1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

1st sitemaps junked - temp duplicated content for 2nd

Discussion in 'Google Sitemaps' started by bgillingham, Jul 12, 2006.

  1. #1
    I have had a software download site up and running since May of this year. Since creating and submitting my initial sitmap to Google, I quickly saw my indexing go from nothing up to 11,000 - 14,000 (seems like common numbers for # of indexed pages on fresh sitemaps of big webs, no?). I used to have just one huge sitemap file, but now I have 6.

    Anyway, after only about 4 days of this did it decline to around 600 - 900. What was it that Google suddenly didn't like? Are all of the links that Google dropped - dropped for a long time? I got no official warning from them about any violation in my pages, so it is hard to come to any conclusion about anything, but...

    Was There a Reason?

    There was some absolutely evil PHP code on my sidebar that used HIDDEN TEXT to hide potential warning messages whenever a getimagesize call failed.:eek: The day that the index started to drop was also the same day that one of my sidebar items caused the warning messages to appear on every page on my site.

    I have since fixed the code. Just don't ever allow hidden text in any of your pages - I learned the hard way.

    Another thing that changed before the decline was the modification I made to my robots.txt file that prevents all bots from hitting the pages for download link (effected download count), voting (duh), screenshot. This certainly can account for a modest percentage loss - but, not 90%+. Only a few of these links were initially in my big sitemap file, but all references are gone now. At the same time, I also fixed links that gave 403 errors because they contained "FTP" / "Telnet" in the "*.php?xyq=* FTP *" parameters.

    My Current Sitemaps

    I recreated the sitemaps again and I have attempted something that I think may work... The new sitemaps do not contain the same URLs for the 30,000+ pages I am trying to get indexed. There is a modified parameter value for the ID of the program -- which the code can react to accordingly.

    Old URL pattern:
    http://mydomain.com/products-13355.html

    New URL pattern:
    http://mydomain.com/products-A13355.html

    Now, the code has to handle this extra value - and serve up pretty much the same page as the old URL, except for a couple things... whenver the URL is missing the "A", put a 303 redirect in there to the new URL pattern. I haven't added this bit of code yet, and this is why I am posting this here - don't want to make another mistake in Googles eyes, or I'll just have to buy a new domain.

    So, to be redundant - the 303 redirect would instruct all search engines that the new page will forever be found at the targeted URL (including the new "A" pattern).

    This should, in theory, cause the old URLs to be dropped from the index for good - but the new pattern of URLs should all be good - since, moving forward none of them could have any hidden text problems.

    I suppose - PM me if you want to check the site out, etc...
     
    bgillingham, Jul 12, 2006 IP
  2. MaxPowers

    MaxPowers Well-Known Member

    Messages:
    264
    Likes Received:
    5
    Best Answers:
    1
    Trophy Points:
    120
    #2
    It sounds like you may want to be using [r=303,L] at the end of your rewrite line to force Google to automatically redirect the request. A standard 303 (unique choice) is setup like other 3xx status codes so that Google should NOT automatically redirect the request. They will get the content from your 'A' version and use it on the 'non-A' version of your URL setting you up for dupe content penalties.

    This is only if both pages (A and non-A) have the same content (I assume they do) and if you are no longer using the non-A version for your pages.

    Aside from this, the 303 status code may not be supported by all spiders (and browsers) which could, in itself, cause problems elsewhere. HTTP 1.0 clients will look at your 303 and scratch their heads. It is clearer than a 302, but only for HTTP 1.1 browsers/spiders/etc. All of the major 3-5 browsers can handle 1.1, but spiders get a little more esoteric and diverse making assumptions more dangerous.
     
    MaxPowers, Jul 13, 2006 IP