1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Archive.org

Discussion in 'General Business' started by itsonlyme, Mar 17, 2007.

  1. #1
    I remember reading somewhere before that you can stop Archive.org from saving the history of your webpage in their Wayback Machine. Does anyone here know how it's done as I can't seem to remember where I saw it.
     
    itsonlyme, Mar 17, 2007 IP
  2. prodigy

    prodigy Guest

    Messages:
    576
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #2
    google is your friend :)
     
    prodigy, Mar 17, 2007 IP
  3. adultuserbars

    adultuserbars Peon

    Messages:
    762
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #3
    On the flip side, how can you get listed there? And also increase the archive rate?
     
    adultuserbars, Mar 17, 2007 IP
  4. D'Godown

    D'Godown Well-Known Member

    Messages:
    1,093
    Likes Received:
    25
    Best Answers:
    0
    Trophy Points:
    140
    #4
    ban in .htaccess
     
    D'Godown, Mar 17, 2007 IP
  5. bookscanning.com

    bookscanning.com Peon

    Messages:
    35
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Usually their bots are visiting your site automatically.

    Timo
     
    bookscanning.com, Mar 18, 2007 IP
  6. itsonlyme

    itsonlyme Peon

    Messages:
    35
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #6
    How to do that in .htaccess?
     
    itsonlyme, Mar 19, 2007 IP
  7. casperl

    casperl Peon

    Messages:
    1,560
    Likes Received:
    57
    Best Answers:
    0
    Trophy Points:
    0
    #7
    As far as i remember, they had a bot and just like you exclude other bots, you can add a line for them in your robots.txt file. Try searching it..
     
    casperl, Mar 19, 2007 IP
  8. jared

    jared Peon

    Messages:
    231
    Likes Received:
    12
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Hope this helps. Cheers :D
     
    jared, Mar 19, 2007 IP
  9. charter

    charter Guest

    Messages:
    806
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    0
    #9
    You can also keep them out via the robots.txt
     
    charter, Mar 19, 2007 IP
  10. drig

    drig Peon

    Messages:
    4,188
    Likes Received:
    175
    Best Answers:
    0
    Trophy Points:
    0
    #10
    The Internet Archive is not interested in offering access to Web sites or other Internet documents whose authors do not want their materials in the collection. To remove your site from the Wayback Machine, place a robots.txt file at the top level of your site (e.g. www.yourdomain.com/robots.txt) and then submit your site below.

    The robots.txt file will do two things:

    1. It will remove all documents from your domain from the Wayback Machine.
    2. It will tell us not to crawl your site in the future.

    To exclude the Internet Archive’s crawler (and remove documents from the Wayback Machine) while allowing all other robots to crawl your site, your robots.txt file should say:

    User-agent: ia_archiver
    Disallow: /

    Robots.txt is the most widely used method for controlling the behavior of automated robots on your site (all major robots, including those of Google, Alta Vista, etc. respect these exclusions). It can be used to block access to the whole domain, or any file or directory within. There are a large number of resources for webmasters and site owners describing this method and how to use it. Here are some:

    * http://www.robotstxt.org/
    * http://pageresource.com/zine/robotstxt.htm

    Once you have put a robots.txt file up, submit your site (www.yourdomain.com) on the form on http://pages.alexa.com/help/webmasters/index.html#crawl_site.

    The robots.txt file must be placed at the root of your domain (www.yourdomain.com/robots.txt). If you cannot put a robots.txt file up, read our exclusion policy. If you think it applies to you, send a request to us at .

    from http://www.archive.org/about/exclude.php

    searched for "exclude archive.org" in google and it was the first result.
     
    drig, Mar 19, 2007 IP