1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

How much GB it would be to index all urls and links on the planet?

Discussion in 'General Chat' started by szynka, Mar 25, 2005.

  1. #1
    Hi,

    How much GB it would be to index all urls and all
    links to that urls on the planet?

    If you want to write an ansewer please do it and include
    way you calculate.

    by the way if you know any software that would handle it
    please write.

    (all in all to do something like link:site.com inachor:keyword,
    to know how many sites links with specific anchor to specifick site)
     
    szynka, Mar 25, 2005 IP
  2. l234244

    l234244 Peon

    Messages:
    1,225
    Likes Received:
    50
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Are you for real?
     
    l234244, Mar 25, 2005 IP
  3. andy

    andy Peon

    Messages:
    26
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Ask google
     
    andy, Mar 25, 2005 IP
  4. king_cobra

    king_cobra Peon

    Messages:
    373
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Its called the web graph by search engine researchers. They store the net as a graph. with pages being the nodes and links being the lines. Thas how they calculate precisely how many links you have.

    And this graph is constantly (every second) changing as the bot crawls. apparently google has the biggest graph and thats not for sale. But i heard some minor search engines sell their web graphs so that other search engines can start with a good db. i dunno the cost or where to get it.do a google abt it.

    Any way the graph is in TB range, some 2-3 TBs. so u can imagine what google has in store. They cache the pages too. So its HUGEEEEEEE. The web graph of google will be around 1000TB and the cache store i cant imagine.
     
    king_cobra, Mar 25, 2005 IP
  5. dchoe

    dchoe Active Member

    Messages:
    64
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    93
    #5
    eleventyoogle
     
    dchoe, Mar 25, 2005 IP
  6. fryman

    fryman Kiss my rep

    Messages:
    9,604
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    370
    #6
    So, you'd need a few million dollars to get started... and I doubt you can do it with some "software".

    Time to stop dreaming and get back to work...
     
    fryman, Mar 25, 2005 IP
    MattUK likes this.
  7. SHT

    SHT Active Member

    Messages:
    266
    Likes Received:
    21
    Best Answers:
    0
    Trophy Points:
    73
    #7
    What ever you think it is, triple it, times it by 9999999999999999999999999 write down the awnser. It is alot more than that.

    It will be massive, alot bigger than you would expect.

    Greg
     
    SHT, Mar 25, 2005 IP
  8. myshtern

    myshtern Active Member

    Messages:
    68
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    58
    #8
    1000TB isnt even that much nowadays
    If you get started now, it would be really difficult to catch up
     
    myshtern, Mar 25, 2005 IP
  9. king_cobra

    king_cobra Peon

    Messages:
    373
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Remember that they are not storing it the way we store. They have the best db optimisation. They use Oracle to its maximum. Its not that much huge, the web graph when its in their db. but the cache store will be massive.
     
    king_cobra, Mar 25, 2005 IP
  10. szynka

    szynka Peon

    Messages:
    68
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #10
    great answer, i'll mail them.




    king_cobra >> how you calculated?

    and whats eleventyoogle?


    l234244 - after minutes of lol yes i'm for real.
     
    szynka, Mar 25, 2005 IP
  11. szynka

    szynka Peon

    Messages:
    68
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #11
    i've send mail to google and local search engine.

    king_cobra - i think about web graph, only.
     
    szynka, Mar 25, 2005 IP
  12. MELLA

    MELLA Peon

    Messages:
    5,189
    Likes Received:
    267
    Best Answers:
    0
    Trophy Points:
    0
    #12
    oh oh do let us know what their answer is! [​IMG]
     
    MELLA, Mar 25, 2005 IP
  13. normanu

    normanu Well-Known Member

    Messages:
    18
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    101
    #13
    yeah dude ask google or another major search engine
     
    normanu, Mar 25, 2005 IP
  14. ziandra

    ziandra Well-Known Member

    Messages:
    142
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    138
    #14
    Ugh. I gotta engage brain. I think it is estimated that google index less than 10% of all web pages on the net. The number 8% sticks in my mind. That does not include dynamic generated pages which are a bugger to index. So, take the marketing lies about how many pages they index and multiply by ten or twelve and that is how many there are.

    The problem is less storage and more bandwidth and intelligent retry strategy. Also, dealing with those nasty, NASTY dynamic pages will drive you nutz.
     
    ziandra, Mar 28, 2005 IP
  15. fryman

    fryman Kiss my rep

    Messages:
    9,604
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    370
    #15
    LMAO!
    :D
     
    fryman, Mar 28, 2005 IP
  16. szynka

    szynka Peon

    Messages:
    68
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #16
    Great advise normanu, i've send w question to Microsoft Search. they got cool search builder better the google.

    How much GB it would be to index all urls and links on the planet?

    Not planet but still some data:
    8 058 044 651 * 50 bytes * 16 = 5.86299913 terabytes

    50 bytes for urls, 16 for links on each page, still mych then ther is.
    when using Western Digital 180GB's , with prize $0,50/GB

    that would be


    $3001 and 85c for hard discs, :)


    What software could do it?
     
    szynka, Mar 28, 2005 IP
  17. davedx

    davedx Peon

    Messages:
    429
    Likes Received:
    21
    Best Answers:
    0
    Trophy Points:
    0
    #17
    davedx, Mar 29, 2005 IP
  18. redking

    redking Member

    Messages:
    93
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    43
    #18
    Dayam son...are you dumb? Even if your 5.86299913 terabytes calculation is accurate (probably low IMHO) you would need more than 30+ hard drives because you need redundancy. Oh, and please don't tell me you plan on using IDE drives too...LOL. Oh, BTW, Dell doesn't sell computers that hold 30+ drives. And software that can index webpages, store them on a RAID 5 disk array with 30+ drives doesn't write itself and it's not sold online either.

    You need to do alot more research and take some computer science classes because you are clueless.
     
    redking, Mar 29, 2005 IP
  19. fryman

    fryman Kiss my rep

    Messages:
    9,604
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    370
    #19

    I think that sums it up :D
     
    fryman, Mar 29, 2005 IP
  20. davedx

    davedx Peon

    Messages:
    429
    Likes Received:
    21
    Best Answers:
    0
    Trophy Points:
    0
    #20
    Erm... while there's a degree of naivety here, I don't think calling him names is much smarter. Don't shoot someone down for being curious or asking interesting questions. And it's really not hugely complicated to build a network of storage devices to hold this kind of information. e.g. http://www.microsoft.com/windows2000/techinfo/howitworks/fileandprint/dfsnew.asp, http://www.google.com/search?hl=en&...l&q=unix+distributed+file+systems&btnG=Search, there's plenty of "out of the box" solutions to the storage question.

    Maybe not both, but there's plenty of software that does either.
     
    davedx, Mar 29, 2005 IP