The Internet Archive Needs Your Help in Capturing 2 Billion Web Pages From Around the World
Using a grant from the Mellon Foundation, received in recognition of our contributions to the ongoing development of the Heritrix web crawler, the Internet Archive will commence a global crawl to capture 2 billion pages and will make these pages available to the public. This project is designed to create a unique global snapshot of the Web and to help improve and demonstrate the scalability of the Heritrix web crawler.
We invite countries from around the world to submit URLs.
Eligible groups include Libraries, Archives, along with memory and cultural institutions.
See Also: What is Heritix?
Heritrix is the Internet Archive’s open-source, extensible, web-scale, archival-quality web crawler project.
Source: The Internet Archive
See Also: Webcast: Brewster Kahle on “Universal Access to Human Knowledgeâ€
An archived presentation (from last week) by Brewster.
