More than Four Million Permanently Archived Web Pages from U.S. Congress Now Available via 109th Congress Web Harvest

Last year we told you about the Presidential Term “Web Harvest: from the National Archives (NARA) and The Internet Archive, people who give us The Wayback Machine.
Key Facts:
+ Over 75 Million Archive .Gov and .Mil pages (6.5 Terabytes)
+ Unlike The Wayback Machine, Keyword Searchable Using Nutch Technology. You can also begin your search with a specific URL.
+ More Here and Here.

Now, something new from the same team.
New Web Harvest Online: 109th Congress

What does it contain?
+ More than four million pages (42 GB) crawled and archived between 11/11/06 and 12/11/06
+ Browse by Members Name
+ Browse by Committee Name
+ Browse by Leadership
+ Browse by House or Senate Organizations

The harvest produced a public reference copy of the web sites for the purpose of continual availability to the public, and also produced a record copy to be retained in the holdings of NARA…Web sites included in the harvest were identified from information provided by the Web Systems Branch of the House Information Resources staff and by Senate webmasters in the Offices of the Secretary of the Senate and the Sergeant at Arms.

The crawl was done using The Internet Archive’s open-source Heretix Crawler.

Learn More about the 109th Congress Web Harvest

See Also: Dr. James Billington, Librarian of Congress, recently told a U.S. House Committee, that the average life of a web site is between 44 and 75 days.

Sources: NARA and The Internet Archive

2 Responses to “More than Four Million Permanently Archived Web Pages from U.S. Congress Now Available via 109th Congress Web Harvest”

  1. ResourceShelf » Weblogs: What’s New at the Internet Archive Says:

    [...] See Also: Two U.S. Government and Military “Web Harvests” Terabytes of Archived Data Compiled by the IA and NARA [...]

  2. ResourceShelf » Blogs from the Digital Curation Centre Says:

    [...] See Also: More than Four Million Permanently Archived Web Pages from U.S. Congress Now Available via 109th Congress Web Harvest [...]