Archive for the ‘Information Science’ Category

The Promise and Peril of Big Data

Friday, March 5th, 2010

The Promise and Peril of Big Data

The Promise and Peril of Big Data explores the implications of inferential technologies used to analyze massive amounts of data and the ways in which these techniques can positively affect business, medicine, and government. The report is the result of the Eighteenth Annual Roundtable on Information Technology.

+ Full Paper (PDF; 352 KB)

Source: The Aspen Institute

Blue Ribbon Task Force Report: Preserving Our Digital Knowledge Base Must be a Public Priority

Thursday, March 4th, 2010

From a Red Orbit Summary:

Addressing one of the most urgent societal challenges of the Information Age – ensuring that valued digital information will be accessible not just today, but in the future – requires solutions that are at least as much economic and social as technical, according to a new report by a Blue Ribbon Task Force.

The Final Report from the Blue Ribbon Task Force on Sustainable Digital Preservation and Access, called “Sustainable Economics for a Digital Planet: Ensuring Long-term Access to Digital Information”, is the result of a two-year effort focusing on the critical economic challenges of preserving an ever-increasing amount of information in a world gone digital.

From the News Release:

Although not all of this data should be preserved, digital data within the public interest – digital official and historical documents, research data sets, YouTube videos of presidential addresses, etc. – must be retained to maintain an accurate and complete “digital record” of our society. Such digital information is now part of what is known as cyberinfrastructure, an organized aggregate of computers, networks, data, storage, software systems, and the experts who run them that is vital to our life and work in the Information Age.

[Snip]

“Addressing the issues of value, incentives, and roles and responsibilities helps us understand who benefits from long-term access to digital materials, who should be responsible for preservation, and who should pay for it,” said Brian Lavoie, research scientist at OCLC and Task Force co-chair. “Neglecting to account for any of these conditions significantly reduces the prospects of achieving sustainable digital preservation activities over the long run.”

+ Access the Complete Report (116 pages; PDF)

+ Access the Blue Ribbon Task Force on Sustainable Digital Preservation and Access Web Site

+ See Also: Coming to DC on April 1, 2010: A National Conversation on the Economic Sustainability of Digital Information

The [one-day] Symposium will focus on one of the most pressing issues in today’s Information Age: identifying practical solutions to the economic challenges of preserving today’s deluge of valuable digital information. The event will feature four “Conversations” with distinguished experts from the academic, private, and public sectors.

Source: Blue Ribbon Task Force on Sustainable Digital Preservation and Access

Brown University: Digital Initiative Starts

Thursday, March 4th, 2010

From the Article:

In an effort to support the use of new technologies in teaching, learning and research, the University Library launched the Center for Digital Scholarship in 2009. The center will serve as “a focal point … at Brown for digital humanities and digital library methods and tools,” according to the CDS Web site.

The CDS is the result of a merger of three existing programs — the Center for Digital Initiatives, the Scholarly Technology Group and the Women Writers Project. The three programs were already collaborating, and “the convergence (was) natural,” said Massimo Riva, a professor of Italian studies who has worked closely with the CDS.

[Snip]

In addition to other activities, the CDS provides scholarly grants for faculty members working on a number of different types of projects. This includes endeavors involving digitizing primary source material, making data easily Web-accessible or bringing together research in a digital publication.

Access the Center for Digital Scholarship Web Site

Source: Brown Daily Herald

New: Free Online Archives Containing 137 Years of Popular Science Magazine

Thursday, March 4th, 2010

The entire archives (137 years) of Popular Science is now accessible, searchable, and free via the PopSci web site. Here’s the official announcement.

You’ll see that the technology and scanning comes with the assistance of Google. In fact, you’ll notice a Google ad at the bottom of each scanned page.

Direct to Popular Science Archive

The Pop Science Archives intro page points out that “advanced features and browsing” are in the works because they are needed. For example, to be able to limit your search to a specific year or range of years would be very helpful.

Nevertheless, it’s more useful (interesting, too!) and free content available to all.

Direct to Popular Science Archive

Source: CrunchGear, Twitter, PopSci

Dr. Peter Jacso Reviews Wolfram Alpha in His Final Gale.com Review

Thursday, March 4th, 2010

We’re sorry to see Peter Jacso, reference reviewer supreme, Professor, and Chair of the Library and Information Science Program at the University of Hawaii and Manoa, end his column for Gale.com with this month’s review.

We’ve learned A LOT from Dr. Jacso and his reviews (he’s done more than 220 during the past 10 years) and sincerely appreciate the many kind words he’s given to ResourceShelf over the years.

His final review is an in-depth look (to put it mildly) of WolframAlpha.

From the Review:

This unique “computational knowledge engine”, the brainchild of one of the most talented contemporary mathematicians, Stephen Wolfram, is said to be based on more than 10 trillion data (which number is comparable to the number of people who ever lived, and more than three times the number of stars in our galaxy. I would have not known this, but I quickly learned it by looking up the term trillion in Wolfram|Alpha.

If this were not enough, it can serve much more data than that because it also calculates new data from many of the raw data that appeared in economic time series, factbooks, yearbooks, encyclopedias, almanacs, directories and a large variety of statistical compendia. It is meant for questions that can be answered mostly through numbers. It has great potential to become a widely used important resource for situations when numeric data is needed rather than deep thoughts and verbalization, but it is not there yet, it is not a finished work that would only need updates with fresh, current data.

[Snip]

There is a reason that the author (or I might as well say composer), calls it a computational knowledge engine. He wants to set it apart from the dozens of search engines. Still, many reviewers compared it to Google, which is like comparing apples and oranges. Google and the other search engines are actually pointers, sending you to Web sites, whereas Wolfram|Alpha is a direct ready-reference source itself.

[Snip]

Wolfram|Alpha is a very interesting ready reference source, and there is no beggary in the answers that can be reckon’d. On the contrary, there is revelry in the answers if the key facts can be summed up compactly.

That’s why good quality abstracts have been appreciated, and why senseless ones are depreciated as those produced by Google Scholar in 307,000 records with the same “abstract” pondering “why this message is appearing”, and 500,000 “abstracts” assuring the user that “the visual presentation will be degraded” – both because of your browser.

Don’t worry, it is not your browser’s problem. The problem is with Google Scholar’s crawlers that triggered these error messages and then gathered them as “abstracts” from the web sites of the most respected scholarly journals. Their publishers gave the key to their entire digital archives, and the precious metadata to Google Scholar’s developers. If you go to the publishers’ site you will find the real metadata, including the real abstracts for free.

Source: Gale.com

Google Book Search March Madness: Paths Forward for the Google Books Settlement (Diagram)

Thursday, March 4th, 2010

Just Released.

Here’s a well done “info rich” diagram (it’s cool too!) developed by Jonathan Band and released by the Library Copyright Alliance. If the chart sorta/kinda reminds you of NCAA Basketball Tournament brackets, that’s the concept. However, you’ll not find b-ball pairings here but rather some of the potential paths (more are possible) that the Google Book Search case could take going forward.

Access the Diagram (PDF)

Jonathan Band Writes:

[This] chart attempts to diagram some of the possible paths forward. Notwithstanding the complexity of the chart, it does not reflect all the possible permutations. For example, it does not mention stays pending appeals nor whether litigation would proceed as a class action. Moreover, the chart does not address the substantive reasons why a certain outcome may occur, e.g., the basis for Judge Chin accepting or rejecting the settlement. And it doesn’t begin to address the issue of Congressional intervention through legislation. In short, the precise way forward is more difficult to predict than the NCAA tournament. And although the next step in the GBS saga may occur this March, many more NCAA tournaments will come and go before the buzzer sounds on this dispute.

Access the Diagram (PDF)

The diagram was designed by Tricia Donovan from ARL.

Source: Library Copyright Alliance
ALA, ARL, and ACRL are Members

As the Internet Replaces Print Publishing, Urge to ‘Unpublish’ Means Censoring History

Wednesday, March 3rd, 2010

From the Article:

Once upon a time, news stories were entombed in newspaper “morgues” and rarely saw the dusty light of day.

Now the news never dies. Millions of people can search the archives online — an amazing benefit unless, perhaps, you’re someone who was actually in the news.

In a recent survey (PDF) of 110 news organizations, the Toronto Star found that increasingly, publishers are fielding regular requests from anxious and embarrassed readers to “unpublish” information, sometimes months or years after it first appeared online.

[Snip]

On a much broader scale, “unpublishing” is the wholesale loss of content that can occur when an online journal or Web archive is sold or goes bankrupt, or the software needed to read it becomes obsolete. It’s expensive to transfer records from an old server to a newer, faster version that operates with different formats and programs. A floppy disk has a half-life of about five years.

“It’s not clear who’s responsible to archive digital material,” said Stanley Katz, director of the Princeton University Center for Arts and Cultural Policy Studies. “Some of the stuff’s going to go away altogether. We are likely to lose whole subsets of it. If we keep renewing everything, we can keep it going. But the question is whether there is money and commitment enough to keep it going. The odds are that money will be applied selectively. “

Access the Complete Article

Source: AlterNet

Under Consideration: An Online Archive Named Video.gov

Wednesday, March 3rd, 2010

We posted earlier today about a speech Eugene Huang from the FCC made earlier this week stating that his organization believes in and will push for free access to PACER filings.

Aliya Sternstein reports on Nextgov that the FCC would also like to make an, “online archive named Video.gov to preserve agencies’ Web content and possibly information provided by the media,” part of the broadband bill.

The planned national digital archives for the 21st century would expand upon the government’s Data.gov Web site, a warehouse of downloadable federal statistics, and be maintained by the National Archives and Records Administration, the Library of Congress and other agencies, said Eugene Huang, FCC’s director of government performance and civic engagement for the national broadband plan.

Access the Complete Article

Source: Nextgov

See Also: Full Text of the Speech By Eugene Huang from the FCC

See Also: FCC to Call for Government Data Overhaul, Broadband Plan Will Recommend Free Access to PACER Docs

New! The Coalition for Networked Information (CNI) YouTube Channel is Now Live

Tuesday, March 2nd, 2010

From the Announcement:

Current offerings include Bernard Frischer’s closing plenary address on 3D modeling of cultural heritage sites and monuments (fall 2009), David Rosenthal’s discussion of the longevity of digital documents (spring 2009), and presentations by Clifford Lynch, Herbert Van de Sompel, and others. Recordings from future meetings will be made available from the site.

You can access the CNI YouTube Channel at:
http://www.youtube.com/cnivideo

Source: Coalition for Networked Information

Official Google Blog Looks at Search Trends from the Winter Olympics

Tuesday, March 2nd, 2010

An interesting post from Google today about how and when people searched for information during the games in Vancouver.

You’ll find a narrative and graphs (number of queries and date for a selected search terms) for these countries:

+ Norway

+ Japan

+ Korea

+ U.S.

+ Canada

Finally, the list of the five athletes (worldwide) that received the most searches:

1. Shaun White (U.S. men’s halfpipe)

2. ??? (Kim Yu-Na; Korea ladies’ figure skating)

3. Lindsey Vonn (U.S. ladies’ downhill Alpine skiing)

4. Sven Kramer (Netherlands men’s 5000m speed skating)

5. Evan Lysacek (U.S. men’s figure skating)

Finally, we wouldn’t be surprised to see Bing post this type of list. We will be on the lookout.
Access the Complete Blog Post

Source: Official Google Blog

Research from Northwestern U.: Tell Me More, Finding the Facts that Online News Leaves Out

Tuesday, March 2nd, 2010

Automatically “enhancing” and updating a news story (with newer content) is currently being developed at the Intelligent Information Lab at Northwestern University in Evanston, IL.

From a New Scientist Article:

Even the most conscientiously crafted news story can leave out information that might have changed your opinion. Some may even do so deliberately. A prototype web service changes that, by sourcing additional quotes, figures and other information to augment any given online news article.

Unlike existing news aggregators or “related stories” features, the new service, called Tell Me More, presents (fresh details without repeating information in the original story) and other material left out of the original story (for a variety of reasons).

[Snip]

The software then trawls Google News, Yahoo News or other news aggregators to find related articles. These are analysed in the same way as the original story so that a comparison can be made to uncover any information not included in the initial article.

[Snip]

By subtracting named entities, quotes and numbers from one document to the other you can determine what information is genuinely new,” says Francisco Iacobelli of Northwestern University, who co-developed Tell Me More.

“We then characterise this information as additional actors, figures or quotes and present the information next to the initial story.”

Iacobelli intends Tell Me More to give readers a more balanced view of events by presenting them with additional and perhaps conflicting information not included by their initial news source.

Two More Points:

1) As pointed out in the article, Tell Me More might have some serious copyright issues to overcome.
2) The software is not for public use.

You can read about a test that New Scientist conducted using Tell Me More software near the conclusion of the article.

Learn More?

A) “Tell Me More” Info Page from Northwestern U.

B) Learn About Other Projects from the Intelligent Info Lab.
Very Interesting Reading!!!

Projects include:
+ Beyond Broadcast
+ So You Say
+ News At Seven
+ Make My Page

There are several other projects at the Intelligent Info Lab and you’ll find links to all of them here.

Sources: New Scientist, IIL at NWU

New: IMLS and SGS Issue Report on the Preservation of World Cultural Heritage

Tuesday, March 2nd, 2010

From the Announcement:

The Institute of Museum and Library Services (IMLS) and the Salzburg Global Seminar announce release of the report, “Connecting to the World’s Collections: Making the Case for Conservation and Preservation of Our Cultural Heritage” based on a seminar held in Salzburg, Austria, October 28-November 1, 2009. The seminar, part of the IMLS’s multi-year initiative on collections care, Connecting to Collections: A Call to Action, explored global themes related to conservation and preservation, including international needs, issues, perspectives, and accomplishments.

[Snip]

The report includes practical recommendations to ensure optimal collections conservation worldwide and the Salzburg Declaration on the Conservation and Preservation of Cultural, which was passed by 60 participants hailing from 32 countries. The session combined presentations by leading experts in conservation and preservation throughout the world with small working groups tasked to make recommendations for future action in key areas, including emergency preparedness, education and training, public awareness, new preservation approaches, and assessment and planning. To access these resources, click here.

Access Full Text Report: “Connecting to the World’s Collections: Making the Case for Conservation and Preservation of Our Cultural Heritage”

Sources: IMLS, SGS

Search News from Bing, Twitter, and Google

Monday, March 1st, 2010

Three items to inform you about today.

First, Matt McGee reports on Search Engine Land that Bing now provides search history with their auto-suggest feature that appears when the search box “drops down” and offers search term suggestions.

Queries from your search history will appear in purple, while other queries will show in blue. The auto-suggest box offers options like “Manage History” and “History Off” for searchers who don’t this feature enabled.

See Also: Official Announcement from the Bing Blog

Second, Twitter is now allowing several search tools to have access to their “firehose” feed. This feed provides access to EVERY TWEET in real-time. Some Twitter tools don’t have Firehose access.

From The Next Web:

Twitter has long kept a very tight grip on just who can have access to the stream of Twitter updates. All of them, that is. Limited access is open to anyone and everyone who wants to play.

According to the company, some 50,000 are using the rate limited APIs. Twitter was long rumored to give out very limited access to the firehose (all tweets in real time) due to scaling problems. More people pulling in data means more server load, something which Twitter always has too much of.

Whatever is true, Twitter is finally loosening their grip, and giving some new companies access to the firehose. All the data as it comes in, it’s a potential goldmine.

Beginning today, Ellerdale, Collecta, Kosmix, Scoopler, twazzup, CrowdEye, and Chainn Search are now accessing firehose access.

From the Twitter Blog:

Recently we’ve announced partnerships with Yahoo!, Google, and Microsoft. These Web leaders gained access beyond our free offerings—we licensed them the full feed of all public tweets.

[Snip]

More than fifty thousand interesting applications are currently using our freely available, rate-limited platform offerings. With access to the full Firehose of data, it is possible to move far beyond the Twitter experiences we know today. In fact, we’re pretty sure that some amazing innovation is possible.

From Venture Beat:

Companies like Google and Microsoft have already gotten a headstart on mining Twitter’s data after signing agreements to incorporate tweets into real-time search. Terms were not disclosed, but Bloomberg reported that the deals made the company profitable with $25 million in additional revenue. Some of the larger startups in Twitter’s ecosystem like Seesmic and Tweetmeme also have had financial arrangements for deeper data access for some time, although they also have not disclosed details.

Finally. this will not be the week Google doesn’t make an acquisition.

This afternoon we learned that Google had acquired Picnik, a web-based photo editing service.

From the NY Times:

The Picnik service is offered through Picasa, Google’s photo storage site. Although the company says users will still be able to access other social networking and photo storage sites from Picnik, the sale leaves an open question about how long the service will continue to run on some of its competitor sites in the future, including Facebook, Flickr and Photoworks.

The sale puts Google in yet another competing business with Adobe, going up against Photoshop.com, and with Apple and the basic photo editing tools within iPhoto.

See Also: Google Blog Post re: Picnik

Library of Congress Digital Preservation Podcast Now Part of iTunes U

Monday, March 1st, 2010

From the Announcement:

The new Library of Congress podcast series of interviews with prominent digital preservation practitioners was recently named a noteworthy podcast at iTunes U .

These podcasts offer a chance to hear experts talk about current approaches to keeping digital content accessible over time.

The podcasts are available on the Library of Congress digital preservation website and by subscription through iTunes U.

Source: LC/NDIIPP

National Central Library of Taiwan to Collaborate with UW Libraries’ East Asia Library to Digitize Rare Chinese Classic Books

Monday, March 1st, 2010

From the Announcement

University of Washington Libraries’ East Asia Library (EAL) and National Central Library of Taiwan (NCL) announce a project to digitize Chinese rare books held at the East Asia Library and Special Collections at University Washington Libraries, slated to commence in Summer 2010.

The EAL rare book collection includes approximately 600 titles of Chinese rare books, including

+ the rare books of Ming Dynasty (1368-1644)

+ editions o the Joseph F. Rock collection including many rare local gazetteers of South West China

+ the Qian Qianyi (a late Ming Dynasty poet- historian, 1582-1664) collection

+ the Hellmut Wilhelm collection (books from Professor Wilhelm, a former UW faculty member and renounced sinologist)

NCL will contribute approximately $91,000 USD, 2-3 staff members, as well as the digitizing equipment. With a target of 80,000 digital images and associated descriptions by December 2012, priority will be given to scanning the titles that are not duplicated by NCL collection, then to scanning duplicate titles but different editions, and finally, scanning missing volumes or missing pages of each partner Library’s collections. The digitization work will take place in the East Asia Library, with the EAL staff working with staff from NCL.

Once digitized, the collection will be part of the NCL Chinese rare book bibliographic database.

Only four US institutions, the Library of Congress, Princeton, UC Berkeley, and the University of Chicago – in addition to more than 30 libraries and institutions in the world – are part of this database.

Source: University of Washington Libraries
Hat Tip: ALA Direct

Portico’s Web Site Receives Makeover, Includes Tools to Educate About Digital Preservation

Monday, March 1st, 2010

From the Announcement:

More than 650 libraries and 90 publishers (representing over 2,000 scholarly associations) around the world are working together to support the digital preservation of scholarly content through Portico.

Highlights include:

+ Who Participates? – Detailed sort-able lists of participating libraries, publishers, and titles (e-journals, e-books, and d-collections), including bibliographic information, archive holdings, and information about the preservation status of titles and their availability for post-cancellation access.

+ The Archive: Content & Access – A summary of archive growth and contents over time, a snapshot of current facts and figures, information about ‘triggered content,’ and readily accessible links to our archive audit and access sites.

+ Preservation Step-by-Step – An overview to help librarians and publishers educate others in their organizations about how digital content is preserved in Portico that is easy to understand and supported by visual aids.

+ How Portico Saves You Time and Money – Information on how to get value from Portico participation today, including links to our holdings comparison service and to case studies from librarian colleagues.

Source: Portico

Research: Digital Archive Study Aims to Create Permanence From the Web

Monday, March 1st, 2010

From the Article:

How can we curate and make permanent the narratives and transient experiences we share daily on the web? Can we preserve a player’s participation in an Alternate Reality Game that spans continents and platforms, or in reading a story that disappears from the world once its last page is turned?

Dr Tom Abba of the University of the West of England is investigating this – he has just been awarded an early career research grant to identify strategies for archiving new and existing digital works. These works or narratives are ‘born-digital’ – story forms created on the web, but echoing the shapes of novels, films, poems, and other media. His research into how to classify and curate these digital narratives will strengthen UWE’s emerging reputation for research into new and interactive media, focused through the University’s Digital Cultures Research Centre.

[Snip]

Tom says, “The transitory nature of the web, and the speed at which things emerge and quickly vanish, causes all sorts of problems for scholars looking to understand new forms of story. The third insight for my research was recognising that there was an opportunity to take hold of some of those curatorial questions, and try to determine what was worth holding onto for future generations and why.

Access the Complete Article

Source: University of the West of England, Bristol

In Haiti Digital Archivists Work to Save Rare Books, Historical Documents

Saturday, February 27th, 2010

From the Article:

But Brooke Wooldridge also learned help was desperately needed to rescue and preserve the treasures that help chronicle Haiti’s history, clustered mostly in the four institutions in downtown Port-au-Prince.

“”First I worried about the people and making sure everybody working at these institutions were OK, and then I thought about all of those collections,” said Wooldridge, project coordinator for the Digital Library of the Caribbean at Florida International University. “I felt very conflicted. Emotionally, I knew there was so much life lost, but I also knew that if the collections were ignored, Haiti’s collective memory could be lost. I knew we needed to help’.”

So Wooldridge quickly assembled like-minded culturalists who were already a part of the Digital Library, an international coalition of research, governmental and educational institutions that provides access to Caribbean-related electronic materials.

The organization, founded in 2004, was perfectly poised to help. Led by Wooldridge, it had already been working with Haiti’s librarians and curators over the years to digitize their collections. Within weeks, the group launched a campaign to rally international contributors, raise money and provide technical support for the recovery and protection of Haiti’s cultural resources — the already brittle rare books and documents scattered and dusted by the quake.

Source: Miami Herald