Professional Reading Shelf
Government Information–United States
Source: Harvard Business School Working Knowledge
Mr. Info: Take the Money�It’s Free!
“(Matthew) Lesko has a unique writing process: plagiarism. It turns out, he says, that in the government, nothing is copyrighted. He simply cut and pasted text from government publications for his first New York Times bestseller, and has been ‘writing’ that way ever since. His description might be a little breezy, however. The real value Lesko adds is in his rigorous and tireless research efforts, as well as the extremely logical and helpful organization of the material. While it is true that anyone can find these resources on the Web or by calling government numbers or writing government agencies, not everyone has the time or inclination to do so.” In other words, Lesko’s work saves people time and effort. This is a skill/service that the library community should also stress in our marketing.
Archive for April, 2004
108310634744201140
Tuesday, April 27th, 2004Document security fears grow
Tuesday, April 27th, 2004Online Documents
Source: FCW
Document security fears grow
From the article, “Problems with maintaining the confidentiality of electronic documents and preventing document tampering are on the rise, according to a security manager at Adobe Systems Inc. Although he would not divulge details of any specific incident of document tampering in the federal government, John Landwehr, group manager for security solutions and strategy at Adobe, said cases of document spoofing represent a growing problem for both government and corporate offices.”
2004 Search Engine Meeting: Presentations Now Online
Monday, April 26th, 2004Professional Reading Shelf
Information Retrieval
Presentations from the 2004 Search Engine Meeting Are Now Available Online
Some really interesting and informative reading for your already full reading lists. The conference took place in The Hague, The Netherlands, 19-20 April 2004. Here’s a selected list of the presentations. I STRONGLY urge you to review the entire list. The page also contains bio info for all speakers. All of the presentations are either pdf or ppt files.
+ Quantity versus quality?
Karen Sp�rck Jones, Computer Laboratory, University of Cambridge, UK
+ The Subtle Side of Retrieval
Elizabeth Liddy, Syracuse University, New York, USA
+ Text and XML querying – Is There a Common Ground?
Prabhakar Raghavan, Verity, California, USA
+ Product Intro: A Holistic Approach to Search
Tuoc Luong, Ask Jeeves, California, USA
+ Information Retrieval: A Single Point of Access
Susan Feldman, IDC, Connecticut, USA
+ Double the Value of Search Using User Behaviour
Laust Sondergaard, Mondosoft, Denmark
+ Social Software and New Search
Stephen E Arnold, AIT, Kentucky, USA
+ Human Intervention in the Search Process
Martin Belam, BBCi Search, UK
+ Learning to Harvest Information for the Semantic Web
Fabio Ciravegna, University of Sheffield, UK
+ Formalising the Concept of Serendipity in Web Searching
Olivier Ertzscheid, University of Toulouse, and Gabriel Gallezot, University of Nice
+ Turbo10: The Mechanics of a Deep Net Metasearch Engine
Nigel Hamilton, Turbo10.com, UK
+ A Relevance Model for Web Image Search
Ethan V. Munson and Cheng Thao, University of Wisconsin, USA
+ Access to Archives of Digital Video Information
Alan Smeaton, Dublin City University, Ireland
+ Organising personal pictures with content analysis technology
Sebastein Gilles, LTU Technologies, France
–
–
Weblogs
RSS
Source: Library + Information Update
Weblogs and RSS in information work
“How can weblogs be used in a library and information service? Ian Winship looks at some of the serious contenders.” A big thanks to the author, Ian Winship, for mentioning ResourceShelf.
Britannica Subsidiary Unveils English-Arabic Search Engine
Monday, April 26th, 2004Web Search
Source: Information Today
Britannica Subsidiary Unveils English-Arabic Search Engine
Paula Hane writes, “Melingo, Ltd. (http://www.melingo.com), a company that has provided advanced search capabilities for complex languages, has just introduced Morfix CL, its English-Arabic-English Cross-Language Search with Embedded Translation. What that means is that English-speaking researchers can search through Arabic material without knowing any Arabic at all�and see a results page with a translation of each Arabic word or phrase. Melingo, a subsidiary of Encyclopaedia Britannica, Inc., is carefully positioning its Morfix technology as a complement to other search engines. The company is concentrating its efforts on aiding the search process and not on highlighting the process of machine translation, which it says is still very inaccurate. Melingo claims that Morfix CL represents a breakthrough in Arabic language analysis and a boon to intelligence agencies and businesses, which today process growing amounts of Arabic data with limited numbers of qualified human translators.” A demo of the cross-language English-Arabic technology is available at Morfix.com.
ScienceDirect Announces New Product Line
Monday, April 26th, 2004Information Industry News
+ Elsevier…ScienceDirect Announces New Product Line
From the announcement, “ScienceDirect (www.sciencedirect.com), announces that Elsevier Book Series titles are now available on the platform. Now multiple users throughout an institution can simultaneously access this important compliment to primary research, previously only available through print subscriptions.”
+ Dialog… Company Names New CTO
A Conversation With Sergey Brin
Monday, April 26th, 2004Web Search–Google
Google IPO Roundup
+ A Conversation With Sergey Brin (via eWeek)
Topics include Gmail, RSS, and privacy.
–
+ A Quirky Brilliance vs. the Dreams of Venture Capitalists (via The New York Times)
From the article, “There are many good reasons to avoid a public stock offering and the close scrutiny it brings. Indeed, this week the scrutiny will intensify as the company approaches a deadline to file financial disclosures. But in Google’s case, its hesitancy up to this point has been a symptom of a long-running battle for control between its two brainy, headstrong founders and the powerful, strong-willed financiers who gave them the money to turn their graduate school project into one of the world’s leading brands, according to several people in and outside Google… Attention is being focused this week on Google because Thursday is the deadline for it to file financial disclosure documents under Securities and Exchange Commission rules. It could meet those requirements by filing papers for a public stock offering – what the venture capitalists are said to favor. Or it could simply file the disclosure papers, perhaps along with a statement that it will begin eventually to move toward a public offering. A person close to the company said last week that it would proceed with this slower course.”
–
+ Google float moves a step closer (via Reuters)
108292769774288583
Monday, April 26th, 2004Resources, Reports, Tools, Lists, and Full-Text Documents (5 Items)
Internet–Statistics
Source: Pew Internet & American Life Project
New Data, 14% of Internet users say they no longer download music files
Summary ||| Direct to Full Text
Thanks to PW for the tip.
–
Business–United States–Lists & Rankings
Source: Forbes
Just Released, CEO Compensation Report 2004
–
Business-United States–Lists & Rankings
Source: Washington Post
New, The Washington Post 200 2004
The largest companies in the region. Browse by category or search.
–
Science Museums
Source: National Academies
New, Marian Koshland Science Museum
The museum, part of the National Academy of Science, opened in DC last week. Web site includes virtual exhibits.
–
Higher Education–United States–Statistics
Source: NSF
New Report, Science and Engineering Degrees: 1966-2001
The Corbis Image Database
Sunday, April 25th, 2004Image Databases
Source: San Jose Mercury News
digital library
“By making more and more of its images digital, Corbis can keep greater control over them and lower its expenses. It can sell images directly over the Web, offer new tools to search through them, and use software to track where each image appears, cracking down on piracy.”
See Also: Image-Seek
This demo from LTU Technologies allows you to keyword search 65,000 royalty-free images from the Corbis database. From the site, “Image-Seeker is an image analyzer that describes images according to their visual features. ‘Because an image is worth a thousand words’, using image similarity combined with text-based search dramatically improves search processes. This demo presents visual search on a large selection of over 65,000 Corbis Royalty-Free images. It is the most intuitive way to navigate while searching for images.”
Pearson and O’Reilly Do e-Textbook Deal
Sunday, April 25th, 2004Books
Source: e-consultancy
Pearson launches web-textbooks programme
“Publishing giant Pearson is launching a project in the US to offer students digital textbooks at half the price of the printed versions…The group’s education division is to launch the project, SafariX Textbooks Online, as a joint venture with another US publisher O’Reilly Media, which offers textbooks on technology, and already uses the Safari system. Rather than offering textbooks for digital download, Safari hosts books online with the ability to annotate and navigate through a web browser.”
See Also: A Bit More in this Reuters Article
Albert Einstein and the Librarian
Sunday, April 25th, 2004Library and Info Briefs (4 Items)
From Companion’s Lost Diary, a Portrait of Einstein in Old Age (via The New York Times)
The “companion” was Johanna Fantova, a librarian at Princeton University. She was a graduate of the library school at the University of North Carolina and was the first map curator at the Firestone Library. According to this AP story it was at Einstein’s urging that Fantova attend library school.
–
+ An Interview with Kevin Starr (via Sacramento Bee)
Starr just retired after ten years as California state librarian.
–
+ �7m library book finally revealed (via The Scotsman)
–
+ New Look for National Library of Medicine Web Site Coming Soon
National Archives and Records Administration Annual Report 2003
Saturday, April 24th, 2004Professional Reading Shelf (2 Items)
National Archives–United States
Source: NARA
Full Text, National Archives and Records Administration Annual Report 2003
–
Media Archives
Source: FCW
NASA to merge media archives
From the article, “Space officials want proposals for a NASA archiving system that would create a one-stop multimedia source for the public.”
Search Engine Showdown Offering a Set of Search-Related Bookmarklets
Saturday, April 24th, 2004Bookmarklets
Search Engine Showdown Offers Search-Related Bookmarklets
Thanks Greg! A couple of days ago Jesse Ruderman announced he has also compiled a page of these useful tools.
Challenges in Web Search Engines
Saturday, April 24th, 2004Web Search–Google
More on the Google/Anti-Semitic Site Story
Important and interesting reads from Seth Finkelstein and Danny Sullivan. No need to comment on this specific issue again but a couple of comments about the issue of search engine manipulation.
Last October, I commented that while most of the press coverage was focusing on paid inclusion (which Google doesn’t offer) and paid placement and its potential effects on the web searcher, it was hard to find press coverage that organic search results can be manipulated (yes, even Google’s results). This manipulation is the nature of the beast (we should learn to deal with it), and another reminder that general web engines are more than just “research tools” like a librarian might think of Dialog, LN, Factiva, and many others. Finkelstein correctly points out, “Google ranks popularity, not authority. And popularity is a measure which is vulnerable to many games. Any system of evaluation is subject to manipulation.” While link analysis is similar in many ways to citation analysis, tools like ISI’s Citation Indexes and ISI’s Impact Factors are less susceptible to manipulation (NOT totally free of it) because it’s a much smaller universe of material to control.
Let’s remember web engines are also advertising/marketing vehicles. As Danny points out, results appearing in the 20th position are all but invisible to the average searcher. Sullivan’s comments remind me of what someone told me at a presentation for the book I co-authored with Chris Sherman. A member of the audience told me that Chris and I failed to mention a large portion of the Invisible Web in our book. After taking a deep breath, I asked her what we forgot. She told me that for many searchers if it’s not in the first five or seven results it’s all but invisible. She was right!
The power searcher needs, first, to be aware of this issue and, second, to utilize advanced search syntax, term selection, specialized databases and other tools to assist in producing more precise result sets. This can help minimize problems. I also think that Teoma’s method of determining relevance might be less susceptible to manipulation.
See Also: Challenges in Web Search Engines
This twelve-page paper was written by Dr. Monika Henzinger (Research Director, Google), Dr. Rajeev Motwani (Professor at Stanford) and Dr. Craig Silverstein (Director of Technology, Google). From the abstract, “…article presents a high-level discussion of some of the problems with information retrieval that are unique to web search engines. The goal is to raise awareness and stimulate research in these areas.” Content quality, spam, cloaking, duplicate hosts and vaguely structured data are some of the topics discussed.
–
See Also, Full Text, Just Released, Web Spam Taxonomy
From the abstract, “Web spamming refers to actions intended to mislead search engines and give some pages higher ranking than they deserve. Recently, the amount of web spam has increased dramatically, leading to a degradation of search results. This paper presents a comprehensive taxonomy of current spamming techniques, which we believe can help in developing appropriate countermeasures.”
Classic Computer Magazine Archive
Saturday, April 24th, 2004Resources, Reports, Tools, Lists, and Full-Text Documents (2 Items)
The following two items were culled from the Infomine What’s New Newsletter
Computers
Classic Computer Magazine Archive
The Classic Computer Magazine Archive presents the full text of early personal computing magazines, including images and advertisements. Contents indexes are offered along with columns, product reviews, software, and cover images. Site is searchable. The site has posted the fulltext of more than 150 individual issues from the following magazines:
Antic (1982-1990)
STart (1986-1991) Dedicated to the Atari ST computer
Creative Computing (1974-1985)
Creative Computing Video and Arcade Games (1983)
Compute! (1979-1994)
Tandy Computer Whiz Kids (1984-1991)
–
Vietnam War
Source: Texas Tech University
The Virtual Vietnam Archive
“The Virtual Vietnam Archive currently contains over 605,000 pages of scanned documents. This searchable archive allows the user to limit results to items available online. Documents, images, audio, finding aids, moving images, periodicals and computer media are available. The search page also has browse indexes of military terms and collection titles. An acronyms database (more than 500 terms) is available from the main archive page as an aid to research.” An Operations Database and Acronym Database are also available.
Google and Akamai: Cult of Secrecy vs. Kingdom of Openness
Friday, April 23rd, 2004Web Search–Google
Source: News.com
Google’s SafeSearch Filtering Draws Some Fire
From the article, “Google’s SafeSearch flaws are more than academic–they can have serious consequences for innocent Web site operators blocked out by them. Google is the most widely used search engine on the Web, and failure to appear in its listings can have a direct impact on sales for some companies, particularly smaller enterprises with limited marketing budgets…Google claims SafeSearch “uses advanced proprietary technology that checks keywords and phrases” and filters out only Web pages containing pornography and explicit sexual content.’ ‘That’s not very bright,’ said Karen Schneider, a librarian who runs the Librarians’ Index to the Internet and has made a study of filtering software. SafeSearch is ‘certainly evocative of the very primitive CyberSitter-type tools of the mid-1990s–not a tool of fairly sophisticated development.’” Some of you might remember that in August, ResourceShelf reported that SafeSearch was blocking pages from the WhiteHouse.Gov site and other non-offensive sites. Some of the problems I highlighted in August have been corrected but, as this new article documents, MANY others still exist.
See Also: Empirical Analysis of Google SafeSearch (Benjamin Edelman, Harvard University)
more Google…
Google and Akamai: Cult of Secrecy vs. Kingdom of Openness (via TechReview)
Simson Garfinkle writes, “The king of search is tapping into what may be the largest grid of computers on the planet. And it remains extraordinarily secretive about its core technologies�perhaps because it senses a potential competitor in dotcom era flameout Akamai.”
Prints and Photographs Online Catalog (PPOC) from The Library of Congress Adds Content
Friday, April 23rd, 2004The Library of Congress
More Digitized Content Added to the Prints and Photographs Online Catalog (PPOC) from The Library of Congress
More digitized content from LC hits the web! The Prints and Photographs Online Catalog (PPOC) is closing in on the one million image mark. Here are a couple recently added collections:
+ National Child Labor Committee Collection (NCLC)
PPOC now offers expanded and enhanced access to approximately 5,100 NCLC photographs [ca.1908-1924] which were primarily taken by the photographer Lewis Hine. These photographs are useful for their examination of labor, reform movements, working class families, education, public health, urban and rural housing conditions, industrial and agricultural sites, and other aspects of urban and rural life in America in the early twentieth century. The collection’s catalog records include a wealth of information, including the locations and names of individuals and businesses featured in the photographs, transcribed from the collection’s original caption cards.
+ U.S. News and World Report Magazine Photograph Collection
Selected photographs from this extensive collection are now available in the Prints and Photographs Online Catalog. In preparation for the magazine’s 70th anniversary, U.S. News staff selected more than 100 photos taken between 1952 and 1983 of newsworthy subjects, including the struggle for African American civil rights, presidential campaigns, and the visits to the United States of foreign dignitaries, as well as life in Vietnam, the Middle East and Russia. Many of the photos were taken by staff photographers and have no known publication restrictions.
See Also: Learn More About the PPOC
FAST Search and Transfer Keeps Going, This Time It’s Ziff-Davis
Friday, April 23rd, 2004Enterprise Search (2 Items)
+ FAST Search and Transfer Continues to Add Customers, This Time It’s Ziff-Davis
It seems that every week they’re announcing big time clients. Several recent announcements involve publishing companies. In addition to today’s Z-D announcement, we’ve seen agreements from Knight-Ridder and Reuters.
–
+ IBM expands search push with Masala (via News.com)
From the article, “The computing giant, based in Armonk, N.Y., is gearing up to release Masala, a new version of its DB2 Information Integrator software that will let corporate employees retrieve information from databases, applications and the Web at the same time.” Btw, a quick review of ResourceShelfPLUS will show that IBM has been building a large collection of search-related patents.
Now Available: New Version of Opera Browser
Friday, April 23rd, 2004Web Browsers
Cool! A New Version (Beta) of the Opera Web Browser is Online
Say hello to Opera 7.50 for Windows Beta 1! I just learned of the release and still haven’t had time to check it out but I thought those of you who use Opera (those of you who don’t should) would like to know. Screenshot here.
+ Company supplied list of changes
+ Beta available for numerous operating systems (Windows, Mac, Linux, FreeBSD and Solaris)
European Union Glossary
Friday, April 23rd, 2004Resources, Reports, Tools, Lists, and Full-Text Documents (5 Items)
United States–Statistics
Source: U.S. Census
Statistical Abstracts: Historical
“Statistical Abstract data present here ranges from our most recent edition to the historical abstracts compiled throughout the decades. Some of the data were scanned as an effort to make historical abstract information available to the public. The display of data will continue as historical records become available.” Access Statistical Abstracts from 1878 – 2001 through this page.
See Also: Mini Historical Statistics
Files available in pdf or xls formats.
–
European Union–Glossary
European Union Glossary
Source: European Union
“The following glossary contains some 250 terms relating to European integration and the institutions and activities of the EU.”
See also: A Plain Language Guide to Eurojargon
–
History
Source: Managing Information News
Library and Museum Material Help Captain Cook Website
“A new virtual exhibition charting the life and voyages of one of the North East’s most famous sons, Captain James Cook, has gone live (www.captcook-ne.co.uk)…. The website is funded through the British Library’s Reaching the Regions programme, in partnership with the North East Museums, Libraries and Archives Council and the Captain Cook Birthplace Museum.”
–
Hunger–United States–Statistics
Source: Food Research and Action Center
Just Released, State of the States 2004 Report (PDF; 307 KB)]
“This FRAC report on the State of the States provides basic data describing the extent of hunger and the use of nutrition programs for the United States as a whole and for each of the 50 states and the District of Columbia. Through these data the State of the States gives a snapshot of how well or badly each state is doing in using available tools to meet the needs of hungry people and improve the health of low-income families.” (Thanks, AT)
Press release
–
Tourism–United States–Lists & Rankings
Source: U.S. Department of Commerce
+ Just Released, Market Share: Overseas Visitors To Select U.S. States And Territories
+ Just Released, Overseas Visitors To Select U.S. Cities/Hawaiian Islands
A Framework of Guidance for Building Good Digital Collections
Friday, April 23rd, 2004Professional Reading Shelf (2 Items)
Digital Collections
+ A Framework of Guidance for Building Good Digital Collections (via NISO)
–
Usability
Source: InfoDesign
Jared Spool: The InfoDesign Interview
“Jared is one of the most important – and best-recognized – voices in the field of usability. User Interface Engineering, the firm that he founded in 1988, is the world’s largest research, training and consulting firm specializing in website and product usability.” (Thanks, LRK) Some insights from the interview:
+ “We discovered that there are basically 14 types of questions, no matter what the subject matter. We’re hoping these 14 types, which we’re calling topic perspectives, can guide designers to plan and implement an initial information resource that is complete and helpful and delights their users.”
+ “Our goal at UIE really is quite simple: We want to eliminate any frustration that comes from the introduction of new technology.”
+ “We’ve come a long way from our roots of being a usability testing service. We really don’t do that anymore, primarily because our research has shown that the most successful design teams are those that do their own testing. Farming your testing out substantially reduces its effectiveness.”
+ “Overly simplistic usability testing can produce too many issues, most of which will not have any desirable effect on the goals of the organization. As a result, teams can easily waste valuable resources fixing things that don’t need fixing.”
+ “Most users are failing on most websites and nobody knows why. We don’t even have a good handle on how to find out why. So, we basically ignore the problem.”
+ “What is the #1 contributor to the user having a good experience? Our research shows that users are most satisfied with a site when they complete their objectives. When they don’t achieve their objectives, they become significantly dissatisfied with the site. Little else really matters beyond completing objectives.”
+”Information architects look at the world from structure and navigation. Designers look at the visual presentation and communication. Usability folks see the world from a user frustration perspective. These aren’t separate branches of knowledge. They are different viewpoints by which you attack the same problem: creating a successful design.”
+”If designers can’t have guidelines, how do they know what to design? Our philosophy is to use an iterative approach. Take a design – any design – it doesn’t matter. Put it in front of users. Change anything that doesn’t work. Repeat.”
