Web Search
Source: Microsoft Watch
Search Titans Talk Futures
“Microsoft is known to be prepping new search technologies that are expected to allow users to search seamlessly across their local machines, corporate networks and the Internet. The new MSN Search part of the equation is expected to debut later this year or early next. A first version of the WinFS file-system subsystem will be integrated into Longhorn when it ships in 2006+. And A9 recently unveiled a beta version of a new search site that builds on top of Google.” Report on World Wide Web Conference on presentations by Rick Rashid, senior VP in charge of Microsoft Research, and Udi Manber, CEO of Amazon.com’s A9 subsidiary.
See Also: ResourceShelfPLUS Has a Compilation Containing Links to Many of the Papers Presented at This Week’s World Wide Web Conference
See Also: Udi Manber the head of a9 spoke at the University of Washington in November. You can watch an archived version of his lecture here. It’s titled, “The World’s Information at Everyone’s Fingertips.”
Archive for May, 2004
Search Titans Talk Futures
Friday, May 21st, 2004A New Smart Search Feature from Jeeves: Movie Info
Friday, May 21st, 2004Web Search–Ask Jeeves
A New Smart Search Feature from Jeeves: Movie Info
In April we ran an item about Ask Jeeves launching Famous People Search. I just noticed that a new Smart Search feature is available and, like the others, can potentially save the searcher time and effort. If you search for a new movie you’ll find a box at the top of the page containing movie plot info and DIRECT links to the trailer, the official site, reviews (via RottenTomatoes.com), showtime info and more. Hopefully, they’ll continue to add more film info and include entries for older films. No special syntax is required to trigger this feature. I also noticed that you will find info for some films released in the past year. You can find more Ask.com SmartSearch shortcuts here.
More Press for Vivisimo
Friday, May 21st, 2004Web Search–Vivisimo
Source: Pittsburgh Tribune-Review
More Press for Vivisimo
From the article, “At the same time Google’s founding duo began their journey to fame and fortune as researchers at Stanford University, a group of Carnegie Mellon University computer scientists initiated their own project in the summer of 1998 to tackle the problem of information overload. ‘The only way to address the problem is to let users see a lot more of what’s out there, but with less effort,’ said Raul Valdes-Perez, 47, who led the CMU effort [now CEO of Vivisimo].” I’ve mentioned several times that Vivisimo calls this idea “selective ignorance.” It’s explained in this paper.
A Contest to Manipulate Google Results
Friday, May 21st, 2004Web Search–Google
Source: The Wall Street Journal
A Contest to Manipulate Google Results (registration required)
From the article, “An online ad company and a search-themed Web site are sponsoring a contest that shows how easy search results can be manipulated. The winners: sites that rank highest in Google searches on June 7 and July 7 for the “invented” term (actually a play on Dark Blue Sea Ltd., an Australian company that is a contest sponsor).” The apparent ease that some people have manipulating Google (and other web engines) is not good news for the company, web search and most importantly the typical searcher who enters 2.5 terms and clicks the search button.”
Library Stuff’s Steven Cohen was kind enought to share a couple of comments with ResourceShelf.
He writes, “While this article shows the public an important lesson on the ability to manipulate results in Google, it also displays what librarians have been teaching our patrons for years. That what is found on the first ten hits can lead the searcher to results that may not be the best for the particular issue at hand. In fact, we also preach the use of numerous other resources to gain quality results (LII, etc). Google does not equal quality research, as this article implicitly demonstrates.”
See Also: Andy Beal points out that the report is incorrect in stating that “cheating” is not allowed.
Peter Morville Launches Findability.org
Friday, May 21st, 2004Information Architecture
Information Access
Peter Morville Launches Findability.org
Guru, President of Semantic Studios, and co-author of what for many people is the bible of web info architecture has a new site. It’s loaded with reading, links, and discussion. Findability.org is, “dedicated to findability and the design of findable objects.” I’ll be spending plenty of time here.
“The Most Important New Library to be Built in a Generation”
Friday, May 21st, 2004Professional Reading Shelf
Public Libraries
Source: The New Yorker
High Tech Bibliophilia
Paul Goldenberger, The New Yorker’s architecture critic, calls Seattle’s new Central Library, designed by Rem Koolhaas, “the most important new library to be built in a generation, and the most exhilarating.”
See Also: Libraries dust off stuffy image
–
Libraries
Inclusion of Library Ets Haim in Memory of the World Register Celebrated in Amsterdam
“The Library Ets Haim, a unique collection of Judaica held in the Portuguese Synagogue complex in Amsterdam, The Netherlands, became officially part of UNESCO’s Memory of the World Register, when the Memory of the World Certificate was handed over by UNESCO’s Elisabeth Longworth to the President of the Board of Governors of the Library during a ceremony yesterday evening in Amsterdam.”
–
Institutional Repositories
ARCHIMEDE : A Canadian software solution for institutional repositories
From the announcement, “Laval University Library recently launched the third component of its institutional repository. Called � Archimede (http://archimede.bibl.ulaval.ca), this component covers e-prints, pre-prints, post-prints and other research publications from faculty members and research communities.”
FedBizOpps.gov Up for Bids
Friday, May 21st, 2004Federal Government–United States–Databases
Source: FCW
FedBizOpps Up for Bids
From the article, “The General Services Administration has issued a solicitation for a new contractor to take over FedBizOpps, a Web site that gives contractors and agencies access to information about federal contracting opportunities.” The article includes highlights from the RFP.
108515818171851530
Friday, May 21st, 2004Resources, Reports, Tools, Lists, and Full-Text Documents
Business–United States–Statistics
Source: U.S. Census
Just Released, 2002 Economic Census: Advance Nonemployer Statistics
Summary ||| Direct to Full Text
–
Population–United States
Source: U.S. Census
New, Fact Sheet: Facts About the Native Hawaiian and Other Pacific Islander Population in the United States
–
Health–Statistics–United States
Source: CDC
Fact Sheet: Facts about Prevalence of Arthritis–U.S., 2004
–
Labor–Statistics
Source: BLS
Just Released, International Comparisons of Hourly Compensation Costs for Production Workers in Manufacturing, revised data for 2002
–
Legal Resources–United States
Source: National Center for State Courts
CourTopics
“Over 100 NCSC topic folders contain overviews, research reports, information about programs and services, frequently asked questions, best practices, and publications.” Topics covered include everything from Acquiring Technology to Workload and Resource Assessment. Most (but not all) topics an overview, FAQ and resource guide (PDFs); some include additional related NCSC documents. The resource guides are basically extensive bibilographies. Some of the topics are likely to be of interest to consumers, such as Adoption, Custody and Support, Impaired Driving, Mental Health, Traffic Offenses.
Also available at this site: Court Statistics Project
+ Examining the Work of State Courts
+ State Court Caseload Statistics
+ Caseload Highlights
Information Today, Inc. Acquires Faulkner Information Services
Friday, May 21st, 2004Save and Search Web Pages Viewed in Your Browser
Thursday, May 20th, 2004Resource of the Week
Web Tools
All About Seruku
This week we have another example of innovative and useful work coming from a small company in the search and info retrieval space.
Say hello to Seruku.
Seruku is toolbar-based application that helps you find and access ANY and ALL web pages that have appeared in your browser. Its simplicity, along with its ability to save the user plenty of time and aggravation, makes it a resource that will appeal to the masses.
As we “work the web”, most of us are constantly looking at and reading hundreds of pages in our browsers. Trying to go back and fine previously viewed material, however, can be time consuming and, in some cases, pretty much impossible.
Why? Reviewing your browser’s history file isn’t always easy since it contains only urls and page titles. And the ephemeral nature of material on the web can pose many problems. Pages you looked at on Monday can be gone for good the following Friday — if not sooner.
Seruku Toolbar 1.1 ($24.95/Windows only/45 day free trial) solves many of these problems. As you visit web pages, it automatically makes a copy (called a snapshot) of every html web page you?ve viewed in your browser, stores it locally, indexes the content and then, when needed, allows you to keyword search the full text of this material. Very cool and very useful.
After downloading (3.5MB) and installing the program, you’ll be up and running in a matter of minutes.
The Seruku site offers plenty of background about how the product operates. In a nutshell, it’s really two separate programs: a toolbar and an indexing/database program that is completely separate from the Internet Explorer cache.
Those with privacy concerns will be relieved to know that these have been addressed by Seruku. The company mentions many times that all of the material you save is kept on YOUR computer. No information about what you’ve saved and when you saved it is transmitted over the web.
Using Seruku is very easy. All html pages that appear in your browser are automatically saved — or recorded — into the database. Of course, you can click to toggle the recording function on and off.
Searching your local datastore (where the material is stored on your computer) is straightforward. Enter your search terms and go. Seruku utilizes an implied AND between terms. You can also limit your search by date. For example, you can search only those pages you?ve seen within the last three days, week, month, or between two specific dates.
A search results page includes links to live versions of each page along with links you can click on to view copies of the recorded pages.
The toolbar also offers a button that will run your query in Yahoo, Google and other web engines.
William Grosso, the owner and lead developer of Seruku told me that most users will use about 3-5 gigabytes of hard drive space a year. Of course, Seruku makes managing the datastore easy. For example, you can remove a specific page, a group of unused pages, and backup the datastore.
At this point Seruku is only available for Internet Explorer but a version for Mozilla is in development. Grosso also let me know that future releases will offer an option to add any web engine to the toolbar similar to what’s currently available from NeedleSearch or the Copernic Toolbar. Improving the format and content of snippets on result pages is also a priority. I’m glad to see that the development team realizes that work is needed in these areas. Improvements will make Seruku and even more valuable tool.
Other personal search products like SurfSaver or the web-based service Furl can also be useful tools for the web researcher. These products allow you to add keywords, descriptions and other info to each page after you decide to save it. You can also organize the saved material into folders. The problem is that you must first decide to save the page. The beauty of Seruku is that all of the saving takes place automatically. It’s always on and recording (unless you decide to toggle it off) what’s in your browser. It’s all there. If additional access points are needed or having access to saved pages from various computers is required, SurfSaver and Furl can help. It all depends on your needs. Btw, another personal search tool, Scopeware, ceased operations on May 15th.
In his legendary 1945 essay, “As We May Think,” Vannevar Bush writes:
Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and, to coin one at random, “memex” will do. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.
Seruku is not exactly the memex device that Bush describes; it can only save html content. But it is certainly a useful step forward in realizing Bush’s vision in today’s web world. Kudos to Grosso for not only developing this product (it’s been needed for a long time), but also for making it so easy to use.
See Also: Direct to the Seruku FAQ
RFID Technology for Libraries
Thursday, May 20th, 2004Professional Reading Shelf
RFID
Source: Public Library Association
New, Full Text, RFID Technology for Libraries
“Tech Notes are short, Web-based papers introducing specific technologies for public librarians.” Other Tech Notes and tech overviews published or revised in the last month:
+ Weblogs (Written by Steven Cohen)
+ Filtering Technology and CIPA Compliance
+ E-Books
–
Scholarly Publishing
Source: SPARC
New, Open Access: Unlocking the Value of Scientific Research
A paper presented by SPARC Director Rick Johnson at a conference on digital resources sponsored by the University of Oklahoma Libraries reviews some of the market forces that are pushing towards a tipping point in scholarly communication.
–
Digital Archiving
Source: ERPANET/CODATA
New, Final Report: Seminar on the Selection, Appraisal and Retention of Digital Scientific Data
From an announcement, “Rapid advances in technology are impacting the way scientists work, allowing greater amounts of digital data to be produced in the majority of scientific disciplines. These technological advances are also changing the way scientists interact, creating opportunities for collaborations across disciplines, institutions, and countries. The ever-increasing data that are generated through these advances require active curation to ensure their longevity. The international EPRANET/CODATA seminar examined the current state of practice of the selection, appraisal and retention among diverse scientific communities and discussed how archival concepts can best be applied to the management and long-term preservation of digital data. The seminar, held from 15th-17th of December 2003 at the Biblioteca Nacional in Lisbon, brought together more than sixty-five researchers, data managers, information specialists, archivists, and librarians from thirteen countries to discuss the issues involved in making critical decisions regarding the long-term preservation of the scientific record.”
108498479058336769
Thursday, May 20th, 2004Resources, Reports, Tools, Lists, and Full-Text Documents
Iraq–Media
Source: Institute for War and Peace Reporting
Iraqi Press Monitor
“IWPR’s Iraqi Press Monitor is a daily survey of the main stories in Iraq’s newspapers. It features the top 7 stories of the day, along with a political cartoon. Stories are selected and summarised by Ali Mohammed Jawad and Ali Kadhim Marzook in Baghdad. The selections are edited by Eric Watkins. Monday through to Thursday, the service focuses on key news stories, while on Friday it reviews the leading opinion pieces.”
–
Congressional Research Service
Source: CRS (via Franklin Pierce Law Center)
Franklin Pierce Law Center Updates CRS Page With Numerous New/Recently Updated IP-Related CRS Reports
+ Internet Taxation
+ H.R. 1417: The Copyright Royalty and Distribution Reform Act of 2004
+ Computer Security: A Summary of Selected Federal Laws, Executive Orders, and Presidential Directives
+ Protecting Noncreative Databases: Bills Before the 108th Congress
+ Internet Privacy: Overview and Pending Legislation
+ Internet Commerce and State Sales and Use Taxes
+ Obscenity, Chil Pornography, and Indecency: Recent Developments and Pending Issues
–
Armed Forces–United States
The Pentagon Launches Streaming Video News Site
This site is powered by FeedRoom technology. The “company line” direct from the Pentagon. From the site, The Pentagon Channel broadcasts military news and information for the 2.6 million members of the U.S. Armed Forces through programming including:
* Department of Defense news briefings
* Military news
* Interviews with top Defense officials
* Short stories about the work of our military
You can also find the video at http://pentagonchannel.feedroom.com. According to the web site, The Pentagon Channel is also distributed via satellite.
See Also: More About FeedRoom in this 2002 Posting
–
Baby Names–United States–Lists & Rankings
Source: SSA
Recently Released, Most Popular Baby Names 2003
You can more info (geographic breakdowns, names by decade, historical lists) here.
See Also: Most Popular Baby Names in the United Kingdom 2003
108496403021474979
Wednesday, May 19th, 2004Resources, Reports, Tools, Lists, and Full-Text Documents
Iraq–Environment
Source: United Nations Environmental Programme
Recently Released: Desk Study on the Environment in Iraq (PDF; 2.11 MB)
“The approach of this Desk Study is environmental and technical. The intent is not to attach blame for various environmental problems. Rather, it is to provide an overview of chronic and war-related environmental issues, and to identify the steps needed to safeguard the environment. Top priorities include environmental issues that have a direct link with easing the humanitarian situation, especially the restoration of water, power, sanitation networks and ensuring food security.”
–
Conventions–Glossary
Source: Convention Industry Council
Convention Industry Glossary
“Do you know podium from lectern? Lavaliere microphone from handheld? Search this glossary of almost 3,800 terms, acronyms, and abbreviations and save yourself time, money, and trouble.”
–
Lobbyists–United States
Source: Meriam Library, California State University, Chico
Political Advocacy Groups: A Directory of United States Lobbyists
“To research the ideology of an editorialist or understand why a group was consulted, refer to their homepage through the alphabetic list found here. To find a source for a story or a perspective on an issue, browse the subject arrangement to choose an appropriate group.” Maintained by Kathi Carlisle Fountain, Reference/Political Science & Social Work Librarian.
–
State Courts–United States
Source: National Center for State Courts
Public Access to Court Records
“This site is an information clearinghouse on the topic of public access to court records and the current debate on privacy concerns that arise as courts improve and expand their court information systems and put more information on the Internet.”
–
Election 2004
Factiva Media Visibility Index (SM) Tracks the Hot-Button Issues for 2004 Presidential Election During the Week Ending May 9, 2004
–
Butterflies
Butterflies of North America
Source: USGS
“The Butterflies of North America Web site is a ‘work in progress,’ consisting primarily of the following information:
+ Distribution maps showing the counties in which occurrence of a particular species has been verified
+ Photos of the adult and caterpillar (when available)
+ Species accounts containing information on size, identifying characteristics, life history, flight, caterpillar hosts, adult food, habitat, species range, conservation status, and management needs
+ Species checklists for each county in the U. S. and state in northern Mexico”
–
Plants
Source: Cornell University Animal Science Department
Poisonous Plants Information Database
“This is a growing reference that includes plant images, pictures of affected animals and presentations concerning the botany, chemistry, toxicology, diagnosis and prevention of poisoning of animals by plants and other natural flora (fungi, etc.).” Search by scientific name, common name, primary poisons, species most often effected. Browse alphabetical lists of common names or botanical names. Includes FAQ about poisonous plants, information on toxic and medicinal agents in plants, identification of species of animals most commonly affected, links to related sites.
An extra special thanks to Shirl Kennedy, ResourceShelf Contributing Editor, for all of her help today.
Sites to try when other engines fail you
Wednesday, May 19th, 2004Online Research
Web Search
Source: San Jose Mercury News
Sites to try when other engines fail you
Thanks to Michael Bazeley for soliciting my comments (he also asked Tara at ResearchBuzz). I’m happy to see that he decided to include a mention of the vast online resources that public libraries offer. It’s also exciting that Bazeley and the editors at the Merc realize that other online search tools exist.
108490681387072839
Wednesday, May 19th, 2004Professional Reading Shelf
Scholarly Communication
Source: C&RL News
Report: Scholarly Communication in Europe
“SPARC Europe Director David Prosser writes an article for the May issue of C&RL News presenting recent developments from a European perspective.”
–
Weblogs
Academic Libraries
Source: Library Journal
UM Library Offers Free Blogs
“With the April launch of UThink, a program under the library’s auspices to offer free blogs to the university community, Minnesota is among the first university libraries to become the center for blogging.”
108498145169860463
Wednesday, May 19th, 2004Citation Analysis
Scholarly Publishing
Source: ISI
+ Science in Japan, 1999-2003
+ Mathematics: High-Impact U.S. Universities, 1999-2003
+ Journals with Multiple Hot Papers
+ Space Science: High-Impact Institutions, 1993-2003
Endeca Now Powering Abebooks
Wednesday, May 19th, 2004Enterprise Search
Endeca Now Powers Abebooks Database
We try to keep you informed with what large public facing web sites begin using new search software. This is currently the case at Abebooks, the used book database/vendor. The site now uses Endeca Search and “Guided Navigation” technology. Here’s an overview of what’s available. It’s also possible to browse titles by category. What I like most about Endeca is the ease with which a user can refine their results by simply pointing and clicking the refinements listed on the right side of a results page.
See Also: Direct to Abebooks Advanced Search Interface
See Also: More About Endeca from SearchTools.com
Feds help create PDF archiving standard
Wednesday, May 19th, 2004Digital Archives
Source: GCN
Feds help create PDF archiving standard
From the article, “A number of federal agencies are working to create an archiving version of the Portable Document Format, offered by Adobe Systems Inc. of Mountain View, Calif. The version will be submitted to the International Organization for Standardization for approval as an international standard. The committee hopes to release a draft of the PDF/A standard by early next year with a final standard out by the end of 2005…Federal agencies are grappling with the issue of archiving documents for long-term storage. ‘The task is a difficult one given the ever-changing nature of the IT industry,’ Levenson said. ‘Software or operating systems in use today may be hard to locate 50 years from now. Agencies need to ensure the information they have is available for the public.’ Levenson said that his own agency, the U.S. Courts, is in the midst of converting to an electronic filing system, so the agency wants to ensure that the files it archives will remain easily readable for the ages.”
See Also: Direct to PDF-Archive Committee Home Page
108497679162974730
Wednesday, May 19th, 2004Web Search–Google
Source: The New York Times
Google and Desktop Search
More hype or yet another business that Google is getting into. John Markoff reports that Google is preparing to launch a desktop search tool. Google declined to comment (what else is new). The article does not mention that Hotbot/Terra Lycos already offers a free tool (using dtSearch technology) that allows you to search your hard drive. It also allows you to keyword search your RSS feeds. I also like Scopeware for desktop searching.
and in other Google News…
GEICO Sues Google and Overture over Trademarks (via News.com)
Lawsuits over trademarks will be a major search-related news story. I think it’s a bit ironic that GEICO, a company owned by Warren Buffet’s Berkshire Hathaway, is suing the company. You may remember that the Google IPO letter was inspired by Buffet and includes a quote by the legendary investor.
All About Building Nutch: A Case Study
Tuesday, May 18th, 2004Web Search–Nutch
Source: ACM Queue
All About Building Nutch: A Case Study
This article from the April issue of ACM Queue was written by Mike Cafarella and Doug Cutting from the open source search engine Nutch. From the article, “…we started the Nutch software project, an open source search engine free for anyone to download, modify, and run, either as an internal intranet search engine or as a public Web search service. As you may have just read in Anna Patterson’s “Why Writing Your Own Search Engine Is Hard,” writing a search engine is not easy. As such, our article focuses on Nutch’s technical challenges, but of course we hope Nutch will offer improvements in both the technical and social spheres. By enabling more people to run search engines, and by making the code open, we hope search algorithms will become as transparent as their importance demands.
See Also: Additional Search-Related Articles from the April Issue of ACM Queue
+ A Conversation with Matt Wells (Gigablast’s Developer)
+ Searching Vs. Finding
By William A. Woods from Sun Microsystems Laboratories.
+ Enterprise Search: Tough Stuff
This article was written by Rajat Mukherjee and Jianchang Mao from Verity.
