Archive for the ‘Search Tools’ Category

United Nations Development Group Members — Job Links

Monday, September 28th, 2009

UNDG Members – Job Links

The United Nations Development Group (UNDG) unites 32 UN funds, programmes, agencies, departments, and offices plus five observers that play a role in development. Together, the UNDG coordinates efforts to deliver coherent, effective and efficient development assistance at the country level. For the UNDG to accomplish this critical duty, talented and committed human resources are essential. Each UNDG organization recruits staff independently. The links below will route you to the UNDG members’ job vacancy pages. Please note the links will open in a new browser window.

Long list of direct links to agency career sites.

Source: United Nations Development Group

Peter Jacso Takes on Google Scholar Finding Ghost Authors, Lost Authors, and Other Problems

Thursday, September 24th, 2009

Access the Full Text of the Entire Article

With all of the talk about Google Book Search lately, little has been written about Google Scholar. Now, in a lengthy and well-documented analysis (numerous screenshots) published in Library Journal, Dr. Peter Jacso from the University of Hawaii at Manoa, a monthly columnist for Gale/Cengage and a friend of ResourceShelf, documents some of the problems (two of them named in the title of the article) that he has found while using Google Scholar [GS] during the past several months. Actually, some of the problems go back years.

Here are just a few passages from Dr. Jacso’s article that we found to be of greatest interest:

They [the Google Scholar developers] decided—very unwisely—not to use the good metadata generously offered to them by scholarly publishers and indexing/abstracting services, but instead chose to try and figure them out through ostensibly smart crawler and parser programs.

Millions of records have erroneous metadata, as well as inflated publication and citation counts

A free tool, Google Scholar has become the most convenient resource to find a few good scholarly papers—often in free full-text format—on even the most esoteric topics. [Our emphasis] For topical keyword searches, GS is most valuable. But it cannot be used to analyze the publishing performance and impact of researchers.

Very often, the real authors are relegated to ghost authors deprived of their authorship along with publication and citation counts. [Our emphasis] In the scholarly world, this is critical, as the mantra “publish or perish” is changing to “publish, get cited or perish.”


[Our emphasis] While GS developers have fixed some of the most egregious problems that I reported in several reviews, columns and conference/workshop presentations since 2004—such as the 910,000 papers attributed to an author named “Password”—other large-scale nonsense remains and new absurdities are produced every day.

The numbers in GS are inflated for two main reasons. First, GS lumps together the number of master records (created from actual publications), and the number of citation records (distinguished by the prefix: [citation]) when reporting the total hits for author name search.

…fee-based Web of Science and Scopus have lower article and citation counts and scientometric indicators, as they have a far more selectively defined source base with fewer journals from which to gather publication and citations data. In addition, they count only the master records for the authors’ publication count (as they should), and keep the stray and orphan citations in a separate file.

Unfortunately, the bad metadata has a long reach. These numbers are taken at face value by the free utilities such as the Google Scholar Citation Count gadget by Jan Feyereisl and the sophisticated and pretty Publish or Perish (PoP) software (produced by Tarma Software).

As about 10.2 million records from GBS [Google Book Search] are incorporated now in GS, the metadata disaster likely will continue unabated. It is bad enough to have so many records with erroneous publication years, titles, authors, and journal names.

In its stupor, the parser fancies as author names (parts of) section titles, article titles, journal names, company names, and addresses, such as Methods (42,700 records), Evaluation (43,900), Population (23,300), Contents (25,200), Technique(s) (30,000), Results (17,900), Background (10,500), or—in a whopping number of records— Limited (234,000) and Ltd (452,000). The numbers kept growing by several hundred thousands hits for the cumulative total of the above ”authors” during the few days this paper was being written. More screenshots are available here.

Lost Authors

These errors could be considered relatively harmless if they did not affect the contributions of genuine, real scholars. But the biggest problem is when the mess replaces real scholars with ghost authors, leaving the former as lost authors.


[Our emphasis] Certainly the entire database isn’t rotten, just a few million records. That may be a relatively small percentage—Google won’t reveal the total number of records, and these are just my few forensic search test queries—but there’s ample cause for worry.

In case of GBS [Google Book Search], Google relied on its collective Pavlovian reflex to blame the publishers and libraries (meaning the librarians, catalogers, indexers) for the wrong metadata.

In the case of Google Scholar, these same Googlish arguments will not fly, because practically all the scholarly publishers gave Google—hats in hand—their digital archive with metadata. The idea was to have Google index it and drive traffic to the publishers’ sites.

Yes, GS has fixed fairly quickly some of the major errors that I earlier used to demonstrate its illiteracy and innumeracy, but have so far left millions of others untouched.

GS designers have sent very under-trained, ignorant crawlers/parsers to recognize and fetch the metadata elements on their own. Not all of the indexing/abstracting services are perfect and consistent, but their errors are dwarfed by the types and volume of those in GS. This is the perfect example of the lethal mix of ignorance and arrogance GS developers applied to metadata and relevance ranking issues.

The parsers have not improved much in the past five years despite much criticism. GS developers corrected some errors that got negative publicity, but these were Band-Aids, where brain surgery and extensive parser training is required. Without these, GS will keep producing similar errors on a mega-scale.

Again, these highlights are a only a small portion of the entire article that also includes numerous screenshots. You can access the full text here.

Source: Library Journal

U.S. Congress Considers Building a Bailout Database

Friday, September 18th, 2009

From the Article:

One year after the Wall Street meltdown, Congress is considering a bill that would build a massive database to track bailout funds.

[Snip]

If the bill to create a centralized database makes it through Congress, President Obama may have no reason to reject it. The White House has been pushing for open government data and has built new services, such as Data.gov and Recovery.gov, which tracks stimulus spending.

U.S. Rep. Carolyn Maloney (D-N.Y.) , who introduced the legislation (HR 1242), has not offered a cost estimate but is adamant about the need to track the funds. Maloney said she wants a technology that’s capable of monitoring spending in near real time.

At a House Committee on Financial Services subcommittee hearing today, Maloney said the TARP data isn’t usable. “You have to go to 25 different agencies to put it together,” she told the committee.

Source: Computerworld

See Also: Track the Bill, HR 1242 (via GovTrack.us)

Online Database: Search FBI Uniform Crime Reports

Friday, September 18th, 2009

+ Search by State and City (if available).

+ Years: 2005-2008.

Access Database

Source: FBI Uniform Crime Reporting Program

New from Google Labs: “Fast Flip” Your News and Magazine Reading and NY Times Has “Skimming” Prototype

Tuesday, September 15th, 2009

From Greg Sterling’s SEL Blog Post:

The previously rumored Google news site “Flipper” is in fact launching today as “Fast Flip” in Google Labs. But maybe it should be called Google Skimmer because it permits people to move very quickly through lots of visually rich news pages from dozens of partner publications. According to the Google Blog Post:

Fast Flip is a new reading experience that combines the best elements of print and online articles. Like a print magazine, Fast Flip lets you browse sequentially through bundles of recent news, headlines and popular topics, as well as feeds from individual top publishers. As the name suggests, flipping through content is very fast, so you can quickly look through a lot of pages until you find something interesting. At the same time, we provide aggregation and search over many top newspapers and magazines, and the ability to share content with your friends and community. Fast Flip also personalizes the experience for you, by taking cues from selections you make to show you more content from sources, topics and journalists that you seem to like. In short, you get fast browsing, natural magazine-style navigation, recommendations from friends and other members of the community and a selection of content that is serendipitous and personalized.

Much More in Greg Sterling’s Search Engine Land Post (with screen shots) where he points out that iPhone and Android versions are available.

Access Fast Flip (Beta) from Google Labs

Source: Search Engine Land

See Also: It’s worth noting that New York Times has offered an “Article Skimmer” prototype since early 2009 when it was described in this article.

Here at The Times, we often hear a common story of usage from our customers: Reading the Sunday Times, spreading out the paper on a table while eating brunch. For many of our customers, this ritual is fundamental to their enjoyment of the weekend, and its absence would be jolting.

[Snip]

Instead, our focus was on the fundamentals of the experience. It is empowering to spread so much information out on a table, so we spread as many stories as we can fit into the space of your screen. It is easier and more relaxing to scan a surface of information than flip through a stack, so information is laid out in a rigid two-dimensional grid. The sections do not flip into place; instead, they slide up and down. If you want to imagine the whole of the content as a giant uncut scroll of paper, don’t let us stop you.

In June, 2009 “Article Skimmer” had its third release. You can read about it here.

The third release prototype has (according to the paper):
+ Improved Navigation
+ Arrow Keys: Even More Useful (Many users have expressed delight at being able to move around the article skimmer using the up and down arrow keys. Now you can move between pages using the left and right arrow keys.
+ The Addition of the Times Wire

Bing 2.0 “Visual Search” (Beta) Launches, Allows Search By Pictures

Monday, September 14th, 2009

Bing 2.0 Visual Search: Their Motto? “Start with pictures to find results faster!”

Access Bing Visual Search (Beta)

From the SEL Blog Post:

Bing Visual Search lets searchers browse easily through a slick interface of “structured data sets from trusted partners” using Sliverlight technology. At launch, Bing Visual Search will earn a spot on the homepage search categories, just under Travel, although depending on the homepage image of the day, those links can sometimes get lost in the background colors of the photo.

[Snip]

The concept behind Visual Search is simple: use clear imagery to help users sort through large sets of data easily. Certain categories of search lend themselves more easily to this than others, likely the reason why Bing has launched this feature in beta with a fairly limited set of visual information: cars, animals, people and products. Users must have Silverlight installed on their browsers to fully experience Visual Search.

[Snip]

Research-based topics including politicians, US States and items like the periodic table are useful applications, but perhaps only to a limited audience, such as younger students working on school products. However, this also affords Bing the opportunity to appeal to a new generation of searchers, who are highly dependent on visual cues and ease of use, as iPods and iPhones have shown us. Of course, visualization does have additional appeal to the middle demographic using those products as well.

Much More in Elisabeth Osmeloski’s Blog Post

Source: Search Engine Land

A bit more from the ResourceShelf Team:

To limit your visual search, look for a group of narrowing limits located in the left margin of a visual search category page. Here’s an example for U.S. Politicians. You can narrow by:

+ Party
+ State Represented
+ First Term
+ Gender

In this case, you can also place your cursor on top of an image and identify the politician by name, party affiliation, and age.

Once you’ve made your selection, the politicians name is automatically placed in the search box ready conduct a web search.

Compare the politicians search with this one for Billboard’s past songs. On the songs home page you can sort by song title. In the left margin you can narrow by:

+ Decade
+ Year
+ Artist
+ Genre

IBM’s New Image Recognition-Based Search and a Few Others

Thursday, September 10th, 2009

From the Article:

We’ve all seen photos of ourselves in locations we can’t quite remember. Often they’re from exotic travels or from days long past. Regardless of the reason for your memory loss, IBM is working on a tool that can help. In collaboration with the European Union consortium, the company is testing SAPIR (Search in Audio-Visual Content Using Peer-to-peer Information Retrieval). The image matching search technology allows users to pull results from large collections of audio-visual content without using tags for search. Instead, users can upload images and match them to similar ones – perhaps even ones with signage and labels. The system analyzes everything from digital photographs, to sound files to video. From here it automatically indexes and ranks the media for retrieval.

Source: ReadWriteWeb

On a Somewhat Related Note:

See Also: LTU Technologies

See Also: LTU Demo Photo Search Using Corbis Images

See Also: A Picture is Worth a Thousand Words: Image Search Technology (via Digital Buzz)
From the article:

“Our technology examines the pixel content of images, the different shapes, the structure, the texture, the colors, the arrangements,” Winter says. “We encode that into a bit of binary code that we call the image DNA. That image DNA is sort of a mid-level description of the image. We use that data to compare images and classify them and track them. We can actually compare image DNA pretty easily.”

See Also: Like.com

Like.com is the first true visual search engine, where the contents of photos are used to search and retrieve similar items.
+ Likeness Search – the ability to search by image instead of text
+ Like Detail – finds items that have a specific feature you like (such as a buckle, straps, bezel, etc)
+ Like Color – find color variants of the item you desire
+ Like Celebrity – find clothing, shoes and accessories similar to those worn by your favorite celebrities
+ Like This – the ability to upload your own photo of your favorite item and find the same or similar product

CrowdEye Twitter Search Upgrades With Google-Like Features

Thursday, September 3rd, 2009

Matt McGee writes:

CrowdEye, one of the new Twitter-based real-time search engines that has launched this year, has just upgraded its service with several new features that will remind you of … Google? It’s true. The service now includes a PageRank-like measurement system, a customizable home page (sorta like iGoogle), and the ability to perform site: searches like you would on Google.

Matt continues to review (with screen caps) the new features including:

+ Personalized Home Page
+ CrowdEye Rank
+ Site: Searches

Source: Search Engine Land

Greg Sterling on Local Search Coming to Twitter with Idearc

Wednesday, September 2nd, 2009

Greg Sterling explains how Idearc works (with screen caps):

1) You follow sp411 and then it will automatically follow you a few seconds later
2) You then send a direct message to sp411 (”d sp411?) with a query and location. Example: pizza in Seattle, “d sp411 pizza Seattle”.
3) Results will appear in an all the Twitter notification places (email, SMS and direct message)

Listings come from Superpages.

Source: Search Engine Land

Netbase Debuts HealthBase Demo

Wednesday, September 2nd, 2009

From the Article by Greg Sterling:

To “come out” in a manner of speaking and demonstrate its capabilities to a broader public, Netbase has launched vertical search site HealthBase, a kind of “technology showcase” for the company’s “content intelligence” platform and semantic search capabilities. If HealthBase gets a positive response I was told perhaps the company will move into the consumer search business. But that’s not the main point of the site at the moment. Indeed there’s a very “enterprise-y” quality to the look and feel of HealthBase.

To “come out” in a manner of speaking and demonstrate its capabilities to a broader public, Netbase has launched vertical search site HealthBase, a kind of “technology showcase” for the company’s “content intelligence” platform and semantic search capabilities. If HealthBase gets a positive response I was told perhaps the company will move into the consumer search business. But that’s not the main point of the site at the moment. Indeed there’s a very “enterprise-y” quality to the look and feel of HealthBase.

Access HealthBase

Continue Reading the Search Engine Land Article

Source: SEL

A Few Comments from Gary

1) First, a bit of a stickler.
Yes, it’s the first day for HealthBase (and things can change quickly) but it would be useful if the HealthBase would provide a complete list of the sources it’s crawling from. They have a small list on the first page of each section (and that’s a good start) but a complete list would be even more helpful to researchers. We do give kudos to HealthBase for providing a “source list” to show where the results come from. However, the kudos only go so far. Why? If you search for “causes of H1N1 (swine flu),” clicking the source list takes you to the source but makes you rerun the entire search again. Not very helpful.

2) One of the sources not listed on the first page but we did find in results from Wikipedia. We’ll keep the “is Wikipedia useful for health researchers” argument out of it for now. We did a search for “poor posture” in the “causes and conditions” tab. OK, no problem. We then selected “Joint.” The second result was from Wikipedia, dated April, 2009. One of Wikipedia’s strengths (and maybe a weakness in some cases) is it’s currency. It would be useful to let users know that this is (is it?) the most current version of the material available. When we found other Wikipedia material, they contained other dates. Here’s an example. We searched (using the “causes of conditions tab” for diabetes). Under the “infection” tab we found a Wikipedia result from May, 2009.

3) Finally, when searching for journal material (like what you find in PubMed) it takes some clicking around to find the full citation to the abstract (if available) and bibliographic information. HealthBase could and should make this easier. In fact, they could work with database vendors and document delivery services to provide full text access to the article.

An In-Depth Look at Surchur

Tuesday, September 1st, 2009

Matt McGee writes:

Surchur has been around for more than a year, but its recent facelift aims to take real-time search toward a new idea: real-time discovery.

[Snip]
…Surchur is going a lot further with its new home page. Founder Todd Hogan calls it a merger of real-time search with real-time discovery.

The home page is now positioned as a “Real-Time Board” that shows trending topics from Google Trends, Yahoo Buzz, Bing xRank, CNN Popular Stories, Twitter, and Technorati. This is the discovery aspect: There’s no need to search for what’s hot when Surchur is doing it already, and ranking what it finds from a number of sites. The hot terms are divided into two categories: Hot Topics, which is based on an overall score, and Catching Fire, which lists the fastest-growing terms. As you’d expect, there’s occasionally some overlap in the two categories.

Hogan says the Real-Time Board is updated every 20-50 minutes. For each term, it ranks the popularity on a scale of 1-10, with separate marks for how hot the term is on Twitter, in the blogosphere, and on Surchur itself. A fourth column reveals where Surchur found the hot term.

Access Surchur

Much more in the review (with screen caps) here.

Source: Search Engine Land

It’s semantic – easier solution to annotate and search images

Monday, August 31st, 2009

From the Article:

Innovative software developed in Europe that makes it easier to organise, search and navigate collections of digital images will soon be available to media agencies, photographers and, potentially, anyone trying to keep up with photo-happy Facebook or Flickr friends.

The ImageNotion software, which is expected to go on sale next year, takes a user-friendly approach to semantic image annotation and search, a technology that links the content of photos to concepts so as to make the images understandable by computers.

Such systems have typically required end users to use a manually developed ontology – a lexicon of predefined concepts used to assign machine-readable semantic meaning to information – and then train the software to correctly annotate different images. For example, an apple would need to be defined in an ontology for fruit and then photos of apple trees could be tagged as such.

The ImageNotion system strips away much of that complexity for the end user, combining semantic annotation with a variety of other technologies, from text mining and object recognition to face detection and face identification, in order to permit many more images to be accurately annotated with little or no user intervention.

“When you mention ontologies to most people they just switch off. A photographer, an image agency employee or a web user doesn’t want and shouldn’t have to learn how the technology works, they just want to be able to use it,” explains Gabor Nagypal, who oversaw development of the ImageNotion software as technical and scientific coordinator of the EU-funded IMAGINATION project. “Because of that, our goal has been to make the technology transparent and intuitive to use,” he adds.

Demo ImageNotion

Source: ICT CORDIS (via ACM TechNews)

Image Searching: New Look, Advanced Features for NLM Images from the History of Medicine (IHM)

Thursday, August 27th, 2009

From the Announcement:

The History of Medicine Division of the National Library of Medicine announces the launch of a new image platform for its premier database, Images from the History of Medicine. Using award winning software developed by Luna Imaging, Inc., NLM offers greatly enhanced searching and viewing capabilities to image researchers. Patrons can view search results in a multi-image display, download high resolution copies of their favorite images, zoom in on image details, move images into a patron-defined workspace for further manipulation, and create media groups for presenting images and sharing them via e-mail or posting on blogs. With these new capabilities, NLM greatly enhances usability of its image collection, where inspection and comparison of images is often as important as access to bibliographic data. IHM is available free of charge.

Source: National Library of Medicine Technical Bulletin

Spanish Language Content: Univision.com Boosts Video Search With AOL’s Truveo

Wednesday, August 26th, 2009

From the Article:

Visitors to Univision.com are in for a completely different video-search experience, as they will now be able to browse a comprehensive worldwide library of Spanish-language videos made available thanks to a new partnership between Univision Interactive Media and Truveo, the AOL-owned video-search engine.

Source: MultiChannel News

Labeling Library Archives Is a Game at Dartmouth College

Wednesday, August 26th, 2009

From the Blog Post:

Professor Mary Flanagan wants students to go online and label library archives – for free.

Ms. Flanagan, a digital-humanities professor at Dartmouth College, is creating an Internet-based game in which users create descriptive tags for library images to improve searching through the library’s database. Although the program will be tested at the college’s library, Ms. Flanagan says the game will be open source and available for others to download and build upon.

Source: The Wired Campus

See Also: Google Image Labeler (via Wikipedia)

See Also: Direct to Google Image Labeler

Wikipedia to Limit Changes to Articles on People

Tuesday, August 25th, 2009

From the Article:

Officials at the Wikimedia Foundation, the nonprofit in San Francisco that governs Wikipedia, say that within weeks, the English-language Wikipedia will begin imposing a layer of editorial review on articles about living people.

The new feature, called “flagged revisions,” will require that an experienced volunteer editor for Wikipedia sign off on any change made by the public before it can go live. Until the change is approved — or in Wikispeak, flagged — it will sit invisibly on Wikipedia’s servers, and visitors will be directed to the earlier version.

Source: NY Times

A Brief Note to Google on Newspaper Digitization

Saturday, August 8th, 2009

First, congrats to Google on the massive increase (4x bigger but exact numbers were not given) to their newspaper digitization project. Well done and we look forward to more in the future. That said, may we ask a small favor? How about a list, a catalog of sorts, of the newspapers you’re making available via the project. It would be most welcome. As more papers get digitized you can simply add the titles and/or change the run dates. Heck, you can limit by newspaper* on the advanced search interface but where does one go to find the list of papers? How can one limit by date if we don’t know what date range is available? We believe this is info that should be on the search interface home page or if that’s not possible it could be placed on one of your well documented help pages.

* On the newspaper archive advanced search page Google lists NewsBank as a newspaper source. It’s not. NewsBank is an aggregator of newspapers.

Looking for More Digitized Newspapers?

Check out the Chronicling America Project from the Library of Congress. They just digitized their one millionth page. Free.

Australian Newspapers Digitisation Program
4.3 million articles (1.95 million scanned pages) are now available and full-text searchable. Some of the material is accessible via Google’s Newspaper Digitization program.

New Zealand: Papers Past
Over 1.3 million digitized newspaper pages. Free.

NewspaperARCHIVE.com
Fee-based (monthly or yearly subscription) from Heritage Microfilm. According to their documentation NA is adding about 2.5 million pages per month. Recently, they added the Stars and Stripes newspaper from 1948-1999.

UK: The Times of London (1785-1985)
Fee-Based.

British Newspapers 1800-1900
2 million pages. Fee-based.

Many libraries (of all types) provide FREE remote (access from home, office) access to digitized newspapers. Just ask your librarian.

Wolfram|Alpha Can Help You Make Smart Food Choices

Tuesday, July 28th, 2009

From the Blog Post:

Whether you are concerned about monitoring your total fat, cholesterol, sodium, sugar, carbohydrates, or other nutrients, Wolfram|Alpha can provide you with this information for an individual food item, a meal, or a comprehensive calculation of your daily diet.

Source: W|A Blog

Springer Launches Innovative Publisher-Based Image Collection

Thursday, July 16th, 2009

Barbara Quint writes:

After a year and a half of planning and development, Springer Science+Business Media , an international scholarly publisher based in Germany but operating in 20 countries, has launched SpringerImages ((www.springerimages.com). The massive collection of 1.6 million scientific, technological, and medical images includes photos, tables and figures, charts, graphs, histograms, and other illustrations. Although covering all scientific subject areas, some 61% of the collection focuses on medical and life sciences. Drawing on its own vast collection of content, Springer provides multilayered, in-depth indexing. Subscribers can use the material liberally as long as they do not use it for direct commercial purposes. In an interesting development, SpringerImages includes a small but growing collection of open access images, which are available to anyone, no registration required. (our emphasis)

Source: Information Today NewsBreaks

NIH and Wikimedia Foundation Collaborate to Improve Online Health Information

Wednesday, July 15th, 2009

From the Announcement:

The National Institutes of Health and the Wikimedia Foundation, the nonprofit organization that operates the Wikipedia online encyclopedia, are joining forces to make health and science information more accessible and reliable. This collaboration is the first of its kind for both organizations.

“NIH works to ensure that the information it provides on science and health is of the highest quality and reaches the widest audience,” said John Burklow, NIH associate director for communications and public liaison. “We look forward to this opportunity to collaborate with the Wikimedia Foundation and participate in a resource that is used by millions of people around the world.”

After the Wikipedia Academy, NIH subject matter experts will be able to contribute to Wikipedia and also help develop best practices for future sessions. Instructions about how to contribute, including video of the Wikipedia Academy at NIH, will be available on the NIH and the Wikipedia websites for scientists across the country.

Source: National Institutes of Health
Hat Tip: P.W.