Archive for the ‘Papers and Presentations’ Category

Paper — Extremely Fast Text Feature Extraction for Classification and Indexing

Wednesday, September 3rd, 2008

Extremely Fast Text Feature Extraction for Classification and Indexing

Most research in speeding up text mining involves algorithmic improvements to induction algorithms, and yet for many large scale applications, such as classifying or indexing large document repositories, the time spent extracting word features from texts can itself greatly exceed the initial training time. This paper describes a fast method for text feature extraction that folds together Unicode conversion, forced lowercasing, word boundary detection, and string hash computation. We show empirically that our integer hash features result in classifiers with equivalent statistical performance to those built using string word features, but require far less computation and less memory.

+ Full Paper (PDF; 348 KB)

Source: HP Labs

The Google Controversy — Two Years Later

Wednesday, August 27th, 2008

The Google Controversy — Two Years Later (PDF; 56 KB)

Two years have passed since Google startled the world with its free, online, high-resolution mapping products of the world. Foreign governments expressed their shock and concern about such detailed imagery in the hands of the general populace; their facilities and state secrets exposed to the world. “Today, with the advent of civilian satellites here and abroad, we have opened wide the window on places and events that, not so long ago, only spies could see,” writes Sharon Weinberger.

As the initial shock wore off, five main responses to the “Google threat” emerged from nations around the world: negotiations with Google, banning Google products, developing a similar product, taking evasive measures, and nonchalance. This report discusses foreign reporting and government response to the online mapping revolution after the initial brouhaha.

Source: Open Source Center (via Secrecy News)

Paper — Making Web 2.0 Accessibility Mainstream

Thursday, August 21st, 2008

Making Web 2.0 Accessibility Mainstream

Research into ‘Web 2.0 accessibility’ for people with disabilities has recently gained momentum in library and information science studies due to the unique problems disabled individuals face because they must rely on digitized formats. People with disabilities who use assistive technologies are often restricted by incompatibility issues involving software and hardware when retrieving Web content since many resources have been constructed without consideration for disabled users. The result has been a new dilemma emerging for many information centers and libraries regarding how to provide access to Web 2.0 technologies which are not designed for persons with disabilities and are incompatible with many assistive technologies. Careful consideration must be given in the development stage of web design to the layout, navigation and compatibility of different assistive technologies used to view the site.

+ Full Paper (PDF; 149 KB)

Source: Cheris Carpenter (via E-LIS)

Research Paper Looks at Biomedical Text-Mining

Thursday, July 31st, 2008

From the abstract:

Biomedical text-mining have great promise to improve the usefulness of genomic researchers. The goal of text-mining is analyzed large collections of unstructured documents for the purposes of extracting interesting and non-trivial patterns of knowledge. The analysis of biomedical texts and available databases, such as Medline and PubMed, can help to interpret a phenomenon, to detect gene relations, or to establish comparisons among similar genes in different specific databases.

Source: Gálvez, Carmen and Moya-Anegón, Félix (2008) Text-mining research in genomics. In Guimaraes, Nuno and Isaías, Pedro, Eds. Proceedings IADIS International Conference Applied Computing 2008, pp. 277-283, Algarve (Portugal).

Just Released — Academic Libraries: 2006

Tuesday, July 8th, 2008

Academic Libraries: 2006

The Academic Libraries: 2006 First Look summarizes services, staff, collections, and expenditures of academic libraries in 2- and 4-year, degree-granting postsecondary institutions in the 50 states and the District of Columbia. The nation’s 3,600 academic libraries held 1.0 billion books; serial backfiles; and other paper materials, including government documents at the end of FY 2006, and there were 144.1 million circulation transactions from their general collections. During the same time period, academic libraries’ expenditures totaled $6.2 billion.

+ Full Report (PDF; 1.1 MB)
+ Supplemental Table (PDF; 169 KB)

Open Source Software in Education

Monday, July 7th, 2008

Open Source Software in Education

Educational institutions have rushed to put their academic resources and services online, bringing the global community onto a common platform and awakening the interest of investors. Despite continuing technical challenges, online education shows great promise. Open source software offers one approach to addressing the technical problems in providing optimal delivery of online learning.

Source: Education Quarterly (EDUCAUSE)

Information Literacy from the Trenches: How Do Humanities and Social Science Majors Conduct Academic Research?

Wednesday, June 18th, 2008

Information Literacy from the Trenches: How Do Humanities and Social Science Majors Conduct Academic Research? (PDF; 697 KB)

This article examines the ways in which students majoring in humanities and social sciences conceptualize and operationalize course-related research. Findings are presented from an information-seeking behavior study with data collected from student discussion groups, a student survey, and a content analysis of professors’ research assignment handouts. Results indicate that students first use course readings and library resources for academic research and then rely on public Internet sites later in their research process. Students adopt a hybrid approach to course-related research. A majority of students in this study leveraged both human and computer-mediated resources to compensate for their lack of information literacy. In particular, students faced problems with determining information needs for assignments, selecting and critically evaluating resources, and gauging professors’ expectations for quality research.

Source: College & Research Libraries, forthcoming (Alison J. Head)

Google’s Joe Kraus on How to Make the Web More Social

Monday, June 16th, 2008

Google’s Joe Kraus on How to Make the Web More Social

Can the Internet be made more social? This is a question with which Joe Kraus, director of product management at Google, constantly has to grapple. He believes every killer app on the web — instant messaging, e-mail, blogging, photo-sharing — has succeeded because it helps people connect with one another. For Kraus, this means the Internet has an inherently social character, but it can be enhanced further — an area he continues to explore through Google initiatives such as Open Social and Friend Connect. Wharton legal studies professor Kevin Werbach spoke with Kraus recently about the increasing socialization of the Internet. Kraus will speak about social computing at the Supernova conference in San Francisco on June 16.

Audio also available.

Source: Knowledge@Wharton

Paper — Government Data and the Invisible Hand

Sunday, June 15th, 2008

Government Data and the Invisible Hand

If the next Presidential administration really wants to embrace the potential of Internet-enabled government transparency, it should follow a counter-intuitive but ultimately compelling strategy: reduce the federal role in presenting important government information to citizens. Today, government bodies consider their own websites to be a higher priority than technical infrastructures that open up their data for others to use. We argue that this understanding is a mistake. It would be preferable for government to understand providing reusable data, rather than providing websites, as the core of its online publishing responsibility.

Rather than struggling, as it currently does, to design sites that meet each end-user need, we argue that the executive branch should focus on creating a simple, reliable and publicly accessible infrastructure that exposes the underlying data. Private actors, either nonprofit or commercial, are better suited to deliver government information to citizens and can constantly create and reshape the tools individuals use to find and leverage public data. The best way to ensure that the government allows private parties to compete on equal terms in the provision of government data is to require that federal websites themselves use the same open systems for accessing the underlying data as they make available to the public at large.

Several options available for retrieval of full text (PDF; 113 KB).

Source: Yale Journal of Law & Technology (via SSRN)

Paper — Journal Prices, Book Acquisitions, and Sustainable College Library Collections

Sunday, June 15th, 2008

Journal Prices, Book Acquisitions, and Sustainable College Library Collections (PDF; 489 KB)

Library collections are economically sustainable only if the rate of increase in costs is no greater than the rate of increase in the library acquisitions budget. Because book prices increase at a much lower rate than journal prices, undergraduate libraries can achieve economic sustainability through a renewed emphasis on books rather than journals. Book?centered collections are consistent with the goals of many undergraduate colleges, and books rather than journals may provide the best teaching resources even in those fields that rely heavily on journals for the communication of original research results.

Source: College & Research Libraries, forthcoming (William H. Walters)

Report: Paying faculty to use library resources: Course enhancement grants at Ohio State University Libraries

Friday, June 13th, 2008

From the article:

At the suggestion of the assistant director for collections, instruction, and public service, the Ohio State University Libraries in fall 2005 initiated a program to provide grants to faculty members to enhance their courses with the library’s electronic resources. The purpose of this program was twofold: to maximize use of electronic resources for which the library was already paying and to encourage collaboration between faculty and librarians in course development.

The libraries initially set aside $50,000 to implement the program, deciding that for each accepted proposal the faculty member would get $2,000 to teach the course and another $2,000 if the course was taught a second time. In addition, the librarian associated with the project would get $1,000. The grants were considered incentives; there was no requirement that the money to be used to implement the activities set forth in the proposals.

Source: C&RL News

Internet Information and Communication Behavior during a Political Moment: The Iraq War, March 2003

Thursday, June 5th, 2008

Internet Information and Communication Behavior during a Political Moment: The Iraq War, March 2003

This article explores the Internet as a resource for political information and communication in March 2003, when American troops were first sent to Iraq, offering us a unique setting of political context, information use, and technology. Employing a national survey conducted by the Pew Internet & American Life project. We examine the political information behavior of the Internet respondents through an exploratory factor analysis; analyze the effects of personal demographic attributes and political attitudes, traditional and new media use, and technology on online behavior through multiple regression analysis; and assess the online political information and communication behavior of supporters and dissenters of the Iraq War. The factor analysis suggests four factors: activism, support, information seeking, and communication. The regression analysis indicates that gender, political attitudes and beliefs, motivation, traditional media consumption, perceptions of bias in the media, and computer experience and use predict online political information behavior, although the effects of these variables differ for the four factors. The information and communication behavior of supporters and dissenters of the Iraq War differed significantly. We conclude with a brief discussion of the value of “interdisciplinary poaching” for advancing the study of Internet information practices.

+ Full Paper (PDF; 352 KB)

Source: E-LIS

Lists & Rankings: Canada’s Fastest-Growing Companies

Wednesday, June 4th, 2008

Canada’s Fastest-Growing Companies, 2008

The PROFIT 100 table ranks the 100 fastest-growing companies in Canada by percentage revenue growth from 2002-07, while the Next 100 table features companies ranked from Nos. 101 to 200.

Sort tables by:
* Rank
* Alphabetical order
* Revenue 2002
* Revenue 2007
* Growth 2002-07 (%)
* Profit margin
* Employees (number of)
* Exports as % of sales

Source: Canadian Business

Exploring historical trends using taxonomic name metadata

Monday, May 19th, 2008

Exploring historical trends using taxonomic name metadata

Background
Authority and year information have been attached to taxonomic names since Linnaean times. The systematic structure of taxonomic nomenclature facilitates the ability to develop tools that can be used to explore historical trends that may be associated with taxonomy.

Results
From the over 10.7 million taxonomic names that are part of the uBio system (http://www.ubio.org), approximately 3 million names were identified to have taxonomic authority information from the years 1750 to 2004. A pipe-delimited file was then generated, organized according to a Linnaean hierarchy and by years from 1750 to 2004, and imported into an Excel workbook. A series of macros were developed to create an Excel-based tool and a complementary Web site to explore the taxonomic data. A cursory and speculative analysis of the data reveals observable trends that may be attributable to significant events that are of both taxonomic (e.g., publishing of key monographs) and societal importance (e.g., world wars). The findings also help quantify the number of taxonomic descriptions that may be made available through digitization initiatives.

Conclusions
Temporal organization of taxonomic data can be used to identify interesting biological epochs relative to historically significant events and ongoing efforts. We have developed an Excel workbook and complementary Web site that enables one to explore taxonomic trends for Linnaean taxonomic groupings, from Kingdoms to Families.

+ Full Paper (PDF; 4.2 MB)

Source: BMC Evolutionary Biology

New Technical Report from MSR: Email Information Flow in Large-Scale Enterprises

Monday, May 19th, 2008

Email Information Flow in Large-Scale Enterprises
14 pages; PDF.
by Thomas Karagiannis; Milan Vojnović

From the abstract:

We present analysis results of email communications in a large-scale enterprise network. Our study first focuses on understanding the social graph induced by email communications between individual users. Specifically, we examine how email communication flows are correlated with user profiles, the organization structure, and how outside information penetrates the enterprise. We then concentrate on understanding the information processing load imposed to users and the strategies applied by users in email triage. To the best of our knowledge, this is the first measurement study of email communications of a global enterprise network comprising email data from over 100,000 employees spread across multiple continents. Our analysis results inform the design of network applications that takes into account typical user behaviour in social interactions and solitary information processing. Our large-scale dataset further allows us to examine the validity of several hypotheses suggested by the social network theory.

Source: Microsoft Research