Science: Internet Security Program as an Archaeological Tool

Science: Internet Security Program as an Archaeological Tool

A CAPTCHA is a distorted string of numbers or letters that must be read and typed, acting as a security measure on the World Wide Web. You might have solved a CAPTCHA before in order to gain entry into a secure website such as an email provider, ticket seller, social network, or blog. Now, researchers have modified the basic algorithm behind this online security program to help recognize words from faded texts that computerized optical character recognition programs are unable to decipher.

This new program, reCAPTCHA, was developed by Luis von Ahn and colleagues, and is currently in use by over 40,000 websites. It captures the efforts expended by human users all over the world, who collectively type more than 100 million CAPTCHAs each day. In this way, the program capitalizes on a task that only humans can perform, and computers still can not.

The reCAPTCHA program is highlighted in the 15 August issue of Science, the journal of AAAS.

Basically, in an effort to preserve human knowledge and to make information more accessible to the world (as well as to make a profit), physical books and other texts are being digitized en masse. But the numbers and letters on a page are often faded or otherwise obscured, especially since many of these texts are old, worn, and out-of-print.

Specialized character-recognition computer programs scan the physical documents and create bitmap images of the text. From these images, the programs can often determine the intended message and re-create the actual text in digital form. However, this technology is far from perfect, and on average, the programs fail to recognize 20% of the text they convert to images. This is where reCAPTCHA comes into play.

Source: American Association for the Advancement of Science

Comments are closed.