
There's
a very exciting project to help with the digitisation
of old titles, called reCaptcha. http://recaptcha.net/learnmore.html

It's an online security device to prevent abuse
from "bots" by getting humans to type
in what they see - text which cannot be recognised
by a computer.
There are two words in every reCaptcha. One is
known to the computer (the "control"
word) and the other is supplied by the book digitisation
project at the Internet Archive - http://www.archive.org/
Scanned books (for which there are no digital
files) are sent through OCR software (Optical
Character Recognition) which automatically "reads"
the letters and converts them into editable, and
therefore searchable, text. Some letters are not
correctly recognised by the software, and this
is where the system needs a little human help.
Every time a reCaptcha is entered, the control
word (know to the computer) is compared to what
the computer expects to see. If it's correct,
the security check is complete and you pass the
security test. The other word (from the Internet
Archive) is sent to the archive and the previously
unrecognised word is replaced with the human interpretation.
The user doesn't know which is the control word
and which is the unknown word, and in this way,
both are supplied correctly.
In the ten seconds it takes you to enter the two
words, you've made a small contribution to worldwide
knowledge.
Ten seconds is not a lot for an individual, but
when one considers that about 60 000 000 (60 million!)
reCaptchas are solved every day,
it equates to about 150 000 hours per
day spent on the digitisation of books!
So remember, next time you enter a reCaptcha,
you're helping to grow the worldwide knowledge
base!
|