homeabout uscontact us
 

Cape Town Book Fair

bee

 

did you know?

There's a very exciting project to help with the digitisation of old titles, called reCaptcha. http://recaptcha.net/learnmore.html


recaptcha


It's an online security device to prevent abuse from "bots" by getting humans to type in what they see - text which cannot be recognised by a computer.

There are two words in every reCaptcha. One is known to the computer (the "control" word) and the other is supplied by the book digitisation project at the Internet Archive - http://www.archive.org/

Scanned books (for which there are no digital files) are sent through OCR software (Optical Character Recognition) which automatically "reads" the letters and converts them into editable, and therefore searchable, text. Some letters are not correctly recognised by the software, and this is where the system needs a little human help.

Every time a reCaptcha is entered, the control word (know to the computer) is compared to what the computer expects to see. If it's correct, the security check is complete and you pass the security test. The other word (from the Internet Archive) is sent to the archive and the previously unrecognised word is replaced with the human interpretation. The user doesn't know which is the control word and which is the unknown word, and in this way, both are supplied correctly.

In the ten seconds it takes you to enter the two words, you've made a small contribution to worldwide knowledge.

Ten seconds is not a lot for an individual, but when one considers that about 60 000 000 (60 million!) reCaptchas are solved every day, it equates to about 150 000 hours per day spent on the digitisation of books!

So remember, next time you enter a reCaptcha, you're helping to grow the worldwide knowledge base!