Posted by: David | January 7, 2010

Did You Know …

… that those funny little verification thingies that you sometimes have to type words into to demonstrate that you are NOT a software robot, are, in some cases, helping to contribute corrections to digitized libraries?

Yeah I didn’t know that either.

A coworker was telling me about that today and I had to do some Googling as soon as I got home. So I found out that CAPTCHA (a funky acronym for Completely Automated Turing Test To Tell Computers and Humans Apart) was coined in 2000 by Luis von Ahn, Manuel Blum, Nicholas Hopper and John Langford of Carnegie Mellon University. At the time, they developed the first CAPTCHA to be used by Yahoo. You may recall that the Turing Test was developed to distinguish between software and human “intelligences”.

But reCAPTCHA is this whole verification thing with a really cool side function. And it’s a function that HUMANS are actually good for! Hip hip … hooray! Please click the link for some detail from the reCAPTCHA site and save me some copying and pasting. In a nutshell, book digitization projects (e.g. Project Gutenberg) are benefiting from human intervention which is solving some of the incorrectly scanned and OCR’d texts a little bit at a time. Not only is this brilliant, but it’s heartwarming. To combat spambots, we humans are using mistakes generated by other software, which we’re correcting “manually”. Is this cool or am I just a total fracking nerd? Don’t answer that, because I should have known about this a couple of years ago, like this other blogger did. D’OH!



  1. I did not know that. Sooo…


    I try my best to look at the bright side. 😉

    It’s not our poor writing skills, but rather our being able, as humans, to decipher words that have been poorly scanned and interpreted by the OCR (Optical Character Recognition) software used when the books are scanned. The errors arise when the software cannot decide what letter is represented by the bunch of pixels it’s looking at.

    But nice job looking at the bright side and thanks for commenting. We lost to the machines back in 1945. 😦

  2. Oh, cool! I knew what Captcha was, generally, and also Project Gutenberg, but THIS I didn’t know. 🙂

    Isn’t it? It’s like the SETI project, but using human brains and eyeballs and for totally terrestrial purposes.

  3. Please don’t taunt the machines. Long live the machines. I love them…they are great. They do nothing wrong. I was never here. Goodbye.

    Shhhhh! The Internet is listening!

  4. Well, I am happy to see that even better OCRs than I have are not very good yet.

    There are also a few OCRs online but they are not better. I am surprised that they have not yet combined the OCR with a dictionary so that in addition to pixels the computer would go through a probability calculus the same as the human brain does. Maybe it could do it even faster.

    I guess that is a valid reason for happiness. But I disagree. OCRs have improved dramatically, though still not as good as human cognition. Speech recognition is still primitive too. And yet, one can see the significant improvement in programs like Dragon Naturally Speaking from versions 10 years ago to those available now. These things are coming. I think that physical computational power has been an obstacle. But it’s a temporary obstacle.

  5. Actually, many OCR programs do indeed use dictionaries as well as trying to decipher pixels. And they use probabilities, such as Markov chains
    But the idea of having people help out is brilliant.

    If you like the idea of captcha you may also like the idea of Mechanical Turk
    Okay, so that is probably more nerdy than blogging should ever allow. Sorry

  6. Wait. Are you saying the government is watching my every move with this robot word thing . . . still trying to corner me? Should I be paranoid?

    Only if you are a robot. Otherwise, the government won’t be watching your every move, unless you are really really HOT.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s


%d bloggers like this: