Optical character recognition (OCR) is "Recognition of printed or written characters by a computer. Involves computer software designed to translate images of typewritten text — usually captured by a scanner — into machine-editable text or to translate pictures of characters into a standard encoding scheme representing them in ASCII or Unicode." (MultiLingual)

"The accurate recognition of Latin-script, typewritten text is now considered largely a solved problem. Typical accuracy rates exceed 99%, although certain applications demanding even higher accuracy require human review for errors.
Recognition of hand printing, cursive handwriting, and even the printed typewritten versions of some other scripts (especially those with a very large number of characters), is still the subject of active research." (Wikipedia)

OCR software


  • OmniPage by Nuance is a well known commercial scanning software
  • FineReader by ABBYY is noted for recognition of Latin extended characters (although there is a question as to whether it can recognize some diacritic characters)



MultiLingual. 2007. "Glossary." 2007 Resource Directory. http://www.multilingual.com/resourceDirectory/

Wikipedia, "Optical character recognition," http://en.wikipedia.org/wiki/Optical_character_recognition