Projects / tesseract-ocr


tesseract-ocr is an OCR engine originally developed by Hewlett Packard and now sponsored by Google. It is highly accurate and will read a binary, gray, or color image and output text.

Operating Systems

RSS Recent releases

  •  04 Nov 2012 01:50

Release Notes: This release adds a C API, a new solution for VS (2008), right-to-left/Bidi capability in the output iterators for Hebrew/Arabic, paragraph detection in layout analysis/post OCR, fixes for inconsistent xheight during training and over-chopping, simultaneous multi-language capability, a refactored top-level word recognition module, an experimental equation detector, improved handling of resolution from input images, and a blamer module for error analysis. It cleans an externally-used namespace by removing includes from baseapi.h.

  •  31 Oct 2011 19:28

    Release Notes: This release adds thread safety, a recognizer for Arabic, PageIterator and ResultIterator, and more.

    •  02 Oct 2010 21:21

    Release Notes: Preparations were made for thread safety. A major new page layout analysis module was added. HOCR output was added. Many more languages were added. Most of the function header comments were documented with doxygen. Leptonica was added for main image I/O and handling.

    RSS Recent comments

    08 Mar 2011 10:57 Teiman Thumbs up

    Its easy to use, and has a good quality of recognition. I recommend it over other similar engines.


    Project Spotlight


    A utility for monitoring Unix system services.


    Project Spotlight

    Performance Co-Pilot

    performance monitoring toolkit and API