jPDFText is a Java library to extract text from PDF documents. PDF documents can be processed to extract the textual content for archiving, storage, searching, or indexing. jPDFText is built on top of Qoppa's proprietary PDF technology, so there is no need for any third party software or drivers. Main Features: loading PDF documents from files, network drives, URLs, or input streams; extracting text; and extracting words as a vector of Strings. It is written entirely in Java, which allows your application to remain platform independent. There is no need to install or configure additional drivers or software when deploying.
jPDFViewer is a Java bean which embeds a PDF viewer in your Java applications and applets. It can read, display, and print PDF files, fill interactive PDF forms (acroforms, xfa forms), view all markup annotations, validate and display digital signatures, and perform text search, selection, and highlighting. It provides easy navigation with different views: thumbnails, bookmarks, annotations, etc. It has a customizable toolbar and user interface. It supports all image types, including JBIG2 and JPEG 2000, and all PDF font types (Types 0-3, OpenType, TrueType). It supports Acrobat PDF format 1.7, including layers, all PDF color spaces, including pattern and separation, and file attachments. There's no need to install or configure additional drivers or software when deploying.
Japplis Toolbox is a compilation of text utilities in one application. It can encode and decode URL, Base64, Hex, SoundEx, or Metaphone. It can convert numbers from/to binary, octal, decimal, and hexadecimal, and to date. It gives you text information such as character count, word count, MD5, or SHA. You can get Java system properties, environment variables, or Swing default values. It checks and finds regular expressions. It can also manipulate lines of text by sorting, reversing, shuffling, deleting duplicates, trimming spaces, or numbering lines.
The Big Faceless Java PDF Viewer is a Swing component that can display PDF documents. It is intended for developers who don't require the full API. The PDF Viewer can be installed as an applet, an application, via Java Web Start, or embedded in a Swing application. Printing, saving, text search, forms, digital signatures, and annotations are some of the many features available. The viewer can be tailored to include just the features you need, and is a cost-effective solution for those needing the features of Adobe Acrobat on a Java platform.
Pymur provides Python bindings to the C++ based Lemur Toolkit. The Lemur Toolkit is an open-source toolkit designed to facilitate research in language modeling and information retrieval. Lemur supports a wide range of industrial and research language applications such as ad-hoc retrieval, site-search, and text mining.
a2b invokes conversion tools in sequence to convert files from one type to another, possibly performing some extra processing along the way. Itis a wrapper for many programs like gif2png, latex2html, netpbm, mencoder, ffmpeg, oggenc, etc., that can convert content from one type to another. It can covert text, documents, images, audio, and video and more. Some examples: a2b test.mp3 test.ogg; mencoder_opts="-ss 60 -endpos 10" a2b video.avi clip.flv; a2b -g=subtitles dvd://8 movie.sub; a2b -q http://sam.nipl.net/ sam.aac. That last example downloads the author's Web page, converts it to text with lynx -dump, speaks the text with flite or espeak, and converts the audio file to AAC with faac.