The place where I can keep my story

I am building infrastructure for creative web application. Always in search of the great user experiences. Focused on great performance and deliver the best results as daily basis.

Tapan's World
Tapan's World

Home
Search

Search for:

Posts Tagged

OCR (Optical Charecter Reader)

Extract Text From PDFs (Including Scan Copy) – Ubuntu way

October 15, 2012 General, Linux, Ubuntu

To extract all text from PDFs (including text in images/Scan copy), we can use a combination of Ghostscript and a command line OCR tool called tesseract-ocr. First we need to convert our PDF to individual image files (TIFF) so we can then OCR-scan them again. We need Ghostscript for that. It’s probably already installed on…