Posts Tagged

OCR (Optical Charecter Reader)

Extract Text From PDFs (Including Scan Copy) – Ubuntu way

To extract all text from PDFs (including text in images/Scan copy), we can use a combination of Ghostscript and a command line OCR tool called tesseract-ocr. First we need to convert our PDF to individual image files (TIFF) so we can then OCR-scan them again. We need Ghostscript for that. It’s probably already installed on…

Read More