ocrmypdf - add an OCR text layer to PDF files

OCRmyPDF generates a searchable PDF/A file from a regular PDF
containing only images, allowing it to be searched.
It uses the Tesseract OCR engine and so supports all the languages
that Tesseract does.
Some other main features:
* Places OCR text accurately below the image to ease copy / paste
* Keeps the exact resolution of the original embedded images
* When possible, inserts OCR information as a lossless operation
without rendering vector information
* Keeps file size about the same
* If requested deskews and/or cleans the image before performing OCR
* Validates input and output files
* Provides debug mode to enable easy verification of the OCR results
* Processes pages in parallel when more than one CPU core is
* Battle-tested on thousands of PDFs, a test suite and continuous


Install Howto

  1. Update the package index:
    # sudo apt-get update
  2. Install ocrmypdf deb package:
    # sudo apt-get install ocrmypdf




2019-03-13 - Marc Deslauriers <marc.deslauriers@ubuntu.com>
ocrmypdf (8.0.1+dfsg-1ubuntu2) disco; urgency=medium
* tests/test_main.py: disable an additional test that uses the enormous
PDF file.
2019-03-12 - Marc Deslauriers <marc.deslauriers@ubuntu.com>
ocrmypdf (8.0.1+dfsg-1ubuntu1) disco; urgency=medium
* tests/test_main.py: disable test that uses an enormous PDF file that
fails due to insufficient RAM when autopkgtests are run.
2019-01-26 - Sean Whitton <spwhitton@spwhitton.name>
ocrmypdf (8.0.1+dfsg-1) unstable; urgency=medium
* New upstream release.
2019-01-14 - Sean Whitton <spwhitton@spwhitton.name>
ocrmypdf (8.0.0+dfsg-3) unstable; urgency=medium
* Require python3-pdfminer (>= 20181108+dfsg-3).
2019-01-14 - Sean Whitton <spwhitton@spwhitton.name>
ocrmypdf (8.0.0+dfsg-2) unstable; urgency=medium
* Revert changes in previous upload that disabled usage of pdfminer.six.
It turns out that the blocking problem was not #886291, but instead
the problem fixed by the 20181108+dfsg-3 upload of src:pdfminer.
Thanks to Daniele Tricoli for the fix.
2019-01-11 - Sean Whitton <spwhitton@spwhitton.name>
ocrmypdf (8.0.0+dfsg-1) unstable; urgency=medium
* New upstream release.
- Add tests/resources/enron1.pdf to Files-Excluded
See https://github.com/pikepdf/pikepdf/issues/21
- Patch out test_prevent_gs_invalid_xml
This test requires tests/resources/enron1.pdf
- Tighten dependency on tesseract-ocr.
- Tighten {build-,}dep on pikepdf.
* Drop dependencies on python3-pdfminer & patch pdfminer.six out of setup.py.
OCRmyPDF's usage of pdfminer is broken due to #886291.  The problem is
not likely to be fixed in time for the buster freeze, so disable
pdfminer functionality for now.
Also see https://github.com/jbarlow83/OCRmyPDF/issues/339
* Drop bogus Debian changes to upstream file tests/test_main.py by
checking out the file from tag v8.0.0+dfsg (Closes: #918891).
The changes were introduced in upstream releases 6.2.4 and 6.2.5 and
dropped by 7.4.0.  The merge of upstream version 7.4.0 into the Debian
packaging branch was not done correctly, such that the changes
2019-01-06 - Sean Whitton <spwhitton@spwhitton.name>
ocrmypdf (7.4.0-3) unstable; urgency=medium
* Upload to unstable.
2019-01-04 - Sean Whitton <spwhitton@spwhitton.name>
ocrmypdf (7.4.0-2) experimental; urgency=medium
* Regenerate manpage.
2019-01-04 - Sean Whitton <spwhitton@spwhitton.name>
ocrmypdf (7.4.0-1) experimental; urgency=medium
* New upstream release.
- Tighten {build-,}deps on python3-img2pdf, python3-pikepdf, python3-ruffus
- Drop python3-libxmp build-dep and autopkgtest dep
- Add python3-pdfminer versioned {build-,}dep.
- Add python3-cffi autopkgtest dep.
* In override_dh_auto_build, delete the line `from . import leptonica`
from debian/.debhelper/ocrmypdf/__init__.py.
The directory debian/.debhelper/ocrmypdf is just a hack so that
upstream's doc build can find the version number, and the cffi setup
does not work inside debian/.debhelper/ocrmypdf, so avoid the dlopen
2018-10-20 - Sean Whitton <spwhitton@spwhitton.name>
ocrmypdf (7.2.1-1) experimental; urgency=medium
* New upstream release.

