ocrmypdf - add an OCR text layer to PDF files

OCRmyPDF generates a searchable PDF/A file from a regular PDF
containing only images, allowing it to be searched.
It uses the Tesseract OCR engine and so supports all the languages
that Tesseract does.
Some other main features:
* Places OCR text accurately below the image to ease copy / paste
* Keeps the exact resolution of the original embedded images
* When possible, inserts OCR information as a lossless operation
without rendering vector information
* Keeps file size about the same
* If requested deskews and/or cleans the image before performing OCR
* Validates input and output files
* Provides debug mode to enable easy verification of the OCR results
* Processes pages in parallel when more than one CPU core is
* Battle-tested on thousands of PDFs, a test suite and continuous


Install Howto

  1. Update the package index:
    # sudo apt-get update
  2. Install ocrmypdf deb package:
    # sudo apt-get install ocrmypdf




2018-11-12 - Graham Inggs <ginggs@ubuntu.com>
ocrmypdf (6.1.2-1ubuntu1.1) bionic; urgency=medium
* Backport changes from 6.2.4-1 in cosmic for
compatibility with Ghostscript >= 9.23 (LP: #1802966)
* Drop debian/source/options, not needed in Ubuntu
* Separate xfail-on-s390x.patch into its own file
* Disable JPEG passthrough for Ghostscript >= 9.23
* DOCINFO fixes for Ghostscript >= 9.24
2018-04-24 - Graham Inggs <ginggs@ubuntu.com>
ocrmypdf (6.1.2-1ubuntu1) bionic; urgency=medium
* XFAIL tests failing on big-endian architectures,
see Debian bug #849094
2018-03-31 - Sean Whitton <spwhitton@spwhitton.name>
ocrmypdf (6.1.2-1) unstable; urgency=low
* New upstream release (Closes: #888917).
* Various updates to d/copyright due to project relicensing and source
tree rearrangement.
- Additionally update upstream contact e-mail address.
- Additionally use https for Format: field.
* Add python3-defusedxml build-dep.
* Drop python3-pytest-xdist autopkgtest dependency.
* Drop SETUPTOOLS_SCM_PRETEND_VERSION hack from d/rules.
Obsoleted by upstream changes.
* Update override_dh_auto_build for source tree rearrangement.
* Update d/tests/control for source tree rearrangement.
* Add README.Debian about the lack of PyMuPDF support.
* Add debian/NEWS to detail breaking changes in command line interface.
Breaking changes in the ocrmypdf library are not detailed because
ocrmypdf is not considered to provide a stable public API.
* Expand reasoning in first bullet point of 5.5-2 changelog entry.
* Patch setup.py to remove addopts key under tool:pytest section.
The '-n' command line option is not supported by recent pytest.
2018-01-27 - Sean Whitton <spwhitton@spwhitton.name>
ocrmypdf (5.5-2) unstable; urgency=medium
* Disable test suite at package build time.
Rely on autopkgtest instead.  The test suite now takes a prohibitively
long time to run; upstream expects it to be run after OCRmyPDF is
installed so running it during the build relies on fragile code in
d/rules; and it requires a number of heavy build dependencies which
makes it less convenient to build the package, and to backport the
package to Debian stable.
* Move test suite dependencies d/control -> d/tests/control.
* Set PYBUILD_INSTALL_ARGS to pass --force to setup.py.
This prevents the build from aborting because tools like unpaper, qpdf
etc. are not installed.  These programs are not actually needed to
build the package.
* Demote unpaper Depends -> Recommends.
Upstream considers it to be optional.
* Add --locale to help2man call in gen-man-page target.
* Regenerate manpage.
2018-01-20 - Sean Whitton <spwhitton@spwhitton.name>
ocrmypdf (5.5-1) unstable; urgency=medium
* New upstream release.
2017-12-16 - Sean Whitton <spwhitton@spwhitton.name>
ocrmypdf (5.4.4-1) unstable; urgency=medium
* New upstream release.
* Add new build-dep for test suite: python3-pytest-timeout.
* Update sed(1) call in override_dh_auto_build for changes to __init__.py.
* Update d/copyright.
- Upstream have listed Julien Pfefferkorn in LICENSE.rst but the diff
between upstream releases shows that he holds copyright on
hocrtransform.py alone.  Thus, he is not listed under "Files: *".
* Declare compliance with Debian Policy 4.1.2.
2017-10-14 - Sean Whitton <spwhitton@spwhitton.name>
ocrmypdf (5.4-1) unstable; urgency=medium
* New upstream release.
* Drop Testsuite: field.
See Lintian tag unnecessary-testsuite-autopkgtest-header.
* Bump standards version to 4.1.1 (no changes required).

