ocrmypdf - add an OCR text layer to PDF files

Property Value
Distribution Ubuntu 19.04 (Disco Dingo)
Repository Ubuntu Universe amd64
Package filename ocrmypdf_8.0.1+dfsg-1ubuntu2_all.deb
Package name ocrmypdf
Package version 8.0.1+dfsg
Package release 1ubuntu2
Package architecture all
Package type deb
Category universe/graphics
Homepage https://github.com/jbarlow83/OCRmyPDF
License -
Maintainer Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Download size 87.19 KB
Installed size 431.00 KB
OCRmyPDF generates a searchable PDF/A file from a regular PDF
containing only images, allowing it to be searched.
It uses the Tesseract OCR engine and so supports all the languages
that Tesseract does.
Some other main features:
* Places OCR text accurately below the image to ease copy / paste
* Keeps the exact resolution of the original embedded images
* When possible, inserts OCR information as a lossless operation
without rendering vector information
* Keeps file size about the same
* If requested deskews and/or cleans the image before performing OCR
* Validates input and output files
* Provides debug mode to enable easy verification of the OCR results
* Processes pages in parallel when more than one CPU core is
* Battle-tested on thousands of PDFs, a test suite and continuous


Package Version Architecture Repository
ocrmypdf_8.0.1+dfsg-1ubuntu2_all.deb 8.0.1+dfsg all Ubuntu Universe
ocrmypdf - - -


Name Value
ghostscript >= 9.18~dfsg~
icc-profiles-free -
liblept5 -
python3-cffi-backend-api-max >= 9729
python3-cffi-backend-api-min <= 9729
python3-chardet -
python3-img2pdf >= 0.3.0
python3-pdfminer >= 20181108+dfsg-3
python3-pikepdf >= 1.0.2
python3-pil -
python3-pkg-resources -
python3-reportlab -
python3-ruffus >= 2.8
python3:any -
qpdf >= 8.0.2
tesseract-ocr >= 4.0.0
zlib1g -


Type URL
Mirror archive.ubuntu.com
Binary Package ocrmypdf_8.0.1+dfsg-1ubuntu2_all.deb
Source Package ocrmypdf

Install Howto

  1. Update the package index:
    # sudo apt-get update
  2. Install ocrmypdf deb package:
    # sudo apt-get install ocrmypdf




2019-03-13 - Marc Deslauriers <marc.deslauriers@ubuntu.com>
ocrmypdf (8.0.1+dfsg-1ubuntu2) disco; urgency=medium
* tests/test_main.py: disable an additional test that uses the enormous
PDF file.
2019-03-12 - Marc Deslauriers <marc.deslauriers@ubuntu.com>
ocrmypdf (8.0.1+dfsg-1ubuntu1) disco; urgency=medium
* tests/test_main.py: disable test that uses an enormous PDF file that
fails due to insufficient RAM when autopkgtests are run.
2019-01-26 - Sean Whitton <spwhitton@spwhitton.name>
ocrmypdf (8.0.1+dfsg-1) unstable; urgency=medium
* New upstream release.
2019-01-14 - Sean Whitton <spwhitton@spwhitton.name>
ocrmypdf (8.0.0+dfsg-3) unstable; urgency=medium
* Require python3-pdfminer (>= 20181108+dfsg-3).
2019-01-14 - Sean Whitton <spwhitton@spwhitton.name>
ocrmypdf (8.0.0+dfsg-2) unstable; urgency=medium
* Revert changes in previous upload that disabled usage of pdfminer.six.
It turns out that the blocking problem was not #886291, but instead
the problem fixed by the 20181108+dfsg-3 upload of src:pdfminer.
Thanks to Daniele Tricoli for the fix.
2019-01-11 - Sean Whitton <spwhitton@spwhitton.name>
ocrmypdf (8.0.0+dfsg-1) unstable; urgency=medium
* New upstream release.
- Add tests/resources/enron1.pdf to Files-Excluded
See https://github.com/pikepdf/pikepdf/issues/21
- Patch out test_prevent_gs_invalid_xml
This test requires tests/resources/enron1.pdf
- Tighten dependency on tesseract-ocr.
- Tighten {build-,}dep on pikepdf.
* Drop dependencies on python3-pdfminer & patch pdfminer.six out of setup.py.
OCRmyPDF's usage of pdfminer is broken due to #886291.  The problem is
not likely to be fixed in time for the buster freeze, so disable
pdfminer functionality for now.
Also see https://github.com/jbarlow83/OCRmyPDF/issues/339
* Drop bogus Debian changes to upstream file tests/test_main.py by
checking out the file from tag v8.0.0+dfsg (Closes: #918891).
The changes were introduced in upstream releases 6.2.4 and 6.2.5 and
dropped by 7.4.0.  The merge of upstream version 7.4.0 into the Debian
packaging branch was not done correctly, such that the changes
2019-01-06 - Sean Whitton <spwhitton@spwhitton.name>
ocrmypdf (7.4.0-3) unstable; urgency=medium
* Upload to unstable.
2019-01-04 - Sean Whitton <spwhitton@spwhitton.name>
ocrmypdf (7.4.0-2) experimental; urgency=medium
* Regenerate manpage.
2019-01-04 - Sean Whitton <spwhitton@spwhitton.name>
ocrmypdf (7.4.0-1) experimental; urgency=medium
* New upstream release.
- Tighten {build-,}deps on python3-img2pdf, python3-pikepdf, python3-ruffus
- Drop python3-libxmp build-dep and autopkgtest dep
- Add python3-pdfminer versioned {build-,}dep.
- Add python3-cffi autopkgtest dep.
* In override_dh_auto_build, delete the line `from . import leptonica`
from debian/.debhelper/ocrmypdf/__init__.py.
The directory debian/.debhelper/ocrmypdf is just a hack so that
upstream's doc build can find the version number, and the cffi setup
does not work inside debian/.debhelper/ocrmypdf, so avoid the dlopen
2018-10-20 - Sean Whitton <spwhitton@spwhitton.name>
ocrmypdf (7.2.1-1) experimental; urgency=medium
* New upstream release.

See Also

Package Description
ocrodjvu_0.10.4-1_all.deb tool to perform OCR on DjVu documents
ocserv_0.12.2-2_amd64.deb OpenConnect VPN server compatible with Cisco AnyConnect VPN
ocsinventory-agent_2.4.2-2_amd64.deb Hardware and software inventory tool (client)
ocsinventory-reports_2.5+dfsg1-1_all.deb Hardware and software inventory tool (Administration Console)
ocsinventory-server_2.5+dfsg1-1_all.deb Hardware and software inventory tool (Communication Server)
octave-arduino_0.3.0-2_all.deb Octave Arduino Toolkit
octave-bart_0.4.04-2_all.deb Octave bindings for BART
octave-bim_1.1.5-6_all.deb PDE solver using a finite element/volume approach in Octave
octave-biosig_1.9.3-2_amd64.deb Octave bindings for BioSig library
octave-bsltl_1.1.1-2_all.deb biospeckle laser tool library for Octave
octave-cgi_0.1.2-2_all.deb Common Gateway Interface for Octave
octave-common_4.4.1-5_all.deb architecture-independent files for octave
octave-communications-common_1.2.1-7_all.deb communications package for Octave (arch-indep files)
octave-communications_1.2.1-7_amd64.deb communications package for Octave
octave-control_3.1.0-3_amd64.deb computer-aided control system design (CACSD) for Octave