## START: Set by rpmautospec ## (rpmautospec version 0.2.6) %define autorelease(e:s:pb:) %{?-p:0.}%{lua: release_number = 4; base_release_number = tonumber(rpm.expand("%{?-b*}%{!?-b:1}")); print(release_number + base_release_number - 1); }%{?-e:.%{-e*}}%{?-s:.%{-s*}}%{?dist} ## END: Set by rpmautospec # Sphinx-generated HTML documentation is not suitable for packaging; see # https://bugzilla.redhat.com/show_bug.cgi?id=2006555 for discussion. # # We can generate PDF documentation as a substitute. %bcond_without doc_pdf Name: python-pdfminer Version: 20220524 Release: %autorelease Summary: Tool for extracting information from PDF documents # The entire source is MIT except: # # Public Domain: # pdfminer/arcfour.py # - If this is a bundled library, its origin is unclear # pdfminer/ascii85.py # - If this is a bundled library, its origin is unclear # # APAFML: # pdfminer/fontmetrics.py # - Data extracted and converted from the AFM files: # https://www.ctan.org/tex-archive/fonts/adobe/afm/ # # BSD: # pdfminer/cmap/* # - Both the original bundled data and the data generated from the # adobe-mappings-cmap package are BSD-licensed. # # ASL 2.0 and MIT: # pdfminer/_saslprep.py # - Forked from from ASL 2.0 code by MongoDB, Inc.—originally # pymongo/saslprep.py in mongo-python-driver (python-pymongo), with # additional modifications in pyHanko (not yet packaged). # # Note that pdfminer/glyphlist.py contains data extracted and converted from # https://partners.adobe.com/public/developer/en/opentype/glyphlist.txt under # the Adobe Glyph List License; but that this license is just an MIT variant # (https://fedoraproject.org/wiki/Licensing:MIT?rd=Licensing/MIT#AdobeGlyph). License: MIT and Public Domain and APAFML and BSD and (ASL 2.0 and MIT) URL: https://github.com/pdfminer/pdfminer.six # This has the samples/ directory stripped out. While upstream claims the # sample PDFs are “freely distributable”, they have unclear or unspecified # licenses, which makes them unsuitable for Fedora. This applies especially, # but not exclusively, to the contents of samples/nonfree. # # Generated with ./get_source.sh %%{version} Source0: pdfminer.six-%{version}-filtered.tar.xz # Script to generate Source0; see comments above. Source1: get_source.sh # Man pages written by hand for Fedora in groff_man(7) format using the # command’s --help output Source2: dumppdf.1 Source3: pdf2txt.1 BuildArch: noarch BuildRequires: python3-devel BuildRequires: make # We use the Japan1, Korea1, GB1, and CNS1 CMaps: BuildRequires: adobe-mappings-cmap-devel >= 20190730 %if %{with doc_pdf} BuildRequires: python3dist(sphinx) BuildRequires: python3-sphinx-latex BuildRequires: latexmk %endif # We do not generate BR’s from the “dev” extra because it includes an exact # version requirement on mypy (and we do not intend to do typechecking), and it # pulls in nox and black. We just want to use plain pytest. BuildRequires: python3dist(pytest) %global common_description %{expand: Pdfminer.six is a community maintained fork of the original PDFMiner. It is a tool for extracting information from PDF documents. It focuses on getting and analyzing text data. Pdfminer.six extracts the text from a page directly from the sourcecode of the PDF. It can also be used to get the exact location, font or color of the text. It is built in a modular way such that each component of pdfminer.six can be replaced easily. You can implement your own interpreter or rendering device that uses the power of pdfminer.six for other purposes than text analysis. Check out the full documentation on Read the Docs (https://pdfminersix.readthedocs.io/). Features: • Written entirely in Python. • Parse, analyze, and convert PDF documents. • PDF-1.7 specification support. (well, almost). • CJK languages and vertical writing scripts support. • Various font types (Type1, TrueType, Type3, and CID) support. • Support for extracting images (JPG, JBIG2, Bitmaps). • Support for various compressions (ASCIIHexDecode, ASCII85Decode, LZWDecode, FlateDecode, RunLengthDecode, CCITTFaxDecode) • Support for RC4 and AES encryption. • Support for AcroForm interactive form extraction. • Table of contents extraction. • Tagged contents extraction. • Automatic layout analysis.} %description %{common_description} %package -n python3-pdfminer Summary: %{summary} # The import name is pdfminer. The upstream project name (as specified in # setup.py) is pdfminer.six, which results in a canonical project name of # pdfminer-six. %py_provides python3-pdfminer-six # One file, pdfminer/_saslprep.py, is forked from from ASL 2.0 code by MongoDB, # Inc.—originally pymongo/saslprep.py in mongo-python-driver # (python-pymongo)—with additional modifications in pyHanko (not yet packaged), # where it is pyhanko/pdf_utils/_saslprep.py. # # Since this is a fork of the python-pymongo module, and since the fork is not # part of pyHanko’s public API, there is no possibility of using an unbundled # version. # # The version history of the fork is not clear. We add unversioned virtual # Provides for both libraries of origin. Provides: bundled(python3dist(pymongo)) Provides: bundled(python3dist(pyhanko)) %description -n python3-pdfminer %{common_description} %package doc Summary: Documentation for pdfminer # See the base package License field for non-MIT sources; it appears that none # of these contribute to the documentation. License: MIT %description doc %{common_description} %pyproject_extras_subpkg -n python3-pdfminer image %prep %autosetup -n pdfminer.six-%{version} # Unbundle cmap data; it will be replaced in %%build. rm -vf cmaprsrc/* pdfminer/cmap/* # Remove shebang line in non-script source sed -r -i '1{/^#!/d}' pdfminer/psparser.py # Fix unversioned Python shebangs %py3_shebang_fix tools # Imitate the “publish” GitHub action, which sets the version metadata from the # git tag when publishing to PyPI. See .github/workflows/actions.yml. sed -r -i 's/__VERSION__/%{version}/g' pdfminer/__init__.py %generate_buildrequires %pyproject_buildrequires -x %{?with_doc_pdf:docs,}image %build # Symlink the unbundled CMap resources and convert to the pickled format. for cmap in Japan1 Korea1 GB1 CNS1 do ln -s "%{adobe_mappings_rootpath}/${cmap}/cid2code.txt" \ "cmaprsrc/cid2code_Adobe_${cmap}.txt" done %make_build cmap PYTHON='%{python3}' %pyproject_wheel %if %{with doc_pdf} PYTHONPATH="${PWD}" %make_build -C docs latex SPHINXOPTS='%{?_smp_mflags}' %make_build -C docs/build/latex LATEXMKOPTS='-quiet' %endif %install %pyproject_install %pyproject_save_files pdfminer install -t '%{buildroot}%{_mandir}/man1' -D -p -m 0644 \ '%{SOURCE2}' '%{SOURCE3}' # Also, ship symlinks of the scripts without the .py extension. for script in pdf2txt dumppdf do ln -sf "${script}.py" "%{buildroot}%{_bindir}/${script}" done %check # Skipped tests (and ignored files) are those that require the sample PDFs, # which are not included in our version of the source tarball. k="${k-}${k+ and }not TestDumpImages" k="${k-}${k+ and }not TestDumpPDF" k="${k-}${k+ and }not TestExtractPages" k="${k-}${k+ and }not TestExtractText" k="${k-}${k+ and }not TestOpenFilename" k="${k-}${k+ and }not TestPdf2Txt" k="${k-}${k+ and }not TestPdfDocument" k="${k-}${k+ and }not TestPdfPage" k="${k-}${k+ and }not test_font_size" k="${k-}${k+ and }not test_paint_path_quadrilaterals" k="${k-}${k+ and }not test_pdf_with_empty_characters_horizontal" k="${k-}${k+ and }not test_pdf_with_empty_characters_vertical" %pytest -k "${k-}" \ --ignore='tests/test_tools_dumppdf.py' \ --ignore='tests/test_tools_pdf2txt.py' %files -n python3-pdfminer -f %{pyproject_files} %license LICENSE docs/licenses/LICENSE.pyHanko %{_bindir}/pdf2txt %{_bindir}/pdf2txt.py %{_mandir}/man1/pdf2txt.1* %{_bindir}/dumppdf %{_bindir}/dumppdf.py %{_mandir}/man1/dumppdf.1* %files doc %license LICENSE %doc CHANGELOG.md %doc CONTRIBUTING.md %doc README.md %if %{with doc_pdf} %doc docs/build/latex/pdfminersix.pdf %endif %changelog * Fri Jul 22 2022 Fedora Release Engineering 20220524-4 - Rebuilt for https://fedoraproject.org/wiki/Fedora_37_Mass_Rebuild * Sat Jul 09 2022 Benjamin A. Beasley 20220524-3 - Fix extra newline in description * Tue Jun 14 2022 Python Maint 20220524-2 - Rebuilt for Python 3.11 * Thu May 26 2022 Benjamin A. Beasley 20220524-1 - Update to 20220524 (close RHBZ#2089917) * Sun May 08 2022 Benjamin A. Beasley 20220506-2 - Replace sed-patch with upstream PR#755 * Sat May 07 2022 Benjamin A. Beasley 20220506-1 - Update to 20220506 (close RHBZ#2082716) * Mon Mar 21 2022 Benjamin A. Beasley 20220319-2 - Generate BR for “image” extra even when docs are disabled * Sun Mar 20 2022 Benjamin A. Beasley 20220319-1 - Update to 20220319 (close RHBZ#2065998) * Fri Jan 21 2022 Fedora Release Engineering 20211012-5 - Rebuilt for https://fedoraproject.org/wiki/Fedora_36_Mass_Rebuild * Sat Nov 27 2021 Benjamin A. Beasley 20211012-4 - Reduce LaTeX PDF build verbosity * Mon Nov 22 2021 Benjamin A. Beasley 20211012-3 - Minor spec file style changes * Mon Oct 25 2021 Benjamin A. Beasley 20211012-2 - Use %%%%python3 macro instead of %%%%__python3 * Tue Oct 19 2021 Benjamin A. Beasley 20211012-1 - Update to 20211012 (close RHBZ#1763506) * Sun Oct 17 2021 Benjamin A. Beasley 20200517-12 - Another small man page fix * Sun Oct 17 2021 Benjamin A. Beasley 20200517-11 - Man page typo fix * Thu Oct 14 2021 Benjamin A. Beasley 20200517-10 - Use adobe_mappings_rootpath macro * Thu Oct 14 2021 Benjamin A. Beasley 20200517-9 - Add BSD to the base License field; make -doc MIT only * Thu Oct 14 2021 Benjamin A. Beasley 20200517-8 - Comprehensive packaging improvements - Switch to pyproject-rpm-macros (“new guidelines”) - Do not distribute questionably-licensed sample PDFs, and skip the tests that require them - Build PDF documentation in a new -doc subpackage (instead of simply distributing the documentation sources) - Correct License field from “MIT” to “MIT and Public Domain and APAFML” - Add downstream man pages for command-line tools - Switch cmap-resources BR to adobe-mappings-cmap * Fri Jul 23 2021 Fedora Release Engineering - 20200517-5 - Rebuilt for https://fedoraproject.org/wiki/Fedora_35_Mass_Rebuild * Fri Jun 04 2021 Python Maint - 20200517-4 - Rebuilt for Python 3.10 * Wed Jan 27 2021 Fedora Release Engineering - 20200517-3 - Rebuilt for https://fedoraproject.org/wiki/Fedora_34_Mass_Rebuild * Wed Jul 29 2020 Fedora Release Engineering - 20200517-2 - Rebuilt for https://fedoraproject.org/wiki/Fedora_33_Mass_Rebuild * Wed Jun 24 2020 Elliott Sales de Andrade - 20200517-1 - Update to latest version * Tue May 26 2020 Miro Hrončok - 20181108-7 - Rebuilt for Python 3.9 * Thu Jan 30 2020 Fedora Release Engineering - 20181108-6 - Rebuilt for https://fedoraproject.org/wiki/Fedora_32_Mass_Rebuild * Thu Oct 03 2019 Miro Hrončok - 20181108-5 - Rebuilt for Python 3.8.0rc1 (#1748018) * Mon Aug 19 2019 Miro Hrončok - 20181108-4 - Rebuilt for Python 3.8 * Fri Jul 26 2019 Fedora Release Engineering - 20181108-3 - Rebuilt for https://fedoraproject.org/wiki/Fedora_31_Mass_Rebuild * Sat Feb 02 2019 Fedora Release Engineering - 20181108-2 - Rebuilt for https://fedoraproject.org/wiki/Fedora_30_Mass_Rebuild * Tue Jan 08 2019 Elliott Sales de Andrade - 20181108-1 - Update to latest version - Enable tests - Fix crypto dependency - Switch to automatic Requires - Drop Python 2 subpackage * Sat Jul 14 2018 Fedora Release Engineering - 20170720-8 - Rebuilt for https://fedoraproject.org/wiki/Fedora_29_Mass_Rebuild * Thu Jul 05 2018 Ben Rosser - 20170720-7 - Stop package from using 'python' to run cmap script. * Tue Jun 19 2018 Miro Hrončok - 20170720-6 - Rebuilt for Python 3.7 * Tue May 22 2018 Ben Rosser - 20170720-5 - Rebuild against new cmap resources package. * Fri Feb 09 2018 Fedora Release Engineering - 20170720-4 - Rebuilt for https://fedoraproject.org/wiki/Fedora_28_Mass_Rebuild * Fri Jan 26 2018 Iryna Shcherbina - 20170720-3 - Update Python 2 dependency declarations to new packaging standards (See https://fedoraproject.org/wiki/FinalizingFedoraSwitchtoPython3) * Thu Jul 27 2017 Fedora Release Engineering - 20170720-2 - Rebuilt for https://fedoraproject.org/wiki/Fedora_27_Mass_Rebuild * Mon Jul 24 2017 Ben Rosser - 20170720-1 - Update to latest upstream release. * Fri Apr 21 2017 Ben Rosser - 20170419-1 - Update to latest upstream release, fixing a logging bug from 20170418. * Fri Apr 21 2017 Ben Rosser - 20170418-2 - Now that upstream patch removing chbangs was merged, don't chmod library files. * Wed Apr 19 2017 Ben Rosser - 20170418-1 - Updated to latest upstream release. * Sat Feb 11 2017 Fedora Release Engineering - 20160614-7 - Rebuilt for https://fedoraproject.org/wiki/Fedora_26_Mass_Rebuild * Mon Dec 19 2016 Miro Hrončok - 20160614-6 - Rebuild for Python 3.6 * Sat Oct 22 2016 Ben Rosser - 20160614-5 - Add missing requires on python-six and python-chardet. * Fri Sep 9 2016 Ben Rosser - 20160614-4 - Rebuild against latest cmap-resources. * Tue Jul 19 2016 Fedora Release Engineering - 20160614-3 - https://fedoraproject.org/wiki/Changes/Automatic_Provides_for_Python_RPM_Packages * Tue Jun 21 2016 Ben Rosser 20160614-2 - I forgot to actually apply the patch to remove chbangs from library files. Apply said patch. * Tue Jun 14 2016 Ben Rosser 20160614-1 - Update to latest upstream version of package. - Use local version of patch. * Sat Feb 27 2016 Ben Rosser 20160202-3 - Added a patch to remove the chbangs from all library files. - Write correct sed command to make python3 scripts run with python3. * Sat Feb 27 2016 Ben Rosser 20160202-2 - Through the use of some gratuitious sed, the python2 package only depends on /usr/bin/python2. - The python3 version is still a little weird; it pulls in /usr/bin/python and I'm not sure why. - Also, make the python 3 scripts be the default ones. * Fri Feb 26 2016 Ben Rosser 20160202-1 - Update to latest upstream release. * Thu Feb 04 2016 Fedora Release Engineering - 20151013-6 - Rebuilt for https://fedoraproject.org/wiki/Fedora_24_Mass_Rebuild * Fri Jan 1 2016 Ben Rosser 20151013-5 - Version bump to silence rpmlint. * Fri Jan 1 2016 Ben Rosser 20151013-4 - Upgrade path; obsolete and provide the pdfminer-six package in the COPR. - Now replace the original python-pdfminer package with this one. * Fri Jan 1 2016 Ben Rosser 20151013-3 - Upgrade path; obsolete and provide python-pdfminer up until rawhide. * Sat Dec 19 2015 Ben Rosser 20151013-2 - Ship symlinks of the pdfminer scripts without the .py suffix. * Fri Dec 18 2015 Ben Rosser - 20151013-1 - Initial package of the pdfminer.six fork using pyp2rpm. * Thu Jun 18 2015 Fedora Release Engineering - 20140328-3 - Rebuilt for https://fedoraproject.org/wiki/Fedora_23_Mass_Rebuild * Sat Aug 23 2014 Ben Rosser 20140328-2 - Replaced /usr/bin with bindir macro in install section. * Sat Aug 16 2014 Ben Rosser 20140328-1 - Updated to latest version of pdfminer. - Changed specfile to depend on the correct cmap-* packages. * Thu Sep 20 2012 Ben Rosser 20110515-4 - Removed bundled cmap, changed to depend on cmap package instead * Thu Jul 05 2012 Ben Rosser 20110515-3 - Removed BuildRoot, clean, and first line of install - Fixed issue with cmap data not being copied into package - Fixed license (cmap is under BSD, not MIT) * Tue May 22 2012 Ben Rosser 20110515-2 - Fixed unowned directory issue and cleaned up the spec file * Fri May 18 2012 Ben Rosser 20110515-1 - Initial version of the package