lp:~mgedmin/+junk/pdf2html
Created by
Marius Gedminas
and last modified
Python wrapper around pdftohtml (from poppler-utils) that tries hard to preserve paragraphs.
- Get this branch:
- bzr branch lp:~mgedmin/+junk/pdf2html
Only
Marius Gedminas
can upload to this branch. If you are
Marius Gedminas
please log in for upload directions.
Branch information
- Owner:
- Marius Gedminas
- Status:
- Mature
Recent revisions
- 49. By Marius Gedminas
-
New option: --encoding
Fix extra spaces after superscript.
And I can't easily untangle these two commits because bzr is not git.
- 48. By Marius Gedminas
-
Better superscript handling logic.
Doesn't mistakenly join footnotes with text on the next page.
- 45. By Marius Gedminas
-
Show topmost and bottommost coordinates with --debug.
Helps the user discover the value of --header-pos or --footer-pos.
- 42. By Marius Gedminas
-
Pass -hidden and -nodrm to pdftohtml.
The -hidden option lets you see copy-pasteable text when the PDF itself is
just a bunch of bitmaps.
Branch metadata
- Branch format:
- Branch format 6
- Repository format:
- Bazaar pack repository format 1 with rich root (needs bzr 1.0)