Comment 10 for bug 741628

Revision history for this message
Robert Ancell (robert-ancell) wrote :

The solution was found by Rafał Mużyło in this bug:
https://bugs.gentoo.org/show_bug.cgi?id=380429

This is now fixed for 3.1.90.

To fix broken documents run the following:
simple-scan --fix-pdf ~/Documents/*.pdf

It should be safe to run this on all PDF documents but PLEASE BACKUP FIRST. It will also copy the existing document to DocumentName.pdf~ so you have those in case anything goes wrong.

If you can't wait for the next simple-scan, you can also run this Python program (i.e. python fixpdf.py broken.pdf > fixed.pdf)
import sys
import re

lines = file (sys.argv[1]).readlines ()

xref_offset = int(lines[-2])

xref_offset = 0
for (n, line) in enumerate (lines):
        # Fix PDF header and binary comment
        if (n == 0 or n == 1) and line.startswith ('%%'):
                xref_offset -= 1
                line = line[1:]

        # Fix xref format
        match = re.match ('(\d\d\d\d\d\d\d\d\d\d) 0000 n\n', line)
        if match != None:
                offset = int (match.groups ()[0])
                line = '%010d 00000 n \n' % (offset + xref_offset)

        # Fix xref offset
        if n == len(lines) - 2:
                line = '%d\n' % (int (line) + xref_offset)

        # Fix EOF marker
        if n == len(lines) - 1 and line.startswith ('%%%%'):
            line = line[2:]

        print line,