Comment 45 for bug 39890

Revision history for this message
In , Paul Sladen (sladen) wrote :

It's perfectly natural that every letter in a word be in a different font, colour, or style so that needs to be allowed for (the text could even a scan of some handwriting, or several scans), . ...what pdftotext is trying to do is to replicate what would be found in a Tagged PDF (Section 9.7 of the PDF Reference), effectively synthesising what would ideally be found in '/BMC.../EMC' '/ActualText' contents.

To that end I'm inclined to agree that (b) sounds like non-ideal kludge on top of non-ideal kludge.