characters change when selecting text

Bug #39890 reported by Rikard Nordgren
294
This bug affects 30 people
Affects Status Importance Assigned to Milestone
Poppler
Fix Released
Medium
poppler (Ubuntu)
Fix Released
Medium
Ubuntu Desktop Bugs

Bug Description

I have problems with the following pdf:
www.lacim.uqam.ca/~plouffe/OEIS/citations/MAS-R9821.pdf

When marking text some characters change. This happens on many places in this pdf-file. To reproduce the bug just mark some random places in the pdf. In some places the text disappears. In some the characters change totally. In others the italics is removed.

I use the latest update of dapper. Evince 0.5.2

Revision history for this message
Gary Coady (garycoady) wrote :

Thank you for reporting this issue.

I can also reproduce it. I don't think that the different characters are really different characters, just a different font. For example, the character ']' (right square bracket) looks like the character '#' in the font cmmi10, and that is the character that appears when it is highlighted.

Changed in evince:
status: Unconfirmed → Confirmed
Revision history for this message
Sebastien Bacher (seb128) wrote :

I've forwarded the issue upstream: https://bugs.freedesktop.org/show_bug.cgi?id=6923

Changed in evince:
assignee: nobody → desktop-bugs
Revision history for this message
In , Sebastien Bacher (seb128) wrote :

That bug has been described on
https://launchpad.net/distros/ubuntu/+source/evince/+bug/39890

"I have problems with the following pdf:
www.lacim.uqam.ca/~plouffe/OEIS/citations/MAS-R9821.pdf

When marking text some characters change. This happens on many places in this
pdf-file. To reproduce the bug just mark some random places in the pdf. In some
places the text disappears. In some the characters change totally. In others the
italics is removed.

I use the latest update of dapper. Evince 0.5.2
...
> Thank you for reporting this issue.

I can also reproduce it. I don't think that the different characters are really
different characters, just a different font. For example, the character ']'
(right square bracket) looks like the character '#' in the font cmmi10, and that
is the character that appears when it is highlighted."

Changed in evince:
assignee: nobody → desktop-bugs
importance: Undecided → Medium
Revision history for this message
VLK (valentyn) wrote :

Have the same problem here!

Evence 0.8.1 / Ubuntu Feisty

Revision history for this message
Thijs Kinkhorst (kink) wrote :

Just to confirm, I have the problem most notably when selecting text in "small caps" style: the text changes to lowercase characters when it is selected.

Revision history for this message
Kévin Guilloy (felixzero) wrote : Re: [Bug 39890] Re: characters change when selecting text

Yes, actually, every "special" character have problems with selection in
evince. So, not only on maths formulas, but on "small caps" or accents
also.

Le lundi 17 septembre 2007 à 15:05 +0000, Thijs Kinkhorst a écrit :
> Just to confirm, I have the problem most notably when selecting text in
> "small caps" style: the text changes to lowercase characters when it is
> selected.
>

Revision history for this message
In , Sebastien Bacher (seb128) wrote :

That's still happening using the cairo backend and poppler 0.6

Revision history for this message
Sebastien Bacher (seb128) wrote :

the poppler task is enough

Changed in poppler:
status: Confirmed → Triaged
Changed in evince:
status: New → Invalid
Changed in evince:
assignee: desktop-bugs → pedrocpneto
status: Invalid → Confirmed
Changed in evince:
assignee: pedrocpneto → desktop-bugs
Revision history for this message
Vytas (vytas) wrote :

I see this bug in the form of disappearing characters (especially those who are in Latin-B unicode subset, for example letters ą, č, ę, ė, į, ų, ū). However, files created with OpenOffice seem to work correctly

Revision history for this message
Alex Eftimie (alexeftimie) wrote :

I can also report this bug with evince 2.20.1 using poppler 0.6.2 on ubuntu 7.10.
The problem appears only when selecting diacritics characters, like Romanian: ş,ţ,ă,î,â. Some random blind-selection (like showing selected a void area instead of text) also appears, but only in files with that characters.

Revision history for this message
Wolfgang Pittermann (pittermann-deactivatedaccount) wrote : Re: [Bug 39890] Re: characters change when selecting text

Hi Alex,

thanks for your email :-)

Indeed, the problem is displaying 'special' charecters... all others
work fine!

In German, it is "Ü", "ü", "Ä", "ä", "Ö", "ö" and "ß".

Regards,

Wolfgang

On Wed, 13 Feb 2008 17:11:10 -0000, "Alex Eftimie"
<email address hidden> said:
> I can also report this bug with evince 2.20.1 using poppler 0.6.2 on
> ubuntu 7.10.
> The problem appears only when selecting diacritics characters, like
> Romanian: &#351;,&#355;,&#259;,î,â. Some random blind-selection (like showing selected a
> void area instead of text) also appears, but only in files with that
> characters.
>
> --
> characters change when selecting text
> https://bugs.launchpad.net/bugs/39890
> You received this bug notification because you are a direct subscriber
> of a duplicate bug.

Revision history for this message
tweedledee (terrywatt-deactivatedaccount) wrote :

Still ongoing in Hardy. However, as upstream is not seeing these posts, please post there instead to try to encourage something to happen. Although frankly Evince is worse in Gutsy than it was in Hardy (major memory leaks, font rendering problems, and printing quality is poor), so I think after more than a year of Evince I'm going to have to break down and switch to Adobe Reader.

Revision history for this message
Nils Hartmann (x378) wrote :

I think I have the similar problem, but if I select a text it looks like in the screenshot.
and sometimes some random "white blocks" apear...

Revision history for this message
Broflofski.Eric (eric-78) wrote :

In my document the € Euro-signs multiply when the text is selected.
After the first original Euro-sign there follow 4 - 9 partly overlapping additional Euro signs. The original Text under the additional Euro-signs lies under white blocks, which seem to make room for the Euros.

I cant say, if the problem is caused by the Euro sign itself, or if it is the group of caracters. In my document the Euro sign comes always along in a context like this:
   " 6 Mio. €/km "

Revision history for this message
s3a (gamingtechnology) wrote :

I have the same problem.

Evince 2.22.2
Ubuntu 8.04 x86_64

Revision history for this message
In , Alkisg+freedesktop-org (alkisg+freedesktop-org) wrote :

Created an attachment (id=18147)
Minimal pdf that reproduces the problem to help debugging

I've located exactly when the bug appears, and I've hand-coded a ***minimal*** PDF as a testcase. I'm attaching it, when selecting the text you should see
BCDΓ
transforming to
ABC<display artifact>

(notice the change from BCD to ABC).

Kind regards,
Alkis Georgopoulos

Revision history for this message
Alkis Georgopoulos (alkisg) wrote :

This happens on not-properly-produced .pdf files, with custom encodings but without cmap (toUnicode) entries. Unfortunately, there are a lot of them out there!

When selecting text, evince (erroneously) displays the characters that will be copied to the clipboard, instead of the original glyphs.

Explanation: There is a PDF specification "feature", where one can declare that the character 'A' maps to the glyph 'B'. So if 'A' is written in a pdf text object, it displays as 'B' but it gets copied as 'A'!

My English is poor, so I made a ***minimal*** .pdf that reproduces this bug.

If you open it in Evince and select all the text, the line that contains
BCDΓ
should display
ABC<display artifact>
when selected!

Kind regards,
Alkis Georgopoulos

Revision history for this message
Christopher Yeleighton (giecrilj) wrote :

Adobe Reader 8 plugin complains it cannot read the document Alkis concocted.

Revision history for this message
Alkis Georgopoulos (alkisg) wrote :

I don't know about the plugin, it's a hand-made minimal example to show the developers that this line causes the problem:
/Differences [65 /B /C /D /Gamma]

When the text is not selected, the differences array is used, and evince displays
BCDΓ
When the text is selected, the differences array is not used, and evince displays
ABC<artifact>

Notice the change from BCD to ABC.

Acrobat (on Windows - not the Reader) read the attachment OK, but probably some things like the xref table are broken (I wasn't willing to count byte indices by hand!).
Evince automatically reconstructs the table though, so other programs reading the .pdf or not are not the point here, I just tried to pinpoint when the problem occurs to make it easier for the developers to fix it.

I can fix the xref table if it's required, but I don't think it matters...

Revision history for this message
Tyler Rusk (tdrusk) wrote :

I have reproduced this with a .pdf my professor sent me. I was hightlighting text when this happened. Look at the attachment for an example.

Revision history for this message
Pedro Villavicencio (pedro) wrote :

it's a poppler issue not evince, rejecting that task.

Changed in evince:
status: Confirmed → Invalid
Revision history for this message
In , Damian Barberon (damian01w) wrote :

Anyone?

Revision history for this message
In , Albert Astals Cid (aacid) wrote :

Can you try with a newer poppler?

Revision history for this message
In , Sebastien Bacher (seb128) wrote :

The example on this bug is still buggy on 0.11.3

Revision history for this message
In , Juanjo Marín (juanj-marin-deactivatedaccount) wrote :

The example on this bug is still buggy on Evince 2.29.3 with poppler 0.12.0

Revision history for this message
tester8 (tester8) wrote :

I have problems with formula highlighting in Evince 2.28.1 (from repository) at Ubuntu 9.10.
Poppler version is 0.12.0-0ubuntu2.1.

When selecting formula it content is changing. It happens in pdf converted from odt.
In attachment: original odt, pdf, screenshot of not highlited and highlited text in Evince and Adobe Reader.

Revision history for this message
tester8 (tester8) wrote :
Revision history for this message
tester8 (tester8) wrote :
Revision history for this message
In , Andrew Tinka (tinka-berkeley) wrote :

evincebug.pdf still displays the buggy behavior in evince 2.30.3, poppler 0.12.4

Changed in poppler:
importance: Unknown → Medium
Revision history for this message
Ettore Atalan (atalanttore) wrote :

I have the same problem with following pdf:
http://publica.fraunhofer.de/eprints/urn:nbn:de:0011-n-1174865.pdf

On first page, marking the heading ("Abbildung von Straßendaten für Qualitätsuntersuchungen - Ein Vergleich von OpenStreetMap mit Navteq") will change its text. Very annoying ...

Revision history for this message
Ettore Atalan (atalanttore) wrote :
Changed in poppler:
importance: Medium → Unknown
Revision history for this message
In , José Aliste (jose-aliste) wrote :

For the problem in the minimal example, the culprit seems to be in TextSelectionPainter::visitWord, which assumes that all the chars in a word use the same font. This is demonstrated by adding a space After "Word:" in the bad line. This also explain why this bug is very common with LaTeX files.

Revision history for this message
In , José Aliste (jose-aliste) wrote :

Created attachment 43360
Patch fixing the issue

Actually, All the chars in a TextWord MUST share the same font, so I added a check so we also create a new word when the font changes. The attached patch fixes the bug in the minimal example and many other problems when selecting text in PdfLatex-generated files, but there are still issues in rendering selections with Bitmapped fonts. In this case, the glyph seems right but the size is wrong.

Revision history for this message
In , José Aliste (jose-aliste) wrote :

The last patch does not fix either some problems when selecting ligatures in some fonts, see http://java.sun.com/docs/books/jvms/second_edition/ClassFileFormat-final-draft.pdf, page 54 and look for the word "reflective" which is rendered as "reßective" (althought the text select is indeed "reflective". This does not happen with all fonts, for instance, in a pdflatex generated file, I could not reproduce the problem.

Revision history for this message
Dmitry Shachnev (mitya57) wrote :

I have marked bug #39321 as a duplicate, since Evince developer Jose Aliste confirmed it (https://bugzilla.gnome.org/show_bug.cgi?id=439070#c12).
Also, he has made a patch for poppler that resolves the issue (https://bugs.freedesktop.org/show_bug.cgi?id=6923#c9). What about applying it in Ubuntu?

Changed in evince:
importance: Unknown → Medium
status: Unknown → Confirmed
Revision history for this message
In , Albert Astals Cid (aacid) wrote :

You are comparing the pointers of the font infos, shouldn't you be comparing the contents of the pointers?

Revision history for this message
Martin Pitt (pitti) wrote :

Thanks for pointing this out! I subscribed to the upstream bug, and will follow it there (the patch doesn't seem to be complete yet).

Revision history for this message
In , José Aliste (jose-aliste) wrote :

I assumed that the only code that is updating the curFont is TextPage::updateFont. If this is true, from what I understood from the code, the fonts array holds unique FontInfos, so it would be safe to compare pointers instead of FontInfos. So, is this right? If so, and you still want me to rework the patch so it compares the fonts infos instead of the pointers, I won't argue and will update the patch, but I'd appreciate an answer so I can have a better understanding of the code, Thanks

Revision history for this message
In , José Aliste (jose-aliste) wrote :

Created attachment 43420
alternate patch comparing gfxFont instead of pointers to FontInfos

Even if what I say in last comment may be true, I followed your advice and
updated the patch to compare the gfxFonts instead of the pointers to the TextInfos.

Revision history for this message
In , José Aliste (jose-aliste) wrote :

Created attachment 43421
updated alternate patch

I apologize for the spam. Previous patch was clearly wrong. Updated patch.

Revision history for this message
In , Albert Astals Cid (aacid) wrote :

The patch changes (breaks) the behaviour of pdftotext so that is not acceptable. You can see that in the attached pdf
"bold strike:"
changes to be extracted to
"bold strike :"

Revision history for this message
In , Albert Astals Cid (aacid) wrote :

Created attachment 43455
The pdf with the pdftotext regression

Changed in evince:
status: Confirmed → Unknown
Revision history for this message
In , José Aliste (jose-aliste) wrote :

Thanks for the regression. The problem here is that "bold:" has two fonts since "bold" is italicised and ":" not, so before the patch, "bold:" is a TextWord and pdftotext get the text right, but drawing the selection is bad since the selected ":" would be drawn italicised. After the patch, "bold:" gets splitted up into "bold" and ":" so it gets drawn correctly when selected, but you have the regression you pointed out. So, I believe I am stuck with this choice:

a). I could either allow more than one font on a TextWord, and adapt the code that draws the TextWord to use that fact, so the selected ":" does not get transform to a italicised ":" when drawing it.

b) or I could fix the TextDumper to be aware of the fact in some cases there is no space between two TextWord.

c) Do you have another way?

I think I want to take approach a) even if it could more complicated, but approach b) seems that it could break more things that it would fix... thinking that when doing selection by words would not work at all. What do you think?

Revision history for this message
In , Paul Sladen (sladen) wrote :

It's perfectly natural that every letter in a word be in a different font, colour, or style so that needs to be allowed for (the text could even a scan of some handwriting, or several scans), . ...what pdftotext is trying to do is to replicate what would be found in a Tagged PDF (Section 9.7 of the PDF Reference), effectively synthesising what would ideally be found in '/BMC.../EMC' '/ActualText' contents.

To that end I'm inclined to agree that (b) sounds like non-ideal kludge on top of non-ideal kludge.

Revision history for this message
In , Albert Astals Cid (aacid) wrote :

I tend to agree with Paul that is fine for a word to have different fonts, but on the other hand one can argue if the ":" in that situation is part of the word or not.

I'd say 1 is a better solution but if you can get 2 with a "clean enough" code i won't be oposed to it either.

Personally i think text selection should have been done in client side (like we do in Okular) and not "modyfing" poppler internal structures but that's a bit too late i guess :D

Revision history for this message
In , Paul Sladen (sladen) wrote :

Albert: given the lack of a space ( ) in the input, there's shouldn't be one in the output! :)

Revision history for this message
In , Albert Astals Cid (aacid) wrote :

Paul: Of course, i never said this is an acceptable behaviour. (What makes you think i did?)

I only said that from the technical point of view you can argue that ":" is of course a different word since you won't find the word "strike:" in a dictionary.

Revision history for this message
In , Gasche-dylc (gasche-dylc) wrote :

As there been any additional progress on this issue? I must say I have been encountering it quite frequently for a long time, and for one would be very happy to see a fix applied.

Anyway, thanks to anyone working on this.

Changed in poppler:
importance: Unknown → Medium
Revision history for this message
In , Dmitry Shachnev (mitya57) wrote :

An interesting thing:
When I select the "Wrong" line in PDF from comment 2, BCDГ is changing not to ABC<display artifact>, but to ABC<non-selected letter Г>, and when I then copy-paste that line, I get "Wrong:BCD", without the last letter.

@José: Please, do something with this! Sometimes selecting text does really terrible things (take a look at 3 screenshots in this archive: http://ubuntuone.com/p/11Pa/, for example).

Revision history for this message
In , Albert Astals Cid (aacid) wrote :

Dmitry please to not modify those fields, that's not for users to choose.

Revision history for this message
Exeleration-G (exeleration-g) wrote :

Bug still there in Ubuntu 12.04, Evince 3.4.0

no longer affects: evince
Revision history for this message
In , Jason Crain (jcrain) wrote :

Created attachment 65449
Allow multiple fonts in a TextWord

This patch modifies TextWord so that each character can be a different font. Function poppler_page_get_text_attributes is updated to allow for attributes to change mid-word.

Revision history for this message
In , Carlos Garcia Campos (carlosgc) wrote :

(In reply to comment #25)
> Created attachment 65449 [details] [review]
> Allow multiple fonts in a TextWord
>
> This patch modifies TextWord so that each character can be a different font.
> Function poppler_page_get_text_attributes is updated to allow for attributes to
> change mid-word.

Thanks for the patch!, Albert could you run the regtests for text backend with this patch, please?

Revision history for this message
In , Albert Astals Cid (aacid) wrote :

Sure, on my todo (man i'm behind again in poppler :-()

Revision history for this message
In , Carlos Garcia Campos (carlosgc) wrote :

Sure, no hurry, thanks!

Revision history for this message
In , Albert Astals Cid (aacid) wrote :

Good news first, it does not change the output of pdftotext (well it does it in a file, but i don't consider it a regression)

Yay :-)

Now onto the review of the code itself.

From what i can read it seems it is possible to have in the same word fonts with different WMode (i.e. fonts written vertically and horizontally) i don't think this makes any sense.

What do you think?

Note i have *not* verified it fixes the problem mentioned in this bug. Carlos can you do that part?

Revision history for this message
In , Albert Astals Cid (aacid) wrote :

There's also a weird thing.

If i get all the character positions and dump them to file, sometimes there's minor changes, e.g. in https://bugs.launchpad.net/ubuntu/+source/poppler/+bug/881407/+attachment/2571595/+files/Manning-ExtJS_in_Action.pdf

the diff says

--- old/foo 2012-08-17 01:16:21.566960179 +0200
+++ new/foo 2012-08-17 01:16:21.566960179 +0200
@@ -51275,13 +51275,16 @@
 QRectF(289.28,579.535 2.4948x4.725)
 true
 QRectF(289.28,579.535 2.4948x4.725)
-"FumiK"
-QRectF(294.666,579.535 10.0606x4.725)
-true
+"Fumi"
+QRectF(294.666,579.535 7.64003x4.725)
+false
 QRectF(294.666,579.535 2.08791x4.725)
 QRectF(296.754,579.535 1.73864x4.725)
 QRectF(298.492,579.535 2.8904x4.725)
-QRectF(301.383,579.535 0.349866x4.725)
+QRectF(301.383,579.535 0.923076x4.725)
+"K"
+QRectF(301.733,579.535 2.99376x4.725)
+true
 QRectF(301.733,579.535 2.99376x4.725)
 "®"
 QRectF(308.586,580.274 3.06445x3.885)

the more interesting part is

-QRectF(301.383,579.535 0.349866x4.725)
+QRectF(301.383,579.535 0.923076x4.725)

it seems the "i" moved from being 0.349866 wide to being 0.923076. Any idea why those changes happen? Looking at the code i don't see any obvious reason for them.

Revision history for this message
In , Carlos Garcia Campos (carlosgc) wrote :

(In reply to comment #29)

> Note i have *not* verified it fixes the problem mentioned in this bug. Carlos
> can you do that part?

it's weird, I don't see any difference with and without the patch with poppler-glib-demo and the minimal example attached to this bug.

Revision history for this message
In , Jason Crain (jcrain) wrote :

The difference in character width is because of this change:

@@ -863,7 +839,7 @@ void TextLine::coalesce(UnicodeMap *uMap) {
  word0->spaceAfter = gTrue;
  word0 = word1;
  word1 = word1->next;
- } else if (word0->font == word1->font &&
+ } else if (word0->font[word0->len - 1] == word1->font[0] &&
    word0->underlined == word1->underlined &&
    fabs(word0->fontSize - word1->fontSize) <
      maxWordFontSizeDelta * words->fontSize &&

Before, the words "Fumi" and "K" were being merged. Letters "F" and "K" are using the font BookmanOldStyle. Letters "umi" are using ArialMT. Because of this font check, the words are no longer being merged, making the width of the i different.

Yes, it is possible for one word to have fonts with different WModes. Still looking into it, but I suppose it will need to start a new word on WMode changes.

Carlos: I don't know why you do not see a difference. Without this patch, selecting text in the minimal example changes the text to "ABC". With this patch, it stays "BCD".

Revision history for this message
In , Albert Astals Cid (aacid) wrote :

(In reply to comment #32)
> The difference in character width is because of this change:
>
> @@ -863,7 +839,7 @@ void TextLine::coalesce(UnicodeMap *uMap) {
> word0->spaceAfter = gTrue;
> word0 = word1;
> word1 = word1->next;
> - } else if (word0->font == word1->font &&
> + } else if (word0->font[word0->len - 1] == word1->font[0] &&
> word0->underlined == word1->underlined &&
> fabs(word0->fontSize - word1->fontSize) <
> maxWordFontSizeDelta * words->fontSize &&
>
>
> Before, the words "Fumi" and "K" were being merged. Letters "F" and "K" are
> using the font BookmanOldStyle. Letters "umi" are using ArialMT. Because of
> this font check, the words are no longer being merged, making the width of the
> i different.

I understand why they are not merged (by code not, why the code is put there), but why the "i" has a different width that makes it overlap with the "K" when previously didn't?

Revision history for this message
In , Jason Crain (jcrain) wrote :

(In reply to comment #33)
> I understand why they are not merged (by code not, why the code is put there),
> but why the "i" has a different width that makes it overlap with the "K" when
> previously didn't?

Before being merged, the width of the 'i' is 0.923076. In TextWord::merge, the right edge of 'i' is set equal to the left edge of 'K', reducing its width to 0.349866.

Revision history for this message
In , Albert Astals Cid (aacid) wrote :

Ok, let's not worry about that for the moment. I'm more concerned about Carlos not being able to verify this indeed fixes the problem it's supposed to be fixing.

Carlos can you give it a second try?

Revision history for this message
In , José Aliste (jose-aliste) wrote :

I tried patched poppler with the minimal example and I don't get the change of the glyphs anymore under the selection, which is VERY GOOD!! But I can't select the Gamma glyph...

Revision history for this message
In , Jason Crain (jcrain) wrote :

Created attachment 65992
Allow multiple fonts in a TextWord

It was possible for a TextWord to have fonts for both horizontal and vertical writing modes. This version of the patch starts a new word when the WMode changes.

Revision history for this message
In , Jason Crain (jcrain) wrote :

I am not sure how to fix the problem with the Gamma glyph not being selectable. It is not selectable because it has a zero width, because there is no Widths array in the file. This is a violation of the spec. Poppler has a mechanism to provide default widths, but this does not work for symbols.

Revision history for this message
In , Dmitry Shachnev (mitya57) wrote :

Created attachment 66101
Another small test case

The patch doesn't fully solve the problem for me. I have attached another PDF file (which only contains one math formula) with which I still can reproduce the issue.

When I try to select the Sigma ("∑") in Evince, limits ("m=0" and "∞") disappear; when I try to select "x/2" fraction, I get an orange or black rectangle (I use Ubuntu's Ambiance theme, so normally selection color should be always orange).

I can confirm that the "Minimal pdf" attached here is fixed by this patch, though.

Revision history for this message
In , Albert Astals Cid (aacid) wrote :

Jason want me to have another look at your patch or you want to try to fix the issue by Dmitry first?

Revision history for this message
In , Jason Crain (jcrain) wrote :

Yes, please review the patch as it is.

I have looked at the document from Dmitry. It exposes two issues, but these are separate from minimal pdf. 1) TextLines may be drawn over each other, obscuring other text. 2) In CairoOutputDev::updateFillColor, the color may not be set correctly.

Revision history for this message
In , Jason Crain (jcrain) wrote :

Created attachment 66163
Change default fallback character width

Also, I don't know how you feel about fixing something caused by violating the PDF specification, but this patch changes the default width for a missing character, making the gamma symbol selectable in the minimal pdf.

Revision history for this message
In , Albert Astals Cid (aacid) wrote :

(In reply to comment #37)
> Created attachment 65992 [details] [review]
> Allow multiple fonts in a TextWord
>
> It was possible for a TextWord to have fonts for both horizontal and vertical
> writing modes. This version of the patch starts a new word when the WMode
> changes.

When running pdftotext with this patch on a pdf i'm going to attach it crashes (without the patch it works), can you please fix it?

Revision history for this message
In , Albert Astals Cid (aacid) wrote :

Created attachment 66375
The pdf where the patch crashes

Revision history for this message
In , Jason Crain (jcrain) wrote :

Created attachment 66450
Check for NaN in TextPage::addChar

I don't think this is related to my earlier patch. For me, this pdf crashes both with and without it. This document is doing very strange things with the current transformation matrix (CTM) and inline images. Pages 6 and 15 are filled with lines like this:

    q 18 0 0 -1 2782 6350 cm
    q BI
    <IMAGE DICT>
    ID <IMAGE DATA>
    EI Q
    q 19 0 0 -1 2782 6350 cm
    q BI
    <IMAGE DICT>
    ID <IMAGE DATA>
    EI Q

Note the unbalanced q/Q for saving/restoring the graphics state. This means that the graphics state is not ever being properly restored and the `cm' operator is scaling the CTM until its components become NaN. This leads to TextWord::base being NaN. This breaks calculations in TextPool::addWord, causing wordBaseIdx to be INT_MIN, causing the text pool to not be initialized to NULLs, which causes a crash when an invalid pointer is read and dereferenced from the pool.

As a test, adding a call to restoreState() in Gfx::opBeginImage allows the page to render properly and without crashing. Otherwise, poppler either crashes or places text in an invalid location.

The attached patch adds a check for NaN to TextPage::addChar and throws away chars with invalid positions.

Revision history for this message
In , Jason Crain (jcrain) wrote :

Created attachment 66454
Allow multiple fonts in a TextWord

Updated patch to apply to current git master.

Revision history for this message
In , Albert Astals Cid (aacid) wrote :

I've commited this patch to master (what will be 0.22) and not to 0.20 since it breaks the public glib api

I'm closing this bug since it fixes the original reported issue. If there are issues not fixed by this bug please open a new one, big bugs get difficult to keep track of otherwise.

Revision history for this message
In , Carlos Garcia Campos (carlosgc) wrote :

(In reply to comment #47)
> I've commited this patch to master (what will be 0.22) and not to 0.20 since it
> breaks the public glib api

Really? why? I agree with not committing it to the stable branch, but I don't think the patch breaks the API, or am I missing something? Breaking the API should be avoided even in master.

Revision history for this message
In , Albert Astals Cid (aacid) wrote :

Oh, wait, didn't realize the functions it's adding new parameters are private, ignore my "breaks API" part.

Revision history for this message
In , Carlos Garcia Campos (carlosgc) wrote :

(In reply to comment #49)
> Oh, wait, didn't realize the functions it's adding new parameters are private,
> ignore my "breaks API" part.

No problem :-) but please, ask me before committing any change that might break the public glib API even in master.

Changed in poppler:
status: Confirmed → Fix Released
Revision history for this message
In , Dmitry Shachnev (mitya57) wrote :

Unfortunately, this bug is still present in Poppler 0.20.5 on Ubuntu 13.04.

When I open my test case (attachment 66101) in Evince and try to select the "2m+α" part, it changes to "20+=" every time. Should I reopen this bug or file a new one instead?

Revision history for this message
In , Dmitry Shachnev (mitya57) wrote :

(In reply to comment #51)
> Unfortunately, this bug is still present in Poppler 0.20.5 on Ubuntu 13.04.

Please ignore that, it seems that Poppler version is too old in Ubuntu. Will prepare an update now :)

no longer affects: evince (Ubuntu)
Changed in poppler (Ubuntu):
status: Triaged → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package poppler - 0.22.4-0ubuntu1

---------------
poppler (0.22.4-0ubuntu1) saucy; urgency=low

  * New upstream release 0.22.4 (LP: #1135995).
    - Should fix the text selection problems (LP: #39890).
    - Should fix issues with PDF forms (LP: #1153517).
  * Drop all upstream patches and refresh other patches.
  * Change the soname version in package name: libpoppler28 -> libpoppler37.
  * Update symbols files.
  * Update debian/copyright (taken from Debian packaging Git).
 -- Dmitry Shachnev <email address hidden> Fri, 17 May 2013 17:07:14 +0400

Changed in poppler (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
In , nh2 (nh2) wrote :

Created attachment 104775
Unselected text with evince 3.10.3 (Ubuntu 14.04)

Revision history for this message
In , nh2 (nh2) wrote :

Created attachment 104776
Selected/Corrupted text with evince 3.10.3 (Ubuntu 14.04)

I'm not convinced this bug is fixed; see the attachment images from my evince 3.10.3 in Ubuntu 14.04.

Or is this a different issue?

Revision history for this message
In , Jason Crain (jcrain) wrote :

(In reply to comment #54)
> I'm not convinced this bug is fixed; see the attachment images from my
> evince 3.10.3 in Ubuntu 14.04.
>
> Or is this a different issue?

That is a different issue. Please open a new bug, attach the PDF, and include your libpoppler version.

Revision history for this message
In , Jason Crain (jcrain) wrote :

*** Bug 9608 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Jason Crain (jcrain) wrote :

*** Bug 13441 has been marked as a duplicate of this bug. ***

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.