Strange character showing in file listing in Nautilus in some views, for Thai language file names

Bug #986008 reported by Chanchao
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Pango
Fix Released
Medium
pango1.0 (Ubuntu)
Fix Released
High
Unassigned
Precise
Fix Released
High
Sebastien Bacher

Bug Description

Impact: the default filemanager renders thai filenames incorrectly

Development Fix: the bug has been fixed in quantal

Stable Fix: it's a trivial patch coming from https://bugzilla.gnome.org/show_bug.cgi?id=677090 (upstream bug report)

Regression Potential: could break thai string rendering in another way but it seems pretty unlikely since it adds a special rules for the specific char which was rendered in a buggy way

Test Case:
- download the document from comment #10
- open the folder where it's downloaded in nautilus
- see if the filename is correct or there are square chars like in screenshot in comment #3

...

In some views in Nautilus, there is a strange character appearing right after particular characters such as . (dot) and - (dash) when using Thai script in the file name.

Note that this does not happen in List view (Ctrl-2) , only icon view (either Ctrl-1 or Ctrl-3). Also, when renaming a file, the strange character disappears during the editing of the filename. (But re-appears just before any dots, dashes or underscores after renaming).

I attach some screen shots. Screenshots 1 and 3 show the problem, with the character appearing in front of a dot or dash. Screenshot 2 (list view) show that there is no issue there. Screenshot 4 shows what happens when renaming a file: the strange character(s) disappear.

Note when I take a file name and insert a dot or dash at any point within the Thai name, the strange character appears in the affected views. This seems to happen only with dot and dash and underscore, but not with regular a-z characters or characters like #, $, ! etc. (I did not try all 128 low ascii characters though).

EDIT: I can only add one attachment. Will add the other ones in follow-up comments.

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: nautilus 1:3.4.1-0ubuntu1
ProcVersionSignature: Ubuntu 3.2.0-23.36-generic 3.2.14
Uname: Linux 3.2.0-23-generic x86_64
ApportVersion: 2.0.1-0ubuntu5
Architecture: amd64
Date: Fri Apr 20 10:41:58 2012
GsettingsChanges:
 org.gnome.nautilus.window-state geometry '1154x546+125+114'
 org.gnome.nautilus.window-state start-with-status-bar true
InstallationMedia: Ubuntu 12.04 LTS "Precise Pangolin" - Beta amd64 (20120301)
ProcEnviron:
 LANGUAGE=en_US:en
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: nautilus
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Chanchao (custom) wrote :
Revision history for this message
Chanchao (custom) wrote :

Screenshot 2, showing that the issue does not occur in List view.

Revision history for this message
Chanchao (custom) wrote :

Screen shot 3, showing the issue again in compact icon view. (Ctrl-3)

Revision history for this message
Chanchao (custom) wrote :

Screenshot 4: Showing that the strange character disappears when renaming a file.

After completing the renaming (hitting enter) the strange character(s) show up again.

Revision history for this message
Chanchao (custom) wrote :

Comment: I could also add that this bug has been there for a while, from the start of Beta 1 at least. It's not due to a recent update or system change.

Revision history for this message
Chanchao (custom) wrote :

Final screen shot: It also happens on the desktop, and happens for files of any type. Note that regular Latin character file names are not affected.

Revision history for this message
Chanchao (custom) wrote :

^ Interestingly, it only happens when the non-Latin character preceeds the dot or dash. Looking at the three icons at the top in the previous screenshot, those have some Thai language in them, but a latin character preceeds the dot. The issue does not occur.

Chanchao (custom)
description: updated
description: updated
description: updated
Revision history for this message
Sebastien Bacher (seb128) wrote :

thank you for your bug report, could you add an example of such file to the bug?

Changed in nautilus (Ubuntu):
importance: Undecided → Low
status: New → Incomplete
Revision history for this message
Sebastien Bacher (seb128) wrote :

Is that issue new in precise? Do you know when it started?

Revision history for this message
Chanchao (custom) wrote :

I don't know when it started, but I don't recall ever seeing it in 11.10.

I attach a small text file with a name that uses Thai characters. The strange character appears after the dash and after the dot.

I tested it with some other non Western scripts (Korean and Hindi) and the issue does not occur there. It seems specific to Thai.

Revision history for this message
Chanchao (custom) wrote :

Note: I added the requested file a couple days ago but the status of this bug is still 'incomplete'.

Please review and let me know if I can provide additional information, if you were able to reproduce the bug (saving the attached file on the desktop should show it.) and if it can be classified as a bug, or that the status should be updated to something else.

Thanks!

Changed in nautilus (Ubuntu):
status: Incomplete → Triaged
Changed in nautilus (Ubuntu Precise):
status: New → Triaged
importance: Undecided → High
Changed in nautilus (Ubuntu):
importance: Low → High
Revision history for this message
Sebastien Bacher (seb128) wrote :

Thanks, that's a bug in the code that helps wrapping labels at the right position for the icon view

nautilus-icon-canvas-item.c has this code

"
#define ZERO_WIDTH_SPACE "\xE2\x80\x8B"

  for (p = text; *p != '\0'; p++) {
   str = g_string_append_c (str, *p);

   if (*p == '_' || *p == '-' || (*p == '.' && !g_ascii_isdigit(*(p+1)))) {
    /* Ensure that we allow to break after '_' or '.' characters,
     * if they are not followed by a number */
    str = g_string_append (str, ZERO_WIDTH_SPACE);
   }
  }"

it seems like the "zero width space" is what is creating the issue, I'm unsure why that's happening though.

One bug I can see is that the code is iterating over char types, and an utf8 non ascii glyph might not fit in a char, so it could insert value in the middle of an utf8 sequence and corrupt it, but that doesn't seem to be what is happening there since the "weird" glyphs are added after "_" and "."

Revision history for this message
Sebastien Bacher (seb128) wrote :

That code is hackish, I had to fix a bug in it recently and pondered dropped it, maybe we should do that. I confirmed that dropping it fixes the issue

Revision history for this message
Sebastien Bacher (seb128) wrote :

I've opened https://bugzilla.gnome.org/show_bug.cgi?id=674924 upstream to discuss the issue with their

Changed in nautilus (Ubuntu Precise):
assignee: nobody → Sebastien Bacher (seb128)
Changed in nautilus:
importance: Unknown → Medium
status: Unknown → New
Changed in nautilus:
status: New → Fix Released
Chanchao (custom)
description: updated
description: updated
description: updated
Revision history for this message
Sebastien Bacher (seb128) wrote :

upstream said on IRC "looks like a but in the pango Thai shaper", reassigning there, the upstream bug is https://bugzilla.gnome.org/show_bug.cgi?id=677090

affects: nautilus (Ubuntu) → pango1.0 (Ubuntu)
no longer affects: nautilus
Changed in pango:
importance: Unknown → Medium
status: Unknown → New
Changed in pango1.0 (Ubuntu):
status: Triaged → Fix Committed
Changed in pango1.0 (Ubuntu Precise):
status: Triaged → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package pango1.0 - 1.30.0-0ubuntu4

---------------
pango1.0 (1.30.0-0ubuntu4) quantal; urgency=low

  * debian/patches/git_thai_zero_width_spaces.patch:
    - correctly handle zero width spaces in thai (lp: #986008)
 -- Sebastien Bacher <email address hidden> Wed, 30 May 2012 18:02:05 +0200

Changed in pango1.0 (Ubuntu):
status: Fix Committed → Fix Released
description: updated
Revision history for this message
Clint Byrum (clint-fewbar) wrote : Please test proposed package

Hello Chanchao, or anyone else affected,

Accepted pango1.0 into precise-proposed. The package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

tags: added: verification-needed
Revision history for this message
Chanchao (custom) wrote :

Excellent, thank you very much! I will test at the first opportunity.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Did you have a chance to test it yet?

Revision history for this message
Chanchao (custom) wrote :

Sorry for the delay, I was travelling on a business trip. I just updated these packages from the Proposed-repository and it works perfectly!

Thank you very much to everyone involved in providing a speedy solution!!

Cheers,
Chanchao

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package pango1.0 - 1.30.0-0ubuntu3

---------------
pango1.0 (1.30.0-0ubuntu3) precise-proposed; urgency=low

  * debian/patches/git_thai_zero_width_spaces.patch:
    - correctly handle zero width spaces in thai (lp: #986008)
 -- Sebastien Bacher <email address hidden> Wed, 30 May 2012 18:02:05 +0200

Changed in pango1.0 (Ubuntu Precise):
status: Fix Committed → Fix Released
Changed in pango:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.