w3mman2html.cgi doesn't correctly underline UTF-8 characters

Bug #680202 reported by Piotr P. Karwasz
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Manpage Repository
Fix Released
Undecided
Unassigned
w3m
Unknown
Unknown
w3m (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Binary package hint: w3m

Ubuntu version: Ubuntu 10.10
Package version: 0.5.2-6

W3mman2html doesn't correctly deal with underlined UTF-8 text and every single byte is underlined separately, for example the backspace-escape code _^Hé is transformed into the HTML code <u>\0xC3</u>\0xA9 (with two invalid 1 byte UTF-8 sequences).

In fact it assumes that backspace escape codes are of the form __^Hé or é^H__ (with two underscores). However, as far as I could test with the Ubuntu man program, only one underscore is generated, independently of the length of the UTF-8 encoding for that letter (man version 2.5.7-4, groff version 1.20.1-10).

The number of backspace characters in the bold and italic escape codes is only one as far as I could see, hence the match for multiple backspace characters is useless, even if it is innocuous.

I submit a patch that should correctly deal with bold and underline escapes independently of the length of the UTF-8 character. Till now only 2-byte characters were taken into account.

If the man page is in a single byte encoding instead of UTF-8, the underline matching code may match too much character like for the combination _^Hé . Such sequences should however be very rare, since usually only whole words are underlined and a backspace escape code will be followed either by a space or by another backspace escape code.

Tags: patch w3mman

Related branches

Revision history for this message
Piotr P. Karwasz (chopinhauer) wrote :
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "Correct underline processing and more UTF-8 support" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package w3m - 0.5.3-11

---------------
w3m (0.5.3-11) unstable; urgency=low

  * Update 130_siteconf.patch to fix segfault (closes: #718612)
  * New patch 180_execdict.patch to fix potentially segfault
  * New patch 190_Strchop.patch to fix potentially segfault
  * New patch 900_ChangeLog.patch to update ChangeLog
  * Update 015_debian-version.patch to 0.5.3+debian-11

 -- Tatsuya Kinoshita <email address hidden> Thu, 08 Aug 2013 19:06:06 +0900

Changed in w3m (Ubuntu):
status: New → Fix Released
Revision history for this message
Joshua Powers (powersj) wrote :

Marking fix released on the manpage repo as well, as I believe this is taken care of.

Changed in ubuntu-manpage-repository:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.