Bad characters in Python logger output when using rsyslog

Bug #1029640 reported by Aimon Bustardo
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
python2.7 (Ubuntu)
Fix Released
High
Scott Kitterman
Precise
Fix Released
High
Scott Kitterman
Quantal
Fix Released
High
Scott Kitterman

Bug Description

[IMPACT]

Any UTF-8 messages that are sent to syslog by a Python application are corrupted.

[TESTCASE]

Run the code in comment #9. You can either do this by running the python interpreter and pasting the code into the python shell or creating a file with the code and running it as python foo where foo is the name of the file.

Then check /var/log/syslog for the mesage "AUDIT: TEST LOGER FROM PYTHON". There will be a few characters of garbage or odd looking numbers before the word AUDIT. If you see that, you've recreated the problem.

Install the updated packages from -proposed and re-run the python code from comment #9. Now there should be now garbage or unusual characters. Something like:

root: AUDIT: TEST LOGER FROM PYTHON

If you get that, the fix is verified.

[Regression Potential]

Nil. Patch is backported from upstream and is easily visually verified as correct.

[Other Info]

I ran this by Barry Warsaw and he agreed it would be important to get into 12.04.1.

Original Bug:

Ubuntu 12.0.4 LTS 64bit
python2.7-minimal 2.7.3-0ubuntu3
rsyslog 5.8.6-1ubuntu8

Python converts all syslog messages to UTF8 before sending to syslog. It also prepends the Byte Order Mark (BOM) of the Unicode Standard. This prepended BOM causes bad characters when using rsyslog (have not verified with std syslog or syslog-ng).

Example log line:

Jul 25 13:36:03 mc 2012-07-25 13:36:03 INFO nova.api.openstack.wsgi [req-48a555a5-6d2a-4a38-8384-3b4684357e72 19f932a5b0b34655989f4cb761522bb3 2617e657fdf84569a6be7977318e46c8] http://MASKED:8774/v1.1/2617e657fdf84569a6be7977318e46c8/os-hosts/MASKED.json?ignore_awful_caching1343248563 returned with HTTP 200

Note the ' ' before the date field.

Interesting find on issues from another site:

"Yes, "" is the Byte Order Mark (BOM) of the Unicode Standard. Specifically it is the hex bytes EF BB BF, which form the UTF-8 representation of the BOM, misinterpreted as ISO 8859/1 text instead of UTF-8.

Probably what it means is that you are using a text editor that is saving files in UTF-8 with the BOM, when it should be saving without the BOM. It could be PHP files that have the BOM, in which case they'd appear as literal text on your page. Or it could be translated text you pasted into Joomla! edit windows.

The Unicode Consortium's FAQ on the Byte Order Mark is at http://www.unicode.org/faq/utf_bom.html#BOM ."

Note that if I edit the file: /usr/lib/python2.7/logging/handlers.py as shown in this patch, the bad characters go away:
----------------------------------------------------------
@@ -797,9 +797,10 @@
                                             self.mapPriority(record.levelname))
         # Message is a string. Convert to bytes as required by RFC 5424
         if type(msg) is unicode:
+ # Morph
            msg = msg.encode('utf-8')
- if codecs:
- msg = codecs.BOM_UTF8 + msg
+ #if codecs:
+ # msg = codecs.BOM_UTF8 + msg
         msg = prio + msg
         try:
             if self.unixsocket:

----------------------------------------------------

Perhaps something is wrong with the 'codecs' condition??

Aimon Bustardo (aimonb)
description: updated
Aimon Bustardo (aimonb)
description: updated
description: updated
description: updated
Revision history for this message
Aimon Bustardo (aimonb) wrote :
Revision history for this message
Aimon Bustardo (aimonb) wrote :
Revision history for this message
Aimon Bustardo (aimonb) wrote :

This indeed is a known Python bug (http://bugs.python.org/msg158447). Ubuntu Python is out of date.

Revision history for this message
Scott Kitterman (kitterman) wrote :

Fixed in quantal:

        # Message is a string. Convert to bytes as required by RFC 5424
        if type(msg) is unicode:
            msg = msg.encode('utf-8')
            if codecs:
                msg = codecs.BOM_UTF8 + msg
        msg = prio + msg

Changed in python2.7 (Ubuntu):
status: New → Fix Released
Changed in python2.7 (Ubuntu Precise):
milestone: none → ubuntu-12.04.1
status: New → Triaged
importance: Undecided → High
Changed in python2.7 (Ubuntu Quantal):
importance: Undecided → High
Revision history for this message
Scott Kitterman (kitterman) wrote :

If someone can write up a reliable test case so the fix can be verified, I'll get the patch backfitted to 12.04's python2.7.

Revision history for this message
Aimon Bustardo (aimonb) wrote :

Any python app using the std python logger to syslog will exhibit the behavior. Openstack is one example. it is apparent when tailing the syslog. I am willing to write up a test case however I have no idea what format you guys use.. can you point me to previous ones?

Revision history for this message
Scott Kitterman (kitterman) wrote :

Here's the requirements: https://wiki.ubuntu.com/StableReleaseUpdates#SRU_Bug_Template

It doesn't have to be very formal, just provide the needed information. I did quickly try one python application I use that uses syslog and I did not immediately see this behavior. If you can whip up a couple of lines of python to use to trigger the case, that would be helpful too.

Other than the test case, the other things in the template are trivial. I don't mind dealing with them if you don't wan to.

Revision history for this message
Aimon Bustardo (aimonb) wrote :

Yeah I can do that.. WIll write up a code snippet and attach with details.

Revision history for this message
Aimon Bustardo (aimonb) wrote :

ok this code will reproduce. Note that key was to convert to utf8 before sending to syslog.

# vim: tabstop=4 shiftwidth=4 softtabstop=4
import logging
from logging.handlers import SysLogHandler

# our new audit level
# NOTE(jkoelker) Since we synthesized an audit level, make the logging
# module aware of it so it acts like other levels.
logging.AUDIT = logging.INFO + 1
logging.addLevelName(logging.AUDIT, 'AUDIT')

logger = logging.getLogger()
logger.setLevel(logging.INFO)
facility = getattr(logging.handlers.SysLogHandler, 'LOG_SYSLOG', None)

syslog = logging.handlers.SysLogHandler(address='/dev/log', facility=facility)

formatter = logging.Formatter('%(name)s: %(levelname)s: %(message)s')
syslog.setFormatter(formatter)
logger.addHandler(syslog)
# 'application' code
msg = unicode("TEST LOGER FROM PYTHON", "utf-8")
logger.log(logging.AUDIT, msg)

Revision history for this message
Aimon Bustardo (aimonb) wrote :

btw output from above code:

# /usr/bin/python log_stuff.py
Jul 27 13:52:04 mc root: #277root: AUDIT: TEST LOGER FROM PYTHON

Revision history for this message
Scott Kitterman (kitterman) wrote :

Turns out it's not fixed in quantal either.

Changed in python2.7 (Ubuntu Quantal):
status: Fix Released → Triaged
milestone: none → ubuntu-12.10
milestone: ubuntu-12.10 → ubuntu-12.10-beta-1
Revision history for this message
Aimon Bustardo (aimonb) wrote :

You sure? This commit diff accurately applies the fix (removal of the BOM addition): http://hg.python.org/cpython/rev/af46a001d5ec

Revision history for this message
Scott Kitterman (kitterman) wrote :

Yes. It needs to be applied to both releases as they've currently got the same python2.7 version:

 python2.7 | 2.7.3-0ubuntu3 | precise | source, amd64, armel, armhf, i386, powerpc
 python2.7 | 2.7.3-0ubuntu3 | quantal | source, amd64, armel, armhf, i386, powerpc

Looking into it now.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package python2.7 - 2.7.3-0ubuntu4

---------------
python2.7 (2.7.3-0ubuntu4) quantal; urgency=low

  * Cherry pick af46a001d5ec to remove Python syslog BOM insertion code that
    was causing corrupt log messages in syslog. LP: #1029640
 -- Scott Kitterman <email address hidden> Fri, 27 Jul 2012 22:54:14 -0400

Changed in python2.7 (Ubuntu Quantal):
status: Triaged → Fix Released
description: updated
Changed in python2.7 (Ubuntu Precise):
status: Triaged → In Progress
Changed in python2.7 (Ubuntu Quantal):
assignee: nobody → Scott Kitterman (kitterman)
Changed in python2.7 (Ubuntu Precise):
assignee: nobody → Scott Kitterman (kitterman)
Revision history for this message
Adam Conrad (adconrad) wrote : Please test proposed package

Hello Aimon, or anyone else affected,

Accepted python2.7 into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/python2.7/2.7.3-0ubuntu3.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in python2.7 (Ubuntu Precise):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Aimon Bustardo (aimonb) wrote :

 I have installed the proposed python and it tested successfully. No bad characters seen in logs.

Changed in python2.7 (Ubuntu Precise):
status: Fix Committed → Confirmed
Revision history for this message
Aimon Bustardo (aimonb) wrote :

I have set precise to "Confirmed". I do not have a Quantal system to confirm the Quantal change.

Revision history for this message
Scott Kitterman (kitterman) wrote :

"Confirmed" means the bug is confirmed to exist. The way to confirm that a fix doesn't what it's supposed to is to change the bug tag from verification-needed to verification-done (I've done this for you).

Thank you for testing.

Changed in python2.7 (Ubuntu Precise):
status: Confirmed → Fix Committed
tags: added: verification-done
removed: verification-needed
Revision history for this message
Scott Kitterman (kitterman) wrote :

Precise was where we needed verification. The requirements for the development release (Quantal) are different. Thanks again. Assuming no regressions are identified, this should be released to all Ubuntu 12.04 users in precise-updates in about a week.

Revision history for this message
Aimon Bustardo (aimonb) wrote :

Ha! Ok I see how you guys do this now. Thanks for bearing with me as I learn the ropes. And thank you for your support getting this patch pushed to precise.

Revision history for this message
Colin Watson (cjwatson) wrote : Update Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package python2.7 - 2.7.3-0ubuntu3.1

---------------
python2.7 (2.7.3-0ubuntu3.1) precise-proposed; urgency=low

  * Cherry pick af46a001d5ec to remove Python syslog BOM insertion code that
    was causing corrupt log messages in syslog. LP: #1029640
 -- Scott Kitterman <email address hidden> Fri, 27 Jul 2012 22:54:14 -0400

Changed in python2.7 (Ubuntu Precise):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.