Bad characters in Python logger output when using rsyslog
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
python2.7 (Ubuntu) |
Fix Released
|
High
|
Scott Kitterman | ||
Precise |
Fix Released
|
High
|
Scott Kitterman | ||
Quantal |
Fix Released
|
High
|
Scott Kitterman |
Bug Description
[IMPACT]
Any UTF-8 messages that are sent to syslog by a Python application are corrupted.
[TESTCASE]
Run the code in comment #9. You can either do this by running the python interpreter and pasting the code into the python shell or creating a file with the code and running it as python foo where foo is the name of the file.
Then check /var/log/syslog for the mesage "AUDIT: TEST LOGER FROM PYTHON". There will be a few characters of garbage or odd looking numbers before the word AUDIT. If you see that, you've recreated the problem.
Install the updated packages from -proposed and re-run the python code from comment #9. Now there should be now garbage or unusual characters. Something like:
root: AUDIT: TEST LOGER FROM PYTHON
If you get that, the fix is verified.
[Regression Potential]
Nil. Patch is backported from upstream and is easily visually verified as correct.
[Other Info]
I ran this by Barry Warsaw and he agreed it would be important to get into 12.04.1.
Original Bug:
Ubuntu 12.0.4 LTS 64bit
python2.7-minimal 2.7.3-0ubuntu3
rsyslog 5.8.6-1ubuntu8
Python converts all syslog messages to UTF8 before sending to syslog. It also prepends the Byte Order Mark (BOM) of the Unicode Standard. This prepended BOM causes bad characters when using rsyslog (have not verified with std syslog or syslog-ng).
Example log line:
Jul 25 13:36:03 mc 2012-07-25 13:36:03 INFO nova.api.
Note the ' ' before the date field.
Interesting find on issues from another site:
"Yes, "" is the Byte Order Mark (BOM) of the Unicode Standard. Specifically it is the hex bytes EF BB BF, which form the UTF-8 representation of the BOM, misinterpreted as ISO 8859/1 text instead of UTF-8.
Probably what it means is that you are using a text editor that is saving files in UTF-8 with the BOM, when it should be saving without the BOM. It could be PHP files that have the BOM, in which case they'd appear as literal text on your page. Or it could be translated text you pasted into Joomla! edit windows.
The Unicode Consortium's FAQ on the Byte Order Mark is at http://
Note that if I edit the file: /usr/lib/
-------
@@ -797,9 +797,10 @@
# Message is a string. Convert to bytes as required by RFC 5424
if type(msg) is unicode:
+ # Morph
msg = msg.encode('utf-8')
- if codecs:
- msg = codecs.BOM_UTF8 + msg
+ #if codecs:
+ # msg = codecs.BOM_UTF8 + msg
msg = prio + msg
try:
if self.unixsocket:
-------
Perhaps something is wrong with the 'codecs' condition??
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
Changed in python2.7 (Ubuntu Precise): | |
status: | Triaged → In Progress |
Changed in python2.7 (Ubuntu Quantal): | |
assignee: | nobody → Scott Kitterman (kitterman) |
Changed in python2.7 (Ubuntu Precise): | |
assignee: | nobody → Scott Kitterman (kitterman) |
Filed bug with Rsyslog also: http:// bugzilla. adiscon. com/show_ bug.cgi? id=346