Comment 7 for bug 295161

Revision history for this message
John A Meinel (jameinel) wrote :

So there is a subtle issue in how we handle revisions properties that we need a better answer to.

Specifically, we write the per-file message as a blob to to a cElementTree.Element object (as the '.text' member).

This seems to serialize down to a text string while preserving everything. However the parsing functions for cElementTree end up normalizing end-of-line characters as though you had used a "text" mode file rather than a "binary" mode file.

Here is a simple example using python >2.5

from xml.etree.cElementTree import Element, tostring, fromstring
e = Element('element-name')
e.text = 'text\rwith\nvarious\r\nline endings\n'
e.tail = '\n'
s = tostring(e)
print repr(s) # '<element-name>text\rwith\nvarious\r\nline endings\n</element-name>\n'
f = fromstring(s)
f.text == e.text # False
print repr(f.text) # 'text\nwith\nvarious\nline endings\n'

So it would seem that the data is written out with all '\r' characters onto disk, but the code that reads that back into memory is combining '\r\n' into '\n', and transforming '\r' into '\n'.

This isn't detected by the 'sha1' check, because that check is happening on the raw bytes which still have all the necessary information.