Ok, think I see whats happening. I am using in doc8 'chardet' to detect that files encoding.
Trying on this file. It gives a bad encoding:
>>> b = open('test.rst').read()
>>> chardet.detect(b)['encoding']
'ISO-8859-2'
Sooo that is then being decoded in the 'ISO-8859-2' -> unicode by python.
Which then goes into docutils rst parser, which then when iterated over gives back the decoding of that line in 'ISO-8859-2'
>>> y = b'known exploit in the wild, for example – the time between advance notification'
>>> import six
>>> g = six.text_type(y, encoding='ISO-8859-2')
>>> print g
known exploit in the wild, for example â the time between advance notification
>>> len(g)
80
Ok, think I see whats happening. I am using in doc8 'chardet' to detect that files encoding.
Trying on this file. It gives a bad encoding:
>>> b = open('test. rst').read( ) detect( b)['encoding' ]
>>> chardet.
'ISO-8859-2'
Sooo that is then being decoded in the 'ISO-8859-2' -> unicode by python.
Which then goes into docutils rst parser, which then when iterated over gives back the decoding of that line in 'ISO-8859-2'
>>> y = b'known exploit in the wild, for example – the time between advance notification' 'ISO-8859- 2')
>>> import six
>>> g = six.text_type(y, encoding=
>>> print g
known exploit in the wild, for example â the time between advance notification
>>> len(g)
80
So that seems to be the cause (bad detection).