Problem with chardet integration (need a bytearray)

Bug #571812 reported by Felipe Kellermann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Fix Released
Undecided
Unassigned

Bug Description

I have a fix for this bug. Registering it anyway so I can link this to a fix branch.

Here is the issue: when BeautifulSoup uses chardet, a str is used when a byte array is required. Thus this exception is generated when chardet is used (below). My patch adds the proper code to create a byte array to properly use chardet.

{{{
  File "Fetcher.py", line 262, in parse
    self._soup = BeautifulSoup(self._raw, convertEntities=BeautifulStoneSoup.HTML_ENTITIES)
  File "/home/felipek/Projects/trunk/code/contrib/WebCat/BeautifulSoup.py", line 1517, in __init__
    BeautifulStoneSoup.__init__(self, *args, **kwargs)
  File "/home/felipek/Projects/trunk/code/contrib/WebCat/BeautifulSoup.py", line 1142, in __init__
    self._feed(isHTML=isHTML)
  File "/home/felipek/Projects/trunk/code/contrib/WebCat/BeautifulSoup.py", line 1166, in _feed
    smartQuotesTo=self.smartQuotesTo, isHTML=isHTML)
  File "/home/felipek/Projects/trunk/code/contrib/WebCat/BeautifulSoup.py", line 1787, in __init__
    u = self._convertFrom(chardet.detect(self.markup)['encoding'])
  File "/home/felipek/Projects/trunk/code/contrib/WebCat/chardet/__init__.py", line 24, in detect
    u.feed(aBuf)
  File "/home/felipek/Projects/trunk/code/contrib/WebCat/chardet/universaldetector.py", line 116, in feed
    if prober.feed(aBuf) == constants.eFoundIt:
  File "/home/felipek/Projects/trunk/code/contrib/WebCat/chardet/charsetgroupprober.py", line 60, in feed
    st = prober.feed(aBuf)
  File "/home/felipek/Projects/trunk/code/contrib/WebCat/chardet/utf8prober.py", line 53, in feed
    codingState = self._mCodingSM.next_state(c)
  File "/home/felipek/Projects/trunk/code/contrib/WebCat/chardet/codingstatemachine.py", line 44, in next_state
    byteCls = self._mModel['classTable'][c]
}}}

Related branches

Changed in beautifulsoup:
assignee: nobody → Felipe Kellermann (felipekellermann)
status: New → Fix Committed
Revision history for this message
Leonard Richardson (leonardr) wrote :

Are you using a custom/bleeding-edge version of chardet? That might explain why I don't see this error.

Revision history for this message
Aaron DeVore (aaron-devore) wrote :

Your problem is in this line in chardet

byteCls = self._mModel['classTable'][c]

It should instead be

byteCls = self._mModel['classTable'][ord(c)]

Try switching to the very latest release version of chardet, version 2.0.1.

Changed in beautifulsoup:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.