javascript parsing error

Bug #357067 reported by Arthur Clune
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Beautiful Soup
Fix Released
Undecided
Unassigned
Python
Fix Released
Unknown
beautifulsoup (Debian)
Fix Released
Unknown
beautifulsoup (Ubuntu)
Fix Released
Low
Unassigned
python2.6 (Ubuntu)
Invalid
Low
Unassigned
python2.7 (Ubuntu)
Fix Released
Low
Unassigned

Bug Description

>>> p = """
... <HTML>
... <HEAD>
... </HEAD>
... <BODY>
... <script type=text/javascript>
... rgvij="></if";
... </script>
... </BODY>
... </html>
... """
>>> soup = BeautifulSoup(p)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'BeautifulSoup' is not defined
>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup(p)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/arthur/honeynet/code/js_decode/python/lib/python2.5/site-packages/BeautifulSoup.py", line 1499, in __init__
    'th' : ['tr'],
  File "/Users/arthur/honeynet/code/js_decode/python/lib/python2.5/site-packages/BeautifulSoup.py", line 1230, in __init__
    """We need to pop up to the previous tag of this type, unless
  File "/Users/arthur/honeynet/code/js_decode/python/lib/python2.5/site-packages/BeautifulSoup.py", line 1263, in _feed
    #If we encounter one of the nesting reset triggers
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/HTMLParser.py", line 108, in feed
    self.goahead(0)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/HTMLParser.py", line 150, in goahead
    k = self.parse_endtag(i)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/HTMLParser.py", line 314, in parse_endtag
    self.error("bad end tag: %r" % (rawdata[i:j],))
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/HTMLParser.py", line 115, in error
    raise HTMLParseError(message, self.getpos())
HTMLParser.HTMLParseError: bad end tag: u'</if";\n</script>', at line 7, column 9
>>>

This works correctly in 3.0.x series.

Revision history for this message
Daniel Darabos (darabos-daniel) wrote :

This happens even if the Javascript is inside <!-- -->. I think at least this case should be handled, because the contents of HTML comments are easy enough to ignore (easier than contents of strings within Javascript blocks).

Changed in beautifulsoup (Debian):
status: Unknown → New
Changed in python:
status: Unknown → New
Santiago M. Mola (smola)
Changed in beautifulsoup:
status: New → Confirmed
Revision history for this message
KAMI (kami911) wrote :

Any fix?

Revision history for this message
Arthur Clune (arthur-clune) wrote :

An update has been posted on the web page:

https://bugs.launchpad.net/beautifulsoup/+bug/357067

Revision history for this message
Matthias Klose (doko) wrote :

the upstream python report has a workaround for beautifulsoup

Changed in python-defaults (Ubuntu):
importance: Undecided → Low
status: New → Triaged
Mathias Gug (mathiaz)
Changed in beautifulsoup (Ubuntu):
importance: Undecided → Low
Changed in beautifulsoup (Debian):
status: New → Confirmed
Changed in beautifulsoup (Debian):
status: Confirmed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package beautifulsoup - 3.2.0-1

---------------
beautifulsoup (3.2.0-1) unstable; urgency=low

  * Adopting beautifulsoup for Debian Python Modules Team. (Closes: #612875)
  * New upstream version.
    - The 3.2 release reverts back to the 3.0 SGMLParser approach.
      (Closes: #564160, LP: #392968)
    - <script> blocks are correctly handled again
      (Closes: #516824, LP: #357067)
    - Upstream no longer ships a changelog. (Closes: #530408)
  * Bump standards version to 3.9.1. Moved into python section.
  * Switch to Source Format 3.0 (quilt).
  * Switch to dh_python2.
    - Use X-Python-Version.
  * debian/control:
    - Drop -XB-Python-Version. Deprecated.
    - Drop Provides, Replaces, Conflicts. Versioned package names for Python
      modules are deprecated. No supported releases have packages requiring
      them.
    - Add Homepage.
    - Add Vcs- URLs.
    - Recommend python-chardet.
  * Bump debhelper dependency and compat level to 8.
  * Use DEP5 format debian/copyright.
  * Add watch file. (Closes: #607864)
  * Don't install tests as an example.
  * debian/rules:
    - Use minimal dh 7 style.
    - Run test suite during build.
 -- Stefano Rivera <email address hidden> Tue, 15 Feb 2011 19:21:30 +0000

Changed in beautifulsoup (Ubuntu):
status: New → Fix Released
Changed in beautifulsoup:
status: Confirmed → Fix Released
Matthias Klose (doko)
affects: python-defaults (Ubuntu) → python2.6 (Ubuntu)
Changed in python2.7 (Ubuntu):
importance: Undecided → Low
status: New → Triaged
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package python2.7 - 2.7.2~rc1-1

---------------
python2.7 (2.7.2~rc1-1) unstable; urgency=low

  * Python 2.7.2 release candidate 1.
  * Update libpython symbols file for m68k (Thorsten Glaser). Closes: #627458.
  * Apply proposed patch for issue #670664. LP: #357067.
 -- Matthias Klose <email address hidden> Mon, 30 May 2011 18:14:23 +0000

Changed in python2.7 (Ubuntu):
status: Triaged → Fix Released
Changed in python:
status: New → Fix Released
Revision history for this message
dino99 (9d9) wrote :

Support for this version has ended

Changed in python2.6 (Ubuntu):
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.