beautifulsoup 3.1 is buggy, provide 3.0 by default

Bug #392968 reported by Marien Zwart
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Beautiful Soup
Invalid
Undecided
Unassigned
beautifulsoup (Debian)
Fix Released
Unknown
beautifulsoup (Ubuntu)
Fix Released
Wishlist
Unassigned

Bug Description

As documented on http://www.crummy.com/software/BeautifulSoup/3.1-problems.html the 3.1 BeautifulSoup branch uses a different parser to be compatible with Python 3. Unfortunately this parser does much worse on invalid html than the old one did, and a very common reason to use BeautifulSoup at all is to parse invalid html. Upstream recommends either using the latest 3.0.x BeautifulSoup (3.0.7a at time of writing) or some other library. I think providing BeautifulSoup 3.0.x by default (providing BeautifulSoup under a different name) would make sense, given the number of people on freenode's #python I've had to link to http://www.crummy.com/software/BeautifulSoup/3.1-problems.html already.

Revision history for this message
Benedikt Kristinsson (benedikt-k) wrote :

I agree with this. Beautiful Soup 3.1 is broken. however, as it has some more features than 3.0 it should be provided in a separate packages.

Chuck Short (zulcss)
Changed in beautifulsoup (Ubuntu):
importance: Undecided → Wishlist
status: New → Confirmed
Revision history for this message
Stefano Rivera (stefanor) wrote :

> Beautiful Soup 3.1 is broken. however, as it has some more features than 3.0 it should be provided in a separate packages.

That is not something that can be easily done with Python packages, only a single version can be in th archive at a time. BeautifulSoup 4 looks like it'll have the package name "beatifulsoup" so it may be installable concurrently with 3.x which is called "BeautifulSoup" (on unix).

I've taken over the maintainance of this package on Debian, and now need to work out how best to handle this... :/

Changed in beautifulsoup (Debian):
status: Unknown → New
Changed in beautifulsoup (Debian):
status: New → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package beautifulsoup - 3.2.0-1

---------------
beautifulsoup (3.2.0-1) unstable; urgency=low

  * Adopting beautifulsoup for Debian Python Modules Team. (Closes: #612875)
  * New upstream version.
    - The 3.2 release reverts back to the 3.0 SGMLParser approach.
      (Closes: #564160, LP: #392968)
    - <script> blocks are correctly handled again
      (Closes: #516824, LP: #357067)
    - Upstream no longer ships a changelog. (Closes: #530408)
  * Bump standards version to 3.9.1. Moved into python section.
  * Switch to Source Format 3.0 (quilt).
  * Switch to dh_python2.
    - Use X-Python-Version.
  * debian/control:
    - Drop -XB-Python-Version. Deprecated.
    - Drop Provides, Replaces, Conflicts. Versioned package names for Python
      modules are deprecated. No supported releases have packages requiring
      them.
    - Add Homepage.
    - Add Vcs- URLs.
    - Recommend python-chardet.
  * Bump debhelper dependency and compat level to 8.
  * Use DEP5 format debian/copyright.
  * Add watch file. (Closes: #607864)
  * Don't install tests as an example.
  * debian/rules:
    - Use minimal dh 7 style.
    - Run test suite during build.
 -- Stefano Rivera <email address hidden> Tue, 15 Feb 2011 19:21:30 +0000

Changed in beautifulsoup (Ubuntu):
status: Confirmed → Fix Released
Changed in beautifulsoup:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.