lp:beautifulsoup

Created by Leonard Richardson and last modified
Get this branch:
bzr branch lp:beautifulsoup
Only Leonard Richardson can upload to this branch. If you are Leonard Richardson please log in for upload directions.

Branch merges

Related bugs

Related blueprints

Branch information

Owner:
Leonard Richardson
Project:
Beautiful Soup
Status:
Development

Recent revisions

603. By Leonard Richardson on 2021-04-09

Brought in fuzz tests from the oss-project into Beautiful Soup's unit test suite.

602. By Leonard Richardson on 2021-02-14

NavigableString and its subclasses now implement the get_text()
  method, as well as the properties .strings and
  .stripped_strings. These methods will either return the string
  itself, or nothing, so the only reason to use this is when iterating
  over a list of mixed Tag and NavigableString objects. [bug=1904309]

601. By Leonard Richardson on 2021-02-14

The 'html5' formatter now treats attributes whose values are the
  empty string as HTML boolean attributes. Previously (and in other
  formatters), an attribute value must be set as None to be treated as
  a boolean attribute. In a future release, I plan to also give this
  behavior to the 'html' formatter. Patch by Isaac Muse. [bug=1915424]

600. By Leonard Richardson on 2021-02-13

The behavior of methods like .get_text() and .strings now differs
  depending on the type of tag. The change is visible with HTML tags
  like <script>, <style>, and <template>. Starting in 4.9.0, methods
  like get_text() returned no results on such tags, because the
  contents of those tags are not considered 'text' within the document
  as a whole.

  But a user who calls script.get_text() is working from a different
  definition of 'text' than a user who calls div.get_text()--otherwise
  there would be no need to call script.get_text() at all. In 4.10.0,
  the contents of (e.g.) a <script> tag are considered 'text' during a
  get_text() call on the tag itself, but not considered 'text' during
  a get_text() call on the tag's parent.

  Because of this change, calling get_text() on each child of a tag
  may now return a different result than calling get_text() on the tag
  itself. That's because different tags now have different
  understandings of what counts as 'text'. [bug=1906226] [bug=1868861]

599. By Leonard Richardson on 2021-02-13

Corrected the use of special string container classes in cases when a
  single tag may contain strings with different containers; such as
  the <template> tag, which may contain both TemplateString objects
  and Comment objects. [bug=1913406]

598. By Leonard Richardson on 2021-02-13

Added a second way to pass specify encodings to UnicodeDammit and
  EncodingDetector, based on the order of precedence defined in the
  HTML5 spec, starting at:
  https://html.spec.whatwg.org/multipage/parsing.html#parsing-with-a-known-character-encoding

  Encodings in 'known_definite_encodings' are tried first, then
  byte-order-mark sniffing is run, then encodings in 'user_encodings'
  are tried. The old argument, 'override_encodings', is now a
  deprecated alias for 'known_definite_encodings'.

  This changes the default behavior of the html.parser and lxml tree
  builders, in a way that may slightly improve encoding
  detection but will probably have no effect. [bug=1889014]

597. By Leonard Richardson on 2021-02-13

Performance improvement when processing tags that speeds up overall
  tree construction by 2%. Patch by Morotti. [bug=1899358]

596. By Leonard Richardson on 2021-02-13

Improve the warning issued when a directory name (as opposed to
  the name of a regular file) is passed as markup into the BeautifulSoup
  constructor. [bug=1913628]

595. By Leonard Richardson on 2021-02-13

Corrected output when the namespace prefix associated with a
  namespaced attribute is the empty string, as opposed to
  None. [bug=1915583]

594. By Leonard Richardson on 2020-10-24

Exclude more tests from the package. Patch by Ville Skyttä.

Branch metadata

Branch format:
Branch format 7
Repository format:
Bazaar repository format 2a (needs bzr 1.16 or later)
Stacked on:
lp:beautifulsoup/3.2
This branch contains Public information 
Everyone can see this information.