Merge ~andres-he/beautifulsoup:add_new_line_on_br_tags into beautifulsoup:master
Status: | Needs review |
---|---|
Proposed branch: | ~andres-he/beautifulsoup:add_new_line_on_br_tags |
Merge into: | beautifulsoup:master |
Diff against target: |
62 lines (+37/-0) 2 files modified
bs4/element.py (+15/-0) test_explaining_issue-remove_this.py (+22/-0) |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Leonard Richardson | Pending | ||
Review via email: mp+462910@code.launchpad.net |
Commit message
add line break on text acquisition when the element is a br tag to avoid strings unexpectedly joined
Description of the change
There are cases when you find something like this on html pages:
URL 1: www.example-
(you can find a real example on this link, look for "Blog" in the html: https:/
(when I see that page with the inspector tool, the html is prettified, so the <br> is surrounded by new lines, but when downloading the html, it is a case like the above).
The current implementation of text acquisition (get_text()) ignores <br> tags, resulting in a string like the following (for the example above): URL 1: www.example-
I included a test in the root directory of the repository (called test_explaining
With this new implementation, the test I added passes, along with the already existing tests.
Thank you for your work on this super helpful library.
Fixes #2058695: https:/ /bugs.launchpad .net/beautifuls oup/+bug/ 2058695