Merge lp:~mjumbewu/beautifulsoup/text-white-space-fix into lp:beautifulsoup/3.2

Proposed by Mjumbe Wawatu Ukweli
Status: Rejected
Rejected by: Leonard Richardson
Proposed branch: lp:~mjumbewu/beautifulsoup/text-white-space-fix
Merge into: lp:beautifulsoup/3.2
Diff against target: 39 lines (+13/-2)
2 files modified
BeautifulSoup.py (+3/-2)
BeautifulSoupTests.py (+10/-0)
To merge this branch: bzr merge lp:~mjumbewu/beautifulsoup/text-white-space-fix
Reviewer Review Type Date Requested Status
Leonard Richardson Disapprove
Review via email: mp+62629@code.launchpad.net

This proposal supersedes a proposal from 2011-05-27.

Description of the change

BeautifulSoup removes too much white space on getText. For example, the text of "<p>This is a <i>test</i>, ok?" should be "This is a test, ok?". Instead, BS calculates it as "This is atest, ok?"

This invalidates bug #788986

To post a comment you must log in.
Revision history for this message
Leonard Richardson (leonardr) wrote :

Bug 788986 was fixed another way.

review: Disapprove

Unmerged revisions

45. By Mjumbe Wawatu Ukweli

In getText, multiple white space characters get truncated to one.

44. By Mjumbe Wawatu Ukweli

Preserve spacing when using getText.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'BeautifulSoup.py'
2--- BeautifulSoup.py 2010-11-21 13:35:35 +0000
3+++ BeautifulSoup.py 2011-05-27 08:49:28 +0000
4@@ -569,9 +569,10 @@
5 current = self.contents[0]
6 while current is not stopNode:
7 if isinstance(current, NavigableString):
8- strings.append(current.strip())
9+ strings.append(current)
10 current = current.next
11- return separator.join(strings)
12+ result = separator.join(strings)
13+ return re.sub(r'\s+', ' ', result)
14
15 text = property(getText)
16
17
18=== added file 'BeautifulSoup.pyc'
19Binary files BeautifulSoup.pyc 1970-01-01 00:00:00 +0000 and BeautifulSoup.pyc 2011-05-27 08:49:28 +0000 differ
20=== modified file 'BeautifulSoupTests.py'
21--- BeautifulSoupTests.py 2010-11-21 13:25:36 +0000
22+++ BeautifulSoupTests.py 2011-05-27 08:49:28 +0000
23@@ -220,6 +220,16 @@
24 soup = BeautifulSoup("<ul><li>spam</li><li>eggs</li><li>cheese</li>")
25 self.assertEquals(soup.ul.text, "spameggscheese")
26 self.assertEquals(soup.ul.getText('/'), "spam/eggs/cheese")
27+
28+ def testTextHasCorrectSpacing(self):
29+ soup = BeautifulSoup("<p>This is a <i>test</i>.")
30+ self.assertEquals(soup.text, "This is a test.")
31+ self.assertEquals(soup.getText('/'), "This is a /test/.")
32+
33+ def testMultipleSpacesBecomeOne(self):
34+ soup = BeautifulSoup("<p>This is a \n\n<i>test</i>.")
35+ self.assertEquals(soup.text, "This is a test.")
36+ self.assertEquals(soup.getText('/'), "This is a /test/.")
37
38 class ThatsMyLimit(SoupTest):
39 "Tests the limit argument."

Subscribers

People subscribed via source and target branches