Calling 'string' or 'text' methods on a script-type tag returns no results

Bug #1906226 reported by Mitar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Fix Released
Undecided
Unassigned

Bug Description

.stripped_strings on <script> tags do not return its contents, while .string does. On the <p> both return the same. Example:

>>> BeautifulSoup("""<html><body><script>Test.</script></body></html>""").select_one('body script').string
'Test.'
>>> list(BeautifulSoup("""<html><body><script>Test.</script></body></html>""").select_one('body script').stripped_strings)
[]

Compare with:

>>> BeautifulSoup("""<html><body><p>Test.</p></body></html>""").select_one('body p').string
'Test.'
>>> list(BeautifulSoup("""<html><body><p>Test.</p></body></html>""").select_one('body p').stripped_strings)
['Test.']

Tested on beautifulsoup4==4.9.3.

summary: - .string vs. .stripped_strings on script tag
+ Calling 'string' or 'text' methods on a script-type tag returns no
+ results
Revision history for this message
Leonard Richardson (leonardr) wrote :

Let's imagine you have markup like this:

<div>
Some text.
<script>
Some more text.
</script>
</div>

Generally speaking, users expect div.get_text() to say "Some text.", not "Some text. Some more text." People don't consider the contents of <script> tags to be "text". That's the original point of issue #1868861.

However, when you call get_text() *on a <script> tag*, it's reasonable to assume that you _do_ consider the contents of a <script> tag to be "text"--otherwise you wouldn't bother calling the method. The way I implemented #1868861 excludes "Some more text." even when you call script.get_text(). That's the underlying cause of this issue.

Revision 600 puts a system into place that changes the behavior of tags like <script>, <style>, and <template> to something more like what you are looking for.

Revision history for this message
Mitar (mitar) wrote :

Awesome. Thanks for the explanation.

Changed in beautifulsoup:
status: New → Fix Committed
Revision history for this message
Leonard Richardson (leonardr) wrote :

Released in 4.10.0.

Changed in beautifulsoup:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.