Merge beautifulsoup:soupsieve-integration into beautifulsoup:master

Proposed by Leonard Richardson
Status: Merged
Merged at revision: 91397340f4736b78c57c4b07cd11ef0587919200
Proposed branch: beautifulsoup:soupsieve-integration
Merge into: beautifulsoup:master
Diff against target: 2048 lines (+1070/-541)
11 files modified
CHANGELOG (+26/-0)
bs4/__init__.py (+1/-0)
bs4/css.py (+253/-0)
bs4/element.py (+39/-57)
bs4/tests/test_css.py (+493/-0)
bs4/tests/test_pageelement.py (+1/-427)
doc.ko/index.html (+1/-1)
doc.ptbr/Makefile (+130/-0)
doc.ptbr/source/index.rst (+1/-1)
doc.zh/source/index.rst (+1/-1)
doc/source/index.rst (+124/-54)
Reviewer Review Type Date Requested Status
Leonard Richardson Pending
Review via email: mp+436759@code.launchpad.net
To post a comment you must log in.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/CHANGELOG b/CHANGELOG
2index 086be15..7b9b673 100644
3--- a/CHANGELOG
4+++ b/CHANGELOG
5@@ -3,6 +3,32 @@ Note: Beautiful Soup's official support for Python 2 ended on January 1st,
6 4.9.3. In the Launchpad Git repository, the final revision to support
7 Python 2 was revision 70f546b1e689a70e2f103795efce6d261a3dadf7.
8
9+= 4.12.0 (Unreleased)
10+
11+* Introduced the .css property, which centralizes all access to
12+ the Soup Sieve API. This allows Beautiful Soup to give direct
13+ access to as much of Soup Sieve that makes sense, without cluttering
14+ the BeautifulSoup and Tag classes with a lot of new methods.
15+
16+ This does mean one addition to the BeautifulSoup and Tag classes
17+ (the .css property itself), so this might be a breaking change if you
18+ happen to use Beautiful Soup to parse XML that includes a tag called
19+ <css>. In particular, code like this will not work in 4.12.0:
20+
21+ soup.css['id']
22+
23+ Code like this will work just as before:
24+
25+ soup.find_one('css')['id']
26+
27+ The Soup Sieve methods supported through the .css property are
28+ select(), select_one(), iselect(), closest(), match(), filter(),
29+ and escape(). The BeautifulSoup and Tag classes still support the
30+ select() and select_one() methods; they have not been deprecated,
31+ but they have been demoted to convenience methods.
32+
33+ [bug=2003677]
34+
35 = 4.11.2 (20230131)
36
37 * Fixed test failures caused by nondeterministic behavior of
38diff --git a/bs4/__init__.py b/bs4/__init__.py
39index db71cc7..a5128d2 100644
40--- a/bs4/__init__.py
41+++ b/bs4/__init__.py
42@@ -43,6 +43,7 @@ from .dammit import UnicodeDammit
43 from .element import (
44 CData,
45 Comment,
46+ CSS,
47 DEFAULT_OUTPUT_ENCODING,
48 Declaration,
49 Doctype,
50diff --git a/bs4/css.py b/bs4/css.py
51new file mode 100644
52index 0000000..b237051
53--- /dev/null
54+++ b/bs4/css.py
55@@ -0,0 +1,253 @@
56+"""Integration code for CSS selectors using Soup Sieve (pypi: soupsieve)."""
57+
58+try:
59+ import soupsieve
60+except ImportError as e:
61+ soupsieve = None
62+ warnings.warn(
63+ 'The soupsieve package is not installed. CSS selectors cannot be used.'
64+ )
65+
66+
67+class CSS(object):
68+ """A proxy object against the soupsieve library, to simplify its
69+ CSS selector API.
70+
71+ Acquire this object through the .css attribute on the
72+ BeautifulSoup object, or on the Tag you want to use as the
73+ starting point for a CSS selector.
74+
75+ The main advantage of doing this is that the tag to be selected
76+ against doesn't need to be explicitly specified in the function
77+ calls, since it's already scoped to a tag.
78+ """
79+
80+ def __init__(self, tag, api=soupsieve):
81+ """Constructor.
82+
83+ You don't need to instantiate this class yourself; instead,
84+ access the .css attribute on the BeautifulSoup object, or on
85+ the Tag you want to use as the starting point for your CSS
86+ selector.
87+
88+ :param tag: All CSS selectors will use this as their starting
89+ point.
90+
91+ :param api: A plug-in replacement for the soupsieve module,
92+ designed mainly for use in tests.
93+ """
94+ if api is None:
95+ raise NotImplementedError(
96+ "Cannot execute CSS selectors because the soupsieve package is not installed."
97+ )
98+ self.api = api
99+ self.tag = tag
100+
101+ def escape(self, ident):
102+ """Escape a CSS identifier.
103+
104+ This is a simple wrapper around soupselect.escape(). See the
105+ documentation for that function for more information.
106+ """
107+ if soupsieve is None:
108+ raise NotImplementedError(
109+ "Cannot escape CSS identifiers because the soupsieve package is not installed."
110+ )
111+ return self.api.escape(ident)
112+
113+ def _ns(self, ns):
114+ """Normalize a dictionary of namespaces."""
115+ if ns is None:
116+ ns = self.tag._namespaces
117+ return ns
118+
119+ def _rs(self, results):
120+ """Normalize a list of results to a Resultset.
121+
122+ A ResultSet is more consistent with the rest of Beautiful
123+ Soup's API, and ResultSet.__getattr__ has a helpful error
124+ message if you try to treat a list of results as a single
125+ result (a common mistake).
126+ """
127+ # Import here to avoid circular import
128+ from bs4.element import ResultSet
129+ return ResultSet(None, results)
130+
131+ def select_one(self, select, namespaces=None, flags=0, **kwargs):
132+ """Perform a CSS selection operation on the current Tag and return the
133+ first result.
134+
135+ This uses the Soup Sieve library. For more information, see
136+ that library's documentation for the soupsieve.select_one()
137+ method.
138+
139+ :param selector: A CSS selector.
140+
141+ :param namespaces: A dictionary mapping namespace prefixes
142+ used in the CSS selector to namespace URIs. By default,
143+ Beautiful Soup will use the prefixes it encountered while
144+ parsing the document.
145+
146+ :param flags: Flags to be passed into Soup Sieve's
147+ soupsieve.select_one() method.
148+
149+ :param kwargs: Keyword arguments to be passed into SoupSieve's
150+ soupsieve.select_one() method.
151+
152+ :return: A Tag, or None if the selector has no match.
153+ :rtype: bs4.element.Tag
154+
155+ """
156+ return self.api.select_one(
157+ select, self.tag, self._ns(namespaces), flags, **kwargs
158+ )
159+
160+ def select(self, select, namespaces=None, limit=0, flags=0, **kwargs):
161+ """Perform a CSS selection operation on the current Tag.
162+
163+ This uses the Soup Sieve library. For more information, see
164+ that library's documentation for the soupsieve.select()
165+ method.
166+
167+ :param selector: A string containing a CSS selector.
168+
169+ :param namespaces: A dictionary mapping namespace prefixes
170+ used in the CSS selector to namespace URIs. By default,
171+ Beautiful Soup will pass in the prefixes it encountered while
172+ parsing the document.
173+
174+ :param limit: After finding this number of results, stop looking.
175+
176+ :param flags: Flags to be passed into Soup Sieve's
177+ soupsieve.select() method.
178+
179+ :param kwargs: Keyword arguments to be passed into SoupSieve's
180+ soupsieve.select() method.
181+
182+ :return: A ResultSet of Tag objects.
183+ :rtype: bs4.element.ResultSet
184+
185+ """
186+ if limit is None:
187+ limit = 0
188+
189+ return self._rs(
190+ self.api.select(
191+ select, self.tag, self._ns(namespaces), limit, flags,
192+ **kwargs
193+ )
194+ )
195+
196+ def iselect(self, select, namespaces=None, limit=0, flags=0, **kwargs):
197+ """Perform a CSS selection operation on the current Tag.
198+
199+ This uses the Soup Sieve library. For more information, see
200+ that library's documentation for the soupsieve.iselect()
201+ method. It is the same as select(), but it returns a generator
202+ instead of a list.
203+
204+ :param selector: A string containing a CSS selector.
205+
206+ :param namespaces: A dictionary mapping namespace prefixes
207+ used in the CSS selector to namespace URIs. By default,
208+ Beautiful Soup will pass in the prefixes it encountered while
209+ parsing the document.
210+
211+ :param limit: After finding this number of results, stop looking.
212+
213+ :param flags: Flags to be passed into Soup Sieve's
214+ soupsieve.iselect() method.
215+
216+ :param kwargs: Keyword arguments to be passed into SoupSieve's
217+ soupsieve.iselect() method.
218+
219+ :return: A generator
220+ :rtype: types.GeneratorType
221+ """
222+ return self.api.iselect(
223+ select, self.tag, self._ns(namespaces), limit, flags, **kwargs
224+ )
225+
226+ def closest(self, select, namespaces=None, flags=0, **kwargs):
227+ """Find the Tag closest to this one that matches the given selector.
228+
229+ This uses the Soup Sieve library. For more information, see
230+ that library's documentation for the soupsieve.closest()
231+ method.
232+
233+ :param selector: A string containing a CSS selector.
234+
235+ :param namespaces: A dictionary mapping namespace prefixes
236+ used in the CSS selector to namespace URIs. By default,
237+ Beautiful Soup will pass in the prefixes it encountered while
238+ parsing the document.
239+
240+ :param flags: Flags to be passed into Soup Sieve's
241+ soupsieve.closest() method.
242+
243+ :param kwargs: Keyword arguments to be passed into SoupSieve's
244+ soupsieve.closest() method.
245+
246+ :return: A Tag, or None if there is no match.
247+ :rtype: bs4.Tag
248+
249+ """
250+ return self.api.closest(
251+ select, self.tag, self._ns(namespaces), flags, **kwargs
252+ )
253+
254+ def match(self, select, namespaces=None, flags=0, **kwargs):
255+ """Check whether this Tag matches the given CSS selector.
256+
257+ This uses the Soup Sieve library. For more information, see
258+ that library's documentation for the soupsieve.match()
259+ method.
260+
261+ :param: a CSS selector.
262+
263+ :param namespaces: A dictionary mapping namespace prefixes
264+ used in the CSS selector to namespace URIs. By default,
265+ Beautiful Soup will pass in the prefixes it encountered while
266+ parsing the document.
267+
268+ :param flags: Flags to be passed into Soup Sieve's
269+ soupsieve.match() method.
270+
271+ :param kwargs: Keyword arguments to be passed into SoupSieve's
272+ soupsieve.match() method.
273+
274+ :return: True if this Tag matches the selector; False otherwise.
275+ :rtype: bool
276+ """
277+ return self.api.match(
278+ select, self.tag, self._ns(namespaces), flags, **kwargs
279+ )
280+
281+ def filter(self, select, namespaces=None, flags=0, **kwargs):
282+ """Filter this Tag's direct children based on the given CSS selector.
283+
284+ This uses the Soup Sieve library. It works the same way as
285+ passing this Tag into that library's soupsieve.filter()
286+ method. More information, for more information see the
287+ documentation for soupsieve.filter().
288+
289+ :param namespaces: A dictionary mapping namespace prefixes
290+ used in the CSS selector to namespace URIs. By default,
291+ Beautiful Soup will pass in the prefixes it encountered while
292+ parsing the document.
293+
294+ :param flags: Flags to be passed into Soup Sieve's
295+ soupsieve.filter() method.
296+
297+ :param kwargs: Keyword arguments to be passed into SoupSieve's
298+ soupsieve.filter() method.
299+
300+ :return: A ResultSet of Tag objects.
301+ :rtype: bs4.element.ResultSet
302+
303+ """
304+ return self._rs(
305+ self.api.filter(
306+ select, self.tag, self._ns(namespaces), flags, **kwargs
307+ )
308+ )
309diff --git a/bs4/element.py b/bs4/element.py
310index 583d0e8..619fb73 100644
311--- a/bs4/element.py
312+++ b/bs4/element.py
313@@ -8,14 +8,8 @@ except ImportError as e:
314 import re
315 import sys
316 import warnings
317-try:
318- import soupsieve
319-except ImportError as e:
320- soupsieve = None
321- warnings.warn(
322- 'The soupsieve package is not installed. CSS selectors cannot be used.'
323- )
324
325+from bs4.css import CSS
326 from bs4.formatter import (
327 Formatter,
328 HTMLFormatter,
329@@ -69,13 +63,13 @@ PYTHON_SPECIFIC_ENCODINGS = set([
330 "string-escape",
331 "string_escape",
332 ])
333-
334+
335
336 class NamespacedAttribute(str):
337 """A namespaced string (e.g. 'xml:lang') that remembers the namespace
338 ('xml') and the name ('lang') that were used to create it.
339 """
340-
341+
342 def __new__(cls, prefix, name=None, namespace=None):
343 if not name:
344 # This is the default namespace. Its name "has no value"
345@@ -146,14 +140,14 @@ class ContentMetaAttributeValue(AttributeValueWithCharsetSubstitution):
346 return match.group(1) + encoding
347 return self.CHARSET_RE.sub(rewrite, self.original_value)
348
349-
350+
351 class PageElement(object):
352 """Contains the navigational information for some part of the page:
353 that is, its current location in the parse tree.
354
355 NavigableString, Tag, etc. are all subclasses of PageElement.
356 """
357-
358+
359 def setup(self, parent=None, previous_element=None, next_element=None,
360 previous_sibling=None, next_sibling=None):
361 """Sets up the initial relations between this element and
362@@ -163,7 +157,7 @@ class PageElement(object):
363
364 :param previous_element: The element parsed immediately before
365 this one.
366-
367+
368 :param next_element: The element parsed immediately before
369 this one.
370
371@@ -257,11 +251,11 @@ class PageElement(object):
372 default = object()
373 def _all_strings(self, strip=False, types=default):
374 """Yield all strings of certain classes, possibly stripping them.
375-
376+
377 This is implemented differently in Tag and NavigableString.
378 """
379 raise NotImplementedError()
380-
381+
382 @property
383 def stripped_strings(self):
384 """Yield all strings in this PageElement, stripping them first.
385@@ -294,11 +288,11 @@ class PageElement(object):
386 strip, types=types)])
387 getText = get_text
388 text = property(get_text)
389-
390+
391 def replace_with(self, *args):
392- """Replace this PageElement with one or more PageElements, keeping the
393+ """Replace this PageElement with one or more PageElements, keeping the
394 rest of the tree the same.
395-
396+
397 :param args: One or more PageElements.
398 :return: `self`, no longer part of the tree.
399 """
400@@ -410,7 +404,7 @@ class PageElement(object):
401 This works the same way as `list.insert`.
402
403 :param position: The numeric position that should be occupied
404- in `self.children` by the new PageElement.
405+ in `self.children` by the new PageElement.
406 :param new_child: A PageElement.
407 """
408 if new_child is None:
409@@ -546,7 +540,7 @@ class PageElement(object):
410 "Element has no parent, so 'after' has no meaning.")
411 if any(x is self for x in args):
412 raise ValueError("Can't insert an element after itself.")
413-
414+
415 offset = 0
416 for successor in args:
417 # Extract first so that the index won't be screwed up if they
418@@ -912,7 +906,7 @@ class PageElement(object):
419 :rtype: bool
420 """
421 return getattr(self, '_decomposed', False) or False
422-
423+
424 # Old non-property versions of the generators, for backwards
425 # compatibility with BS3.
426 def nextGenerator(self):
427@@ -936,7 +930,7 @@ class NavigableString(str, PageElement):
428
429 When Beautiful Soup parses the markup <b>penguin</b>, it will
430 create a NavigableString for the string "penguin".
431- """
432+ """
433
434 PREFIX = ''
435 SUFFIX = ''
436@@ -1059,10 +1053,10 @@ class PreformattedString(NavigableString):
437 as comments (the Comment class) and CDATA blocks (the CData
438 class).
439 """
440-
441+
442 PREFIX = ''
443 SUFFIX = ''
444-
445+
446 def output_ready(self, formatter=None):
447 """Make this string ready for output by adding any subclass-specific
448 prefix or suffix.
449@@ -1144,7 +1138,7 @@ class Stylesheet(NavigableString):
450 """
451 pass
452
453-
454+
455 class Script(NavigableString):
456 """A NavigableString representing an executable script (probably
457 Javascript).
458@@ -1250,7 +1244,7 @@ class Tag(PageElement):
459 if ((not builder or builder.store_line_numbers)
460 and (sourceline is not None or sourcepos is not None)):
461 self.sourceline = sourceline
462- self.sourcepos = sourcepos
463+ self.sourcepos = sourcepos
464 if attrs is None:
465 attrs = {}
466 elif attrs:
467@@ -1308,7 +1302,7 @@ class Tag(PageElement):
468 self.interesting_string_types = builder.string_containers[self.name]
469 else:
470 self.interesting_string_types = self.DEFAULT_INTERESTING_STRING_TYPES
471-
472+
473 parserClass = _alias("parser_class") # BS3
474
475 def __copy__(self):
476@@ -1329,7 +1323,7 @@ class Tag(PageElement):
477 for child in self.contents:
478 clone.append(child.__copy__())
479 return clone
480-
481+
482 @property
483 def is_empty_element(self):
484 """Is this tag an empty-element tag? (aka a self-closing tag)
485@@ -1433,7 +1427,7 @@ class Tag(PageElement):
486 i.contents = []
487 i._decomposed = True
488 i = n
489-
490+
491 def clear(self, decompose=False):
492 """Wipe out all children of this PageElement by calling extract()
493 on them.
494@@ -1521,7 +1515,7 @@ class Tag(PageElement):
495 if not isinstance(value, list):
496 value = [value]
497 return value
498-
499+
500 def has_attr(self, key):
501 """Does this PageElement have an attribute with the given name?"""
502 return key in self.attrs
503@@ -1608,7 +1602,7 @@ class Tag(PageElement):
504 def __repr__(self, encoding="unicode-escape"):
505 """Renders this PageElement as a string.
506
507- :param encoding: The encoding to use (Python 2 only).
508+ :param encoding: The encoding to use (Python 2 only).
509 TODO: This is now ignored and a warning should be issued
510 if a value is provided.
511 :return: A (Unicode) string.
512@@ -1770,7 +1764,7 @@ class Tag(PageElement):
513 a Unicode string will be returned.
514 :param formatter: A Formatter object, or a string naming one of
515 the standard formatters.
516- :return: A Unicode string (if encoding==None) or a bytestring
517+ :return: A Unicode string (if encoding==None) or a bytestring
518 (otherwise).
519 """
520 if encoding is None:
521@@ -1826,7 +1820,7 @@ class Tag(PageElement):
522 if pretty_print and not preserve_whitespace:
523 s.append("\n")
524 return ''.join(s)
525-
526+
527 def encode_contents(
528 self, indent_level=None, encoding=DEFAULT_OUTPUT_ENCODING,
529 formatter="minimal"):
530@@ -1948,16 +1942,13 @@ class Tag(PageElement):
531 Beautiful Soup will use the prefixes it encountered while
532 parsing the document.
533
534- :param kwargs: Keyword arguments to be passed into SoupSieve's
535+ :param kwargs: Keyword arguments to be passed into Soup Sieve's
536 soupsieve.select() method.
537
538 :return: A Tag.
539 :rtype: bs4.element.Tag
540 """
541- value = self.select(selector, namespaces, 1, **kwargs)
542- if value:
543- return value[0]
544- return None
545+ return self.css.select_one(selector, namespaces, **kwargs)
546
547 def select(self, selector, namespaces=None, limit=None, **kwargs):
548 """Perform a CSS selection operation on the current element.
549@@ -1973,27 +1964,18 @@ class Tag(PageElement):
550
551 :param limit: After finding this number of results, stop looking.
552
553- :param kwargs: Keyword arguments to be passed into SoupSieve's
554+ :param kwargs: Keyword arguments to be passed into SoupSieve's
555 soupsieve.select() method.
556
557 :return: A ResultSet of Tags.
558 :rtype: bs4.element.ResultSet
559 """
560- if namespaces is None:
561- namespaces = self._namespaces
562-
563- if limit is None:
564- limit = 0
565- if soupsieve is None:
566- raise NotImplementedError(
567- "Cannot execute CSS selectors because the soupsieve package is not installed."
568- )
569-
570- results = soupsieve.select(selector, self, namespaces, limit, **kwargs)
571+ return self.css.select(selector, namespaces, limit, **kwargs)
572
573- # We do this because it's more consistent and because
574- # ResultSet.__getattr__ has a helpful error message.
575- return ResultSet(None, results)
576+ @property
577+ def css(self):
578+ """Return an interface to the CSS selector API."""
579+ return CSS(self)
580
581 # Old names for backwards compatibility
582 def childGenerator(self):
583@@ -2038,7 +2020,7 @@ class SoupStrainer(object):
584 :param attrs: A dictionary of filters on attribute values.
585 :param string: A filter for a NavigableString with specific text.
586 :kwargs: A dictionary of filters on attribute values.
587- """
588+ """
589 if string is None and 'text' in kwargs:
590 string = kwargs.pop('text')
591 warnings.warn(
592@@ -2137,7 +2119,7 @@ class SoupStrainer(object):
593 # looking at a tag with a different name.
594 if markup and not markup.prefix and self.name != markup.name:
595 return False
596-
597+
598 call_function_with_tag_data = (
599 isinstance(self.name, Callable)
600 and not isinstance(markup_name, Tag))
601@@ -2223,7 +2205,7 @@ class SoupStrainer(object):
602 if self._matches(' '.join(markup), match_against):
603 return True
604 return False
605-
606+
607 if match_against is True:
608 # True matches any non-None value.
609 return markup is not None
610@@ -2267,11 +2249,11 @@ class SoupStrainer(object):
611 return True
612 else:
613 return False
614-
615+
616 # Beyond this point we might need to run the test twice: once against
617 # the tag's name and once against its prefixed name.
618 match = False
619-
620+
621 if not match and isinstance(match_against, str):
622 # Exact string match
623 match = markup == match_against
624diff --git a/bs4/tests/test_css.py b/bs4/tests/test_css.py
625new file mode 100644
626index 0000000..cf73831
627--- /dev/null
628+++ b/bs4/tests/test_css.py
629@@ -0,0 +1,493 @@
630+import pytest
631+import types
632+from unittest.mock import MagicMock
633+
634+from bs4 import (
635+ CSS,
636+ BeautifulSoup,
637+ ResultSet,
638+)
639+
640+from . import (
641+ SoupTest,
642+ SOUP_SIEVE_PRESENT,
643+)
644+
645+if SOUP_SIEVE_PRESENT:
646+ from soupsieve import SelectorSyntaxError
647+
648+
649+@pytest.mark.skipif(not SOUP_SIEVE_PRESENT, reason="Soup Sieve not installed")
650+class TestCSSSelectors(SoupTest):
651+ """Test basic CSS selector functionality.
652+
653+ This functionality is implemented in soupsieve, which has a much
654+ more comprehensive test suite, so this is basically an extra check
655+ that soupsieve works as expected.
656+ """
657+
658+ HTML = """
659+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
660+"http://www.w3.org/TR/html4/strict.dtd">
661+<html>
662+<head>
663+<title>The title</title>
664+<link rel="stylesheet" href="blah.css" type="text/css" id="l1">
665+</head>
666+<body>
667+<custom-dashed-tag class="dashed" id="dash1">Hello there.</custom-dashed-tag>
668+<div id="main" class="fancy">
669+<div id="inner">
670+<h1 id="header1">An H1</h1>
671+<p>Some text</p>
672+<p class="onep" id="p1">Some more text</p>
673+<h2 id="header2">An H2</h2>
674+<p class="class1 class2 class3" id="pmulti">Another</p>
675+<a href="http://bob.example.org/" rel="friend met" id="bob">Bob</a>
676+<h2 id="header3">Another H2</h2>
677+<a id="me" href="http://simonwillison.net/" rel="me">me</a>
678+<span class="s1">
679+<a href="#" id="s1a1">span1a1</a>
680+<a href="#" id="s1a2">span1a2 <span id="s1a2s1">test</span></a>
681+<span class="span2">
682+<a href="#" id="s2a1">span2a1</a>
683+</span>
684+<span class="span3"></span>
685+<custom-dashed-tag class="dashed" id="dash2"/>
686+<div data-tag="dashedvalue" id="data1"/>
687+</span>
688+</div>
689+<x id="xid">
690+<z id="zida"/>
691+<z id="zidab"/>
692+<z id="zidac"/>
693+</x>
694+<y id="yid">
695+<z id="zidb"/>
696+</y>
697+<p lang="en" id="lang-en">English</p>
698+<p lang="en-gb" id="lang-en-gb">English UK</p>
699+<p lang="en-us" id="lang-en-us">English US</p>
700+<p lang="fr" id="lang-fr">French</p>
701+</div>
702+
703+<div id="footer">
704+</div>
705+"""
706+
707+ def setup_method(self):
708+ self.soup = BeautifulSoup(self.HTML, 'html.parser')
709+
710+ def assert_selects(self, selector, expected_ids, **kwargs):
711+ results = self.soup.select(selector, **kwargs)
712+ assert isinstance(results, ResultSet)
713+ el_ids = [el['id'] for el in results]
714+ el_ids.sort()
715+ expected_ids.sort()
716+ assert expected_ids == el_ids, "Selector %s, expected [%s], got [%s]" % (
717+ selector, ', '.join(expected_ids), ', '.join(el_ids)
718+ )
719+
720+ assertSelect = assert_selects
721+
722+ def assert_select_multiple(self, *tests):
723+ for selector, expected_ids in tests:
724+ self.assert_selects(selector, expected_ids)
725+
726+ def test_one_tag_one(self):
727+ els = self.soup.select('title')
728+ assert len(els) == 1
729+ assert els[0].name == 'title'
730+ assert els[0].contents == ['The title']
731+
732+ def test_one_tag_many(self):
733+ els = self.soup.select('div')
734+ assert len(els) == 4
735+ for div in els:
736+ assert div.name == 'div'
737+
738+ el = self.soup.select_one('div')
739+ assert 'main' == el['id']
740+
741+ def test_select_one_returns_none_if_no_match(self):
742+ match = self.soup.select_one('nonexistenttag')
743+ assert None == match
744+
745+
746+ def test_tag_in_tag_one(self):
747+ els = self.soup.select('div div')
748+ self.assert_selects('div div', ['inner', 'data1'])
749+
750+ def test_tag_in_tag_many(self):
751+ for selector in ('html div', 'html body div', 'body div'):
752+ self.assert_selects(selector, ['data1', 'main', 'inner', 'footer'])
753+
754+
755+ def test_limit(self):
756+ self.assert_selects('html div', ['main'], limit=1)
757+ self.assert_selects('html body div', ['inner', 'main'], limit=2)
758+ self.assert_selects('body div', ['data1', 'main', 'inner', 'footer'],
759+ limit=10)
760+
761+ def test_tag_no_match(self):
762+ assert len(self.soup.select('del')) == 0
763+
764+ def test_invalid_tag(self):
765+ with pytest.raises(SelectorSyntaxError):
766+ self.soup.select('tag%t')
767+
768+ def test_select_dashed_tag_ids(self):
769+ self.assert_selects('custom-dashed-tag', ['dash1', 'dash2'])
770+
771+ def test_select_dashed_by_id(self):
772+ dashed = self.soup.select('custom-dashed-tag[id=\"dash2\"]')
773+ assert dashed[0].name == 'custom-dashed-tag'
774+ assert dashed[0]['id'] == 'dash2'
775+
776+ def test_dashed_tag_text(self):
777+ assert self.soup.select('body > custom-dashed-tag')[0].text == 'Hello there.'
778+
779+ def test_select_dashed_matches_find_all(self):
780+ assert self.soup.select('custom-dashed-tag') == self.soup.find_all('custom-dashed-tag')
781+
782+ def test_header_tags(self):
783+ self.assert_select_multiple(
784+ ('h1', ['header1']),
785+ ('h2', ['header2', 'header3']),
786+ )
787+
788+ def test_class_one(self):
789+ for selector in ('.onep', 'p.onep', 'html p.onep'):
790+ els = self.soup.select(selector)
791+ assert len(els) == 1
792+ assert els[0].name == 'p'
793+ assert els[0]['class'] == ['onep']
794+
795+ def test_class_mismatched_tag(self):
796+ els = self.soup.select('div.onep')
797+ assert len(els) == 0
798+
799+ def test_one_id(self):
800+ for selector in ('div#inner', '#inner', 'div div#inner'):
801+ self.assert_selects(selector, ['inner'])
802+
803+ def test_bad_id(self):
804+ els = self.soup.select('#doesnotexist')
805+ assert len(els) == 0
806+
807+ def test_items_in_id(self):
808+ els = self.soup.select('div#inner p')
809+ assert len(els) == 3
810+ for el in els:
811+ assert el.name == 'p'
812+ assert els[1]['class'] == ['onep']
813+ assert not els[0].has_attr('class')
814+
815+ def test_a_bunch_of_emptys(self):
816+ for selector in ('div#main del', 'div#main div.oops', 'div div#main'):
817+ assert len(self.soup.select(selector)) == 0
818+
819+ def test_multi_class_support(self):
820+ for selector in ('.class1', 'p.class1', '.class2', 'p.class2',
821+ '.class3', 'p.class3', 'html p.class2', 'div#inner .class2'):
822+ self.assert_selects(selector, ['pmulti'])
823+
824+ def test_multi_class_selection(self):
825+ for selector in ('.class1.class3', '.class3.class2',
826+ '.class1.class2.class3'):
827+ self.assert_selects(selector, ['pmulti'])
828+
829+ def test_child_selector(self):
830+ self.assert_selects('.s1 > a', ['s1a1', 's1a2'])
831+ self.assert_selects('.s1 > a span', ['s1a2s1'])
832+
833+ def test_child_selector_id(self):
834+ self.assert_selects('.s1 > a#s1a2 span', ['s1a2s1'])
835+
836+ def test_attribute_equals(self):
837+ self.assert_select_multiple(
838+ ('p[class="onep"]', ['p1']),
839+ ('p[id="p1"]', ['p1']),
840+ ('[class="onep"]', ['p1']),
841+ ('[id="p1"]', ['p1']),
842+ ('link[rel="stylesheet"]', ['l1']),
843+ ('link[type="text/css"]', ['l1']),
844+ ('link[href="blah.css"]', ['l1']),
845+ ('link[href="no-blah.css"]', []),
846+ ('[rel="stylesheet"]', ['l1']),
847+ ('[type="text/css"]', ['l1']),
848+ ('[href="blah.css"]', ['l1']),
849+ ('[href="no-blah.css"]', []),
850+ ('p[href="no-blah.css"]', []),
851+ ('[href="no-blah.css"]', []),
852+ )
853+
854+ def test_attribute_tilde(self):
855+ self.assert_select_multiple(
856+ ('p[class~="class1"]', ['pmulti']),
857+ ('p[class~="class2"]', ['pmulti']),
858+ ('p[class~="class3"]', ['pmulti']),
859+ ('[class~="class1"]', ['pmulti']),
860+ ('[class~="class2"]', ['pmulti']),
861+ ('[class~="class3"]', ['pmulti']),
862+ ('a[rel~="friend"]', ['bob']),
863+ ('a[rel~="met"]', ['bob']),
864+ ('[rel~="friend"]', ['bob']),
865+ ('[rel~="met"]', ['bob']),
866+ )
867+
868+ def test_attribute_startswith(self):
869+ self.assert_select_multiple(
870+ ('[rel^="style"]', ['l1']),
871+ ('link[rel^="style"]', ['l1']),
872+ ('notlink[rel^="notstyle"]', []),
873+ ('[rel^="notstyle"]', []),
874+ ('link[rel^="notstyle"]', []),
875+ ('link[href^="bla"]', ['l1']),
876+ ('a[href^="http://"]', ['bob', 'me']),
877+ ('[href^="http://"]', ['bob', 'me']),
878+ ('[id^="p"]', ['pmulti', 'p1']),
879+ ('[id^="m"]', ['me', 'main']),
880+ ('div[id^="m"]', ['main']),
881+ ('a[id^="m"]', ['me']),
882+ ('div[data-tag^="dashed"]', ['data1'])
883+ )
884+
885+ def test_attribute_endswith(self):
886+ self.assert_select_multiple(
887+ ('[href$=".css"]', ['l1']),
888+ ('link[href$=".css"]', ['l1']),
889+ ('link[id$="1"]', ['l1']),
890+ ('[id$="1"]', ['data1', 'l1', 'p1', 'header1', 's1a1', 's2a1', 's1a2s1', 'dash1']),
891+ ('div[id$="1"]', ['data1']),
892+ ('[id$="noending"]', []),
893+ )
894+
895+ def test_attribute_contains(self):
896+ self.assert_select_multiple(
897+ # From test_attribute_startswith
898+ ('[rel*="style"]', ['l1']),
899+ ('link[rel*="style"]', ['l1']),
900+ ('notlink[rel*="notstyle"]', []),
901+ ('[rel*="notstyle"]', []),
902+ ('link[rel*="notstyle"]', []),
903+ ('link[href*="bla"]', ['l1']),
904+ ('[href*="http://"]', ['bob', 'me']),
905+ ('[id*="p"]', ['pmulti', 'p1']),
906+ ('div[id*="m"]', ['main']),
907+ ('a[id*="m"]', ['me']),
908+ # From test_attribute_endswith
909+ ('[href*=".css"]', ['l1']),
910+ ('link[href*=".css"]', ['l1']),
911+ ('link[id*="1"]', ['l1']),
912+ ('[id*="1"]', ['data1', 'l1', 'p1', 'header1', 's1a1', 's1a2', 's2a1', 's1a2s1', 'dash1']),
913+ ('div[id*="1"]', ['data1']),
914+ ('[id*="noending"]', []),
915+ # New for this test
916+ ('[href*="."]', ['bob', 'me', 'l1']),
917+ ('a[href*="."]', ['bob', 'me']),
918+ ('link[href*="."]', ['l1']),
919+ ('div[id*="n"]', ['main', 'inner']),
920+ ('div[id*="nn"]', ['inner']),
921+ ('div[data-tag*="edval"]', ['data1'])
922+ )
923+
924+ def test_attribute_exact_or_hypen(self):
925+ self.assert_select_multiple(
926+ ('p[lang|="en"]', ['lang-en', 'lang-en-gb', 'lang-en-us']),
927+ ('[lang|="en"]', ['lang-en', 'lang-en-gb', 'lang-en-us']),
928+ ('p[lang|="fr"]', ['lang-fr']),
929+ ('p[lang|="gb"]', []),
930+ )
931+
932+ def test_attribute_exists(self):
933+ self.assert_select_multiple(
934+ ('[rel]', ['l1', 'bob', 'me']),
935+ ('link[rel]', ['l1']),
936+ ('a[rel]', ['bob', 'me']),
937+ ('[lang]', ['lang-en', 'lang-en-gb', 'lang-en-us', 'lang-fr']),
938+ ('p[class]', ['p1', 'pmulti']),
939+ ('[blah]', []),
940+ ('p[blah]', []),
941+ ('div[data-tag]', ['data1'])
942+ )
943+
944+ def test_quoted_space_in_selector_name(self):
945+ html = """<div style="display: wrong">nope</div>
946+ <div style="display: right">yes</div>
947+ """
948+ soup = BeautifulSoup(html, 'html.parser')
949+ [chosen] = soup.select('div[style="display: right"]')
950+ assert "yes" == chosen.string
951+
952+ def test_unsupported_pseudoclass(self):
953+ with pytest.raises(NotImplementedError):
954+ self.soup.select("a:no-such-pseudoclass")
955+
956+ with pytest.raises(SelectorSyntaxError):
957+ self.soup.select("a:nth-of-type(a)")
958+
959+ def test_nth_of_type(self):
960+ # Try to select first paragraph
961+ els = self.soup.select('div#inner p:nth-of-type(1)')
962+ assert len(els) == 1
963+ assert els[0].string == 'Some text'
964+
965+ # Try to select third paragraph
966+ els = self.soup.select('div#inner p:nth-of-type(3)')
967+ assert len(els) == 1
968+ assert els[0].string == 'Another'
969+
970+ # Try to select (non-existent!) fourth paragraph
971+ els = self.soup.select('div#inner p:nth-of-type(4)')
972+ assert len(els) == 0
973+
974+ # Zero will select no tags.
975+ els = self.soup.select('div p:nth-of-type(0)')
976+ assert len(els) == 0
977+
978+ def test_nth_of_type_direct_descendant(self):
979+ els = self.soup.select('div#inner > p:nth-of-type(1)')
980+ assert len(els) == 1
981+ assert els[0].string == 'Some text'
982+
983+ def test_id_child_selector_nth_of_type(self):
984+ self.assert_selects('#inner > p:nth-of-type(2)', ['p1'])
985+
986+ def test_select_on_element(self):
987+ # Other tests operate on the tree; this operates on an element
988+ # within the tree.
989+ inner = self.soup.find("div", id="main")
990+ selected = inner.select("div")
991+ # The <div id="inner"> tag was selected. The <div id="footer">
992+ # tag was not.
993+ self.assert_selects_ids(selected, ['inner', 'data1'])
994+
995+ def test_overspecified_child_id(self):
996+ self.assert_selects(".fancy #inner", ['inner'])
997+ self.assert_selects(".normal #inner", [])
998+
999+ def test_adjacent_sibling_selector(self):
1000+ self.assert_selects('#p1 + h2', ['header2'])
1001+ self.assert_selects('#p1 + h2 + p', ['pmulti'])
1002+ self.assert_selects('#p1 + #header2 + .class1', ['pmulti'])
1003+ assert [] == self.soup.select('#p1 + p')
1004+
1005+ def test_general_sibling_selector(self):
1006+ self.assert_selects('#p1 ~ h2', ['header2', 'header3'])
1007+ self.assert_selects('#p1 ~ #header2', ['header2'])
1008+ self.assert_selects('#p1 ~ h2 + a', ['me'])
1009+ self.assert_selects('#p1 ~ h2 + [rel="me"]', ['me'])
1010+ assert [] == self.soup.select('#inner ~ h2')
1011+
1012+ def test_dangling_combinator(self):
1013+ with pytest.raises(SelectorSyntaxError):
1014+ self.soup.select('h1 >')
1015+
1016+ def test_sibling_combinator_wont_select_same_tag_twice(self):
1017+ self.assert_selects('p[lang] ~ p', ['lang-en-gb', 'lang-en-us', 'lang-fr'])
1018+
1019+ # Test the selector grouping operator (the comma)
1020+ def test_multiple_select(self):
1021+ self.assert_selects('x, y', ['xid', 'yid'])
1022+
1023+ def test_multiple_select_with_no_space(self):
1024+ self.assert_selects('x,y', ['xid', 'yid'])
1025+
1026+ def test_multiple_select_with_more_space(self):
1027+ self.assert_selects('x, y', ['xid', 'yid'])
1028+
1029+ def test_multiple_select_duplicated(self):
1030+ self.assert_selects('x, x', ['xid'])
1031+
1032+ def test_multiple_select_sibling(self):
1033+ self.assert_selects('x, y ~ p[lang=fr]', ['xid', 'lang-fr'])
1034+
1035+ def test_multiple_select_tag_and_direct_descendant(self):
1036+ self.assert_selects('x, y > z', ['xid', 'zidb'])
1037+
1038+ def test_multiple_select_direct_descendant_and_tags(self):
1039+ self.assert_selects('div > x, y, z', ['xid', 'yid', 'zida', 'zidb', 'zidab', 'zidac'])
1040+
1041+ def test_multiple_select_indirect_descendant(self):
1042+ self.assert_selects('div x,y, z', ['xid', 'yid', 'zida', 'zidb', 'zidab', 'zidac'])
1043+
1044+ def test_invalid_multiple_select(self):
1045+ with pytest.raises(SelectorSyntaxError):
1046+ self.soup.select(',x, y')
1047+ with pytest.raises(SelectorSyntaxError):
1048+ self.soup.select('x,,y')
1049+
1050+ def test_multiple_select_attrs(self):
1051+ self.assert_selects('p[lang=en], p[lang=en-gb]', ['lang-en', 'lang-en-gb'])
1052+
1053+ def test_multiple_select_ids(self):
1054+ self.assert_selects('x, y > z[id=zida], z[id=zidab], z[id=zidb]', ['xid', 'zidb', 'zidab'])
1055+
1056+ def test_multiple_select_nested(self):
1057+ self.assert_selects('body > div > x, y > z', ['xid', 'zidb'])
1058+
1059+ def test_select_duplicate_elements(self):
1060+ # When markup contains duplicate elements, a multiple select
1061+ # will find all of them.
1062+ markup = '<div class="c1"/><div class="c2"/><div class="c1"/>'
1063+ soup = BeautifulSoup(markup, 'html.parser')
1064+ selected = soup.select(".c1, .c2")
1065+ assert 3 == len(selected)
1066+
1067+ # Verify that find_all finds the same elements, though because
1068+ # of an implementation detail it finds them in a different
1069+ # order.
1070+ for element in soup.find_all(class_=['c1', 'c2']):
1071+ assert element in selected
1072+
1073+ def test_closest(self):
1074+ inner = self.soup.find("div", id="inner")
1075+ closest = inner.css.closest("div[id=main]")
1076+ assert closest == self.soup.find("div", id="main")
1077+
1078+ def test_match(self):
1079+ inner = self.soup.find("div", id="inner")
1080+ main = self.soup.find("div", id="main")
1081+ assert inner.css.match("div[id=main]") == False
1082+ assert main.css.match("div[id=main]") == True
1083+
1084+ def test_iselect(self):
1085+ gen = self.soup.css.iselect("h2")
1086+ assert isinstance(gen, types.GeneratorType)
1087+ [header2, header3] = gen
1088+ assert header2['id'] == 'header2'
1089+ assert header3['id'] == 'header3'
1090+
1091+ def test_filter(self):
1092+ inner = self.soup.find("div", id="inner")
1093+ results = inner.css.filter("h2")
1094+ assert len(inner.css.filter("h2")) == 2
1095+
1096+ results = inner.css.filter("h2[id=header3]")
1097+ assert isinstance(results, ResultSet)
1098+ [result] = results
1099+ assert result['id'] == 'header3'
1100+
1101+ def test_escape(self):
1102+ m = self.soup.css.escape
1103+ assert m(".foo#bar") == '\\.foo\\#bar'
1104+ assert m("()[]{}") == '\\(\\)\\[\\]\\{\\}'
1105+ assert m(".foo") == self.soup.css.escape(".foo")
1106+
1107+ def test_api_replacement(self):
1108+ # You can pass in another object to act as a drop-in
1109+ # replacement for the soupsieve module.
1110+ class Mock():
1111+ attribute = "value"
1112+ pass
1113+ mock_soupsieve = Mock()
1114+ mock_soupsieve.escape = MagicMock()
1115+
1116+ # If an unknown method turns out to be present in Soup Sieve,
1117+ # we may still be able to call it.
1118+ css = CSS(self.soup, api=mock_soupsieve)
1119+ css.escape("identifier")
1120+ mock_soupsieve.escape.assert_called_with(
1121+ "selector", self.soup, 1, flags=0
1122+ )
1123diff --git a/bs4/tests/test_pageelement.py b/bs4/tests/test_pageelement.py
1124index 6674dad..a94280f 100644
1125--- a/bs4/tests/test_pageelement.py
1126+++ b/bs4/tests/test_pageelement.py
1127@@ -6,16 +6,13 @@ import pytest
1128 from bs4 import BeautifulSoup
1129 from bs4.element import (
1130 Comment,
1131+ ResultSet,
1132 SoupStrainer,
1133 )
1134 from . import (
1135 SoupTest,
1136- SOUP_SIEVE_PRESENT,
1137 )
1138
1139-if SOUP_SIEVE_PRESENT:
1140- from soupsieve import SelectorSyntaxError
1141-
1142 class TestEncoding(SoupTest):
1143 """Test the ability to encode objects into strings."""
1144
1145@@ -216,429 +213,6 @@ class TestFormatters(SoupTest):
1146 assert soup.contents[0].name == 'pre'
1147
1148
1149-@pytest.mark.skipif(not SOUP_SIEVE_PRESENT, reason="Soup Sieve not installed")
1150-class TestCSSSelectors(SoupTest):
1151- """Test basic CSS selector functionality.
1152-
1153- This functionality is implemented in soupsieve, which has a much
1154- more comprehensive test suite, so this is basically an extra check
1155- that soupsieve works as expected.
1156- """
1157-
1158- HTML = """
1159-<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
1160-"http://www.w3.org/TR/html4/strict.dtd">
1161-<html>
1162-<head>
1163-<title>The title</title>
1164-<link rel="stylesheet" href="blah.css" type="text/css" id="l1">
1165-</head>
1166-<body>
1167-<custom-dashed-tag class="dashed" id="dash1">Hello there.</custom-dashed-tag>
1168-<div id="main" class="fancy">
1169-<div id="inner">
1170-<h1 id="header1">An H1</h1>
1171-<p>Some text</p>
1172-<p class="onep" id="p1">Some more text</p>
1173-<h2 id="header2">An H2</h2>
1174-<p class="class1 class2 class3" id="pmulti">Another</p>
1175-<a href="http://bob.example.org/" rel="friend met" id="bob">Bob</a>
1176-<h2 id="header3">Another H2</h2>
1177-<a id="me" href="http://simonwillison.net/" rel="me">me</a>
1178-<span class="s1">
1179-<a href="#" id="s1a1">span1a1</a>
1180-<a href="#" id="s1a2">span1a2 <span id="s1a2s1">test</span></a>
1181-<span class="span2">
1182-<a href="#" id="s2a1">span2a1</a>
1183-</span>
1184-<span class="span3"></span>
1185-<custom-dashed-tag class="dashed" id="dash2"/>
1186-<div data-tag="dashedvalue" id="data1"/>
1187-</span>
1188-</div>
1189-<x id="xid">
1190-<z id="zida"/>
1191-<z id="zidab"/>
1192-<z id="zidac"/>
1193-</x>
1194-<y id="yid">
1195-<z id="zidb"/>
1196-</y>
1197-<p lang="en" id="lang-en">English</p>
1198-<p lang="en-gb" id="lang-en-gb">English UK</p>
1199-<p lang="en-us" id="lang-en-us">English US</p>
1200-<p lang="fr" id="lang-fr">French</p>
1201-</div>
1202-
1203-<div id="footer">
1204-</div>
1205-"""
1206-
1207- def setup_method(self):
1208- self.soup = BeautifulSoup(self.HTML, 'html.parser')
1209-
1210- def assert_selects(self, selector, expected_ids, **kwargs):
1211- el_ids = [el['id'] for el in self.soup.select(selector, **kwargs)]
1212- el_ids.sort()
1213- expected_ids.sort()
1214- assert expected_ids == el_ids, "Selector %s, expected [%s], got [%s]" % (
1215- selector, ', '.join(expected_ids), ', '.join(el_ids)
1216- )
1217-
1218- assertSelect = assert_selects
1219-
1220- def assert_select_multiple(self, *tests):
1221- for selector, expected_ids in tests:
1222- self.assert_selects(selector, expected_ids)
1223-
1224- def test_one_tag_one(self):
1225- els = self.soup.select('title')
1226- assert len(els) == 1
1227- assert els[0].name == 'title'
1228- assert els[0].contents == ['The title']
1229-
1230- def test_one_tag_many(self):
1231- els = self.soup.select('div')
1232- assert len(els) == 4
1233- for div in els:
1234- assert div.name == 'div'
1235-
1236- el = self.soup.select_one('div')
1237- assert 'main' == el['id']
1238-
1239- def test_select_one_returns_none_if_no_match(self):
1240- match = self.soup.select_one('nonexistenttag')
1241- assert None == match
1242-
1243-
1244- def test_tag_in_tag_one(self):
1245- els = self.soup.select('div div')
1246- self.assert_selects('div div', ['inner', 'data1'])
1247-
1248- def test_tag_in_tag_many(self):
1249- for selector in ('html div', 'html body div', 'body div'):
1250- self.assert_selects(selector, ['data1', 'main', 'inner', 'footer'])
1251-
1252-
1253- def test_limit(self):
1254- self.assert_selects('html div', ['main'], limit=1)
1255- self.assert_selects('html body div', ['inner', 'main'], limit=2)
1256- self.assert_selects('body div', ['data1', 'main', 'inner', 'footer'],
1257- limit=10)
1258-
1259- def test_tag_no_match(self):
1260- assert len(self.soup.select('del')) == 0
1261-
1262- def test_invalid_tag(self):
1263- with pytest.raises(SelectorSyntaxError):
1264- self.soup.select('tag%t')
1265-
1266- def test_select_dashed_tag_ids(self):
1267- self.assert_selects('custom-dashed-tag', ['dash1', 'dash2'])
1268-
1269- def test_select_dashed_by_id(self):
1270- dashed = self.soup.select('custom-dashed-tag[id=\"dash2\"]')
1271- assert dashed[0].name == 'custom-dashed-tag'
1272- assert dashed[0]['id'] == 'dash2'
1273-
1274- def test_dashed_tag_text(self):
1275- assert self.soup.select('body > custom-dashed-tag')[0].text == 'Hello there.'
1276-
1277- def test_select_dashed_matches_find_all(self):
1278- assert self.soup.select('custom-dashed-tag') == self.soup.find_all('custom-dashed-tag')
1279-
1280- def test_header_tags(self):
1281- self.assert_select_multiple(
1282- ('h1', ['header1']),
1283- ('h2', ['header2', 'header3']),
1284- )
1285-
1286- def test_class_one(self):
1287- for selector in ('.onep', 'p.onep', 'html p.onep'):
1288- els = self.soup.select(selector)
1289- assert len(els) == 1
1290- assert els[0].name == 'p'
1291- assert els[0]['class'] == ['onep']
1292-
1293- def test_class_mismatched_tag(self):
1294- els = self.soup.select('div.onep')
1295- assert len(els) == 0
1296-
1297- def test_one_id(self):
1298- for selector in ('div#inner', '#inner', 'div div#inner'):
1299- self.assert_selects(selector, ['inner'])
1300-
1301- def test_bad_id(self):
1302- els = self.soup.select('#doesnotexist')
1303- assert len(els) == 0
1304-
1305- def test_items_in_id(self):
1306- els = self.soup.select('div#inner p')
1307- assert len(els) == 3
1308- for el in els:
1309- assert el.name == 'p'
1310- assert els[1]['class'] == ['onep']
1311- assert not els[0].has_attr('class')
1312-
1313- def test_a_bunch_of_emptys(self):
1314- for selector in ('div#main del', 'div#main div.oops', 'div div#main'):
1315- assert len(self.soup.select(selector)) == 0
1316-
1317- def test_multi_class_support(self):
1318- for selector in ('.class1', 'p.class1', '.class2', 'p.class2',
1319- '.class3', 'p.class3', 'html p.class2', 'div#inner .class2'):
1320- self.assert_selects(selector, ['pmulti'])
1321-
1322- def test_multi_class_selection(self):
1323- for selector in ('.class1.class3', '.class3.class2',
1324- '.class1.class2.class3'):
1325- self.assert_selects(selector, ['pmulti'])
1326-
1327- def test_child_selector(self):
1328- self.assert_selects('.s1 > a', ['s1a1', 's1a2'])
1329- self.assert_selects('.s1 > a span', ['s1a2s1'])
1330-
1331- def test_child_selector_id(self):
1332- self.assert_selects('.s1 > a#s1a2 span', ['s1a2s1'])
1333-
1334- def test_attribute_equals(self):
1335- self.assert_select_multiple(
1336- ('p[class="onep"]', ['p1']),
1337- ('p[id="p1"]', ['p1']),
1338- ('[class="onep"]', ['p1']),
1339- ('[id="p1"]', ['p1']),
1340- ('link[rel="stylesheet"]', ['l1']),
1341- ('link[type="text/css"]', ['l1']),
1342- ('link[href="blah.css"]', ['l1']),
1343- ('link[href="no-blah.css"]', []),
1344- ('[rel="stylesheet"]', ['l1']),
1345- ('[type="text/css"]', ['l1']),
1346- ('[href="blah.css"]', ['l1']),
1347- ('[href="no-blah.css"]', []),
1348- ('p[href="no-blah.css"]', []),
1349- ('[href="no-blah.css"]', []),
1350- )
1351-
1352- def test_attribute_tilde(self):
1353- self.assert_select_multiple(
1354- ('p[class~="class1"]', ['pmulti']),
1355- ('p[class~="class2"]', ['pmulti']),
1356- ('p[class~="class3"]', ['pmulti']),
1357- ('[class~="class1"]', ['pmulti']),
1358- ('[class~="class2"]', ['pmulti']),
1359- ('[class~="class3"]', ['pmulti']),
1360- ('a[rel~="friend"]', ['bob']),
1361- ('a[rel~="met"]', ['bob']),
1362- ('[rel~="friend"]', ['bob']),
1363- ('[rel~="met"]', ['bob']),
1364- )
1365-
1366- def test_attribute_startswith(self):
1367- self.assert_select_multiple(
1368- ('[rel^="style"]', ['l1']),
1369- ('link[rel^="style"]', ['l1']),
1370- ('notlink[rel^="notstyle"]', []),
1371- ('[rel^="notstyle"]', []),
1372- ('link[rel^="notstyle"]', []),
1373- ('link[href^="bla"]', ['l1']),
1374- ('a[href^="http://"]', ['bob', 'me']),
1375- ('[href^="http://"]', ['bob', 'me']),
1376- ('[id^="p"]', ['pmulti', 'p1']),
1377- ('[id^="m"]', ['me', 'main']),
1378- ('div[id^="m"]', ['main']),
1379- ('a[id^="m"]', ['me']),
1380- ('div[data-tag^="dashed"]', ['data1'])
1381- )
1382-
1383- def test_attribute_endswith(self):
1384- self.assert_select_multiple(
1385- ('[href$=".css"]', ['l1']),
1386- ('link[href$=".css"]', ['l1']),
1387- ('link[id$="1"]', ['l1']),
1388- ('[id$="1"]', ['data1', 'l1', 'p1', 'header1', 's1a1', 's2a1', 's1a2s1', 'dash1']),
1389- ('div[id$="1"]', ['data1']),
1390- ('[id$="noending"]', []),
1391- )
1392-
1393- def test_attribute_contains(self):
1394- self.assert_select_multiple(
1395- # From test_attribute_startswith
1396- ('[rel*="style"]', ['l1']),
1397- ('link[rel*="style"]', ['l1']),
1398- ('notlink[rel*="notstyle"]', []),
1399- ('[rel*="notstyle"]', []),
1400- ('link[rel*="notstyle"]', []),
1401- ('link[href*="bla"]', ['l1']),
1402- ('[href*="http://"]', ['bob', 'me']),
1403- ('[id*="p"]', ['pmulti', 'p1']),
1404- ('div[id*="m"]', ['main']),
1405- ('a[id*="m"]', ['me']),
1406- # From test_attribute_endswith
1407- ('[href*=".css"]', ['l1']),
1408- ('link[href*=".css"]', ['l1']),
1409- ('link[id*="1"]', ['l1']),
1410- ('[id*="1"]', ['data1', 'l1', 'p1', 'header1', 's1a1', 's1a2', 's2a1', 's1a2s1', 'dash1']),
1411- ('div[id*="1"]', ['data1']),
1412- ('[id*="noending"]', []),
1413- # New for this test
1414- ('[href*="."]', ['bob', 'me', 'l1']),
1415- ('a[href*="."]', ['bob', 'me']),
1416- ('link[href*="."]', ['l1']),
1417- ('div[id*="n"]', ['main', 'inner']),
1418- ('div[id*="nn"]', ['inner']),
1419- ('div[data-tag*="edval"]', ['data1'])
1420- )
1421-
1422- def test_attribute_exact_or_hypen(self):
1423- self.assert_select_multiple(
1424- ('p[lang|="en"]', ['lang-en', 'lang-en-gb', 'lang-en-us']),
1425- ('[lang|="en"]', ['lang-en', 'lang-en-gb', 'lang-en-us']),
1426- ('p[lang|="fr"]', ['lang-fr']),
1427- ('p[lang|="gb"]', []),
1428- )
1429-
1430- def test_attribute_exists(self):
1431- self.assert_select_multiple(
1432- ('[rel]', ['l1', 'bob', 'me']),
1433- ('link[rel]', ['l1']),
1434- ('a[rel]', ['bob', 'me']),
1435- ('[lang]', ['lang-en', 'lang-en-gb', 'lang-en-us', 'lang-fr']),
1436- ('p[class]', ['p1', 'pmulti']),
1437- ('[blah]', []),
1438- ('p[blah]', []),
1439- ('div[data-tag]', ['data1'])
1440- )
1441-
1442- def test_quoted_space_in_selector_name(self):
1443- html = """<div style="display: wrong">nope</div>
1444- <div style="display: right">yes</div>
1445- """
1446- soup = BeautifulSoup(html, 'html.parser')
1447- [chosen] = soup.select('div[style="display: right"]')
1448- assert "yes" == chosen.string
1449-
1450- def test_unsupported_pseudoclass(self):
1451- with pytest.raises(NotImplementedError):
1452- self.soup.select("a:no-such-pseudoclass")
1453-
1454- with pytest.raises(SelectorSyntaxError):
1455- self.soup.select("a:nth-of-type(a)")
1456-
1457- def test_nth_of_type(self):
1458- # Try to select first paragraph
1459- els = self.soup.select('div#inner p:nth-of-type(1)')
1460- assert len(els) == 1
1461- assert els[0].string == 'Some text'
1462-
1463- # Try to select third paragraph
1464- els = self.soup.select('div#inner p:nth-of-type(3)')
1465- assert len(els) == 1
1466- assert els[0].string == 'Another'
1467-
1468- # Try to select (non-existent!) fourth paragraph
1469- els = self.soup.select('div#inner p:nth-of-type(4)')
1470- assert len(els) == 0
1471-
1472- # Zero will select no tags.
1473- els = self.soup.select('div p:nth-of-type(0)')
1474- assert len(els) == 0
1475-
1476- def test_nth_of_type_direct_descendant(self):
1477- els = self.soup.select('div#inner > p:nth-of-type(1)')
1478- assert len(els) == 1
1479- assert els[0].string == 'Some text'
1480-
1481- def test_id_child_selector_nth_of_type(self):
1482- self.assert_selects('#inner > p:nth-of-type(2)', ['p1'])
1483-
1484- def test_select_on_element(self):
1485- # Other tests operate on the tree; this operates on an element
1486- # within the tree.
1487- inner = self.soup.find("div", id="main")
1488- selected = inner.select("div")
1489- # The <div id="inner"> tag was selected. The <div id="footer">
1490- # tag was not.
1491- self.assert_selects_ids(selected, ['inner', 'data1'])
1492-
1493- def test_overspecified_child_id(self):
1494- self.assert_selects(".fancy #inner", ['inner'])
1495- self.assert_selects(".normal #inner", [])
1496-
1497- def test_adjacent_sibling_selector(self):
1498- self.assert_selects('#p1 + h2', ['header2'])
1499- self.assert_selects('#p1 + h2 + p', ['pmulti'])
1500- self.assert_selects('#p1 + #header2 + .class1', ['pmulti'])
1501- assert [] == self.soup.select('#p1 + p')
1502-
1503- def test_general_sibling_selector(self):
1504- self.assert_selects('#p1 ~ h2', ['header2', 'header3'])
1505- self.assert_selects('#p1 ~ #header2', ['header2'])
1506- self.assert_selects('#p1 ~ h2 + a', ['me'])
1507- self.assert_selects('#p1 ~ h2 + [rel="me"]', ['me'])
1508- assert [] == self.soup.select('#inner ~ h2')
1509-
1510- def test_dangling_combinator(self):
1511- with pytest.raises(SelectorSyntaxError):
1512- self.soup.select('h1 >')
1513-
1514- def test_sibling_combinator_wont_select_same_tag_twice(self):
1515- self.assert_selects('p[lang] ~ p', ['lang-en-gb', 'lang-en-us', 'lang-fr'])
1516-
1517- # Test the selector grouping operator (the comma)
1518- def test_multiple_select(self):
1519- self.assert_selects('x, y', ['xid', 'yid'])
1520-
1521- def test_multiple_select_with_no_space(self):
1522- self.assert_selects('x,y', ['xid', 'yid'])
1523-
1524- def test_multiple_select_with_more_space(self):
1525- self.assert_selects('x, y', ['xid', 'yid'])
1526-
1527- def test_multiple_select_duplicated(self):
1528- self.assert_selects('x, x', ['xid'])
1529-
1530- def test_multiple_select_sibling(self):
1531- self.assert_selects('x, y ~ p[lang=fr]', ['xid', 'lang-fr'])
1532-
1533- def test_multiple_select_tag_and_direct_descendant(self):
1534- self.assert_selects('x, y > z', ['xid', 'zidb'])
1535-
1536- def test_multiple_select_direct_descendant_and_tags(self):
1537- self.assert_selects('div > x, y, z', ['xid', 'yid', 'zida', 'zidb', 'zidab', 'zidac'])
1538-
1539- def test_multiple_select_indirect_descendant(self):
1540- self.assert_selects('div x,y, z', ['xid', 'yid', 'zida', 'zidb', 'zidab', 'zidac'])
1541-
1542- def test_invalid_multiple_select(self):
1543- with pytest.raises(SelectorSyntaxError):
1544- self.soup.select(',x, y')
1545- with pytest.raises(SelectorSyntaxError):
1546- self.soup.select('x,,y')
1547-
1548- def test_multiple_select_attrs(self):
1549- self.assert_selects('p[lang=en], p[lang=en-gb]', ['lang-en', 'lang-en-gb'])
1550-
1551- def test_multiple_select_ids(self):
1552- self.assert_selects('x, y > z[id=zida], z[id=zidab], z[id=zidb]', ['xid', 'zidb', 'zidab'])
1553-
1554- def test_multiple_select_nested(self):
1555- self.assert_selects('body > div > x, y > z', ['xid', 'zidb'])
1556-
1557- def test_select_duplicate_elements(self):
1558- # When markup contains duplicate elements, a multiple select
1559- # will find all of them.
1560- markup = '<div class="c1"/><div class="c2"/><div class="c1"/>'
1561- soup = BeautifulSoup(markup, 'html.parser')
1562- selected = soup.select(".c1, .c2")
1563- assert 3 == len(selected)
1564-
1565- # Verify that find_all finds the same elements, though because
1566- # of an implementation detail it finds them in a different
1567- # order.
1568- for element in soup.find_all(class_=['c1', 'c2']):
1569- assert element in selected
1570-
1571-
1572 class TestPersistence(SoupTest):
1573 "Testing features like pickle and deepcopy."
1574
1575diff --git a/doc.ko/index.html b/doc.ko/index.html
1576index c474071..2f08f77 100644
1577--- a/doc.ko/index.html
1578+++ b/doc.ko/index.html
1579@@ -89,7 +89,7 @@
1580 <span class="c"># Lacie</span>
1581 <span class="c"># &lt;/a&gt;</span>
1582 <span class="c"># and</span>
1583-<span class="c"># &lt;a class="sister" href="http://example.com/tillie" id="link2"&gt;</span>
1584+<span class="c"># &lt;a class="sister" href="http://example.com/tillie" id="link3"&gt;</span>
1585 <span class="c"># Tillie</span>
1586 <span class="c"># &lt;/a&gt;</span>
1587 <span class="c"># ; and they lived at the bottom of a well.</span>
1588diff --git a/doc.ptbr/Makefile b/doc.ptbr/Makefile
1589new file mode 100644
1590index 0000000..8c833d2
1591--- /dev/null
1592+++ b/doc.ptbr/Makefile
1593@@ -0,0 +1,130 @@
1594+# Makefile for Sphinx documentation
1595+#
1596+
1597+# You can set these variables from the command line.
1598+SPHINXOPTS =
1599+SPHINXBUILD = sphinx-build
1600+PAPER =
1601+BUILDDIR = build
1602+
1603+# Internal variables.
1604+PAPEROPT_a4 = -D latex_paper_size=a4
1605+PAPEROPT_letter = -D latex_paper_size=letter
1606+ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) source
1607+
1608+.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest
1609+
1610+help:
1611+ @echo "Please use \`make <target>' where <target> is one of"
1612+ @echo " html to make standalone HTML files"
1613+ @echo " dirhtml to make HTML files named index.html in directories"
1614+ @echo " singlehtml to make a single large HTML file"
1615+ @echo " pickle to make pickle files"
1616+ @echo " json to make JSON files"
1617+ @echo " htmlhelp to make HTML files and a HTML help project"
1618+ @echo " qthelp to make HTML files and a qthelp project"
1619+ @echo " devhelp to make HTML files and a Devhelp project"
1620+ @echo " epub to make an epub"
1621+ @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
1622+ @echo " latexpdf to make LaTeX files and run them through pdflatex"
1623+ @echo " text to make text files"
1624+ @echo " man to make manual pages"
1625+ @echo " changes to make an overview of all changed/added/deprecated items"
1626+ @echo " linkcheck to check all external links for integrity"
1627+ @echo " doctest to run all doctests embedded in the documentation (if enabled)"
1628+
1629+clean:
1630+ -rm -rf $(BUILDDIR)/*
1631+
1632+html:
1633+ $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
1634+ @echo
1635+ @echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
1636+
1637+dirhtml:
1638+ $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
1639+ @echo
1640+ @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
1641+
1642+singlehtml:
1643+ $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
1644+ @echo
1645+ @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
1646+
1647+pickle:
1648+ $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
1649+ @echo
1650+ @echo "Build finished; now you can process the pickle files."
1651+
1652+json:
1653+ $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
1654+ @echo
1655+ @echo "Build finished; now you can process the JSON files."
1656+
1657+htmlhelp:
1658+ $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
1659+ @echo
1660+ @echo "Build finished; now you can run HTML Help Workshop with the" \
1661+ ".hhp project file in $(BUILDDIR)/htmlhelp."
1662+
1663+qthelp:
1664+ $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
1665+ @echo
1666+ @echo "Build finished; now you can run "qcollectiongenerator" with the" \
1667+ ".qhcp project file in $(BUILDDIR)/qthelp, like this:"
1668+ @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/BeautifulSoup.qhcp"
1669+ @echo "To view the help file:"
1670+ @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/BeautifulSoup.qhc"
1671+
1672+devhelp:
1673+ $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
1674+ @echo
1675+ @echo "Build finished."
1676+ @echo "To view the help file:"
1677+ @echo "# mkdir -p $$HOME/.local/share/devhelp/BeautifulSoup"
1678+ @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/BeautifulSoup"
1679+ @echo "# devhelp"
1680+
1681+epub:
1682+ $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
1683+ @echo
1684+ @echo "Build finished. The epub file is in $(BUILDDIR)/epub."
1685+
1686+latex:
1687+ $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
1688+ @echo
1689+ @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
1690+ @echo "Run \`make' in that directory to run these through (pdf)latex" \
1691+ "(use \`make latexpdf' here to do that automatically)."
1692+
1693+latexpdf:
1694+ $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
1695+ @echo "Running LaTeX files through pdflatex..."
1696+ make -C $(BUILDDIR)/latex all-pdf
1697+ @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
1698+
1699+text:
1700+ $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
1701+ @echo
1702+ @echo "Build finished. The text files are in $(BUILDDIR)/text."
1703+
1704+man:
1705+ $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
1706+ @echo
1707+ @echo "Build finished. The manual pages are in $(BUILDDIR)/man."
1708+
1709+changes:
1710+ $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
1711+ @echo
1712+ @echo "The overview file is in $(BUILDDIR)/changes."
1713+
1714+linkcheck:
1715+ $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
1716+ @echo
1717+ @echo "Link check complete; look for any errors in the above output " \
1718+ "or in $(BUILDDIR)/linkcheck/output.txt."
1719+
1720+doctest:
1721+ $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
1722+ @echo "Testing of doctests in the sources finished, look at the " \
1723+ "results in $(BUILDDIR)/doctest/output.txt."
1724diff --git a/doc.ptbr/source/index.rst b/doc.ptbr/source/index.rst
1725index f5eb849..89ae5be 100644
1726--- a/doc.ptbr/source/index.rst
1727+++ b/doc.ptbr/source/index.rst
1728@@ -90,7 +90,7 @@ de dados aninhada::
1729 # Lacie
1730 # </a>
1731 # and
1732- # <a class="sister" href="http://example.com/tillie" id="link2">
1733+ # <a class="sister" href="http://example.com/tillie" id="link3">
1734 # Tillie
1735 # </a>
1736 # ; and they lived at the bottom of a well.
1737diff --git a/doc.zh/source/index.rst b/doc.zh/source/index.rst
1738index 990053f..f716768 100644
1739--- a/doc.zh/source/index.rst
1740+++ b/doc.zh/source/index.rst
1741@@ -79,7 +79,7 @@ Beautiful Soup 4.4.0 文档
1742 # Lacie
1743 # </a>
1744 # and
1745- # <a class="sister" href="http://example.com/tillie" id="link2">
1746+ # <a class="sister" href="http://example.com/tillie" id="link3">
1747 # Tillie
1748 # </a>
1749 # ; and they lived at the bottom of a well.
1750diff --git a/doc/source/index.rst b/doc/source/index.rst
1751index 007e75f..f1be0c5 100644
1752--- a/doc/source/index.rst
1753+++ b/doc/source/index.rst
1754@@ -36,7 +36,7 @@ Beautiful Soup users:
1755 * `이 문서는 한국어 번역도 가능합니다. <https://www.crummy.com/software/BeautifulSoup/bs4/doc.ko/>`_
1756 * `Este documento também está disponível em Português do Brasil. <https://www.crummy.com/software/BeautifulSoup/bs4/doc.ptbr>`_
1757 * `Эта документация доступна на русском языке. <https://www.crummy.com/software/BeautifulSoup/bs4/doc.ru/>`_
1758-
1759+
1760 Getting help
1761 ------------
1762
1763@@ -47,6 +47,9 @@ your problem involves parsing an HTML document, be sure to mention
1764 :ref:`what the diagnose() function says <diagnose>` about
1765 that document.
1766
1767+When reporting an error in this documentation, please mention which
1768+translation you're reading.
1769+
1770 Quick Start
1771 ===========
1772
1773@@ -1670,126 +1673,188 @@ that show up earlier in the document than the one we started with. A
1774 <p> tag that contains an <a> tag must have shown up before the <a>
1775 tag it contains.
1776
1777-CSS selectors
1778--------------
1779-
1780-``BeautifulSoup`` has a ``.select()`` method which uses the `SoupSieve
1781-<https://facelessuser.github.io/soupsieve/>`_ package to run a CSS
1782-selector against a parsed document and return all the matching
1783-elements. ``Tag`` has a similar method which runs a CSS selector
1784-against the contents of a single tag.
1785+CSS selectors through the ``.css`` property
1786+-------------------------------------------
1787
1788-(The SoupSieve integration was added in Beautiful Soup 4.7.0. Earlier
1789-versions also have the ``.select()`` method, but only the most
1790-commonly-used CSS selectors are supported. If you installed Beautiful
1791-Soup through ``pip``, SoupSieve was installed at the same time, so you
1792-don't have to do anything extra.)
1793+``BeautifulSoup`` and ``Tag`` objects support CSS selectors through
1794+their ``.css`` property. The actual selector implementation is handled
1795+by the `Soup Sieve <https://facelessuser.github.io/soupsieve/>`_
1796+package, available on PyPI as ``soupsieve``. If you installed
1797+Beautiful Soup through ``pip``, Soup Sieve was installed at the same
1798+time, so you don't have to do anything extra.
1799
1800-The SoupSieve `documentation
1801-<https://facelessuser.github.io/soupsieve/>`_ lists all the currently
1802-supported CSS selectors, but here are some of the basics:
1803+The Soup Sieve documentation lists `all the currently supported CSS
1804+selectors <https://facelessuser.github.io/soupsieve/selectors/>`_, but
1805+here are some of the basics. You can find tags::
1806
1807-You can find tags::
1808-
1809- soup.select("title")
1810+ soup.css.select("title")
1811 # [<title>The Dormouse's story</title>]
1812
1813- soup.select("p:nth-of-type(3)")
1814+ soup.css.select("p:nth-of-type(3)")
1815 # [<p class="story">...</p>]
1816
1817 Find tags beneath other tags::
1818
1819- soup.select("body a")
1820+ soup.css.select("body a")
1821 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
1822 # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
1823 # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
1824
1825- soup.select("html head title")
1826+ soup.css.select("html head title")
1827 # [<title>The Dormouse's story</title>]
1828
1829 Find tags `directly` beneath other tags::
1830
1831- soup.select("head > title")
1832+ soup.css.select("head > title")
1833 # [<title>The Dormouse's story</title>]
1834
1835- soup.select("p > a")
1836+ soup.css.select("p > a")
1837 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
1838 # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
1839 # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
1840
1841- soup.select("p > a:nth-of-type(2)")
1842+ soup.css.select("p > a:nth-of-type(2)")
1843 # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
1844
1845- soup.select("p > #link1")
1846+ soup.css.select("p > #link1")
1847 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]
1848
1849- soup.select("body > a")
1850+ soup.css.select("body > a")
1851 # []
1852
1853 Find the siblings of tags::
1854
1855- soup.select("#link1 ~ .sister")
1856+ soup.css.select("#link1 ~ .sister")
1857 # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
1858 # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
1859
1860- soup.select("#link1 + .sister")
1861+ soup.css.select("#link1 + .sister")
1862 # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
1863
1864 Find tags by CSS class::
1865
1866- soup.select(".sister")
1867+ soup.css.select(".sister")
1868 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
1869 # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
1870 # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
1871
1872- soup.select("[class~=sister]")
1873+ soup.css.select("[class~=sister]")
1874 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
1875 # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
1876 # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
1877
1878 Find tags by ID::
1879
1880- soup.select("#link1")
1881+ soup.css.select("#link1")
1882 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]
1883
1884- soup.select("a#link2")
1885+ soup.css.select("a#link2")
1886 # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
1887
1888 Find tags that match any selector from a list of selectors::
1889
1890- soup.select("#link1,#link2")
1891+ soup.css.select("#link1,#link2")
1892 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
1893 # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
1894
1895 Test for the existence of an attribute::
1896
1897- soup.select('a[href]')
1898+ soup.css.select('a[href]')
1899 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
1900 # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
1901 # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
1902
1903 Find tags by attribute value::
1904
1905- soup.select('a[href="http://example.com/elsie"]')
1906+ soup.css.select('a[href="http://example.com/elsie"]')
1907 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]
1908
1909- soup.select('a[href^="http://example.com/"]')
1910+ soup.css.select('a[href^="http://example.com/"]')
1911 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
1912 # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
1913 # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
1914
1915- soup.select('a[href$="tillie"]')
1916+ soup.css.select('a[href$="tillie"]')
1917 # [<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
1918
1919- soup.select('a[href*=".com/el"]')
1920+ soup.css.select('a[href*=".com/el"]')
1921 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]
1922
1923 There's also a method called ``select_one()``, which finds only the
1924 first tag that matches a selector::
1925
1926+ soup.css.select_one(".sister")
1927+ # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
1928+
1929+As a convenience, you can call ``select()`` and ``select_one()`` can
1930+directly on the ``BeautifulSoup`` or ``Tag`` object, omitting the
1931+``.css`` property::
1932+
1933+ soup.select('a[href$="tillie"]')
1934+ # [<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
1935+
1936 soup.select_one(".sister")
1937 # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
1938
1939+CSS selector support is a convenience for people who already know the
1940+CSS selector syntax. You can do all of this with the Beautiful Soup
1941+API. If CSS selectors are all you need, you should skip Beautiful Soup
1942+altogether and parse the document with ``lxml``: it's a lot
1943+faster. But Soup Sieve lets you `combine` CSS selectors with the
1944+Beautiful Soup API.
1945+
1946+Advanced Soup Sieve features
1947+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1948+
1949+Soup Sieve offers a substantial API beyond the ``select()`` and
1950+``select_one()`` methods, and you can access most of that API through
1951+the ``.css`` attribute of ``Tag`` or ``BeautifulSoup``. What follows
1952+is just a list of the supported methods; see `the Soup Sieve
1953+documentation <https://facelessuser.github.io/soupsieve/>`_ for full
1954+documentation.
1955+
1956+The ``iselect()`` method works the same as ``select()``, but it
1957+returns a generator instead of a list::
1958+
1959+ [tag['id'] for tag in soup.css.iselect(".sister")]
1960+ # ['link1', 'link2', 'link3']
1961+
1962+The ``closest()`` method returns the nearest parent of a given ``Tag``
1963+that matches a CSS selector, similar to Beautiful Soup's
1964+``find_parent()`` method::
1965+
1966+ elsie = soup.css.select_one(".sister")
1967+ elsie.css.closest("p.story")
1968+ # <p class="story">Once upon a time there were three little sisters; and their names were
1969+ # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
1970+ # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
1971+ # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
1972+ # and they lived at the bottom of a well.</p>
1973+
1974+The ``match()`` method returns a boolean depending on whether or not a
1975+specific ``Tag`` matches a selector::
1976+
1977+ # elsie.css.match("#link1")
1978+ True
1979+
1980+ # elsie.css.match("#link2")
1981+ False
1982+
1983+The ``filter()`` method returns the subset of a tag's direct children
1984+that match a selector::
1985+
1986+ [tag.string for tag in soup.find('p', 'story').css.filter('a')]
1987+ # ['Elsie', 'Lacie', 'Tillie']
1988+
1989+The ``escape()`` method escapes CSS identifiers that would otherwise
1990+be invalid::
1991+
1992+ soup.css.escape("1-strange-identifier")
1993+ # '\\31 -strange-identifier'
1994+
1995+Namespaces in CSS selectors
1996+^^^^^^^^^^^^^^^^^^^^^^^^^^^
1997+
1998 If you've parsed XML that defines namespaces, you can use them in CSS
1999 selectors.::
2000
2001@@ -1798,28 +1863,33 @@ selectors.::
2002 <ns1:child>I'm in namespace 1</ns1:child>
2003 <ns2:child>I'm in namespace 2</ns2:child>
2004 </tag> """
2005- soup = BeautifulSoup(xml, "xml")
2006+ namespace_soup = BeautifulSoup(xml, "xml")
2007
2008- soup.select("child")
2009+ namespace_soup.css.select("child")
2010 # [<ns1:child>I'm in namespace 1</ns1:child>, <ns2:child>I'm in namespace 2</ns2:child>]
2011
2012- soup.select("ns1|child")
2013+ namespace_soup.css.select("ns1|child")
2014 # [<ns1:child>I'm in namespace 1</ns1:child>]
2015-
2016-When handling a CSS selector that uses namespaces, Beautiful Soup
2017-always tries to use namespace prefixes that make sense based on what
2018-it saw while parsing the document. You can always provide your own
2019-dictionary of abbreviations::
2020+
2021+Beautiful Soup tries to use namespace prefixes that make sense based
2022+on what it saw while parsing the document, but you can always provide
2023+your own dictionary of abbreviations::
2024
2025 namespaces = dict(first="http://namespace1/", second="http://namespace2/")
2026- soup.select("second|child", namespaces=namespaces)
2027+ namespace_soup.css.select("second|child", namespaces=namespaces)
2028 # [<ns1:child>I'm in namespace 2</ns1:child>]
2029+
2030+History of CSS selector support
2031+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2032+
2033+The ``.css`` property was added in Beautiful Soup 4.12.0. Prior to this,
2034+only the ``.select()`` and ``.select_one()`` convenience methods were
2035+supported.
2036+
2037+The Soup Sieve integration was added in Beautiful Soup 4.7.0. Earlier
2038+versions had the ``.select()`` method, but only the most commonly-used
2039+CSS selectors were supported.
2040
2041-All this CSS selector stuff is a convenience for people who already
2042-know the CSS selector syntax. You can do all of this with the
2043-Beautiful Soup API. And if CSS selectors are all you need, you should
2044-parse the document with lxml: it's a lot faster. But this lets you
2045-`combine` CSS selectors with the Beautiful Soup API.
2046
2047 Modifying the tree
2048 ==================

Subscribers

People subscribed via source and target branches

to all changes: