Beautiful Soup

Merge beautifulsoup:more-modular-soupstrainers into beautifulsoup:4.13

Proposed by Leonard Richardson on 2024-01-19

Status:	Merged
Merged at revision:	c23dd48ebea467fcf028e14287f07d2c51e62975
Proposed branch:	beautifulsoup:more-modular-soupstrainers
Merge into:	beautifulsoup:4.13
Diff against target:	2064 lines (+710/-262) 18 files modified CHANGELOG (+18/-1) bs4/__init__.py (+131/-84) bs4/_typing.py (+19/-1) bs4/builder/__init__.py (+8/-8) bs4/builder/_html5lib.py (+123/-67) bs4/builder/_htmlparser.py (+12/-2) bs4/builder/_lxml.py (+1/-1) bs4/diagnose.py (+27/-15) bs4/element.py (+24/-20) bs4/filter.py (+167/-36) bs4/tests/__init__.py (+1/-1) bs4/tests/test_filter.py (+125/-8) bs4/tests/test_html5lib.py (+2/-2) bs4/tests/test_lxml.py (+1/-1) bs4/tests/test_pageelement.py (+1/-1) bs4/tests/test_soup.py (+2/-2) bs4/tests/test_tree.py (+1/-1) doc/index.rst (+47/-11)
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Leonard Richardson			Pending
Review via email: mp+459082@code.launchpad.net

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Leonard Richardson

Beautiful Soup

Merge beautifulsoup:more-modular-soupstrainers into beautifulsoup:4.13

Commit message

Description of the change

Preview Diff

Subscribers

 diff --git a/CHANGELOG b/CHANGELOG
 index 69f238d..162e3dc 100644
 --- a/CHANGELOG
 +++ b/CHANGELOG
@@ -1,5 +1,7 @@
  = 4.13.0 (Unreleased)
++TODO: we could stand to put limit inside ResultSet
++
  * This version drops support for Python 3.6. The minimum supported
    major Python version for Beautiful Soup is now Python 3.7.
@@ -31,6 +33,13 @@
    you, since you probably use HTMLParserTreeBuilder, not
    BeautifulSoupHTMLParser directly.
++* The TreeBuilderForHtml5lib methods fragmentClass and getFragment
++  now raise NotImplementedError. These methods are called only by
++  html5lib's HTMLParser.parseFragment() method, which Beautiful Soup
++  doesn't use, so they were untested and should have never been called.
++  The getFragment() implementation was also slightly incorrect in a way
++  that should have caused obvious problems for anyone using it.
++
  * If Tag.get_attribute_list() is used to access an attribute that's not set,
    the return value is now an empty list rather than [None].
@@ -47,6 +56,10 @@
    empty list was treated the same as None and False, and you would have
    found the tags which did not have that attribute set at all. [bug=2045469]
++* For similar reasons, if you pass in limit=0 to a find() method for some
++  reason, you will now get zero results. Previously, you would get all
++  matching results.
++
  * When using one of the find() methods or creating a SoupStrainer,
    if you specify the same attribute value in ``attrs`` and the
    keyword arguments, you'll end up with two different ways to match that
@@ -88,7 +101,7 @@
    changed to match the arguments to the superclass,
    TreeBuilder.prepare_markup. Specifically, document_declared_encoding
    now appears before exclude_encodings, not after. If you were calling
--  this method yourself, I recomment switching to using keyword
++  this method yourself, I recommend switching to using keyword
    arguments instead.
  * Fixed an error in the lookup table used when converting
@@ -101,8 +114,12 @@ New deprecations in 4.13.0:
  * The SAXTreeBuilder class, which was never officially supported or tested.
++* The private class method BeautifulSoup._decode_markup(), which has not
++  been used inside Beautiful Soup for many years.
++
  * The first argument to BeautifulSoup.decode has been changed from a bool
    `pretty_print` to an int `indent_level`, to match the signature of Tag.decode.
++  Using a bool will still work but will give you a DeprecationWarning.
  * SoupStrainer.text and SoupStrainer.string are both deprecated
    since a single item can't capture all the possibilities of a SoupStrainer
 diff --git a/bs4/__init__.py b/bs4/__init__.py
 index 347cb38..95bd48d 100644
 --- a/bs4/__init__.py
 +++ b/bs4/__init__.py
@@ -15,7 +15,7 @@ documentation: http://www.crummy.com/software/BeautifulSoup/bs4/doc/
  """
  __author__ = "Leonard Richardson (leonardr@segfault.org)"
--__version__ = "4.12.3"
++__version__ = "4.13.0"
  __copyright__ = "Copyright (c) 2004-2024 Leonard Richardson"
  # Use of this source code is governed by the MIT license.
  __license__ = "MIT"
@@ -42,10 +42,13 @@ from .builder import (
+ )
  from .builder._htmlparser import HTMLParserTreeBuilder
  from .dammit import UnicodeDammit
++from .css import (
++    CSS
++)
++from ._deprecation import _deprecated
  from .element import (
      CData,
      Comment,
--    CSS,
      DEFAULT_OUTPUT_ENCODING,
      Declaration,
      Doctype,
@@ -60,7 +63,10 @@ from .element import (
      TemplateString,
+     )
  from .formatter import Formatter
--from .strainer import SoupStrainer
++from .filter import (
++    ElementFilter,
++    SoupStrainer,
++)
  from typing import (
      Any,
      cast,
@@ -70,6 +76,7 @@ from typing import (
      List,
      Sequence,
      Optional,
++    Tuple,
      Type,
      TYPE_CHECKING,
      Union,
@@ -81,6 +88,7 @@ from bs4._typing import (
      _Encoding,
      _Encodings,
      _IncomingMarkup,
++    _RawMarkup,
+ )
  # Define some custom warnings.
@@ -144,20 +152,21 @@ class BeautifulSoup(Tag):
      NO_PARSER_SPECIFIED_WARNING: str = "No parser was explicitly specified, so I'm using the best available %(markup_type)s parser for this system (\"%(parser)s\"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.\n\nThe code that caused this warning is on line %(line_number)s of the file %(filename)s. To get rid of this warning, pass the additional argument 'features=\"%(parser)s\"' to the BeautifulSoup constructor.\n"
      # FUTURE PYTHON:
--    element_classes:Dict[Type[PageElement], Type[Any]] #: :meta private:
++    element_classes:Dict[Type[PageElement], Type[PageElement]] #: :meta private:
      builder:TreeBuilder #: :meta private:
      is_xml: bool
      known_xml: Optional[bool]
      parse_only: Optional[SoupStrainer] #: :meta private:
      # These members are only used while parsing markup.
--    markup:Optional[Union[str,bytes]] #: :meta private:
++    markup:Optional[_RawMarkup] #: :meta private:
      current_data:List[str] #: :meta private:
      currentTag:Optional[Tag] #: :meta private:
      tagStack:List[Tag] #: :meta private:
      open_tag_counter:CounterType[str] #: :meta private:
      preserve_whitespace_tag_stack:List[Tag] #: :meta private:
      string_container_stack:List[Tag] #: :meta private:
++    _most_recent_element:Optional[PageElement] #: :meta private:
      #: Beautiful Soup's best guess as to the character encoding of the
      #: original document.
@@ -182,7 +191,7 @@ class BeautifulSoup(Tag):
              parse_only:Optional[SoupStrainer]=None,
              from_encoding:Optional[_Encoding]=None,
              exclude_encodings:Optional[_Encodings]=None,
--            element_classes:Optional[Dict[Type[PageElement], Type[Any]]]=None,
++            element_classes:Optional[Dict[Type[PageElement], Type[PageElement]]]=None,
              **kwargs:Any
      ):
          """Constructor.
@@ -271,7 +280,7 @@ class BeautifulSoup(Tag):
                  "features='lxml' for HTML and features='lxml-xml' for "
                  "XML.")
--        def deprecated_argument(old_name, new_name):
++        def deprecated_argument(old_name:str, new_name:str) -> Optional[Any]:
              if old_name in kwargs:
                  warnings.warn(
                      'The "%s" argument to the BeautifulSoup constructor '
@@ -284,13 +293,14 @@ class BeautifulSoup(Tag):
          parse_only = parse_only or deprecated_argument(
              "parseOnlyThese", "parse_only")
--        if (parse_only is not None
--            and parse_only.string_rules and
--            (parse_only.name_rules or parse_only.attribute_rules)):
--            warnings.warn(
--                f"Value for parse_only will exclude everything, since it puts restrictions on both tags and strings: {parse_only}",
--                UserWarning, stacklevel=3
--            )
++        if parse_only is not None:
++            # Issue a warning if we can tell in advance that
++            # parse_only will exclude the entire tree.
++            if parse_only.excludes_everything:
++                warnings.warn(
++                    f"The given value for parse_only will exclude everything: {parse_only}",
++                    UserWarning, stacklevel=3
++                )
          from_encoding = from_encoding or deprecated_argument(
              "fromEncoding", "from_encoding")
@@ -323,7 +333,7 @@ class BeautifulSoup(Tag):
                      "Couldn't find a tree builder with the features you "
                      "requested: %s. Do you need to install a parser library?"
                      % ",".join(features))
--            builder_class = cast(Type[TreeBuilder], possible_builder_class)
++            builder_class = possible_builder_class
          # At this point either we have a TreeBuilder instance in
          # builder, or we have a builder_class that we can instantiate
@@ -399,7 +409,7 @@ class BeautifulSoup(Tag):
          # At this point we know markup is a string or bytestring.  If
          # it was a file-type object, we've read from it.
--        markup = cast(Union[str,bytes], markup)
++        markup = cast(_RawMarkup, markup)
          rejections = []
          success = False
@@ -428,7 +438,7 @@ class BeautifulSoup(Tag):
          self.markup = None
          self.builder.soup = None
--    def _clone(self):
++    def _clone(self) -> "BeautifulSoup":
          """Create a new BeautifulSoup object with the same TreeBuilder,
          but not associated with any markup.
@@ -441,7 +451,7 @@ class BeautifulSoup(Tag):
          clone.original_encoding = self.original_encoding
          return clone
--    def __getstate__(self):
++    def __getstate__(self) -> dict[str, Any]:
          # Frequently a tree builder can't be pickled.
          d = dict(self.__dict__)
          if 'builder' in d and d['builder'] is not None and not self.builder.picklable:
@@ -457,7 +467,7 @@ class BeautifulSoup(Tag):
              del d['_most_recent_element']
          return d
--    def __setstate__(self, state):
++    def __setstate__(self, state: dict[str, Any]) -> None:
          # If necessary, restore the TreeBuilder by looking it up.
          self.__dict__ = state
          if isinstance(self.builder, type):
@@ -469,15 +479,16 @@ class BeautifulSoup(Tag):
          self.builder.soup = self
          self.reset()
          self._feed()
--        return state
      @classmethod
--    def _decode_markup(cls, markup):
--        """Ensure `markup` is bytes so it's safe to send into warnings.warn.
++    @_deprecated(replaced_by="nothing (private method, will be removed)", version="4.13.0")
++    def _decode_markup(cls, markup:_RawMarkup) -> str:
++        """Ensure `markup` is Unicode so it's safe to send into warnings.warn.
--        TODO: warnings.warn had this problem back in 2010 but it might not
--        anymore.
++        warnings.warn had this problem back in 2010 but fortunately
++        not anymore. This has not been used for a long time; I just
++        noticed that fact while working on 4.13.0.
          """
          if isinstance(markup, bytes):
              decoded = markup.decode('utf-8', 'replace')
@@ -486,56 +497,76 @@ class BeautifulSoup(Tag):
          return decoded
      @classmethod
--    def _markup_is_url(cls, markup):
++    def _markup_is_url(cls, markup:_RawMarkup) -> bool:
          """Error-handling method to raise a warning if incoming markup looks
          like a URL.
--        :param markup: A string.
--        :return: Whether or not the markup resembles a URL
--            closely enough to justify a warning.
++        :param markup: A string of markup.
++        :return: Whether or not the markup resembled a URL
++            closely enough to justify issuing a warning.
          """
++        problem: bool = False
          if isinstance(markup, bytes):
--            space = b' '
--            cant_start_with = (b"http:", b"https:")
++            cant_start_with_b: Tuple[bytes, bytes] = (b"http:", b"https:")
++            problem = (
++                any(
++                    markup.startswith(prefix) for prefix in
++                    (b"http:", b"https:")
++                )
++                and not b' ' in markup
++            )
          elif isinstance(markup, str):
--            space = ' '
--            cant_start_with = ("http:", "https:")
++            problem = (
++                any(
++                    markup.startswith(prefix) for prefix in
++                    ("http:", "https:")
++                )
++                and not ' ' in markup
++            )
          else:
              return False
--        if any(markup.startswith(prefix) for prefix in cant_start_with):
--            if not space in markup:
--                warnings.warn(
--                    'The input looks more like a URL than markup. You may want to use'
--                    ' an HTTP client like requests to get the document behind'
--                    ' the URL, and feed that document to Beautiful Soup.',
--                    MarkupResemblesLocatorWarning,
--                    stacklevel=3
--                )
--                return True
--        return False
++        if not problem:
++            return False
++        warnings.warn(
++            'The input looks more like a URL than markup. You may want to use'
++            ' an HTTP client like requests to get the document behind'
++            ' the URL, and feed that document to Beautiful Soup.',
++            MarkupResemblesLocatorWarning,
++            stacklevel=3
++        )
++        return True
      @classmethod
--    def _markup_resembles_filename(cls, markup):
--        """Error-handling method to raise a warning if incoming markup
++    def _markup_resembles_filename(cls, markup:_RawMarkup) -> bool:
++        """Error-handling method to issue a warning if incoming markup
          resembles a filename.
--        :param markup: A bytestring or string.
--        :return: Whether or not the markup resembles a filename
--            closely enough to justify a warning.
++        :param markup: A string of markup.
++        :return: Whether or not the markup resembled a filename
++            closely enough to justify issuing a warning.
          """
--        path_characters = '/\\'
--        extensions = ['.html', '.htm', '.xml', '.xhtml', '.txt']
--        if isinstance(markup, bytes):
--            path_characters = path_characters.encode("utf8")
--            extensions = [x.encode('utf8') for x in extensions]
++        path_characters_b = b'/\\'
++        path_characters_s = '/\\'
++        extensions_b = [b'.html', b'.htm', b'.xml', b'.xhtml', b'.txt']
++        extensions_s = ['.html', '.htm', '.xml', '.xhtml', '.txt']
++
          filelike = False
--        if any(x in markup for x in path_characters):
--            filelike = True
++        if isinstance(markup, bytes):
++            if any(x in markup for x in path_characters_b):
++                filelike = True
++            else:
++                lower_b = markup.lower()
++                if any(lower_b.endswith(ext) for ext in extensions_b):
++                    filelike = True
          else:
--            lower = markup.lower()
--            if any(lower.endswith(ext) for ext in extensions):
++            if any(x in markup for x in path_characters_s):
                  filelike = True
++            else:
++                lower_s = markup.lower()
++                if any(lower_s.endswith(ext) for ext in extensions_s):
++                    filelike = True
++
          if filelike:
              warnings.warn(
                  'The input looks more like a filename than markup. You may'
@@ -546,20 +577,22 @@ class BeautifulSoup(Tag):
              return True
          return False
--    def _feed(self):
++    def _feed(self) -> None:
          """Internal method that parses previously set markup, creating a large
          number of Tag and NavigableString objects.
          """
          # Convert the document to Unicode.
          self.builder.reset()
--        self.builder.feed(self.markup)
++        if self.markup is not None:
++            self.builder.feed(self.markup)
          # Close out any unfinished strings and close all the open tags.
          self.endData()
--        while self.currentTag.name != self.ROOT_TAG_NAME:
++        while (self.currentTag is not None and
++               self.currentTag.name != self.ROOT_TAG_NAME):
              self.popTag()
--    def reset(self):
++    def reset(self) -> None:
          """Reset this object to a state as though it had never parsed any
          markup.
          """
@@ -585,7 +618,7 @@ class BeautifulSoup(Tag):
              sourcepos:Optional[int]=None,
              string:Optional[str]=None,
              **kwattrs:_AttributeValue,
--    ):
++    ) -> Tag:
          """Create a new Tag associated with this BeautifulSoup object.
          :param name: The name of the new Tag.
@@ -603,10 +636,16 @@ class BeautifulSoup(Tag):
          """
          kwattrs.update(attrs)
--        tag =  self.element_classes.get(Tag, Tag)(
++        tag_class = self.element_classes.get(Tag, Tag)
++
++        # Assume that this is either Tag or a subclass of Tag. If not,
++        # the user brought type-unsafety upon themselves.
++        tag_class = cast(Type[Tag], tag_class)
++        tag = tag_class(
              None, self.builder, name, namespace, nsprefix, kwattrs,
              sourceline=sourceline, sourcepos=sourcepos
+         )
++
          if string is not None:
              tag.string = string
          return tag
@@ -622,9 +661,11 @@ class BeautifulSoup(Tag):
          """
          container = base_class or NavigableString
--        # There may be a general override of NavigableString.
--        container = self.element_classes.get(
--            container, container
++        # The user may want us to use some other class (hopefully a
++        # custom subclass) instead of the one we'd use normally.
++        container = cast(
++            type[NavigableString],
++            self.element_classes.get(container, container)
+         )
          # On top of that, we may be inside a tag that needs a special
@@ -728,9 +769,8 @@ class BeautifulSoup(Tag):
              self.current_data = []
              # Should we add this string to the tree at all?
--            if self.parse_only and len(self.tagStack) <= 1 and \
--                   (not self.parse_only.string_rules or \
--                    not self.parse_only.allow_string_creation(current_data)):
++            if (self.parse_only and len(self.tagStack) <= 1 and
++                (not self.parse_only.allow_string_creation(current_data))):
                  return
              containerClass = self.string_container(containerClass)
@@ -739,17 +779,16 @@ class BeautifulSoup(Tag):
      def object_was_parsed(
              self, o:PageElement, parent:Optional[Tag]=None,
--            most_recent_element:Optional[PageElement]=None):
++            most_recent_element:Optional[PageElement]=None) -> None:
          """Method called by the TreeBuilder to integrate an object into the
          parse tree.
--
--
          :meta private:
          """
          if parent is None:
              parent = self.currentTag
          assert parent is not None
++        previous_element: Optional[PageElement]
          if most_recent_element is not None:
              previous_element = most_recent_element
          else:
@@ -774,12 +813,12 @@ class BeautifulSoup(Tag):
          if fix:
              self._linkage_fixer(parent)
--    def _linkage_fixer(self, el):
++    def _linkage_fixer(self, el:Tag) -> None:
          """Make sure linkage of this fragment is sound."""
          first = el.contents[0]
          child = el.contents[-1]
--        descendant = child
++        descendant:PageElement = child
          if child is first and el.parent is not None:
              # Parent should be linked to first child
@@ -797,14 +836,18 @@ class BeautifulSoup(Tag):
          # This index is a tag, dig deeper for a "last descendant"
          if isinstance(child, Tag) and child.contents:
--            descendant = child._last_descendant(False)
++            # _last_decendant is typed as returning Optional[PageElement],
++            # but the value can't be None here, because el is a Tag
++            # which we know has contents.
++            descendant = cast(PageElement, child._last_descendant(False))
          # As the final step, link last descendant. It should be linked
          # to the parent's next sibling (if found), else walk up the chain
          # and find a parent with a sibling. It should have no next sibling.
          descendant.next_element = None
          descendant.next_sibling = None
--        target = el
++
++        target:Optional[Tag] = el
          while True:
              if target is None:
                  break
@@ -814,7 +857,7 @@ class BeautifulSoup(Tag):
                  break
              target = target.parent
--    def _popToTag(self, name, nsprefix=None, inclusivePop=True) -> Optional[Tag]:
++    def _popToTag(self, name:str, nsprefix:Optional[str]=None, inclusivePop:bool=True) -> Optional[Tag]:
          """Pops the tag stack up to and including the most recent
          instance of the given tag.
@@ -851,7 +894,7 @@ class BeautifulSoup(Tag):
      def handle_starttag(
              self, name:str, namespace:Optional[str],
--            nsprefix:Optional[str], attrs:Optional[Dict[str,str]],
++            nsprefix:Optional[str], attrs:_AttributeValues,
              sourceline:Optional[int]=None, sourcepos:Optional[int]=None,
              namespaces:Optional[Dict[str, str]]=None) -> Optional[Tag]:
          """Called by the tree builder when a new tag is encountered.
@@ -867,7 +910,7 @@ class BeautifulSoup(Tag):
              currently in scope in the document.
          If this method returns None, the tag was rejected by an active
--        SoupStrainer. You should proceed as if the tag had not occurred
++        `ElementFilter`. You should proceed as if the tag had not occurred
          in the document. For instance, if this was a self-closing tag,
          don't call handle_endtag.
@@ -877,11 +920,14 @@ class BeautifulSoup(Tag):
          self.endData()
          if (self.parse_only and len(self.tagStack) <= 1
--            and (self.parse_only.string_rules
--                 or not self.parse_only.allow_tag_creation(nsprefix, name, attrs))):
++            and not self.parse_only.allow_tag_creation(nsprefix, name, attrs)):
              return None
--        tag = self.element_classes.get(Tag, Tag)(
++        tag_class = self.element_classes.get(Tag, Tag)
++        # Assume that this is either Tag or a subclass of Tag. If not,
++        # the user brought type-unsafety upon themselves.
++        tag_class = cast(Type[Tag], tag_class)
++        tag = tag_class(
              self, self.builder, name, namespace, nsprefix, attrs,
              self.currentTag, self._most_recent_element,
              sourceline=sourceline, sourcepos=sourcepos,
@@ -918,7 +964,8 @@ class BeautifulSoup(Tag):
      def decode(self, indent_level:Optional[int]=None,
                 eventual_encoding:_Encoding=DEFAULT_OUTPUT_ENCODING,
                 formatter:Union[Formatter,str]="minimal",
--               iterator:Optional[Iterable]=None, **kwargs) -> str:
++               iterator:Optional[Iterable[PageElement]]=None,
++               **kwargs:Any) -> str:
          """Returns a string representation of the parse tree
              as a full HTML or XML document.
@@ -989,7 +1036,7 @@ _soup = BeautifulSoup
  class BeautifulStoneSoup(BeautifulSoup):
      """Deprecated interface to an XML parser."""
--    def __init__(self, *args, **kwargs):
++    def __init__(self, *args:Any, **kwargs:Any):
          kwargs['features'] = 'xml'
          warnings.warn(
              'The BeautifulStoneSoup class was deprecated in version 4.0.0. Instead of using '
 diff --git a/bs4/_typing.py b/bs4/_typing.py
 index fed804a..ab8f7a0 100644
 --- a/bs4/_typing.py
 +++ b/bs4/_typing.py
@@ -7,6 +7,8 @@
  # * In 3.10, x|y is an accepted shorthand for Union[x,y].
  # * In 3.10, TypeAlias gains capabilities that can be used to
  #   improve the tree matching types (I don't remember what, exactly).
++# * 3.8 defines the Protocol type, which can be used to do duck typing
++#   in a statically checkable way.
  import re
  from typing_extensions import TypeAlias
@@ -15,13 +17,14 @@ from typing import (
      Dict,
      IO,
      Iterable,
++    Optional,
      Pattern,
      TYPE_CHECKING,
      Union,
+ )
  if TYPE_CHECKING:
--    from bs4.element import Tag
++    from bs4.element import PageElement, Tag
  # Aliases for markup in various stages of processing.
+ #
@@ -52,6 +55,10 @@ _InvertedNamespaceMapping:TypeAlias = Dict[_NamespaceURL, _NamespacePrefix]
  _AttributeValue: TypeAlias = Union[str, Iterable[str]]
  _AttributeValues: TypeAlias = Dict[str, _AttributeValue]
++# The most common form in which attribute values are passed in from a
++# parser.
++_RawAttributeValues: TypeAlias = dict[str, str]
++
  # Aliases to represent the many possibilities for matching bits of a
  # parse tree.
+ #
@@ -60,6 +67,17 @@ _AttributeValues: TypeAlias = Dict[str, _AttributeValue]
  # of the arguments to the SoupStrainer constructor and (more
  # familiarly to Beautiful Soup users) the find* methods.
++# A function that takes a PageElement and returns a yes-or-no answer.
++_PageElementMatchFunction:TypeAlias = Callable[['PageElement'], bool]
++
++# A function that takes the raw parsed ingredients of a markup tag
++# and returns a yes-or-no answer.
++_AllowTagCreationFunction:TypeAlias = Callable[[Optional[str], str, Optional[_RawAttributeValues]], bool]
++
++# A function that takes the raw parsed ingredients of a markup string node
++# and returns a yes-or-no answer.
++_AllowStringCreationFunction:TypeAlias = Callable[[Optional[str]], bool]
++
  # A function that takes a Tag and returns a yes-or-no answer.
  # A TagNameMatchRule expects this kind of function, if you're
  # going to pass it a function.
 diff --git a/bs4/builder/__init__.py b/bs4/builder/__init__.py
 index fa2b939..b59513e 100644
 --- a/bs4/builder/__init__.py
 +++ b/bs4/builder/__init__.py
@@ -277,7 +277,7 @@ class TreeBuilder(object):
              return True
          return tag_name in self.empty_element_tags
--    def feed(self, markup:str) -> None:
++    def feed(self, markup:_RawMarkup) -> None:
          """Run some incoming markup through some parsing process,
          populating the `BeautifulSoup` object in `TreeBuilder.soup`
          """
@@ -598,8 +598,8 @@ class DetectsXMLParsedAsHTML(object):
      # This is typed as str, not `ProcessingInstruction`, because this
      # check may be run before any Beautiful Soup objects are created.
--    _first_processing_instruction: Optional[str]
--    _root_tag: Optional[Tag]
++    _first_processing_instruction: Optional[str] #: :meta private:
++    _root_tag_name: Optional[str] #: :meta private:
      @classmethod
      def warn_if_markup_looks_like_xml(cls, markup:Optional[_RawMarkup], stacklevel:int=3) -> bool:
@@ -648,14 +648,14 @@ class DetectsXMLParsedAsHTML(object):
      def _initialize_xml_detector(self) -> None:
          """Call this method before parsing a document."""
          self._first_processing_instruction = None
--        self._root_tag = None
++        self._root_tag_name = None
      def _document_might_be_xml(self, processing_instruction:str):
          """Call this method when encountering an XML declaration, or a
          "processing instruction" that might be an XML declaration.
          """
          if (self._first_processing_instruction is not None
--            or self._root_tag is not None):
++            or self._root_tag_name is not None):
              # The document has already started. Don't bother checking
              # anymore.
              return
@@ -665,18 +665,18 @@ class DetectsXMLParsedAsHTML(object):
          # We won't know until we encounter the first tag whether or
          # not this is actually a problem.
--    def _root_tag_encountered(self, name):
++    def _root_tag_encountered(self, name:str) -> None:
          """Call this when you encounter the document's root tag.
          This is where we actually check whether an XML document is
          being incorrectly parsed as HTML, and issue the warning.
          """
--        if self._root_tag is not None:
++        if self._root_tag_name is not None:
              # This method was incorrectly called multiple times. Do
              # nothing.
              return
--        self._root_tag = name
++        self._root_tag_name = name
          if (name != 'html' and self._first_processing_instruction is not None
              and self._first_processing_instruction.lower().startswith('xml ')):
              # We encountered an XML declaration and then a tag other
 diff --git a/bs4/builder/_html5lib.py b/bs4/builder/_html5lib.py
 index b7d2924..2ea556c 100644
 --- a/bs4/builder/_html5lib.py
 +++ b/bs4/builder/_html5lib.py
@@ -6,6 +6,9 @@ __all__ = [
+     ]
  from typing import (
++    Any,
++    cast,
++    Dict,
      Iterable,
      List,
      Optional,
@@ -14,8 +17,11 @@ from typing import (
      Union,
+ )
  from bs4._typing import (
++    _AttributeValue,
++    _AttributeValues,
      _Encoding,
      _Encodings,
++    _NamespaceURL,
      _RawMarkup,
+ )
@@ -30,6 +36,7 @@ from bs4.builder import (
+     )
  from bs4.element import (
      NamespacedAttribute,
++    PageElement,
      nonwhitespace_re,
+ )
  import html5lib
@@ -42,7 +49,9 @@ from bs4.element import (
      Doctype,
      NavigableString,
      Tag,
--    )
++)
++if TYPE_CHECKING:
++    from bs4 import BeautifulSoup
  from html5lib.treebuilders import base as treebuilder_base
@@ -71,7 +80,9 @@ class HTML5TreeBuilder(HTMLTreeBuilder):
      #: html5lib can tell us which line number and position in the
      #: original file is the source of an element.
      TRACKS_LINE_NUMBERS:bool = True
--
++
++    underlying_builder:'TreeBuilderForHtml5lib' #: :meta private:
++
      def prepare_markup(self, markup:_RawMarkup,
                         user_specified_encoding:Optional[_Encoding]=None,
                         document_declared_encoding:Optional[_Encoding]=None,
@@ -102,20 +113,31 @@ class HTML5TreeBuilder(HTMLTreeBuilder):
          yield (markup, None, None, False)
      # These methods are defined by Beautiful Soup.
--    def feed(self, markup):
++    def feed(self, markup:_RawMarkup) -> None:
          """Run some incoming markup through some parsing process,
          populating the `BeautifulSoup` object in `HTML5TreeBuilder.soup`.
          """
--        if self.soup.parse_only is not None:
++        if self.soup is not None and self.soup.parse_only is not None:
              warnings.warn(
                  "You provided a value for parse_only, but the html5lib tree builder doesn't support parse_only. The entire document will be parsed.",
                  stacklevel=4
+             )
++
++        # self.underlying_parser is probably None now, but it'll be set
++        # when self.create_treebuilder is called by html5lib.
++        #
++        # TODO-TYPING: typeshed stubs are incorrect about the return
++        # value of HTMLParser.__init__; it is HTMLParser, not None.
          parser = html5lib.HTMLParser(tree=self.create_treebuilder)
++        assert self.underlying_builder is not None
          self.underlying_builder.parser = parser
          extra_kwargs = dict()
          if not isinstance(markup, str):
++            # kwargs, specifically override_encoding, will eventually
++            # be passed in to html5lib's
++            # HTMLBinaryInputStream.__init__.
              extra_kwargs['override_encoding'] = self.user_specified_encoding
++
          doc = parser.parse(markup, **extra_kwargs)
          # Set the character encoding detected by the tokenizer.
@@ -131,10 +153,12 @@ class HTML5TreeBuilder(HTMLTreeBuilder):
              doc.original_encoding = original_encoding
          self.underlying_builder.parser = None
--    def create_treebuilder(self, namespaceHTMLElements):
++    def create_treebuilder(self, namespaceHTMLElements:bool) -> 'TreeBuilderForHtml5lib':
          """Called by html5lib to instantiate the kind of class it
          calls a 'TreeBuilder'.
--
++
++        :param namespaceHTMLElements: Whether or not to namespace HTML elements.
++
          :meta private:
          """
          self.underlying_builder = TreeBuilderForHtml5lib(
@@ -143,15 +167,18 @@ class HTML5TreeBuilder(HTMLTreeBuilder):
+         )
          return self.underlying_builder
--    def test_fragment_to_document(self, fragment):
++    def test_fragment_to_document(self, fragment:str) -> str:
          """See `TreeBuilder`."""
          return '<html><head></head><body>%s</body></html>' % fragment
  class TreeBuilderForHtml5lib(treebuilder_base.TreeBuilder):
--
--    def __init__(self, namespaceHTMLElements, soup=None,
--                 store_line_numbers=True, **kwargs):
++
++    soup:'BeautifulSoup' #: :meta private:
++
++    def __init__(self, namespaceHTMLElements:bool,
++                 soup:Optional['BeautifulSoup']=None,
++                 store_line_numbers:bool=True, **kwargs:Any):
          if soup:
              self.soup = soup
          else:
@@ -172,65 +199,68 @@ class TreeBuilderForHtml5lib(treebuilder_base.TreeBuilder):
          self.parser = None
          self.store_line_numbers = store_line_numbers
--    def documentClass(self):
++    def documentClass(self) -> 'Element':
          self.soup.reset()
          return Element(self.soup, self.soup, None)
--    def insertDoctype(self, token):
--        name = token["name"]
--        publicId = token["publicId"]
--        systemId = token["systemId"]
++    def insertDoctype(self, token:Dict[str, Any]) -> None:
++        name:str = cast(str, token["name"])
++        publicId:Optional[str] = cast(Optional[str], token["publicId"])
++        systemId:Optional[str] = cast(Optional[str], token["systemId"])
          doctype = Doctype.for_name_and_ids(name, publicId, systemId)
          self.soup.object_was_parsed(doctype)
--    def elementClass(self, name, namespace):
--        kwargs = {}
++    def elementClass(self, name:str, namespace:str) -> 'Element':
++        sourceline:Optional[int] = None
++        sourcepos:Optional[int] = None
          if self.parser and self.store_line_numbers:
              # This represents the point immediately after the end of the
              # tag. We don't know when the tag started, but we do know
              # where it ended -- the character just before this one.
              sourceline, sourcepos = self.parser.tokenizer.stream.position()
--            kwargs['sourceline'] = sourceline
--            kwargs['sourcepos'] = sourcepos-1
--        tag = self.soup.new_tag(name, namespace, **kwargs)
++            sourcepos = sourcepos-1
++        tag = self.soup.new_tag(
++            name, namespace, sourceline=sourceline, sourcepos=sourcepos
++        )
          return Element(tag, self.soup, namespace)
--    def commentClass(self, data):
++    def commentClass(self, data:str) -> 'TextNode':
          return TextNode(Comment(data), self.soup)
--    def fragmentClass(self):
--        from bs4 import BeautifulSoup
--        # TODO: Why is the parser 'html.parser' here? To avoid an
--        # infinite loop?
--        self.soup = BeautifulSoup("", "html.parser")
--        self.soup.name = "[document_fragment]"
--        return Element(self.soup, self.soup, None)
++    def fragmentClass(self) -> 'Element':
++        """This is only used by html5lib HTMLParser.parseFragment(),
++        which is never used by Beautiful Soup."""
++        raise NotImplementedError()
++
++    def getFragment(self) -> 'Element':
++        """This is only used by html5lib HTMLParser.parseFragment,
++        which is never used by Beautiful Soup."""
++        raise NotImplementedError()
--    def appendChild(self, node):
--        # XXX This code is not covered by the BS4 tests.
++    def appendChild(self, node:'Element') -> None:
++        # TODO: This code is not covered by the BS4 tests.
          self.soup.append(node.element)
--    def getDocument(self):
++    def getDocument(self) -> 'BeautifulSoup':
          return self.soup
--    def getFragment(self):
--        return treebuilder_base.TreeBuilder.getFragment(self).element
--
--    def testSerializer(self, element):
++    # TODO-TYPING: typeshed stubs are incorrect about this;
++    # cloneNode returns a str, not None.
++    def testSerializer(self, element:'Element') -> str:
          from bs4 import BeautifulSoup
          rv = []
          doctype_re = re.compile(r'^(.*?)(?: PUBLIC "(.*?)"(?: "(.*?)")?| SYSTEM "(.*?)")?$')
--        def serializeElement(element, indent=0):
++        def serializeElement(element:Union['Element', PageElement], indent=0) -> None:
              if isinstance(element, BeautifulSoup):
                  pass
              if isinstance(element, Doctype):
                  m = doctype_re.match(element)
--                if m:
++                if m is not None:
                      name = m.group(1)
--                    if m.lastindex > 1:
++                    if m.lastindex is not None and m.lastindex > 1:
                          publicId = m.group(2) or ""
                          systemId = m.group(3) or m.group(4) or ""
                          rv.append("""|%s<!DOCTYPE %s "%s" "%s">""" %
@@ -243,7 +273,7 @@ class TreeBuilderForHtml5lib(treebuilder_base.TreeBuilder):
                  rv.append("|%s<!-- %s -->" % (' ' * indent, element))
              elif isinstance(element, NavigableString):
                  rv.append("|%s\"%s\"" % (' ' * indent, element))
--            else:
++            elif isinstance(element, Element):
                  if element.namespace:
                      name = "%s %s" % (prefixes[element.namespace],
                                        element.name)
@@ -269,12 +299,19 @@ class TreeBuilderForHtml5lib(treebuilder_base.TreeBuilder):
          return "\n".join(rv)
  class AttrList(object):
--    def __init__(self, element):
++    """Represents a Tag's attributes in a way compatible with html5lib."""
++
++    element:Tag
++    attrs:_AttributeValues
++
++    def __init__(self, element:Tag):
          self.element = element
          self.attrs = dict(self.element.attrs)
--    def __iter__(self):
++
++    def __iter__(self) -> Iterable[Tuple[str, _AttributeValue]]:
          return list(self.attrs.items()).__iter__()
--    def __setitem__(self, name, value):
++
++    def __setitem__(self, name:str, value:_AttributeValue) -> None:
          # If this attribute is a multi-valued attribute for this element,
          # turn its value into a list.
          list_attr = self.element.cdata_list_attributes or {}
@@ -282,40 +319,52 @@ class AttrList(object):
              or (self.element.name in list_attr
                  and name in list_attr.get(self.element.name, []))):
              # A node that is being cloned may have already undergone
--            # this procedure.
++            # this procedure. Check for this and skip it.
              if not isinstance(value, list):
++                assert isinstance(value, str)
                  value = nonwhitespace_re.findall(value)
          self.element[name] = value
--    def items(self):
++
++    def items(self) -> Iterable[Tuple[str, _AttributeValue]]:
          return list(self.attrs.items())
--    def keys(self):
++
++    def keys(self) -> Iterable[str]:
          return list(self.attrs.keys())
--    def __len__(self):
++
++    def __len__(self) -> int:
          return len(self.attrs)
--    def __getitem__(self, name):
++
++    def __getitem__(self, name:str) -> _AttributeValue:
          return self.attrs[name]
--    def __contains__(self, name):
++
++    def __contains__(self, name:str) -> bool:
          return name in list(self.attrs.keys())
  class Element(treebuilder_base.Node):
--    def __init__(self, element, soup, namespace):
++
++    element:Tag
++    soup:'BeautifulSoup'
++    namespace:Optional[_NamespaceURL]
++
++    def __init__(self, element:Tag, soup:'BeautifulSoup',
++                 namespace:Optional[_NamespaceURL]):
          treebuilder_base.Node.__init__(self, element.name)
          self.element = element
          self.soup = soup
          self.namespace = namespace
--    def appendChild(self, node):
++    def appendChild(self, node:'Element') -> None:
          string_child = child = None
          if isinstance(node, str):
              # Some other piece of code decided to pass in a string
              # instead of creating a TextElement object to contain the
--            # string.
++            # string. This should not ever happen.
              string_child = child = node
          elif isinstance(node, Tag):
              # Some other piece of code decided to pass in a Tag
              # instead of creating an Element object to contain the
--            # Tag.
++            # Tag. This should not ever happen.
              child = node
          elif node.element.__class__ == NavigableString:
              string_child = child = node.element
@@ -324,7 +373,7 @@ class Element(treebuilder_base.Node):
              child = node.element
              node.parent = self
--        if not isinstance(child, str) and child.parent is not None:
++        if not isinstance(child, str) and child is not None and child.parent is not None:
              node.element.extract()
          if (string_child is not None and self.element.contents
@@ -359,14 +408,13 @@ class Element(treebuilder_base.Node):
                  child, parent=self.element,
                  most_recent_element=most_recent_element)
--    def getAttributes(self):
++    def getAttributes(self) -> AttrList:
          if isinstance(self.element, Comment):
              return {}
          return AttrList(self.element)
--    def setAttributes(self, attributes):
++    def setAttributes(self, attributes:Optional[Dict]) -> None:
          if attributes is not None and len(attributes) > 0:
--            converted_attributes = []
              for name, value in list(attributes.items()):
                  if isinstance(name, tuple):
                      new_name = NamespacedAttribute(*name)
@@ -386,14 +434,14 @@ class Element(treebuilder_base.Node):
              self.soup.builder.set_up_substitutions(self.element)
      attributes = property(getAttributes, setAttributes)
--    def insertText(self, data, insertBefore=None):
++    def insertText(self, data:str, insertBefore:Optional['Element']=None) -> None:
          text = TextNode(self.soup.new_string(data), self.soup)
          if insertBefore:
              self.insertBefore(text, insertBefore)
          else:
              self.appendChild(text)
--    def insertBefore(self, node, refNode):
++    def insertBefore(self, node:'Element', refNode:'Element') -> None:
          index = self.element.index(refNode.element)
          if (node.element.__class__ == NavigableString and self.element.contents
              and self.element.contents[index-1].__class__ == NavigableString):
@@ -405,10 +453,10 @@ class Element(treebuilder_base.Node):
              self.element.insert(index, node.element)
              node.parent = self
--    def removeChild(self, node):
++    def removeChild(self, node:'Element') -> None:
          node.element.extract()
--    def reparentChildren(self, new_parent):
++    def reparentChildren(self, new_parent:'Element') -> None:
          """Move all of this tag's children into another tag."""
          # print("MOVE", self.element.contents)
          # print("FROM", self.element)
@@ -424,6 +472,10 @@ class Element(treebuilder_base.Node):
          if len(new_parent_element.contents) > 0:
              # The new parent already contains children. We will be
              # appending this tag's children to the end.
++
++            # We can make this assertion since we know new_parent has
++            # children.
++            assert new_parents_last_descendant is not None
              new_parents_last_child = new_parent_element.contents[-1]
              new_parents_last_descendant_next_element = new_parents_last_descendant.next_element
          else:
@@ -474,17 +526,21 @@ class Element(treebuilder_base.Node):
          # print("FROM", self.element)
          # print("TO", new_parent_element)
--    def cloneNode(self):
++    # TODO: typeshed stubs are incorrect about this;
++    # cloneNode returns a new Node, not None.
++    def cloneNode(self) -> treebuilder_base.Node:
          tag = self.soup.new_tag(self.element.name, self.namespace)
          node = Element(tag, self.soup, self.namespace)
          for key,value in self.attributes:
              node.attributes[key] = value
          return node
--    def hasContent(self):
--        return self.element.contents
++    # TODO-TYPING: typeshed stubs are incorrect about this;
++    # cloneNode returns a boolean, not None.
++    def hasContent(self) -> bool:
++        return len(self.element.contents) > 0
--    def getNameTuple(self):
++    def getNameTuple(self) -> Tuple[str, str]:
          if self.namespace == None:
              return namespaces["html"], self.name
          else:
@@ -493,10 +549,10 @@ class Element(treebuilder_base.Node):
      nameTuple = property(getNameTuple)
  class TextNode(Element):
--    def __init__(self, element, soup):
++    def __init__(self, element:PageElement, soup:'BeautifulSoup'):
          treebuilder_base.Node.__init__(self, None)
          self.element = element
          self.soup = soup
--    def cloneNode(self):
--        raise NotImplementedError
++    def cloneNode(self) -> treebuilder_base.Node:
++        raise NotImplementedError()
 diff --git a/bs4/builder/_htmlparser.py b/bs4/builder/_htmlparser.py
 index 291f6c6..91cecf7 100644
 --- a/bs4/builder/_htmlparser.py
 +++ b/bs4/builder/_htmlparser.py
@@ -188,7 +188,7 @@ class BeautifulSoupHTMLParser(HTMLParser, DetectsXMLParsedAsHTML):
              # later on. If so, we want to ignore it.
              self.already_closed_empty_element.append(name)
--        if self._root_tag is None:
++        if self._root_tag_name is None:
              self._root_tag_encountered(name)
      def handle_endtag(self, name:str, check_already_closed:bool=True) -> None:
@@ -422,13 +422,23 @@ class HTMLParserTreeBuilder(HTMLTreeBuilder):
                     dammit.declared_html_encoding,
                     dammit.contains_replacement_characters)
--    def feed(self, markup:str):
++    def feed(self, markup:_RawMarkup) -> None:
          args, kwargs = self.parser_args
++
++        # HTMLParser.feed will only handle str, but
++        # BeautifulSoup.markup is allowed to be _RawMarkup, because
++        # it's set by the yield value of
++        # TreeBuilder.prepare_markup. Fortunately,
++        # HTMLParserTreeBuilder.prepare_markup always yields a str
++        # (UnicodeDammit.unicode_markup).
++        assert isinstance(markup, str)
++
          # We know BeautifulSoup calls TreeBuilder.initialize_soup
          # before calling feed(), so we can assume self.soup
          # is set.
          assert self.soup is not None
          parser = BeautifulSoupHTMLParser(self.soup, *args, **kwargs)
++
          try:
              parser.feed(markup)
              parser.close()
 diff --git a/bs4/builder/_lxml.py b/bs4/builder/_lxml.py
 index ba87e87..3dfe88a 100644
 --- a/bs4/builder/_lxml.py
 +++ b/bs4/builder/_lxml.py
@@ -269,7 +269,7 @@ class LXMLTreeBuilderForXML(TreeBuilder):
          for encoding in detector.encodings:
              yield (detector.markup, encoding, document_declared_encoding, False)
--    def feed(self, markup:Union[bytes,str]) -> None:
++    def feed(self, markup:_RawMarkup) -> None:
          io: IO
          if isinstance(markup, bytes):
              io = BytesIO(markup)
 diff --git a/bs4/diagnose.py b/bs4/diagnose.py
 index 201b879..c2202ad 100644
 --- a/bs4/diagnose.py
 +++ b/bs4/diagnose.py
@@ -9,7 +9,15 @@ from html.parser import HTMLParser
  import bs4
  from bs4 import BeautifulSoup, __version__
  from bs4.builder import builder_registry
--from typing import TYPE_CHECKING
++from typing import (
++    Any,
++    IO,
++    List,
++    Optional,
++    Tuple,
++    TYPE_CHECKING,
++)
++
  if TYPE_CHECKING:
      from bs4._typing import _IncomingMarkup
@@ -78,7 +86,7 @@ def diagnose(data:_IncomingMarkup) -> None:
          print(("-" * 80))
--def lxml_trace(data, html:bool=True, **kwargs) -> None:
++def lxml_trace(data:_IncomingMarkup, html:bool=True, **kwargs:Any) -> None:
      """Print out the lxml events that occur during parsing.
      This lets you see how lxml parses a document when no Beautiful
@@ -94,7 +102,8 @@ def lxml_trace(data, html:bool=True, **kwargs) -> None:
      recover = kwargs.pop('recover', True)
      if isinstance(data, str):
          data = data.encode("utf8")
--    reader = BytesIO(data)
++    if not isinstance(data, IO):
++        reader = BytesIO(data)
      for event, element in etree.iterparse(
          reader, html=html, recover=recover, **kwargs
      ):
@@ -108,37 +117,40 @@ class AnnouncingParser(HTMLParser):
      document. The easiest way to do this is to call `htmlparser_trace`.
      """
--    def _p(self, s):
++    def _p(self, s:str) -> None:
          print(s)
--    def handle_starttag(self, name, attrs):
++    def handle_starttag(
++            self, name:str, attrs:List[Tuple[str, Optional[str]]],
++            handle_empty_element:bool=True
++    ) -> None:
          self._p(f"{name} {attrs} START")
--    def handle_endtag(self, name):
++    def handle_endtag(self, name:str, check_already_closed:bool=True) -> None:
          self._p("%s END" % name)
--    def handle_data(self, data):
++    def handle_data(self, data:str) -> None:
          self._p("%s DATA" % data)
--    def handle_charref(self, name):
++    def handle_charref(self, name:str) -> None:
          self._p("%s CHARREF" % name)
--    def handle_entityref(self, name):
++    def handle_entityref(self, name:str) -> None:
          self._p("%s ENTITYREF" % name)
--    def handle_comment(self, data):
++    def handle_comment(self, data:str) -> None:
          self._p("%s COMMENT" % data)
--    def handle_decl(self, data):
++    def handle_decl(self, data:str) -> None:
          self._p("%s DECL" % data)
--    def unknown_decl(self, data):
++    def unknown_decl(self, data:str) -> None:
          self._p("%s UNKNOWN-DECL" % data)
--    def handle_pi(self, data):
++    def handle_pi(self, data:str) -> None:
          self._p("%s PI" % data)
--def htmlparser_trace(data):
++def htmlparser_trace(data:str) -> None:
      """Print out the HTMLParser events that occur during parsing.
      This lets you see how HTMLParser parses a document when no
@@ -226,7 +238,7 @@ def benchmark_parsers(num_elements:int=100000) -> None:
      b = time.time()
      print(("Raw html5lib parsed the markup in %.2fs." % (b-a)))
--def profile(num_elements:int=100000, parser:str="lxml"):
++def profile(num_elements:int=100000, parser:str="lxml") -> None:
      """Use Python's profiler on a randomly generated document."""
      filehandle = tempfile.NamedTemporaryFile()
      filename = filehandle.name
 diff --git a/bs4/element.py b/bs4/element.py
 index 83f4882..f4ab89c 100644
 --- a/bs4/element.py
 +++ b/bs4/element.py
@@ -44,6 +44,7 @@ if TYPE_CHECKING:
      from bs4 import BeautifulSoup
      from bs4.builder import TreeBuilder
      from bs4.dammit import _Encoding
++    from bs4.filter import ElementFilter
      from bs4.formatter import (
          _EntitySubstitutionFunction,
          _FormatterOrName,
@@ -901,7 +902,7 @@ class PageElement(object):
              limit:Optional[int],
              generator:Iterator[PageElement],
              _stacklevel:int=3,
--            **kwargs:_StrainableAttribute) -> ResultSet[PageElement]:
++            **kwargs:_StrainableAttribute) -> ResultSet[PageElement]:
          """Iterates over a generator looking for things that match."""
          results: ResultSet[PageElement]
@@ -912,11 +913,11 @@ class PageElement(object):
                  DeprecationWarning, stacklevel=_stacklevel
+             )
--        from bs4.strainer import SoupStrainer
--        if isinstance(name, SoupStrainer):
--            strainer = name
++        from bs4.filter import ElementFilter
++        if isinstance(name, ElementFilter):
++            matcher = name
          else:
--            strainer = SoupStrainer(name, attrs, string, **kwargs)
++            matcher = SoupStrainer(name, attrs, string, **kwargs)
          result: Iterable[PageElement]
          if string is None and not limit and not attrs and not kwargs:
@@ -924,7 +925,7 @@ class PageElement(object):
                  # Optimization to find all tags.
                  result = (element for element in generator
                            if isinstance(element, Tag))
--                return ResultSet(strainer, result)
++                return ResultSet(matcher, result)
              elif isinstance(name, str):
                  # Optimization to find all tags with a given name.
                  if name.count(':') == 1:
@@ -945,22 +946,25 @@ class PageElement(object):
+                          )
                          ):
                          result.append(element)
--                return ResultSet(strainer, result)
++                return ResultSet(matcher, result)
++        return self.match(generator, matcher, limit)
++
++    def match(self, generator:Iterator[PageElement], matcher:ElementFilter, limit:Optional[int]=None) -> ResultSet[PageElement]:
++        """The most generic search method offered by Beautiful Soup.
--        results = ResultSet(strainer)
++        You can pass in your own technique for iterating over the tree, and your own
++        technique for matching items.
++        """
++        results:ResultSet = ResultSet(matcher)
          while True:
              try:
                  i = next(generator)
              except StopIteration:
                  break
              if i:
--                # TODO: SoupStrainer.search is a confusing method
--                # that needs to be redone, and this is where
--                # it's being used.
--                found = strainer.search(i)
--                if found:
--                    results.append(found)
--                    if limit and len(results) >= limit:
++                if matcher.match(i):
++                    results.append(i)
++                    if limit is not None and len(results) >= limit:
                          break
          return results
@@ -1254,7 +1258,7 @@ class Declaration(PreformattedString):
  class Doctype(PreformattedString):
      """A `document type declaration <https://www.w3.org/TR/REC-xml/#dt-doctype>`_."""
      @classmethod
--    def for_name_and_ids(cls, name:str, pub_id:str, system_id:str) -> Doctype:
++    def for_name_and_ids(cls, name:str, pub_id:Optional[str], system_id:Optional[str]) -> Doctype:
          """Generate an appropriate document type declaration for a given
          public ID and system ID.
@@ -2503,12 +2507,12 @@ class Tag(PageElement):
  _PageElementT = TypeVar("_PageElementT", bound=PageElement)
  class ResultSet(List[_PageElementT], Generic[_PageElementT]):
      """A ResultSet is a list of `PageElement` objects, gathered as the result
--    of matching a `SoupStrainer` against a parse tree. Basically, a list of
++    of matching an `ElementFilter` against a parse tree. Basically, a list of
      search results.
      """
--    source: Optional[SoupStrainer]
++    source: Optional[ElementFilter]
--    def __init__(self, source:Optional[SoupStrainer], result: Iterable[_PageElementT]=()) -> None:
++    def __init__(self, source:Optional[ElementFilter], result: Iterable[_PageElementT]=()) -> None:
          super(ResultSet, self).__init__(result)
          self.source = source
@@ -2522,4 +2526,4 @@ class ResultSet(List[_PageElementT], Generic[_PageElementT]):
  # import SoupStrainer itself into this module to preserve the
  # backwards compatibility of anyone who imports
  # bs4.element.SoupStrainer.
--from bs4.strainer import SoupStrainer
++from bs4.filter import SoupStrainer
 diff --git a/bs4/strainer.py b/bs4/filter.py
 similarity index 60%
 rename from bs4/strainer.py
 rename to bs4/filter.py
 index 15b289c..74e26d9 100644
 --- a/bs4/strainer.py
 +++ b/bs4/filter.py
@@ -25,6 +25,10 @@ from bs4._deprecation import _deprecated
  from bs4.element import NavigableString, PageElement, Tag
  from bs4._typing import (
      _AttributeValue,
++    _AttributeValues,
++    _AllowStringCreationFunction,
++    _AllowTagCreationFunction,
++    _PageElementMatchFunction,
      _TagMatchFunction,
      _StringMatchFunction,
      _StrainableElement,
@@ -33,13 +37,96 @@ from bs4._typing import (
      _StrainableString,
+ )
++
++class ElementFilter(object):
++    """ElementFilters encapsulate the logic necessary to decide:
++
++    1. whether a PageElement (a tag or a string) matches a
++    user-specified query.
++
++    2. whether a given sequence of markup found during initial parsing
++    should be turned into a PageElement, or simply discarded.
++
++    The base class is the simplest ElementFilter. By default, it
++    matches everything and allows all PageElements to be created. You
++    can make it more selective by passing in user-defined functions.
++
++    Most users of Beautiful Soup will never need to use
++    ElementFilter, or its more capable subclass
++    SoupStrainer. Instead, they will use the find_* methods, which
++    will convert their arguments into SoupStrainer objects and run them
++    against the tree.
++    """
++    match_hook: Optional[_PageElementMatchFunction]
++    allow_tag_creation_function: Optional[_AllowTagCreationFunction]
++    allow_string_creation_function: Optional[_AllowStringCreationFunction]
++
++    def __init__(
++            self, match_function:Optional[_PageElementMatchFunction]=None,
++            allow_tag_creation_function:Optional[_AllowTagCreationFunction]=None,
++            allow_string_creation_function:Optional[_AllowStringCreationFunction]=None):
++        self.match_function = match_function
++        self.allow_tag_creation_function = allow_tag_creation_function
++        self.allow_string_creation_function = allow_string_creation_function
++
++    @property
++    def excludes_everything(self) -> bool:
++        """Does this ElementFilter obviously exclude everything? If
++        so, Beautiful Soup will issue a warning if you try to use it
++        when parsing a document.
++
++        The ElementFilter might turn out to exclude everything even
++        if this returns False, but it won't do so in an obvious way.
++
++        The default ElementFilter excludes *nothing*, and we don't
++        have any way of answering questions about more complex
++        ElementFilters without running their hook functions, so the
++        base implementation always returns False.
++        """
++        return False
++
++    def match(self, element:PageElement) -> bool:
++        """Does the given PageElement match the rules set down by this
++        ElementFilter?
++
++        The base implementation delegates to the function passed in to
++        the constructor.
++        """
++        if not self.match_function:
++            return True
++        return self.match_function(element)
++
++    def allow_tag_creation(
++            self, nsprefix:Optional[str], name:str,
++            attrs:Optional[_AttributeValues]
++    ) -> bool:
++        """Based on the name and attributes of a tag, see whether this
++        ElementFilter will allow a Tag object to even be created.
++
++        :param name: The name of the prospective tag.
++        :param attrs: The attributes of the prospective tag.
++        """
++        if not self.allow_tag_creation_function:
++            return True
++        return self.allow_tag_creation_function(nsprefix, name, attrs)
++
++    def allow_string_creation(self, string:str) -> bool:
++        if not self.allow_string_creation_function:
++            return True
++        return self.allow_string_creation_function(string)
++
++
  class MatchRule(object):
++    """Each MatchRule encapsulates the logic behind a single argument
++    passed in to one of the Beautiful Soup find* methods.
++    """
++
      string: Optional[str]
      pattern: Optional[Pattern[str]]
      present: Optional[bool]
--
--    # All MatchRule objects also have an attribute ``function``, but
--    # the type of the function depends on the subclass.
++    # TODO-TYPING: All MatchRule objects also have an attribute
++    # ``function``, but the type of the function depends on the
++    # subclass.
      def __init__(
              self,
@@ -72,7 +159,7 @@ class MatchRule(object):
                  "At most one of string, pattern, function and present must be provided."
+             )
--    def _base_match(self, string:str) -> Optional[bool]:
++    def _base_match(self, string:Optional[str]) -> Optional[bool]:
          """Run the 'cheap' portion of a match, trying to get an answer without
          calling a potentially expensive custom function.
@@ -101,7 +188,7 @@ class MatchRule(object):
          return None
--    def matches_string(self, string:str) -> bool:
++    def matches_string(self, string:Optional[str]) -> bool:
          _base_result = self._base_match(string)
          if _base_result is not None:
              # No need to invoke the test function.
@@ -125,6 +212,7 @@ class MatchRule(object):
+         )
  class TagNameMatchRule(MatchRule):
++    """A MatchRule implementing the rules for matches against tag name."""
      function: Optional[_TagMatchFunction]
      def matches_tag(self, tag:Tag) -> bool:
@@ -140,19 +228,25 @@ class TagNameMatchRule(MatchRule):
          return False
  class AttributeValueMatchRule(MatchRule):
++    """A MatchRule implementing the rules for matches against attribute value."""
      function: Optional[_StringMatchFunction]
  class StringMatchRule(MatchRule):
++    """A MatchRule implementing the rules for matches against a NavigableString."""
      function: Optional[_StringMatchFunction]
--class SoupStrainer(object):
--    """Encapsulates a number of ways of matching a markup element (a tag
--    or a string).
++class SoupStrainer(ElementFilter):
++    """The ElementFilter subclass used internally by Beautiful Soup.
--    These are primarily created internally and used to underpin the
--    find_* methods, but you can create one yourself and pass it in as
--    ``parse_only`` to the `BeautifulSoup` constructor, to parse a
--    subset of a large document.
++    A SoupStrainer encapsulates the logic necessary to perform the
++    kind of matches supported by the find_* methods. SoupStrainers are
++    primarily created internally, but you can create one yourself and
++    pass it in as ``parse_only`` to the `BeautifulSoup` constructor,
++    to parse a subset of a large document.
++
++    Internally, SoupStrainer objects work by converting the
++    constructor arguments into MatchRule objects. Incoming
++    tags/markup are matched against those rules.
      :param name: One or more restrictions on the tags found in a
      document.
@@ -226,6 +320,17 @@ class SoupStrainer(object):
          self.__string = string
      @property
++    def excludes_everything(self) -> bool:
++        """Check whether the provided rules will obviously exclude
++        everything. (They might exclude everything even if this returns False,
++        but not in an obvious way.)
++        """
++        return True if (
++            self.string_rules and
++            (self.name_rules or self.attribute_rules)
++        ) else False
++
++    @property
      def string(self) -> Optional[_StrainableString]:
          ":meta private:"
          warnings.warn(f"Access to deprecated property string. (Look at .string_rules instead) -- Deprecated since version 4.13.0.", DeprecationWarning, stacklevel=2)
@@ -262,6 +367,15 @@ class SoupStrainer(object):
              yield rule_class(function=obj)
          elif isinstance(obj, Pattern):
              yield rule_class(pattern=obj)
++        elif hasattr(obj, 'search'):
++            # We do a little duck typing here to detect usage of the
++            # third-party regex library, whose pattern objects doesn't
++            # derive from re.Pattern.
++            #
++            # TODO-TYPING: Once we drop support for Python 3.7, we
++            # might be able to address this by defining an appropriate
++            # Protocol.
++            yield rule_class(pattern=obj)
          elif hasattr(obj, '__iter__'):
              for o in obj:
                  if not isinstance(o, (bytes, str)) and hasattr(o, '__iter__'):
@@ -358,7 +472,7 @@ class SoupStrainer(object):
          else:
              attr_values = [cast(str, attr_value)]
--        def _match_attribute_value_helper(attr_values:Sequence[Optional[str]]):
++        def _match_attribute_value_helper(attr_values:Sequence[Optional[str]]) -> bool:
              for rule in rules:
                  for attr_value in attr_values:
                      if rule.matches_string(attr_value):
@@ -382,8 +496,8 @@ class SoupStrainer(object):
                  [joined_attr_value]
+             )
          return this_attr_match
--
--    def allow_tag_creation(self, nsprefix:Optional[str], name:str, attrs:Optional[dict[str, str]]) -> bool:
++
++    def allow_tag_creation(self, nsprefix:Optional[str], name:str, attrs:Optional[_AttributeValues]) -> bool:
          """Based on the name and attributes of a tag, see whether this
          SoupStrainer will allow a Tag object to even be created.
@@ -423,17 +537,25 @@ class SoupStrainer(object):
          return True
      def allow_string_creation(self, string:str) -> bool:
++        """Based on the content of a markup string, see whether this
++        SoupStrainer will allow it to be instantiated as a
++        NavigableString object, or whether it should be ignored.
++        """
          if self.name_rules or self.attribute_rules:
              # A SoupStrainer that has name or attribute rules won't
              # match any strings; it's designed to match tags with
              # certain properties.
              return False
++        if not self.string_rules:
++            # A SoupStrainer with no string rules will match
++            # all strings.
++            return True
          if not self.matches_any_string_rule(string):
              return False
          return True
      def matches_any_string_rule(self, string:str) -> bool:
--        """See whether the content of a string, matches any of
++        """See whether the content of a string matches any of
          this SoupStrainer's string rules.
          """
          if not self.string_rules:
@@ -442,28 +564,37 @@ class SoupStrainer(object):
              if string_rule.matches_string(string):
                  return True
          return False
--
--
++
++    def match(self, element:PageElement) -> bool:
++        """Does the given PageElement match the rules set down by this
++        SoupStrainer?
++
++        The find_* methods rely heavily on this method to find matches.
++
++        :param element: A PageElement.
++        :return: True if the element matches this SoupStrainer's rules; False otherwise.
++        """
++        if isinstance(element, Tag):
++            return self.matches_tag(element)
++        assert isinstance(element, NavigableString)
++        if not (self.name_rules or self.attribute_rules):
++            # A NavigableString can only match a SoupStrainer that
++            # does not define any name or attribute restrictions.
++            for rule in self.string_rules:
++                if rule.matches_string(element):
++                    return True
++        return False
++
      @_deprecated("allow_tag_creation", "4.13.0")
--    def search_tag(self, name, attrs):
++    def search_tag(self, name:str, attrs:Optional[_AttributeValues]) -> bool:
++        """A less elegant version of allow_tag_creation()."""
          ":meta private:"
          return self.allow_tag_creation(None, name, attrs)
--    def search(self, element:PageElement):
--        # TODO: This method needs to be removed or redone. It is
--        # very confusing but it's used everywhere.
--        match = None
--        if isinstance(element, Tag):
--            match = self.matches_tag(element)
--        else:
--            assert isinstance(element, NavigableString)
--            match = False
--            if not (self.name_rules or self.attribute_rules):
--                # A NavigableString can only match a SoupStrainer that
--                # does not define any name or attribute restrictions.
--                for rule in self.string_rules:
--                    if rule.matches_string(element):
--                        match = True
--                        break
--        return element if match else False
++    @_deprecated("match", "4.13.0")
++    def search(self, element:PageElement) -> Optional[PageElement]:
++        """A less elegant version of match().
++        :meta private:
++        """
++        return element if self.match(element) else None
 diff --git a/bs4/tests/__init__.py b/bs4/tests/__init__.py
 index 2ef7fd8..3ef999d 100644
 --- a/bs4/tests/__init__.py
 +++ b/bs4/tests/__init__.py
@@ -20,7 +20,7 @@ from bs4.element import (
      Stylesheet,
      Tag
+ )
--from bs4.strainer import SoupStrainer
++from bs4.filter import SoupStrainer
  from bs4.builder import (
      DetectsXMLParsedAsHTML,
      XMLParsedAsHTMLWarning,
 diff --git a/bs4/tests/test_strainer.py b/bs4/tests/test_filter.py
 similarity index 56%
 rename from bs4/tests/test_strainer.py
 rename to bs4/tests/test_filter.py
 index 4de03f0..8d5da70 100644
 --- a/bs4/tests/test_strainer.py
 +++ b/bs4/tests/test_filter.py
@@ -6,20 +6,108 @@ from . import (
      SoupTest,
+ )
  from bs4.element import Tag
--from bs4.strainer import (
++from bs4.filter import (
      AttributeValueMatchRule,
++    ElementFilter,
      MatchRule,
      SoupStrainer,
      StringMatchRule,
      TagNameMatchRule,
+ )
--class TestMatchrule(SoupTest):
++class TestElementFilter(SoupTest):
++
++    def test_default_behavior(self):
++        # An unconfigured ElementFilter matches absolutely everything.
++        selector = ElementFilter()
++        assert not selector.excludes_everything
++        soup = self.soup("<a>text</a>")
++        tag = soup.a
++        string = tag.string
++        assert True == selector.match(soup)
++        assert True == selector.match(tag)
++        assert True == selector.match(string)
++        assert soup.find(selector).name == "a"
++
++        # And allows any incoming markup to be turned into PageElements.
++        assert True == selector.allow_tag_creation(None, "tag", None)
++        assert True == selector.allow_string_creation("some string")
++
++    def test_match(self):
++        def m(pe):
++            return (pe.string == "allow" or (
++                isinstance(pe, Tag) and pe.name=="allow"))
++
++        soup = self.soup("<allow>deny</allow>allow<deny>deny</deny>")
++        allow_tag = soup.allow
++        allow_string = soup.find(string="allow")
++        deny_tag = soup.deny
++        deny_string = soup.find(string="deny")
++
++        selector = ElementFilter(match_function=m)
++        assert True == selector.match(allow_tag)
++        assert True == selector.match(allow_string)
++        assert False == selector.match(deny_tag)
++        assert False == selector.match(deny_string)
++
++        # Since only the match function was provided, there is
++        # no effect on tag or string creation.
++        soup = self.soup("<a>text</a>", parse_only=selector)
++        assert "text" == soup.a.string
++
++    def test_allow_tag_creation(self):
++        def m(nsprefix, name, attrs):
++            return nsprefix=="allow" or name=="allow" or "allow" in attrs
++        selector = ElementFilter(allow_tag_creation_function=m)
++        f = selector.allow_tag_creation
++        assert True == f("allow", "ignore", {})
++        assert True == f("ignore", "allow", {})
++        assert True == f(None, "ignore", {"allow": "1"})
++        assert False == f("no", "no", {"no" : "nope"})
++
++        # Test the ElementFilter as a value for parse_only.
++        soup = self.soup(
++            "<deny>deny</deny> <allow>deny</allow> allow",
++            parse_only=selector
++        )
--    def _tuple(self, rule):
--        if isinstance(rule.pattern, str):
--            import pdb; pdb.set_trace()
++        # The <deny> tag was filtered out, but there was no effect on
++        # the strings, since only allow_tag_creation_function was
++        # defined.
++        assert 'deny <allow>deny</allow> allow' == soup.decode()
++
++        # Similarly, since match_function was not defined, this
++        # ElementFilter matches everything.
++        assert soup.find(selector) == "deny"
++
++    def test_allow_string_creation(self):
++        def m(s):
++            return s=="allow"
++        selector = ElementFilter(allow_string_creation_function=m)
++        f = selector.allow_string_creation
++        assert True == f("allow")
++        assert False == f("deny")
++        assert False == f("please allow")
++
++        # Test the ElementFilter as a value for parse_only.
++        soup = self.soup(
++            "<deny>deny</deny> <allow>deny</allow> allow",
++            parse_only=selector
++        )
++
++        # All incoming strings other than "allow" (even whitespace)
++        # were filtered out, but there was no effect on the tags,
++        # since only allow_string_creation_function was defined.
++        assert '<deny>deny</deny><allow>deny</allow>' == soup.decode()
++
++        # Similarly, since match_function was not defined, this
++        # ElementFilter matches everything.
++        assert soup.find(selector).name == "deny"
++
++class TestMatchRule(SoupTest):
++
++    def _tuple(self, rule):
          return (
              rule.string,
              rule.pattern.pattern if rule.pattern else None,
@@ -155,6 +243,28 @@ class TestSoupStrainer(SoupTest):
              assert w2.filename == __file__
              assert msg == "Access to deprecated property text. (Look at .string_rules instead) -- Deprecated since version 4.13.0."
++    def test_search_tag_deprecated(self):
++        strainer = SoupStrainer(name="a")
++        with warnings.catch_warnings(record=True) as w:
++            assert False == strainer.search_tag("b", {})
++            [w1] = w
++            msg = str(w1.message)
++            assert w1.filename == __file__
++            assert msg == "Call to deprecated method search_tag. (Replaced by allow_tag_creation) -- Deprecated since version 4.13.0."
++
++    def test_search_deprecated(self):
++        strainer = SoupStrainer(name="a")
++        soup = self.soup("<a></a><b></b>")
++        with warnings.catch_warnings(record=True) as w:
++            assert soup.a == strainer.search(soup.a)
++            assert None == strainer.search(soup.b)
++            [w1, w2] = w
++            msg = str(w1.message)
++            assert msg == str(w2.message)
++            assert w1.filename == __file__
++            assert msg == "Call to deprecated method search. (Replaced by match) -- Deprecated since version 4.13.0."
++
++    # Dummy function used within tests.
      def _match_function(x):
          pass
@@ -213,7 +323,7 @@ class TestSoupStrainer(SoupTest):
+         )
      def test_constructor_with_overlapping_attributes(self):
--        # If you specify the same attribute in arts and **kwargs, you end up
++        # If you specify the same attribute in args and **kwargs, you end up
          # with two different AttributeValueMatchRule objects.
          # This happens whether you use the 'class' shortcut on attrs...
@@ -437,17 +547,24 @@ class TestSoupStrainer(SoupTest):
          # because the string restrictions can't be evaluated during
          # the parsing process, and the tag restrictions eliminate
          # any strings from consideration.
++        #
++        # We can detect this ahead of time, and warn about it,
++        # thanks to SoupStrainer.excludes_everything
          markup = "<a><b>one string<div>another string</div></b></a>"
          with warnings.catch_warnings(record=True) as w:
++            assert True, soupstrainer.excludes_everything
              assert "" == self.soup(markup, parse_only=soupstrainer).decode()
              [warning] = w
              msg = str(warning.message)
              assert warning.filename == __file__
              assert str(warning.message).startswith(
--                "Value for parse_only will exclude everything, since it puts restrictions on both tags and strings:"
++                "The given value for parse_only will exclude everything:"
+             )
--
++
++        # The average SoupStrainer has excludes_everything=False
++        assert not SoupStrainer().excludes_everything
++
      def test_documentation_examples(self):
          """Medium-weight real-world tests based on the Beautiful Soup
          documentation.
 diff --git a/bs4/tests/test_html5lib.py b/bs4/tests/test_html5lib.py
 index b0f4384..9f6dfa1 100644
 --- a/bs4/tests/test_html5lib.py
 +++ b/bs4/tests/test_html5lib.py
@@ -4,7 +4,7 @@ import pytest
  import warnings
  from bs4 import BeautifulSoup
--from bs4.strainer import SoupStrainer
++from bs4.filter import SoupStrainer
  from . import (
      HTML5LIB_PRESENT,
      HTML5TreeBuilderSmokeTest,
@@ -24,7 +24,7 @@ class TestHTML5LibBuilder(SoupTest, HTML5TreeBuilderSmokeTest):
          return HTML5TreeBuilder
      def test_soupstrainer(self):
--        # The html5lib tree builder does not support SoupStrainers.
++        # The html5lib tree builder does not support parse_only.
          strainer = SoupStrainer("b")
          markup = "<p>A <b>bold</b> statement.</p>"
          with warnings.catch_warnings(record=True) as w:
 diff --git a/bs4/tests/test_lxml.py b/bs4/tests/test_lxml.py
 index d450740..9fc04e0 100644
 --- a/bs4/tests/test_lxml.py
 +++ b/bs4/tests/test_lxml.py
@@ -14,7 +14,7 @@ from bs4 import (
      BeautifulStoneSoup,
+     )
  from bs4.element import Comment, Doctype
--from bs4.strainer import SoupStrainer
++from bs4.filter import SoupStrainer
  from . import (
      HTMLTreeBuilderSmokeTest,
      XMLTreeBuilderSmokeTest,
 diff --git a/bs4/tests/test_pageelement.py b/bs4/tests/test_pageelement.py
 index 19b4d63..7dfdc22 100644
 --- a/bs4/tests/test_pageelement.py
 +++ b/bs4/tests/test_pageelement.py
@@ -10,7 +10,7 @@ from bs4.element import (
      Comment,
      ResultSet,
+ )
--from bs4.strainer import SoupStrainer
++from bs4.filter import SoupStrainer
  from . import (
      SoupTest,
+ )
 diff --git a/bs4/tests/test_soup.py b/bs4/tests/test_soup.py
 index 4f8ee1a..c95f380 100644
 --- a/bs4/tests/test_soup.py
 +++ b/bs4/tests/test_soup.py
@@ -27,7 +27,7 @@ from bs4.element import (
      Tag,
      NavigableString,
+ )
--from bs4.strainer import SoupStrainer
++from bs4.filter import SoupStrainer
  from . import (
      default_builder,
@@ -293,7 +293,7 @@ class TestWarnings(SoupTest):
              soup = self.soup("<a><b></b></a>", parse_only=strainer)
          warning = self._assert_warning(w, UserWarning)
          msg = str(warning.message)
--        assert msg.startswith("Value for parse_only will exclude everything, since it puts restrictions on both tags and strings:")
++        assert msg.startswith("The given value for parse_only will exclude everything:")
      def test_parseOnlyThese_renamed_to_parse_only(self):
          with warnings.catch_warnings(record=True) as w:
 diff --git a/bs4/tests/test_tree.py b/bs4/tests/test_tree.py
 index 606525f..43afb29 100644
 --- a/bs4/tests/test_tree.py
 +++ b/bs4/tests/test_tree.py
@@ -26,7 +26,7 @@ from bs4.element import (
      Tag,
      TemplateString,
+ )
--from bs4.strainer import SoupStrainer
++from bs4.filter import SoupStrainer
  from . import (
      SoupTest,
+ )
 diff --git a/doc/index.rst b/doc/index.rst
 index 7beff36..a414830 100755
 --- a/doc/index.rst
 +++ b/doc/index.rst
@@ -20,7 +20,7 @@ with examples. I show you what the library is good for, how it works,
  how to use it, how to make it do what you want, and what to do when it
  violates your expectations.
--This document covers Beautiful Soup version 4.12.2. The examples in
++This document covers Beautiful Soup version 4.13.0. The examples in
  this documentation were written for Python 3.8.
  You might be looking for the documentation for `Beautiful Soup 3
@@ -2577,6 +2577,11 @@ the human-visible content of the page.*
  either return the object itself, or nothing, so the only reason to do
  this is when you're iterating over a mixed list.*
++*As of Beautiful Soup version 4.13.0, you can call .string on a
++NavigableString object. It will return the object itself, so again,
++the only reason to do this is when you're iterating over a mixed
++list.*
++
  Specifying the parser to use
  ============================
@@ -2604,8 +2609,9 @@ specifying one of the following:
  The section `Installing a parser`_ contrasts the supported parsers.
--If you don't have an appropriate parser installed, Beautiful Soup will
--ignore your request and pick a different parser. Right now, the only
++If you ask for a parser that isn't installed, Beautiful Soup will
++raise an exception so that you don't inadvertently parse a document
++under an unknown set of rules. For example, right now, the only
  supported XML parser is lxml. If you don't have lxml installed, asking
  for an XML parser won't give you one, and asking for "lxml" won't work
  either.
@@ -3018,6 +3024,44 @@ been called on it::
  This is because two different :py:class:`Tag` objects can't occupy the same
  space at the same time.
++Advanced search techniques
++==========================
++
++Almost everyone who uses Beautiful Soup to extract information from a
++document can get what they need using the methods described in
++`Searching the tree`_. However, there's a lower-level interface--the
++:py:class:`ElementSelector` class-- which lets you define any matching
++behavior whatsoever.
++
++To use :py:class:`ElementSelector`, define a function that takes a
++:py:class:`PageElement` object (that is, it might be either a
++:py:class:`Tag` or a :py:class`NavigableString`) and returns ``True``
++(if the element matches your custom criteria) or ``False`` (if it
++doesn't)::
++
++  [example goes here]
++
++Then, pass the function into an :py:class:`ElementSelector`::
++
++ from bs4.select import ElementSelector
++ selector = ElementSelector(f)
++
++You can then pass the :py:class:`ElementSelector` object as the first
++argument to any of the `Searching the tree`_ methods::
++
++ [examples go here]
++
++Every potential match will be run through your function, and the only
++:py:class:`PageElement` objects returned will be the one where your
++function returned ``True``.
++
++Note that this is different from simply passing `a function`_ as the
++first argument to one of the search methods. That's an easy way to
++find a tag, but _only_ tags will be considered. With an
++:py:class:`ElementSelector` you can write a single function that makes
++decisions about both tags and strings.
++
++
  Advanced parser customization
  =============================
@@ -3111,14 +3155,6 @@ The :py:class:`SoupStrainer` behavior is as follows:
  * When a tag does not match, the tag itself is not kept, but parsing continues
    into its contents to look for other tags that do match.
--You can also pass a :py:class:`SoupStrainer` into any of the methods covered
--in `Searching the tree`_. This probably isn't terribly useful, but I
--thought I'd mention it::
--
-- soup = BeautifulSoup(html_doc, 'html.parser')
-- soup.find_all(only_short_strings)
-- # ['\n\n', '\n\n', 'Elsie', ',\n', 'Lacie', ' and\n', 'Tillie',
-- #  '\n\n', '...', '\n']
  Customizing multi-valued attributes
  -----------------------------------