Comment 5 for bug 2052936

Revision history for this message
Chris Papademetrious (chrispitude) wrote :

For the self-and-* matching functionality, for the following command:

====
tag.find_parent(True, {'data-base-uri': True}, include_self=True)
====

what would the tag.self_and() version of this command be?

For the self-only matching functionality, I really hope you consider implementing the matches() method. It would be extremely useful for our processing code. I understand your concerns about runtime, but runtime is not a factor for our application, whereas code clarity and maintainability is.

For example, we have code that processes HTML hierarchically like this:

====
for tag in reversed(soup.find_all(True)):
  if tag.matches(True, href=re.compile(...)):
    # do some stuff
  elif tag.matches(['div', 'body'], class_=['abstract', 'summary']):
    # do some stuff
  elif tag.matches(...):
    # ...
  elif ...
====

and the clarity and ease of modifying the filtering tests is paramount. The actual implementation of this code is much more complicated and harder to follow than the code above.

On a side note, the reason I iterate through the find_all() tags in reverse is that it allows me to modify the contents of a tag as much as I want without inadvertently breaking the "for" iteration (because I never traverse downward into the stuff I just modified). This code pattern works very well for hierarchical processing!