Comment 1 for bug 1924908

Revision history for this message
Leonard Richardson (leonardr) wrote : Re: Comma removed when printing some html entities

RightArrowLeftArrow is a named entity in HTML5, so the simplest way to solve your problem is to use the html5lib parser, which understands that entity:

soup = BeautifulSoup("<div>&RightArrowLeftArrow;</div>", "html5lib")
soup.decode(formatter=None)
# '<html><head></head><body><div>⇄</div></body></html>'

I spent some time looking into whether it's possible to make the other parsers able to handle HTML5 named entities, and it does seem possible in almost all cases.