Explicitly set the html parser to make sure no extra tags get added.

BeautifulSoup supports multiple html parsers. Some of those parsers
try to make the html valid by adding/removing tags[1]. This can lead
to useless html, head & body tags in the final document. By explicitly
setting the parser to ’html.parser’ this behaviour can be avoided.

[1] http://www.crummy.com/software/BeautifulSoup/bs4/doc/#differences-between-parsers
This commit is contained in:
bas smit
2013-05-24 11:12:51 +02:00
parent e11c18bf48
commit 8d0e643637

View File

@@ -14,7 +14,7 @@ from pelican import signals, readers, contents
def extract_toc(content):
if isinstance(content, contents.Static):
return
soup = BeautifulSoup(content._content)
soup = BeautifulSoup(content._content,'html.parser')
filename = content.source_path
extension = path.splitext(filename)[1][1:]
toc = ''