Explicitly set the html parser to make sure no extra tags get added.

BeautifulSoup supports multiple html parsers. Some of those parsers try to make the html valid by adding/removing tags[1]. This can lead to useless html, head & body tags in the final document. By explicitly setting the parser to ’html.parser’ this behaviour can be avoided. [1] http://www.crummy.com/software/BeautifulSoup/bs4/doc/#differences-between-parsers
2013-05-24 11:12:51 +02:00
parent e11c18bf48
commit 8d0e643637
1 changed files with 1 additions and 1 deletions
--- a/extract_toc/extract_toc.py
+++ b/extract_toc/extract_toc.py
@@ -14,7 +14,7 @@ from pelican import signals, readers, contents
 def extract_toc(content):
    if isinstance(content, contents.Static):
        return
-    soup = BeautifulSoup(content._content)
+    soup = BeautifulSoup(content._content,'html.parser')
    filename = content.source_path
    extension = path.splitext(filename)[1][1:]
    toc = ''