comparison mupdf-source/thirdparty/gumbo-parser/python/gumbo/__init__.py @ 2:b50eed0cc0ef upstream

ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4. The directory name has changed: no version number in the expanded directory now.
author Franz Glasner <fzglas.hg@dom66.de>
date Mon, 15 Sep 2025 11:43:07 +0200
parents
children
comparison
equal deleted inserted replaced
1:1d09e1dec1d9 2:b50eed0cc0ef
1 """Gumbo HTML parser.
2
3 These are the Python bindings for Gumbo. All public API classes and functions
4 are exported from this module. They include:
5
6 - CTypes representations of all structs and enums defined in gumbo.h. The
7 naming convention is to take the C name and strip off the "Gumbo" prefix.
8
9 - A low-level wrapper around the gumbo_parse function, returning the classes
10 exposed above. Usage:
11
12 import gumbo
13 with gumboc.parse(text, **options) as output:
14 do_stuff_with_doctype(output.document)
15 do_stuff_with_parse_tree(output.root)
16
17 - Higher-level bindings that mimic the API provided by html5lib. Usage:
18
19 from gumbo import html5lib
20
21 This requires that html5lib be installed (it uses their treebuilders), and is
22 intended as a drop-in replacement.
23
24 - Similarly, higher-level bindings that mimic BeautifulSoup and return
25 BeautifulSoup objects. For this, use:
26
27 import gumbo
28 soup = gumbo.soup_parse(text, **options)
29
30 It will give you back a soup object like BeautifulSoup.BeautifulSoup(text).
31 """
32
33 from gumbo.gumboc import *
34
35 try:
36 from gumbo import html5lib_adapter as html5lib
37 except ImportError:
38 # html5lib not installed
39 pass
40
41 try:
42 from gumbo.soup_adapter import parse as soup_parse
43 except ImportError:
44 # BeautifulSoup not installed
45 pass