comparison mupdf-source/thirdparty/gumbo-parser/DEBUGGING.md @ 2:b50eed0cc0ef upstream

ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4. The directory name has changed: no version number in the expanded directory now.
author Franz Glasner <fzglas.hg@dom66.de>
date Mon, 15 Sep 2025 11:43:07 +0200
parents
children
comparison
equal deleted inserted replaced
1:1d09e1dec1d9 2:b50eed0cc0ef
1 These are a couple of debugging notes that may be helpful for anyone developing
2 Gumbo or trying to diagnose a tricky problem. They will probably not be
3 necessary for normal clients of this library - Gumbo is relatively stable, and
4 bugs are often rare and obscure. However, they're handy to have as a reference,
5 and may also provide useful Google fodder to people searching for these tools.
6
7 Standard disclaimer: I use all of these techniques on my Ubuntu 14.04 computer
8 with gcc 4.8.2, clang 3.4, and gtest 1.6.0, but make no warranty about them
9 working on other systems. In particular, they're almost certain not to work on
10 Windows.
11
12 Debug output
13 ============
14
15 Gumbo has a compile-time switch to dump lots of debug output onto stdout.
16 Compile with the GUMBO_DEBUG define enabled:
17
18 ```bash
19 $ make CFLAGS='-DGUMBO_DEBUG'
20 ```
21
22 Note that this spits *a lot* of debug information to the console and makes the
23 program run significantly slower, so it's usually helpful to isolate only the
24 specific HTML file or fragment that causes the bug. It lets us trace the
25 operation of each of the tokenizer & parser's state machines in depth, though.
26
27 Unit tests
28 ==========
29
30 As mentioned in the README, Gumbo relies on [googletest][] for unit tests.
31 Unzip the gtest ZIP distribution inside the Gumbo root and rename it 'gtest'.
32 'make check' runs the tests, as normal.
33
34 ```bash
35 $ make check
36 $ cat test-suite.log
37 ```
38
39 If you need to debug a core dump, you'll probably want to run the test binary
40 directly:
41
42 ```bash
43 $ ulimit -c unlimited
44 $ make check
45 $ .libs/lt-gumbo_test
46 $ gdb .libs/lt-gumbo_test core
47 ```
48
49 The same goes for core dumps in other example binaries.
50
51 To run only a single unit test, pass the --gtest_filter='TestName' flag to the
52 lt-gumbo_test binary.
53
54 Assertions
55 ==========
56
57 Gumbo relies pretty heavily on assertions. By default they're enabled at
58 run-time: to turn them off, define NDEBUG:
59
60 ```bash
61 $ make CFLAGS='-DNDEBUG'
62 ```
63
64 ASAN
65 ====
66
67 Google's [address-sanitizer][] is a helpful tool that lets you find memory
68 errors with relatively low overhead: enough that you can often run it in
69 production. Enabling it for C/C++ binaries is pretty standard and described on
70 the ASAN documentation pages. It requires Clang >=3.1 or GCC >= 4.8.
71
72 ```bash
73 $ make \
74 CFLAGS='-fsanitize=address -fno-omit-frame-pointer -fno-inline' \
75 LDFLAGS='-fsanitize=address'
76 ```
77
78 ASAN can also be used when Gumbo is compiled as a shared library and linked into
79 a scripting language via FFI, but this use-case is unsupported by the ASAN
80 authors. To do it, use LD_PRELOAD to ensure the ASAN runtime support is
81 included in the process:
82
83 ```bash
84 $ LD_PRELOAD=libasan.so.0 python -c 'import gumbo; gumbo.parse(problem_text)'
85 ```
86
87 Getting clean stack traces from this requires the use of the llvm-symbolizer
88 binary, included with clang:
89
90 ```bash
91 $ export ASAN_SYMBOLIZER_PATH=/usr/bin/llvm-symbolizer-3.4
92 $ export ASAN_OPTIONS=symbolize=1
93 $ LD_PRELOAD=libasan.so.0 python -c \
94 'import gumbo; gumbo.parse(problem_text)' 2>&1 | head -100
95 $ killall llvm-symbolizer-3.4
96 $ killall llvm-symbolizer-3.4
97 $ killall llvm-symbolizer-3.4
98 ```
99
100 This use case is even less officially supported than using it with dynamic
101 shared objects; on my machine, it led to a recursive ASAN error about a
102 use-after-free in llvm-symbolizer, effectively fork-bombing the machine. Have
103 the killalls ready, and avoid letting the process run for too long (eg. piping
104 it to 'less').
105
106 [googletest]: https://code.google.com/p/googletest/
107 [address-sanitizer]: https://code.google.com/p/address-sanitizer/