comparison mupdf-source/thirdparty/harfbuzz/docs/repacker.md @ 2:b50eed0cc0ef upstream

ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4. The directory name has changed: no version number in the expanded directory now.
author Franz Glasner <fzglas.hg@dom66.de>
date Mon, 15 Sep 2025 11:43:07 +0200
parents
children
comparison
equal deleted inserted replaced
1:1d09e1dec1d9 2:b50eed0cc0ef
1 # Introduction
2
3 Several tables in the opentype format are formed internally by a graph of subtables. Parent node's
4 reference their children through the use of positive offsets, which are typically 16 bits wide.
5 Since offsets are always positive this forms a directed acyclic graph. For storage in the font file
6 the graph must be given a topological ordering and then the subtables packed in serial according to
7 that ordering. Since 16 bit offsets have a maximum value of 65,535 if the distance between a parent
8 subtable and a child is more then 65,535 bytes then it's not possible for the offset to encode that
9 edge.
10
11 For many fonts with complex layout rules (such as Arabic) it's not unusual for the tables containing
12 layout rules ([GSUB/GPOS](https://docs.microsoft.com/en-us/typography/opentype/spec/gsub)) to be
13 larger than 65kb. As a result these types of fonts are susceptible to offset overflows when
14 serializing to the binary font format.
15
16 Offset overflows can happen for a variety of reasons and require different strategies to resolve:
17 * Simple overflows can often be resolved with a different topological ordering.
18 * If a subtable has many parents this can result in the link from furthest parent(s)
19 being at risk for overflows. In these cases it's possible to duplicate the shared subtable which
20 allows it to be placed closer to it's parent.
21 * If subtables exist which are themselves larger than 65kb it's not possible for any offsets to point
22 past them. In these cases the subtable can usually be split into two smaller subtables to allow
23 for more flexibility in the ordering.
24 * In GSUB/GPOS overflows from Lookup subtables can be resolved by changing the Lookup to an extension
25 lookup which uses a 32 bit offset instead of 16 bit offset.
26
27 In general there isn't a simple solution to produce an optimal topological ordering for a given graph.
28 Finding an ordering which doesn't overflow is a NP hard problem. Existing solutions use heuristics
29 which attempt a combination of the above strategies to attempt to find a non-overflowing configuration.
30
31 The harfbuzz subsetting library
32 [includes a repacking algorithm](https://github.com/harfbuzz/harfbuzz/blob/main/src/hb-repacker.hh)
33 which is used to resolve offset overflows that are present in the subsetted tables it produces. This
34 document provides a deep dive into how the harfbuzz repacking algorithm works.
35
36 Other implementations exist, such as in
37 [fontTools](https://github.com/fonttools/fonttools/blob/7af43123d49c188fcef4e540fa94796b3b44e858/Lib/fontTools/ttLib/tables/otBase.py#L72), however these are not covered in this document.
38
39 # Foundations
40
41 There's four key pieces to the harfbuzz approach:
42
43 * Subtable Graph: a table's internal structure is abstracted out into a lightweight graph
44 representation where each subtable is a node and each offset forms an edge. The nodes only need
45 to know how many bytes the corresponding subtable occupies. This lightweight representation can
46 be easily modified to test new ordering's and strategies as the repacking algorithm iterates.
47
48 * [Topological sorting algorithm](https://en.wikipedia.org/wiki/Topological_sorting): an algorithm
49 which given a graph gives a linear sorting of the nodes such that all offsets will be positive.
50
51 * Overflow check: given a graph and a topological sorting it checks if there will be any overflows
52 in any of the offsets. If there are overflows it returns a list of (parent, child) tuples that
53 will overflow. Since the graph has information on the size of each subtable it's straightforward
54 to calculate the final position of each subtable and then check if any offsets to it will
55 overflow.
56
57 * Content Aware Preprocessing: if the overflow resolver is aware of the format of the underlying
58 tables (eg. GSUB, GPOS) then in some cases preprocessing can be done to increase the chance of
59 successfully packing the graph. For example for GSUB and GPOS we can preprocess the graph and
60 promote lookups to extension lookups (upgrades a 16 bit offset to 32 bits) or split large lookup
61 subtables into two or more pieces.
62
63 * Offset resolution strategies: given a particular occurrence of an overflow these strategies
64 modify the graph to attempt to resolve the overflow.
65
66 # High Level Algorithm
67
68 ```
69 def repack(graph):
70 graph.topological_sort()
71
72 if (graph.will_overflow())
73 preprocess(graph)
74 assign_spaces(graph)
75 graph.topological_sort()
76
77 while (overflows = graph.will_overflow()):
78 for overflow in overflows:
79 apply_offset_resolution_strategy (overflow, graph)
80 graph.topological_sort()
81 ```
82
83 The actual code for this processing loop can be found in the function hb_resolve_overflows () of
84 [hb-repacker.hh](https://github.com/harfbuzz/harfbuzz/blob/main/src/hb-repacker.hh).
85
86 # Topological Sorting Algorithms
87
88 The harfbuzz repacker uses two different algorithms for topological sorting:
89 * [Kahn's Algorithm](https://en.wikipedia.org/wiki/Topological_sorting#Kahn's_algorithm)
90 * Sorting by shortest distance
91
92 Kahn's algorithm is approximately twice as fast as the shortest distance sort so that is attempted
93 first (only on the first topological sort). If it fails to eliminate overflows then shortest distance
94 sort will be used for all subsequent topological sorting operations.
95
96 ## Shortest Distance Sort
97
98 This algorithm orders the nodes based on total distance to each node. Nodes with a shorter distance
99 are ordered first.
100
101 The "weight" of an edge is the sum of the size of the sub-table being pointed to plus 2^16 for a 16 bit
102 offset and 2^32 for a 32 bit offset.
103
104 The distance of a node is the sum of all weights along the shortest path from the root to that node
105 plus a priority modifier (used to change where nodes are placed by moving increasing or
106 decreasing the effective distance). Ties between nodes with the same distance are broken based
107 on the order of the offset in the sub table bytes.
108
109 The shortest distance to each node is determined using
110 [Djikstra's algorithm](https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm). Then the topological
111 ordering is produce by applying a modified version of Kahn's algorithm that uses a priority queue
112 based on the shortest distance to each node.
113
114 ## Optimizing the Sorting
115
116 The topological sorting operation is the core of the repacker and is run on each iteration so it needs
117 to be as fast as possible. There's a few things that are done to speed up subsequent sorting
118 operations:
119
120 * The number of incoming edges to each node is cached. This is required by the Kahn's algorithm
121 portion of both sorts. Where possible when the graph is modified we manually update the cached
122 edge counts of affected nodes.
123
124 * The distance to each node is cached. Where possible when the graph is modified we manually update
125 the cached distances of any affected nodes.
126
127 Caching these values allows the repacker to avoid recalculating them for the full graph on each
128 iteration.
129
130 The other important factor to speed is a fast priority queue which is a core datastructure to
131 the topological sorting algorithm. Currently a basic heap based queue is used. Heap based queue's
132 don't support fast priority decreases, but that can be worked around by just adding redundant entries
133 to the priority queue and filtering the older ones out when poppping off entries. This is based
134 on the recommendations in
135 [a study of the practical performance of priority queues in Dijkstra's algorithm](https://www3.cs.stonybrook.edu/~rezaul/papers/TR-07-54.pdf)
136
137 ## Special Handling of 32 bit Offsets
138
139 If a graph contains multiple 32 bit offsets then the shortest distance sorting will be likely be
140 suboptimal. For example consider the case where a graph contains two 32 bit offsets that each point
141 to a subgraph which are not connected to each other. The shortest distance sort will interleave the
142 subtables of the two subgraphs, potentially resulting in overflows. Since each of these subgraphs are
143 independent of each other, and 32 bit offsets can point extremely long distances a better strategy is
144 to pack the first subgraph in it's entirety and then have the second subgraph packed after with the 32
145 bit offset pointing over the first subgraph. For example given the graph:
146
147
148 ```
149 a--- b -- d -- f
150 \
151 \_ c -- e -- g
152 ```
153
154 Where the links from a to b and a to c are 32 bit offsets, the shortest distance sort would be:
155
156 ```
157 a, b, c, d, e, f, g
158
159 ```
160
161 If nodes d and e have a combined size greater than 65kb then the offset from d to f will overflow.
162 A better ordering is:
163
164 ```
165 a, b, d, f, c, e, g
166 ```
167
168 The ability for 32 bit offsets to point long distances is utilized to jump over the subgraph of
169 b which gives the remaining 16 bit offsets a better chance of not overflowing.
170
171 The above is an ideal situation where the subgraphs are disconnected from each other, in practice
172 this is often not this case. So this idea can be generalized as follows:
173
174 If there is a subgraph that is only reachable from one or more 32 bit offsets, then:
175 * That subgraph can be treated as an independent unit and all nodes of the subgraph packed in isolation
176 from the rest of the graph.
177 * In a table that occupies less than 4gb of space (in practice all fonts), that packed independent
178 subgraph can be placed anywhere after the parent nodes without overflowing the 32 bit offsets from
179 the parent nodes.
180
181 The sorting algorithm incorporates this via a "space" modifier that can be applied to nodes in the
182 graph. By default all nodes are treated as being in space zero. If a node is given a non-zero space, n,
183 then the computed distance to the node will be modified by adding `n * 2^32`. This will cause that
184 node and it's descendants to be packed between all nodes in space n-1 and space n+1. Resulting in a
185 topological sort like:
186
187 ```
188 | space 0 subtables | space 1 subtables | .... | space n subtables |
189 ```
190
191 The assign_spaces() step in the high level algorithm is responsible for identifying independent
192 subgraphs and assigning unique spaces to each one. More information on the space assignment can be
193 found in the next section.
194
195 # Graph Preprocessing
196
197 For certain table types we can preprocess and modify the graph structure to reduce the occurences
198 of overflows. Currently the repacker implements preprocessing only for GPOS and GSUB tables.
199
200 ## GSUB/GPOS Table Splitting
201
202 The GSUB/GPOS preprocessor scans each lookup subtable and determines if the subtable's children are
203 so large that no overflow resolution is possible (for example a single subtable that exceeds 65kb
204 cannot be pointed over). When such cases are detected table splitting is invoked:
205
206 * The subtable is first analyzed to determine the smallest number of split points that will allow
207 for successful offset overflow resolution.
208
209 * Then the subtable in the graph representation is modified to actually perform the split at the
210 previously computed split points. At a high level splits are done by inserting new subtables
211 which contain a subset of the data of the original subtable and then shrinking the original subtable.
212
213 Table splitting must be aware of the underlying format of each subtable type and thus needs custom
214 code for each subtable type. Currently subtable splitting is only supported for GPOS subtable types.
215
216 ## GSUB/GPOS Extension Lookup Promotion
217
218 In GSUB/GPOS tables lookups can be regular lookups which use 16 bit offsets to the children subtables
219 or extension lookups which use 32 bit offsets to the children subtables. If the sub graph of all
220 regular lookups is too large then it can be difficult to find an overflow free configuration. This
221 can be remedied by promoting one or more regular lookups to extension lookups.
222
223 During preprocessing the graph is scanned to determine the size of the subgraph of regular lookups.
224 If the graph is found to be too big then the analysis finds a set of lookups to promote to reduce
225 the subgraph size. Lastly the graph is modified to convert those lookups to extension lookups.
226
227 # Offset Resolution Strategies
228
229 ## Space Assignment
230
231 The goal of space assignment is to find connected subgraphs that are only reachable via 32 bit offsets
232 and then assign each such subgraph to a unique non-zero space. The algorithm is roughly:
233
234 1. Collect the set, `S`, of nodes that are children of 32 bit offsets.
235
236 2. Do a directed traversal from each node in `S` and collect all encountered nodes into set `T`.
237 Mark all nodes in the graph that are not in `T` as being in space 0.
238
239 3. Set `next_space = 1`.
240
241 4. While set `S` is not empty:
242
243 a. Pick a node `n` in set `S` then perform an undirected graph traversal and find the set `Q` of
244 nodes that are reachable from `n`.
245
246 b. During traversal if a node, `m`, has a edge to a node in space 0 then `m` must be duplicated
247 to disconnect it from space 0.
248
249 d. Remove all nodes in `Q` from `S` and assign all nodes in `Q` to `next_space`.
250
251
252 c. Increment `next_space` by one.
253
254
255 ## Manual Iterative Resolutions
256
257 For each overflow in each iteration the algorithm will attempt to apply offset overflow resolution
258 strategies to eliminate the overflow. The type of strategy applied is dependent on the characteristics
259 of the overflowing link:
260
261 * If the overflowing offset is inside a space other than space 0 and the subgraph space has more
262 than one 32 bit offset pointing into the subgraph then subdivide the space by moving subgraph
263 from one of the 32 bit offsets into a new space via the duplication of shared nodes.
264
265 * If the overflowing offset is pointing to a subtable with more than one incoming edge: duplicate
266 the node so that the overflowing offset is pointing at it's own copy of that node.
267
268 * Otherwise, attempt to move the child subtable closer to it's parent. This is accomplished by
269 raising the priority of all children of the parent. Next time the topological sort is run the
270 children will be ordered closer to the parent.
271
272 # Test Cases
273
274 The harfbuzz repacker has tests defined using generic graphs: https://github.com/harfbuzz/harfbuzz/blob/main/src/test-repacker.cc
275
276 # Future Improvements
277
278 Currently for GPOS tables the repacker implementation is sufficient to handle both subsetting and the
279 general case of font compilation repacking. However for GSUB the repacker is only sufficient for
280 subsetting related overflows. To enable general case repacking of GSUB, support for splitting of
281 GSUB subtables will need to be added. Other table types such as COLRv1 shouldn't require table
282 splitting due to the wide use of 24 bit offsets throughout the table.
283
284 Beyond subtable splitting there are a couple of "nice to have" improvements, but these are not required
285 to support the general case:
286
287 * Extension demotion: currently extension promotion is supported but in some cases if the non-extension
288 subgraph is underfilled then packed size can be reduced by demoting extension lookups back to regular
289 lookups.
290
291 * Currently only children nodes are moved to resolve offsets. However, in many cases moving a parent
292 node closer to it's children will have less impact on the size of other offsets. Thus the algorithm
293 should use a heuristic (based on parent and child subtable sizes) to decide if the children's
294 priority should be increased or the parent's priority decreased.