Mercurial > hgrepos > Python2 > PyMuPDF
comparison mupdf-source/thirdparty/harfbuzz/docs/usermanual-getting-started.xml @ 2:b50eed0cc0ef upstream
ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
The directory name has changed: no version number in the expanded directory now.
| author | Franz Glasner <fzglas.hg@dom66.de> |
|---|---|
| date | Mon, 15 Sep 2025 11:43:07 +0200 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| 1:1d09e1dec1d9 | 2:b50eed0cc0ef |
|---|---|
| 1 <?xml version="1.0"?> | |
| 2 <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN" | |
| 3 "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [ | |
| 4 <!ENTITY % local.common.attrib "xmlns:xi CDATA #FIXED 'http://www.w3.org/2003/XInclude'"> | |
| 5 <!ENTITY version SYSTEM "version.xml"> | |
| 6 ]> | |
| 7 <chapter id="getting-started"> | |
| 8 <title>Getting started with HarfBuzz</title> | |
| 9 <section id="an-overview-of-the-harfbuzz-shaping-api"> | |
| 10 <title>An overview of the HarfBuzz shaping API</title> | |
| 11 <para> | |
| 12 The core of the HarfBuzz shaping API is the function | |
| 13 <function>hb_shape()</function>. This function takes a font, a | |
| 14 buffer containing a string of Unicode codepoints and | |
| 15 (optionally) a list of font features as its input. It replaces | |
| 16 the codepoints in the buffer with the corresponding glyphs from | |
| 17 the font, correctly ordered and positioned, and with any of the | |
| 18 optional font features applied. | |
| 19 </para> | |
| 20 <para> | |
| 21 In addition to holding the pre-shaping input (the Unicode | |
| 22 codepoints that comprise the input string) and the post-shaping | |
| 23 output (the glyphs and positions), a HarfBuzz buffer has several | |
| 24 properties that affect shaping. The most important are the | |
| 25 text-flow direction (e.g., left-to-right, right-to-left, | |
| 26 top-to-bottom, or bottom-to-top), the script tag, and the | |
| 27 language tag. | |
| 28 </para> | |
| 29 | |
| 30 <para> | |
| 31 For input string buffers, flags are available to denote when the | |
| 32 buffer represents the beginning or end of a paragraph, to | |
| 33 indicate whether or not to visibly render Unicode <literal>Default | |
| 34 Ignorable</literal> codepoints, and to modify the cluster-merging | |
| 35 behavior for the buffer. For shaped output buffers, the | |
| 36 individual X and Y offsets and <literal>advances</literal> | |
| 37 (the logical dimensions) of each glyph are | |
| 38 accessible. HarfBuzz also flags glyphs as | |
| 39 <literal>UNSAFE_TO_BREAK</literal> if breaking the string at | |
| 40 that glyph (e.g., in a line-breaking or hyphenation process) | |
| 41 would require re-shaping the text. | |
| 42 </para> | |
| 43 | |
| 44 <para> | |
| 45 HarfBuzz also provides methods to compare the contents of | |
| 46 buffers, join buffers, normalize buffer contents, and handle | |
| 47 invalid codepoints, as well as to determine the state of a | |
| 48 buffer (e.g., input codepoints or output glyphs). Buffer | |
| 49 lifecycles are managed and all buffers are reference-counted. | |
| 50 </para> | |
| 51 | |
| 52 <para> | |
| 53 Although the default <function>hb_shape()</function> function is | |
| 54 sufficient for most use cases, a variant is also provided that | |
| 55 lets you specify which of HarfBuzz's shapers to use on a buffer. | |
| 56 </para> | |
| 57 | |
| 58 <para> | |
| 59 HarfBuzz can read TrueType fonts, TrueType collections, OpenType | |
| 60 fonts, and OpenType collections. Functions are provided to query | |
| 61 font objects about metrics, Unicode coverage, available tables and | |
| 62 features, and variation selectors. Individual glyphs can also be | |
| 63 queried for metrics, variations, and glyph names. OpenType | |
| 64 variable fonts are supported, and HarfBuzz allows you to set | |
| 65 variation-axis coordinates on font objects. | |
| 66 </para> | |
| 67 | |
| 68 <para> | |
| 69 HarfBuzz provides glue code to integrate with various other | |
| 70 libraries, including FreeType, GObject, and CoreText. Support | |
| 71 for integrating with Uniscribe and DirectWrite is experimental | |
| 72 at present. | |
| 73 </para> | |
| 74 </section> | |
| 75 | |
| 76 <section id="terminology"> | |
| 77 <title>Terminology</title> | |
| 78 <para> | |
| 79 | |
| 80 </para> | |
| 81 <variablelist> | |
| 82 <?dbfo list-presentation="blocks"?> | |
| 83 <varlistentry> | |
| 84 <term>script</term> | |
| 85 <listitem> | |
| 86 <para> | |
| 87 In text shaping, a <emphasis>script</emphasis> is a | |
| 88 writing system: a set of symbols, rules, and conventions | |
| 89 that is used to represent a language or multiple | |
| 90 languages. | |
| 91 </para> | |
| 92 <para> | |
| 93 In general computing lingo, the word "script" can also | |
| 94 be used to mean an executable program (usually one | |
| 95 written in a human-readable programming language). For | |
| 96 the sake of clarity, HarfBuzz documents will always use | |
| 97 more specific terminology when referring to this | |
| 98 meaning, such as "Python script" or "shell script." In | |
| 99 all other instances, "script" refers to a writing system. | |
| 100 </para> | |
| 101 <para> | |
| 102 For developers using HarfBuzz, it is important to note | |
| 103 the distinction between a script and a language. Most | |
| 104 scripts are used to write a variety of different | |
| 105 languages, and many languages may be written in more | |
| 106 than one script. | |
| 107 </para> | |
| 108 </listitem> | |
| 109 </varlistentry> | |
| 110 | |
| 111 <varlistentry> | |
| 112 <term>shaper</term> | |
| 113 <listitem> | |
| 114 <para> | |
| 115 In HarfBuzz, a <emphasis>shaper</emphasis> is a | |
| 116 handler for a specific script-shaping model. HarfBuzz | |
| 117 implements separate shapers for Indic, Arabic, Thai and | |
| 118 Lao, Khmer, Myanmar, Tibetan, Hangul, Hebrew, the | |
| 119 Universal Shaping Engine (USE), and a default shaper for | |
| 120 scripts with no script-specific shaping model. | |
| 121 </para> | |
| 122 </listitem> | |
| 123 </varlistentry> | |
| 124 | |
| 125 <varlistentry> | |
| 126 <term>cluster</term> | |
| 127 <listitem> | |
| 128 <para> | |
| 129 In text shaping, a <emphasis>cluster</emphasis> is a | |
| 130 sequence of codepoints that must be treated as an | |
| 131 indivisible unit. Clusters can include code-point | |
| 132 sequences that form a ligature or base-and-mark | |
| 133 sequences. Tracking and preserving clusters is important | |
| 134 when shaping operations might separate or reorder | |
| 135 code points. | |
| 136 </para> | |
| 137 <para> | |
| 138 HarfBuzz provides three cluster | |
| 139 <emphasis>levels</emphasis> that implement different | |
| 140 approaches to the problem of preserving clusters during | |
| 141 shaping operations. | |
| 142 </para> | |
| 143 </listitem> | |
| 144 </varlistentry> | |
| 145 | |
| 146 <varlistentry> | |
| 147 <term>grapheme</term> | |
| 148 <listitem> | |
| 149 <para> | |
| 150 In linguistics, a <emphasis>grapheme</emphasis> is one | |
| 151 of the indivisible units that make up a writing system or | |
| 152 script. Often, graphemes are individual symbols (letters, | |
| 153 numbers, punctuation marks, logograms, etc.) but, | |
| 154 depending on the writing system, a particular grapheme | |
| 155 might correspond to a sequence of several Unicode code | |
| 156 points. | |
| 157 </para> | |
| 158 <para> | |
| 159 In practice, HarfBuzz and other text-shaping engines | |
| 160 are not generally concerned with graphemes. However, it | |
| 161 is important for developers using HarfBuzz to recognize | |
| 162 that there is a difference between graphemes and shaping | |
| 163 clusters (see above). The two concepts may overlap | |
| 164 frequently, but there is no guarantee that they will be | |
| 165 identical. | |
| 166 </para> | |
| 167 </listitem> | |
| 168 </varlistentry> | |
| 169 | |
| 170 <varlistentry> | |
| 171 <term>syllable</term> | |
| 172 <listitem> | |
| 173 <para> | |
| 174 In linguistics, a <emphasis>syllable</emphasis> is an | |
| 175 a sequence of sounds that makes up a building block of a | |
| 176 particular language. Every language has its own set of | |
| 177 rules describing what constitutes a valid syllable. | |
| 178 </para> | |
| 179 <para> | |
| 180 For text-shaping purposes, the various definitions of | |
| 181 "syllable" are important because script-specific shaping | |
| 182 operations may be applied at the syllable level. For | |
| 183 example, a reordering rule might specify that a vowel | |
| 184 mark be reordered to the beginning of the syllable. | |
| 185 </para> | |
| 186 <para> | |
| 187 Syllables will consist of one or more Unicode code | |
| 188 points. The definition of a syllable for a particular | |
| 189 writing system might correspond to how HarfBuzz | |
| 190 identifies clusters (see above) for the same writing | |
| 191 system. However, it is important for developers using | |
| 192 HarfBuzz to recognize that there is a difference between | |
| 193 syllables and shaping clusters. The two concepts may | |
| 194 overlap frequently, but there is no guarantee that they | |
| 195 will be identical. | |
| 196 </para> | |
| 197 </listitem> | |
| 198 </varlistentry> | |
| 199 </variablelist> | |
| 200 | |
| 201 </section> | |
| 202 | |
| 203 | |
| 204 <section id="a-simple-shaping-example"> | |
| 205 <title>A simple shaping example</title> | |
| 206 | |
| 207 <para> | |
| 208 Below is the simplest HarfBuzz shaping example possible. | |
| 209 </para> | |
| 210 <orderedlist numeration="arabic"> | |
| 211 <listitem> | |
| 212 <para> | |
| 213 Create a buffer and put your text in it. | |
| 214 </para> | |
| 215 </listitem> | |
| 216 </orderedlist> | |
| 217 <programlisting language="C"> | |
| 218 #include <hb.h> | |
| 219 | |
| 220 hb_buffer_t *buf; | |
| 221 buf = hb_buffer_create(); | |
| 222 hb_buffer_add_utf8(buf, text, -1, 0, -1); | |
| 223 </programlisting> | |
| 224 <orderedlist numeration="arabic"> | |
| 225 <listitem override="2"> | |
| 226 <para> | |
| 227 Set the script, language and direction of the buffer. | |
| 228 </para> | |
| 229 </listitem> | |
| 230 </orderedlist> | |
| 231 <programlisting language="C"> | |
| 232 hb_buffer_set_direction(buf, HB_DIRECTION_LTR); | |
| 233 hb_buffer_set_script(buf, HB_SCRIPT_LATIN); | |
| 234 hb_buffer_set_language(buf, hb_language_from_string("en", -1)); | |
| 235 </programlisting> | |
| 236 <orderedlist numeration="arabic"> | |
| 237 <listitem override="3"> | |
| 238 <para> | |
| 239 Create a face and a font from a font file. | |
| 240 </para> | |
| 241 </listitem> | |
| 242 </orderedlist> | |
| 243 <programlisting language="C"> | |
| 244 hb_blob_t *blob = hb_blob_create_from_file(filename); /* or hb_blob_create_from_file_or_fail() */ | |
| 245 hb_face_t *face = hb_face_create(blob, 0); | |
| 246 hb_font_t *font = hb_font_create(face); | |
| 247 </programlisting> | |
| 248 <orderedlist numeration="arabic"> | |
| 249 <listitem override="4"> | |
| 250 <para> | |
| 251 Shape! | |
| 252 </para> | |
| 253 </listitem> | |
| 254 </orderedlist> | |
| 255 <programlisting> | |
| 256 hb_shape(font, buf, NULL, 0); | |
| 257 </programlisting> | |
| 258 <orderedlist numeration="arabic"> | |
| 259 <listitem override="5"> | |
| 260 <para> | |
| 261 Get the glyph and position information. | |
| 262 </para> | |
| 263 </listitem> | |
| 264 </orderedlist> | |
| 265 <programlisting language="C"> | |
| 266 unsigned int glyph_count; | |
| 267 hb_glyph_info_t *glyph_info = hb_buffer_get_glyph_infos(buf, &glyph_count); | |
| 268 hb_glyph_position_t *glyph_pos = hb_buffer_get_glyph_positions(buf, &glyph_count); | |
| 269 </programlisting> | |
| 270 <orderedlist numeration="arabic"> | |
| 271 <listitem override="6"> | |
| 272 <para> | |
| 273 Iterate over each glyph. | |
| 274 </para> | |
| 275 </listitem> | |
| 276 </orderedlist> | |
| 277 <programlisting language="C"> | |
| 278 hb_position_t cursor_x = 0; | |
| 279 hb_position_t cursor_y = 0; | |
| 280 for (unsigned int i = 0; i < glyph_count; i++) { | |
| 281 hb_codepoint_t glyphid = glyph_info[i].codepoint; | |
| 282 hb_position_t x_offset = glyph_pos[i].x_offset; | |
| 283 hb_position_t y_offset = glyph_pos[i].y_offset; | |
| 284 hb_position_t x_advance = glyph_pos[i].x_advance; | |
| 285 hb_position_t y_advance = glyph_pos[i].y_advance; | |
| 286 /* draw_glyph(glyphid, cursor_x + x_offset, cursor_y + y_offset); */ | |
| 287 cursor_x += x_advance; | |
| 288 cursor_y += y_advance; | |
| 289 } | |
| 290 </programlisting> | |
| 291 <orderedlist numeration="arabic"> | |
| 292 <listitem override="7"> | |
| 293 <para> | |
| 294 Tidy up. | |
| 295 </para> | |
| 296 </listitem> | |
| 297 </orderedlist> | |
| 298 <programlisting language="C"> | |
| 299 hb_buffer_destroy(buf); | |
| 300 hb_font_destroy(font); | |
| 301 hb_face_destroy(face); | |
| 302 hb_blob_destroy(blob); | |
| 303 </programlisting> | |
| 304 | |
| 305 <para> | |
| 306 This example shows enough to get us started using HarfBuzz. In | |
| 307 the sections that follow, we will use the remainder of | |
| 308 HarfBuzz's API to refine and extend the example and improve its | |
| 309 text-shaping capabilities. | |
| 310 </para> | |
| 311 </section> | |
| 312 </chapter> |
