Python2/PyMuPDF: mupdf-source/thirdparty/harfbuzz/docs/usermanual-what-is-harfbuzz.xml comparison

comparison mupdf-source/thirdparty/harfbuzz/docs/usermanual-what-is-harfbuzz.xml @ 2:b50eed0cc0ef upstream

ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4. The directory name has changed: no version number in the expanded directory now.

author	Franz Glasner <fzglas.hg@dom66.de>
date	Mon, 15 Sep 2025 11:43:07 +0200
parents
children

comparison

equal deleted inserted replaced

-:1d09e1dec1d9
+:b50eed0cc0ef
+<?xml version="1.0"?>
+<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
+"http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [
+<!ENTITY % local.common.attrib "xmlns:xi  CDATA  #FIXED 'http://www.w3.org/2003/XInclude'">
+<!ENTITY version SYSTEM "version.xml">
+]>
+<chapter id="what-is-harfbuzz">
+<title>What is HarfBuzz?</title>
+<para>
+HarfBuzz is a <emphasis>text-shaping engine</emphasis>. If you
+give HarfBuzz a font and a string containing a sequence of Unicode
+codepoints, HarfBuzz selects and positions the corresponding
+glyphs from the font, applying all of the necessary layout rules
+and font features. HarfBuzz then returns the string to you in the
+form that is correctly arranged for the language and writing
+system.
+</para>
+<para>
+HarfBuzz can properly shape all of the world's major writing
+systems. It runs on all major operating systems and software
+platforms and it supports the major font formats in use
+today.
+</para>
+<section id="what-is-text-shaping">
+<title>What is text shaping?</title>
+<para>
+Text shaping is the process of translating a string of character
+codes (such as Unicode codepoints) into a properly arranged
+sequence of glyphs that can be rendered onto a screen or into
+final output form for inclusion in a document.
+</para>
+<para>
+The shaping process is dependent on the input string, the active
+font, the script (or writing system) that the string is in, and
+the language that the string is in.
+</para>
+<para>
+Modern software systems generally only deal with strings in the
+Unicode encoding scheme (although legacy systems and documents may
+involve other encodings).
+</para>
+<para>
+There are several font formats that a program might
+encounter, each of which has a set of standard text-shaping
+rules.
+</para>
+<para>The dominant format is <ulink
+url="http://www.microsoft.com/typography/otspec/">OpenType</ulink>. The
+OpenType specification defines a series of <ulink url="https://github.com/n8willis/opentype-shaping-documents">shaping models</ulink> for
+various scripts from around the world. These shaping models depend on
+the font incorporating certain features as
+<emphasis>lookups</emphasis> in its <literal>GSUB</literal>
+and <literal>GPOS</literal> tables.
+</para>
+<para>
+Alternatively, OpenType fonts can include shaping features for
+the <ulink url="https://graphite.sil.org/">Graphite</ulink> shaping model.
+</para>
+<para>
+TrueType fonts can also include OpenType shaping
+features. Alternatively, TrueType fonts can also include <ulink url="https://developer.apple.com/fonts/TrueType-Reference-Manual/RM09/AppendixF.html">Apple
+Advanced Typography</ulink> (AAT) tables to implement shaping
+support. AAT fonts are generally only found on macOS and iOS systems.
+</para>
+<para>
+Text strings will usually be tagged with a script and language
+tag that provide the context needed to perform text shaping
+correctly.  The necessary <ulink
+url="https://docs.microsoft.com/en-us/typography/opentype/spec/scripttags">script</ulink>
+and <ulink
+url="https://docs.microsoft.com/en-us/typography/opentype/spec/languagetags">language</ulink>
+tags are defined by OpenType.
+</para>
+</section>
+<section id="why-do-i-need-a-shaping-engine">
+<title>Why do I need a shaping engine?</title>
+<para>
+Text shaping is an integral part of preparing text for
+display. Before a Unicode sequence can be rendered, the
+codepoints in the sequence must be mapped to the corresponding
+glyphs provided in the font, and those glyphs must be positioned
+correctly relative to each other. For many of the scripts
+supported in Unicode, these steps involve script-specific layout
+rules, including complex joining, reordering, and positioning
+behavior. Implementing these rules is the job of the shaping engine.
+</para>
+<para>
+Text shaping is a fairly low-level operation. HarfBuzz is
+used directly by text-handling libraries like <ulink
+url="https://www.pango.org/">Pango</ulink>, as well as by the layout
+engines in Firefox, LibreOffice, and Chromium. Unless you are
+<emphasis>writing</emphasis> one of these layout engines
+yourself, you will probably not need to use HarfBuzz: normally,
+a layout engine, toolkit, or other library will turn text into
+glyphs for you.
+</para>
+<para>
+However, if you <emphasis>are</emphasis> writing a layout engine
+or graphics library yourself, then you will need to perform text
+shaping, and this is where HarfBuzz can help you.
+</para>
+<para>
+Here are some specific scenarios where a text-shaping engine
+like HarfBuzz helps you:
+</para>
+<itemizedlist>
+<listitem>
+<para>
+OpenType fonts contain a set of glyphs (that is, shapes
+	  to represent the letters, numbers, punctuation marks, and
+	  all other symbols), which are indexed by a <literal>glyph ID</literal>.
+	</para>
+	<para>
+A particular glyph ID within the font does not necessarily
+	  correlate to a predictable Unicode codepoint. For instance,
+	  some fonts have the letter &quot;a&quot; as glyph ID 1, but
+	  many others do not. In order to retrieve the right glyph
+	  from the font to display &quot;a&quot;, you need to consult
+	  the table inside the font (the <literal>cmap</literal>
+	  table) that maps Unicode codepoints to glyph IDs. In other
+	  words, <emphasis>text shaping turns codepoints into glyph
+	  IDs</emphasis>.
+</para>
+</listitem>
+<listitem>
+<para>
+Many OpenType fonts contain ligatures: combinations of
+characters that are rendered as a single unit. For instance,
+	  it is common for the &quot;f, i&quot; letter
+	  sequence to appear in print as the single ligature glyph
+	  &quot;ﬁ&quot;.
+	</para>
+	<para>
+	  Whether you should render an &quot;f, i&quot; sequence
+	  as <literal>fi</literal> or as &quot;ﬁ&quot; does not
+depend on the input text. Instead, it depends on the whether
+	  or not the font includes an &quot;ﬁ&quot; glyph and on the
+	  level of ligature application you wish to perform. The font
+	  and the amount of ligature application used are under your
+	  control. In other words, <emphasis>text shaping involves
+	  querying the font's ligature tables and determining what
+	  substitutions should be made</emphasis>.
+</para>
+</listitem>
+<listitem>
+<para>
+While ligatures like &quot;ﬁ&quot; are optional typographic
+refinements, some languages <emphasis>require</emphasis> certain
+substitutions to be made in order to display text correctly.
+</para>
+	<para>
+	  For example, in Tamil, when the letter &quot;TTA&quot; (ட)
+	  letter is followed by the vowel sign &quot;U&quot; (ு), the pair
+	  must be replaced by the single glyph &quot;டு&quot;. The
+	  sequence of Unicode characters &quot;ட,ு&quot; needs to be
+	  substituted with a single &quot;டு&quot; glyph from the
+	  font.
+	</para>
+	<para>
+	  But &quot;டு&quot; does not have a Unicode codepoint. To
+	  find this glyph, you need to consult the table inside
+	  the font (the <literal>GSUB</literal> table) that contains
+	  substitution information. In other words, <emphasis>text shaping
+	  chooses the correct glyph for a sequence of characters
+	  provided</emphasis>.
+</para>
+</listitem>
+<listitem>
+<para>
+Similarly, each Arabic character has four different variants
+	  corresponding to the different positions it might appear in
+	  within a sequence. Inside a font, there will be separate
+	  glyphs for the initial, medial, final, and isolated forms of
+	  each letter, each at a different glyph ID.
+	</para>
+	<para>
+	  Unicode only assigns one codepoint per character, so a
+	  Unicode string will not tell you which glyph variant to use
+	  for each character. To decide, you need to analyze the whole
+	  string and determine the appropriate glyph for each character
+	  based on its position. In other words, <emphasis>text
+	  shaping chooses the correct form of the letter by its
+	  position and returns the correct glyph from the font</emphasis>.
+</para>
+</listitem>
+<listitem>
+<para>
+Other languages involve marks and accents that need to be
+rendered in specific positions relative a base character. For
+instance, the Moldovan language includes the Cyrillic letter
+&quot;zhe&quot; (ж) with a breve accent, like so: &quot;ӂ&quot;.
+	</para>
+	<para>
+	  Some fonts will provide this character as a single
+	  zhe-with-breve glyph, but other fonts will not and, instead,
+	  will expect the rendering engine to form the character by
+superimposing the separate &quot;ж&quot; and &quot;˘&quot;
+	  glyphs.
+	</para>
+	<para>
+	  But exactly where you should draw the breve depends on the
+	  height and width of the preceding zhe glyph. To find the
+	  right position, you need to consult the table inside
+	  the font (the <literal>GPOS</literal> table) that contains
+	  positioning information.
+In other words, <emphasis>text shaping tells you whether you
+	  have a precomposed glyph within your font or if you need to
+	  compose a glyph yourself out of combining marks&mdash;and,
+	  if so, where to position those marks.</emphasis>
+</para>
+</listitem>
+</itemizedlist>
+<para>
+If tasks like these are something that you need to do, then you
+need a text shaping engine. You could use Uniscribe if you are
+writing Windows software; you could use CoreText on macOS; or
+you could use HarfBuzz.
+</para>
+<note>
+<para>
+	In the rest of this manual, the text will assume that the reader
+	is that implementor of a text-layout engine.
+</para>
+</note>
+</section>
+<section id="what-does-harfbuzz-do">
+<title>What does HarfBuzz do?</title>
+<para>
+HarfBuzz provides text shaping through a cross-platform
+C API that accepts sequences of Unicode codepoints as input. Currently,
+the following OpenType shaping models are supported:
+</para>
+<itemizedlist>
+<listitem>
+	<para>
+	  Indic (covering Devanagari, Bengali, Gujarati,
+	  Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu)
+	</para>
+</listitem>
+<listitem>
+	<para>
+	  Arabic (covering Arabic, N'Ko, Syriac, and Mongolian)
+	</para>
+</listitem>
+<listitem>
+	<para>
+	  Thai and Lao
+	</para>
+</listitem>
+<listitem>
+	<para>
+	  Khmer
+	</para>
+</listitem>
+<listitem>
+	<para>
+	  Myanmar
+	</para>
+</listitem>
+<listitem>
+	<para>
+	  Tibetan
+	</para>
+</listitem>
+<listitem>
+	<para>
+	  Hangul
+	</para>
+</listitem>
+<listitem>
+	<para>
+	  Hebrew
+	</para>
+</listitem>
+<listitem>
+	<para>
+	  The Universal Shaping Engine or <emphasis>USE</emphasis>
+	  (covering complex scripts not covered by the above shaping
+	  models)
+	</para>
+</listitem>
+<listitem>
+	<para>
+	  A default shaping model for non-complex scripts
+	  (covering Latin, Cyrillic, Greek, Armenian, Georgian, Tifinagh,
+	  and many others)
+	</para>
+</listitem>
+<listitem>
+	<para>
+	  Emoji (including emoji modifier sequences, flag sequences,
+	  and ZWJ sequences)
+	</para>
+</listitem>
+</itemizedlist>
+<para>
+In addition to OpenType shaping, HarfBuzz supports the latest
+version of Graphite shaping (the "Graphite 2" model) and AAT
+shaping.
+</para>
+<para>
+HarfBuzz can read and understand TrueType fonts (.ttf), TrueType
+collections (.ttc), and OpenType fonts (.otf, including those
+fonts that contain TrueType-style outlines and those that
+contain PostScript CFF or CFF2 outlines).
+</para>
+<para>
+HarfBuzz is designed and tested to run on top of the FreeType
+font renderer. It can run on Linux, Android, Windows, macOS, and
+iOS systems.
+</para>
+<para>
+In addition to its core shaping functionality, HarfBuzz provides
+functions for accessing other font features, including optional
+GSUB and GPOS OpenType features, as well as
+all color-font formats (<literal>CBDT</literal>,
+<literal>sbix</literal>, <literal>COLR/CPAL</literal>, and
+<literal>SVG-OT</literal>) and OpenType variable fonts. HarfBuzz
+also includes a font-subsetting feature. HarfBuzz can perform
+some low-level math-shaping operations, although it does not
+currently perform full shaping for mathematical typesetting.
+</para>
+<para>
+A suite of command-line utilities is also provided in the
+source-code tree, designed to help users test and debug
+HarfBuzz's features on real-world fonts and input.
+</para>
+</section>
+<section id="what-harfbuzz-doesnt-do">
+<title>What HarfBuzz doesn't do</title>
+<para>
+HarfBuzz will take a Unicode string, shape it, and give you the
+information required to lay it out correctly on a single
+horizontal (or vertical) line using the font provided. That is the
+extent of HarfBuzz's responsibility.
+</para>
+<para>
+It is important to note that if you are implementing a complete
+text-layout engine you may have other responsibilities that
+HarfBuzz will <emphasis>not</emphasis> help you with. For example:
+</para>
+<itemizedlist>
+<listitem>
+<para>
+HarfBuzz won't help you with bidirectionality. If you want to
+lay out text that includes a mix of Hebrew and English, you
+	  will need to ensure that each buffer provided to HarfBuzz
+	  has all of its characters in the same order and that the
+	  directionality of the buffer is set correctly. This may mean
+	  segmenting the text before it is placed into HarfBuzz buffers. In
+other words, the user will hit the keys in the following
+sequence:
+</para>
+<programlisting>
+	  A B C [space] ג ב א [space] D E F
+</programlisting>
+<para>
+but will expect to see in the output:
+</para>
+<programlisting>
+	  ABC אבג DEF
+</programlisting>
+<para>
+This reordering is called <emphasis>bidi processing</emphasis>
+(&quot;bidi&quot; is short for bidirectional), and there's an
+algorithm as an annex to the Unicode Standard which tells you how
+to process a string of mixed directionality.
+Before sending your string to HarfBuzz, you may need to apply the
+bidi algorithm to it. Libraries such as <ulink
+	  url="http://icu-project.org/">ICU</ulink> and <ulink
+	  url="http://fribidi.org/">fribidi</ulink> can do this for you.
+</para>
+</listitem>
+<listitem>
+<para>
+HarfBuzz won't help you with text that contains different font
+properties. For instance, if you have the string &quot;a
+<emphasis>huge</emphasis> breakfast&quot;, and you expect
+&quot;huge&quot; to be italic, then you will need to send three
+strings to HarfBuzz: <literal>a</literal>, in your Roman font;
+<literal>huge</literal> using your italic font; and
+<literal>breakfast</literal> using your Roman font again.
+	</para>
+	<para>
+Similarly, if you change the font, font size, script,
+	  language, or direction within your string, then you will
+	  need to shape each run independently and output them
+	  independently. HarfBuzz expects to shape a run of characters
+	  that all share the same properties.
+</para>
+</listitem>
+<listitem>
+<para>
+HarfBuzz won't help you with line breaking, hyphenation, or
+justification. As mentioned above, HarfBuzz lays out the string
+along a <emphasis>single line</emphasis> of, notionally,
+infinite length. If you want to find out where the potential
+word, sentence and line break points are in your text, you
+could use the ICU library's break iterator functions.
+</para>
+<para>
+HarfBuzz can tell you how wide a shaped piece of text is, which is
+useful input to a justification algorithm, but it knows nothing
+about paragraphs, lines or line lengths. Nor will it adjust the
+space between words to fit them proportionally into a line.
+</para>
+</listitem>
+</itemizedlist>
+<para>
+As a layout-engine implementor, HarfBuzz will help you with the
+interface between your text and your font, and that's something
+that you'll need&mdash;what you then do with the glyphs that your font
+returns is up to you.
+</para>
+</section>
+<section id="why-is-it-called-harfbuzz">
+<title>Why is it called HarfBuzz?</title>
+<para>
+HarfBuzz began its life as text-shaping code within the FreeType
+project (and you will see references to the FreeType authors
+within the source code copyright declarations), but was then
+extracted out to its own project. This project is maintained by
+Behdad Esfahbod, who named it HarfBuzz. Originally, it was a
+shaping engine for OpenType fonts&mdash;&quot;HarfBuzz&quot; is
+the Persian for &quot;open type&quot;.
+</para>
+</section>
+</chapter>

Mercurial > hgrepos > Python2 > PyMuPDF

comparison mupdf-source/thirdparty/harfbuzz/docs/usermanual-what-is-harfbuzz.xml @ 2:b50eed0cc0ef upstream