view mupdf-source/docs/reference/c/fitz/xml.md @ 2:b50eed0cc0ef upstream

ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4. The directory name has changed: no version number in the expanded directory now.
author Franz Glasner <fzglas.hg@dom66.de>
date Mon, 15 Sep 2025 11:43:07 +0200
parents
children
line wrap: on
line source

# XML Parser

We have a rudimentary XML parser that handles well formed XML. It does not do
any namespace processing, and it does not validate the XML syntax.

The parser supports `UTF-8`, `UTF-16`, `iso-8859-1`, `iso-8859-7`, `koi8`,
`windows-1250`, `windows-1251`, and `windows-1252` encoded input.

If `preserve_white` is *false*, we will discard all *whitespace-only* text
elements. This is useful for parsing non-text documents such as XPS and SVG.
Preserving whitespace is useful for parsing XHTML.

	typedef struct { opaque } fz_xml_doc;
	typedef struct { opaque } fz_xml;

	fz_xml_doc *fz_parse_xml(fz_context *ctx, fz_buffer *buf, int preserve_white);
	void fz_drop_xml(fz_context *ctx, fz_xml_doc *xml);
	fz_xml *fz_xml_root(fz_xml_doc *xml);

	fz_xml *fz_xml_prev(fz_xml *item);
	fz_xml *fz_xml_next(fz_xml *item);
	fz_xml *fz_xml_up(fz_xml *item);
	fz_xml *fz_xml_down(fz_xml *item);

`int fz_xml_is_tag(fz_xml *item, const char *name);`
:	Returns *true* if the element is a tag with the given name.

`char *fz_xml_tag(fz_xml *item);`
:	Returns the tag name if the element is a tag, otherwise `NULL`.

`char *fz_xml_att(fz_xml *item, const char *att);`
:	Returns the value of the tag element's attribute, or `NULL` if not a tag or missing.

`char *fz_xml_text(fz_xml *item);`
:	Returns the `UTF-8` text of the text element, or `NULL` if not a text element.

`fz_xml *fz_xml_find(fz_xml *item, const char *tag);`
:	Find the next element with the given tag name. Returns the element
	itself if it matches, or the first sibling if it doesn't. Returns
	`NULL` if there is no sibling with that tag name.

`fz_xml *fz_xml_find_next(fz_xml *item, const char *tag);`
:	Find the next sibling element with the given tag name, or `NULL` if none.

`fz_xml *fz_xml_find_down(fz_xml *item, const char *tag);`
:	Find the first child element with the given tag name, or `NULL` if none.