Mercurial > hgrepos > Python2 > PyMuPDF

diff mupdf-source/docs/reference/c/fitz/xml.md @ 2:b50eed0cc0ef upstream
ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4. The directory name has changed: no version number in the expanded directory now.
author: Franz Glasner <fzglas.hg@dom66.de>
date: Mon, 15 Sep 2025 11:43:07 +0200
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/mupdf-source/docs/reference/c/fitz/xml.md	Mon Sep 15 11:43:07 2025 +0200
@@ -0,0 +1,46 @@
+# XML Parser
+
+We have a rudimentary XML parser that handles well formed XML. It does not do
+any namespace processing, and it does not validate the XML syntax.
+
+The parser supports `UTF-8`, `UTF-16`, `iso-8859-1`, `iso-8859-7`, `koi8`,
+`windows-1250`, `windows-1251`, and `windows-1252` encoded input.
+
+If `preserve_white` is *false*, we will discard all *whitespace-only* text
+elements. This is useful for parsing non-text documents such as XPS and SVG.
+Preserving whitespace is useful for parsing XHTML.
+
+	typedef struct { opaque } fz_xml_doc;
+	typedef struct { opaque } fz_xml;
+
+	fz_xml_doc *fz_parse_xml(fz_context *ctx, fz_buffer *buf, int preserve_white);
+	void fz_drop_xml(fz_context *ctx, fz_xml_doc *xml);
+	fz_xml *fz_xml_root(fz_xml_doc *xml);
+
+	fz_xml *fz_xml_prev(fz_xml *item);
+	fz_xml *fz_xml_next(fz_xml *item);
+	fz_xml *fz_xml_up(fz_xml *item);
+	fz_xml *fz_xml_down(fz_xml *item);
+
+`int fz_xml_is_tag(fz_xml *item, const char *name);`
+:	Returns *true* if the element is a tag with the given name.
+
+`char *fz_xml_tag(fz_xml *item);`
+:	Returns the tag name if the element is a tag, otherwise `NULL`.
+
+`char *fz_xml_att(fz_xml *item, const char *att);`
+:	Returns the value of the tag element's attribute, or `NULL` if not a tag or missing.
+
+`char *fz_xml_text(fz_xml *item);`
+:	Returns the `UTF-8` text of the text element, or `NULL` if not a text element.
+
+`fz_xml *fz_xml_find(fz_xml *item, const char *tag);`
+:	Find the next element with the given tag name. Returns the element
+	itself if it matches, or the first sibling if it doesn't. Returns
+	`NULL` if there is no sibling with that tag name.
+
+`fz_xml *fz_xml_find_next(fz_xml *item, const char *tag);`
+:	Find the next sibling element with the given tag name, or `NULL` if none.
+
+`fz_xml *fz_xml_find_down(fz_xml *item, const char *tag);`
+:	Find the first child element with the given tag name, or `NULL` if none.
author	Franz Glasner <fzglas.hg@dom66.de>
date	Mon, 15 Sep 2025 11:43:07 +0200
parents
children