diff mupdf-source/docs/reference/c/fitz/strings.md @ 2:b50eed0cc0ef upstream

ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4. The directory name has changed: no version number in the expanded directory now.
author Franz Glasner <fzglas.hg@dom66.de>
date Mon, 15 Sep 2025 11:43:07 +0200
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/mupdf-source/docs/reference/c/fitz/strings.md	Mon Sep 15 11:43:07 2025 +0200
@@ -0,0 +1,76 @@
+# Strings
+
+All text strings in MuPDF use the `UTF-8` encoding.
+
+## Unicode
+
+The following functions encode and decode `UTF-8` characters, and return the
+number of bytes used by the `UTF-8` character (at most `FZ_UTFMAX`).
+
+	int fz_chartorune(int *rune, const char *str);
+	int fz_runetochar(char *str, int rune);
+
+## Locale Independent
+
+Since many of the C string functions are locale dependent, we also provide our
+own locale independent versions of these functions. We also have a couple of
+semi-standard functions like `strsep` and `strlcpy` that we can't rely on the
+system providing. These should be pretty self explanatory:
+
+	char *fz_strdup(fz_context *ctx, const char *s);
+	float fz_strtof(const char *s, char **es);
+	char *fz_strsep(char **stringp, const char *delim);
+	size_t fz_strlcpy(char *dst, const char *src, size_t n);
+	size_t fz_strlcat(char *dst, const char *src, size_t n);
+	void *fz_memmem(const void *haystack, size_t haystacklen, const void *needle, size_t needlelen);
+	int fz_strcasecmp(const char *a, const char *b);
+
+There are also a couple of functions to process filenames and URLs:
+
+`char *fz_cleanname(char *path);`
+:	Rewrite path in-place to the shortest string that names the same path.
+	Eliminates multiple and trailing slashes, and interprets "." and "..".
+
+`void fz_dirname(char *dir, const char *path, size_t dir_size);`
+:	Extract the directory component from a path.
+
+`char *fz_urldecode(char *url);`
+:	Decode URL escapes in-place.
+
+## Formatting
+
+Our `printf` family handles the common `printf` formatting characters, with a
+few minor differences. We also support several non-standard formatting
+characters. The same `printf` syntax is used in the `printf` functions in the
+I/O module as well.
+
+	size_t fz_vsnprintf(char *buffer, size_t space, const char *fmt, va_list args);
+	size_t fz_snprintf(char *buffer, size_t space, const char *fmt, ...);
+	char *fz_asprintf(fz_context *ctx, const char *fmt, ...);
+
+`%%`, `%c`, `%e`, `%f`, `%p`, `%x`, `%d`, `%u`, `%s`
+:	These behave as usual, but only take padding (+,0,space), width, and precision arguments.
+
+`%g float`
+:	Prints the `float` in the shortest possible format that won't lose precision, except `NaN` to `0`, `+Inf` to `FLT_MAX`, `-Inf` to `-FLT_MAX`.
+
+`%M fz_matrix*`
+:	Prints all 6 coefficients in the matrix as `%g` separated by spaces.
+
+`%R fz_rect*`
+:	Prints all `x0`, `y0`, `x1`, `y1` in the rectangle as `%g` separated by spaces.
+
+`%P fz_point*`
+:	Prints `x`, `y` in the point as `%g` separated by spaces.
+
+`%C int`
+:	Formats character as `UTF-8`. Useful to print unicode text.
+
+`%q char*`
+:	Formats string using double quotes and C escapes.
+
+`%( char*`
+:	Formats string using parenthesis quotes and Postscript escapes.
+
+`%n char*`
+:	Formats string using prefix `/` and PDF name hex-escapes.