Mercurial > hgrepos > Python2 > PyMuPDF
view mupdf-source/docs/reference/swig.rst @ 7:5ab937c03c27
Apply full RELRO to all generated binaries.
Also strip the generated binaries.
| author | Franz Glasner <fzglas.hg@dom66.de> |
|---|---|
| date | Tue, 16 Sep 2025 12:37:32 +0200 |
| parents | b50eed0cc0ef |
| children |
line wrap: on
line source
.. Copyright (C) 2001-2025 Artifex Software, Inc. .. All Rights Reserved. .. meta:: :description: MuPDF documentation :keywords: MuPDF, pdf, epub C++, Python, and C# =============================================================== .. We define crude substitutions that implement simple expand/contract blocks in html. Unfortunately it doesn't seem possible to pass parameters to substitutions so we can't specify text to be shown next to html's details triangle. .. |expand_begin| raw:: html <details> <summary><strong>Show/hide</strong></summary> .. |expand_end| raw:: html </details> Overview --------------------------------------------------------------- Auto-generated abstracted :title:`C++`, :title:`Python` and :title:`C#` versions of the :title:`MuPDF C API` are available. * The C++ API is machine-generated from the C API header files and adds various abstractions such as automatic contexts and automatic reference counting. * The Python and C# APIs are generated from the C++ API using SWIG, so automatically include the C++ API's abstractions. .. graphviz:: digraph { size="4,4"; labeljust=l; "MuPDF C API" [shape="rectangle"] "MuPDF C++ API" [shape="rectangle"] "SWIG" [shape="oval"] "MuPDF Python API" [shape="rectangle"] "MuPDF C# API" [shape="rectangle"] "MuPDF C API" -> "MuPDF C++ API" [label=" Parse C headers with libclang,\l generate abstractions.\l"] "MuPDF C++ API" -> "SWIG" [label=" Parse C++ headers with SWIG."] "SWIG" -> "MuPDF Python API" "SWIG" -> "MuPDF C# API" } The C++ MuPDF API --------------------------------------------------------------- Basics ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Auto-generated from the MuPDF C API's header files. * Everything is in C++ namespace ``mupdf``. * All functions and methods do not take ``fz_context*`` arguments. (Automatically-generated per-thread contexts are used internally.) * All MuPDF ``setjmp()``/``longjmp()``-based exceptions are converted into C++ exceptions. Low-level C++ API ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The MuPDF C API is provided as low-level C++ functions with ``ll_`` prefixes. * No ``fz_context*`` arguments. * MuPDF exceptions are converted into C++ exceptions. Class-aware C++ API ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C++ wrapper classes wrap most ``fz_*`` and ``pdf_*`` C structs: * Class names are camel-case versions of the wrapped struct's name, for example ``fz_document``'s wrapper class is ``mupdf::FzDocument``. * Classes automatically handle reference counting of the underlying C structs, so there is no need for manual calls to ``fz_keep_*()`` and ``fz_drop_*()``, and class instances can be treated as values and copied arbitrarily. Class-aware functions and methods take and return wrapper class instances instead of MuPDF C structs: * No ``fz_context*`` arguments. * MuPDF exceptions are converted into C++ exceptions. * Class-aware functions have the same names as the underlying C API function. * Args that are pointers to a MuPDF struct will be changed to take a reference to the corresponding wrapper class. * Where a MuPDF function returns a pointer to a struct, the class-aware C++ wrapper will return a wrapper class instance by value. * Class-aware functions that have a C++ wrapper class as their first parameter are also provided as a member function of the wrapper class, with the same name as the class-aware function. * Wrapper classes are defined in ``mupdf/platform/c++/include/mupdf/classes.h``. * Class-aware functions are declared in ``mupdf/platform/c++/include/mupdf/classes2.h``. * Wrapper classes for reference-counted MuPDF structs: * The C++ wrapper classes will have a public ``m_internal`` member that is a pointer to the underlying MuPDF struct. * If a MuPDF C function returns a null pointer to a MuPDF struct, the class-aware C++ wrapper will return an instance of the wrapper class with a null ``m_internal`` member. * The C++ wrapper class will have an ``operator bool()`` that returns true if the ``m_internal`` member is non-null. [Introduced 2024-07-08.] Usually it is more convenient to use the class-aware C++ API rather than the low-level C++ API. C++ Exceptions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C++ exceptions use classes for each ``FZ_ERROR_*`` enum, all derived from a class ``mupdf::FzErrorBase`` which in turn derives from ``std::exception``. For example if MuPDF C code does ``fz_throw(ctx, FZ_ERROR_GENERIC, "something failed")``, this will appear as a C++ exception with type ``mupdf::FzErrorGeneric``. Its ``what()`` method will return ``code=2: something failed``, and it will have a public member ``m_code`` set to ``FZ_ERROR_GENERIC``. Example wrappers ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The MuPDF C API function ``fz_new_buffer_from_page()`` is available as these C++ functions/methods: .. code-block:: c++ // MuPDF C function. fz_buffer *fz_new_buffer_from_page(fz_context *ctx, fz_page *page, const fz_stext_options *options); // MuPDF C++ wrappers. namespace mupdf { // Low-level wrapper: ::fz_buffer *ll_fz_new_buffer_from_page(::fz_page *page, const ::fz_stext_options *options); // Class-aware wrapper: FzBuffer fz_new_buffer_from_page(const FzPage& page, FzStextOptions& options); // Method in wrapper class FzPage: struct FzPage { ... FzBuffer fz_new_buffer_from_page(FzStextOptions& options); ... }; } Extensions beyond the basic C API ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Some generated classes have extra ``begin()`` and ``end()`` methods to allow standard C++ iteration: |expand_begin| .. code-block:: c++ #include "mupdf/classes.h" #include "mupdf/functions.h" #include <iostream> void show_stext(mupdf::FzStextPage& page) { for (mupdf::FzStextPage::iterator it_page: page) { mupdf::FzStextBlock block = *it_page; for (mupdf::FzStextBlock::iterator it_block: block) { mupdf::FzStextLine line = *it_block; for (mupdf::FzStextLine::iterator it_line: line) { mupdf::FzStextChar stextchar = *it_line; fz_stext_char* c = stextchar.m_internal; using namespace mupdf; std::cout << "FzStextChar(" << "c=" << c->c << " color=" << c->color << " origin=" << c->origin << " quad=" << c->quad << " size=" << c->size << " font_name=" << c->font->name << "\n"; } } } } |expand_end| * There are various custom class methods and constructors. * There are extra functions for generating a text representation of 'POD' (plain old data) structs and their C++ wrapper classes. For example for ``fz_rect`` we provide these functions: .. code-block:: c++ std::ostream& operator<< (std::ostream& out, const fz_rect& rhs); std::ostream& operator<< (std::ostream& out, const FzRect& rhs); std::string to_string_fz_rect(const fz_rect& s); std::string to_string(const fz_rect& s); std::string Rect::to_string() const; These each generate text such as: ``(x0=90.51 y0=160.65 x1=501.39 y1=1215.6)`` Runtime environmental variables ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All builds """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" * **MUPDF_mt_ctx** Controls support for multi-threading on startup. * If set with value ``0``, a single ``fz_context*`` is used for all threads; this might give a small performance increase in single-threaded programmes, but will be unsafe in multi-threaded programmes. * Otherwise each thread has its own ``fz_context*``. One can instead call ``mupdf::reinit_singlethreaded()`` on startup to force single-threaded mode. This should be done before any other use of MuPDF. Debug builds only """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" Debug builds contain diagnostics/checking code that is activated via these environmental variables: * **MUPDF_check_refs** If ``1``, generated code checks MuPDF struct reference counts at runtime. * **MUPDF_check_error_stack** If ``1``, generated code outputs a diagnostic if a MuPDF function changes the current ``fz_context``'s error stack depth. * **MUPDF_trace** If ``1`` or ``2``, class-aware code outputs a diagnostic each time it calls a MuPDF function (apart from keep/drop functions). If ``2``, low-level wrappers output a diagnostic each time they are called. We also show arg POD and pointer values. * **MUPDF_trace_director** If ``1``, generated code outputs a diagnostic when doing special handling of MuPDF structs containing function pointers. * **MUPDF_trace_exceptions** If ``1``, generated code outputs diagnostics when it converts MuPDF ``setjmp()``/``longjmp()`` exceptions into C++ exceptions. * **MUPDF_trace_keepdrop** If ``1``, generated code outputs diagnostics for calls to ``*_keep_*()`` and ``*_drop_*()``. Limitations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Global instances of C++ wrapper classes are not supported. This is because: * C++ wrapper class destructors generally call MuPDF functions (for example ``fz_drop_*()``). * The C++ bindings use internal thread-local objects to allow per-thread ``fz_context``'s to be efficiently obtained for use with underlying MuPDF functions. * C++ globals are destructed *after* thread-local objects are destructed. So if a global instance of a C++ wrapper class is created, its destructor will attempt to get a ``fz_context*`` using internal thread-local objects which will have already been destroyed. We attempt to display a diagnostic when this happens, but this cannot be relied on as behaviour is formally undefined. The Python and C# MuPDF APIs --------------------------------------------------------------- * A Python module called ``mupdf``. * A C# namespace called ``mupdf``. * Auto-generated from the C++ MuPDF API using SWIG, so inherits the abstractions of the C++ API: * No ``fz_context*`` arguments. * Automatic reference counting, so no need to call ``fz_keep_*()`` or ``fz_drop_*()``, and we have value-semantics for class instances. * Native Python and C# exceptions. * Output parameters are returned as tuples. For example MuPDF C function ``fz_read_best()`` has prototype:: fz_buffer *fz_read_best(fz_context *ctx, fz_stream *stm, size_t initial, int *truncated); The class-aware Python wrapper is:: mupdf.fz_read_best(stm, initial) and returns ``(buffer, truncated)``, where ``buffer`` is a SWIG proxy for a ``mupdf::FzBuffer`` instance and ``truncated`` is an integer. * Allows implementation of mutool in Python - see `mupdf:scripts/mutool.py <https://git.ghostscript.com/?p=mupdf.git;a=blob;f=scripts/mutool.py>`_ and `mupdf:scripts/mutool_draw.py <https://git.ghostscript.com/?p=mupdf.git;a=blob;f=scripts/mutool_draw.py>`_. * Provides text representation of simple 'POD' structs: .. code-block:: python rect = mupdf.FzRect(...) print(rect) # Will output text such as: (x0=90.51 y0=160.65 x1=501.39 y1=215.6) * This works for classes where the C++ API defines a ``to_string()`` method as described above. * Python classes will have a ``__str__()` method, and an identical `__repr__()`` method. * C# classes will have a ``ToString()`` method. * Uses SWIG Director classes to allow C function pointers in MuPDF structs to call Python code. Installing the Python mupdf module using ``pip`` --------------------------------------------------------------- The Python ``mupdf`` module is available on the `Python Package Index (PyPI) website <https://pypi.org/>`_. * Install with ``pip install mupdf``. * Pre-built Wheels (binary Python packages) are provided for Windows and Linux. * For more information on the latest release, see changelog below and: https://pypi.org/project/mupdf/ Doxygen/Pydoc API documentation --------------------------------------------------------------- Auto-generated documentation for the C, C++ and Python APIs is available at: https://ghostscript.com/~julian/mupdf-bindings/ * All content is generated from the comments in MuPDF header files. * This documentation is generated from an internal development tree, so may contain features that are not yet publicly available. * It is updated only intermittently. Example client code --------------------------------------------------------------- Using the Python API ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Minimal Python code that uses the ``mupdf`` module:: import mupdf document = mupdf.FzDocument('foo.pdf') A simple example Python test script (run by ``scripts/mupdfwrap.py -t``) is: * `scripts/mupdfwrap_test.py <https://git.ghostscript.com/?p=mupdf.git;a=blob;f=scripts/mupdfwrap_test.py>`_ More detailed usage of the Python API can be found in: * `scripts/mutool.py <https://git.ghostscript.com/?p=mupdf.git;a=blob;f=scripts/mutool.py>`_ * `scripts/mutool_draw.py <https://git.ghostscript.com/?p=mupdf.git;a=blob;f=scripts/mutool_draw.py>`_ **Example Python code that shows all available information about a document's Stext blocks, lines and characters**: |expand_begin| :: #!/usr/bin/env python3 import mupdf def show_stext(document): ''' Shows all available information about Stext blocks, lines and characters. ''' for p in range(document.fz_count_pages()): page = document.fz_load_page(p) stextpage = mupdf.FzStextPage(page, mupdf.FzStextOptions()) for block in stextpage: block_ = block.m_internal log(f'block: type={block_.type} bbox={block_.bbox}') for line in block: line_ = line.m_internal log(f' line: wmode={line_.wmode}' + f' dir={line_.dir}' + f' bbox={line_.bbox}' ) for char in line: char_ = char.m_internal log(f' char: {chr(char_.c)!r} c={char_.c:4} color={char_.color}' + f' origin={char_.origin}' + f' quad={char_.quad}' + f' size={char_.size:6.2f}' + f' font=(' + f'is_mono={char_.font.flags.is_mono}' + f' is_bold={char_.font.flags.is_bold}' + f' is_italic={char_.font.flags.is_italic}' + f' ft_substitute={char_.font.flags.ft_substitute}' + f' ft_stretch={char_.font.flags.ft_stretch}' + f' fake_bold={char_.font.flags.fake_bold}' + f' fake_italic={char_.font.flags.fake_italic}' + f' has_opentype={char_.font.flags.has_opentype}' + f' invalid_bbox={char_.font.flags.invalid_bbox}' + f' name={char_.font.name}' + f')' ) document = mupdf.FzDocument('foo.pdf') show_stext(document) |expand_end| Basic PDF viewers written in Python and C# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * `scripts/mupdfwrap_gui.py <https://git.ghostscript.com/?p=mupdf.git;a=blob;f=scripts/mupdfwrap_gui.py>`_ * `scripts/mupdfwrap_gui.cs <https://git.ghostscript.com/?p=mupdf.git;a=blob;f=scripts/mupdfwrap_gui.cs>`_ * Build and run with: * ``./scripts/mupdfwrap.py -b all --test-python-gui`` * ``./scripts/mupdfwrap.py -b --csharp all --test-csharp-gui`` Building the C++, Python and C# MuPDF APIs from source --------------------------------------------------------------- General requirements ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Windows, Linux, MacOS or OpenBSD. * Build should take place inside a Python `venv <https://docs.python.org/3.8/library/venv.html>`_. * `libclang Python interface onto <https://libclang.readthedocs.io/en/latest/index.html>`_ the `clang C/C++ parser <https://clang.llvm.org/>`_. * `swig <https://swig.org/>`_, for Python and C# bindings. * `Mono <https://www.mono-project.com/>`_, for C# bindings on platforms other than Windows. Setting up ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Windows only """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" * Install Python. * Use the Python Windows installer from the python.org website: http://www.python.org/downloads * Don't use other installers such as the Microsoft Store Python package. * If Microsoft Store Python is already installed, leave it in place and install from python.org on top of it - uninstalling before running the python.org installer has been known to cause problems. * A default installation is sufficient. * Debug binaries are required for debug builds of the MuPDF Python API. * If "Customize Installation" is chosen, make sure to include "py launcher" so that the ``py`` command will be available. * Also see: https://docs.python.org/3/using/windows.html * Install Visual Studio 2019. Later versions may not work with MuPDF's solution and build files. All platforms """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" * Get the latest version of MuPDF in git. .. code-block:: shell git clone --recursive git://git.ghostscript.com/mupdf.git * Create and enter a `Python venv <https://docs.python.org/3.8/library/venv.html>`_ and upgrade pip. * Windows. .. code-block:: bat py -m venv pylocal .\pylocal\Scripts\activate python -m pip install --upgrade pip * Linux, MacOS, OpenBSD .. code-block:: shell python3 -m venv pylocal . pylocal/bin/activate python -m pip install --upgrade pip General build flags ~~~~~~~~~~~~~~~~~~~ In all of the commands below, one can set environmental variables to control the build of the underlying MuPDF C API, for example ``USE_SYSTEM_LIBJPEG=yes``. In addition, ``XCXXFLAGS`` can be used to set additional C++ compiler flags when building the C++ and Python bindings (the name is analogous to the ``XCFLAGS`` used by MuPDF's makefile when compiling the core library). Building and installing the Python bindings using ``pip`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Windows, Linux, MacOS. .. code-block:: shell cd mupdf && pip install -vv . * OpenBSD. Building using ``pip`` is not supported because ``libclang`` is not available from pypi.org so pip will fail to install prerequisites from ``pypackage.toml``. Instead one can run ``setup.py`` directly: .. code-block:: shell cd mupdf && setup.py install Building the Python bindings ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Windows, Linux, MacOS. .. code-block:: shell pip install libclang swig setuptools cd mupdf && python scripts/mupdfwrap.py -b all * OpenBSD. ``libclang`` is not available from pypi.org, but we can instead use the system ``py3-llvm`` package. .. code-block:: shell sudo pkg_add py3-llvm pip install swig setuptools cd mupdf && python scripts/mupdfwrap.py -b all Building the C++ bindings ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Windows, Linux, MacOS. .. code-block:: shell pip install libclang setuptools cd mupdf && python scripts/mupdfwrap.py -b m01 * OpenBSD. ``libclang`` is not available from pypi.org, but we can instead use the system ``py3-llvm`` package. .. code-block:: shell sudo pkg_add py3-llvm pip install setuptools cd mupdf && python scripts/mupdfwrap.py -b m01 Building the C# bindings ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Windows. .. code-block:: shell pip install libclang swig setuptools cd mupdf && python scripts/mupdfwrap.py -b --csharp all * Linux. .. code-block:: shell sudo apt install mono-devel pip install libclang swig cd mupdf && python scripts/mupdfwrap.py -b --csharp all * MacOS. Building the C# bindings on MacOS is not currently supported. * OpenBSD. .. code-block:: shell sudo pkg_add py3-llvm mono pip install swig setuptools cd mupdf && python scripts/mupdfwrap.py -b --csharp all Using the bindings ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To use the bindings, one has to tell the OS where to find the MuPDF runtime files. * C++ and C# bindings: * Windows. .. code-block:: shell set PATH=.../mupdf/build/shared-release-x64-py3.11;%PATH% * Replace ``x64`` with ``x32`` if using 32-bit. * Replace ``3.11`` with the appropriate python version number. * Linux, OpenBSD. .. code-block:: shell LD_LIBRARY_PATH=.../mupdf/build/shared-release (``LD_LIBRARY_PATH`` must be an absolute path.) * MacOS. .. code-block:: shell DYLD_LIBRARY_PATH=.../mupdf/build/shared-release * Python bindings: If the bindings have been built and installed using ``pip install``, they will already be available within the venv. Otherwise: * Windows. .. code-block:: shell PYTHONPATH=.../mupdf/build/shared-release-x64-py3.11 * Replace ``x64`` with ``x32`` if using 32-bit. * Replace ``3.11`` with the appropriate python version number. * Linux, MacOS, OpenBSD. .. code-block:: shell PYTHONPATH=.../mupdf/build/shared-release Notes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Running tests. Basic tests can be run by appending args to the ``scripts/mupdfwrap.py`` command. This will also demonstrate how to set environment variables such as ``PYTHONPATH`` or ``LD_LIBRARY_PATH`` to the MuPDF build directory. * Python tests. * ``--test-python`` * ``--test-python-gui`` * C# tests. * ``--test-csharp`` * ``--test-csharp-gui`` * C++ tests. * ``--test-cpp`` * C++ bindings and ``NDEBUG``. When building client code that uses the C++ bindings, ``NDEBUG`` must be defined/undefined to match how the C++ bindings were built. By default the C++ bindings are a release build with ``NDEBUG`` defined, so usually client code must also be built with ``NDEBUG`` defined. Otherwise there will be build errors for missing C++ destructors, for example ``mupdf::FzMatrix::~FzMatrix()``. [This is because we define some destructors in debug builds only; this allows internal reference counting checks.] * Specifying the location of Visual Studio's ``devenv.com`` on Windows. ``scripts/mupdfwrap.py`` looks for Visual Studio's ``devenv.com`` in standard locations; this can be overridden with: .. code-block:: shell python scripts/mupdfwrap.py -b --devenv <devenv.com-location> ... * Specifying compilers. On non-Windows, we use ``cc`` and ``c++`` as default C and C++ compilers; override by setting environment variables ``$CC`` and ``$CXX``. * OpenBSD ``libclang``. * ``libclang`` cannot be installed with pip on OpenBSD - wheels are not available and building from source fails. However unlike on other platforms, the system python-clang package (``py3-llvm``) is integrated with the system's libclang and can be used directly. So the above examples use ``pkg_add py3-llvm``. * Alternatives to Python package ``libclang`` generally do not work. For example pypi.org's `clang <https://pypi.org/project/clang/>`_, or Debian's `python-clang <https://packages.debian.org/search?keywords=python+clang&searchon=names&suite=stable§ion=all>`_. These are inconvenient to use because they require explicit setting of ``LD_LIBRARY_PATH`` to point to the correct libclang dynamic library. * Debug builds. One can specify a debug build using the ``-d <build-directory>`` arg before ``-b``. .. code-block:: shell python ./scripts/mupdfwrap.py -d build/shared-debug -b ... * Debug builds of the Python and C# bindings on Windows have not been tested. There may be issues with requiring a debug version of the Python interpreter, for example ``python311_d.lib``. * C# build failure: ``cstring.i not implemented for this target`` and/or ``Unknown directive '%cstring_output_allocate'``. This is probably because SWIG does not include support for C#. This has been seen in the past but as of 2023-07-19 pypi.org's default swig seems ok. A possible solution is to install SWIG using the system package manager, for example ``sudo apt install swig`` on Linux, or use ``./scripts/mupdfwrap.py --swig-windows-auto ...`` on Windows. * More information about running ``scripts/mupdfwrap.py``. * Run ``python ./scripts/mupdfwrap.py -h``. * Read the doc-string at beginning of ``scripts/wrap/__main__.py+``. How ``scripts/mupdfwrap.py`` builds the APIs ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Building the MuPDF C API """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" * On Unix, runs ``make`` on MuPDF's ``Makefile`` with ``shared=yes``. * On Windows, runs ``devenv.com`` on ``.sln`` and ``.vcxproj`` files within MuPDF's `platform/win32/ <https://git.ghostscript.com/?p=mupdf.git;a=tree;f=platform/win32>`_ directory. Generation of the MuPDF C++ API """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" * Uses clang-python to parse MuPDF's C API. * Generates C++ code that wraps the basic C interface, converting MuPDF ``setjmp()``/``longjmp()`` exceptions into C++ exceptions and automatically handling ``fz_context``'s internally. * Generates C++ wrapper classes for each ``fz_*`` and ``pdf_*`` struct, and uses various heuristics to define constructors, methods and static methods that call ``fz_*()`` and ``pdf_*()`` functions. These classes' constructors and destructors automatically handle reference counting so class instances can be copied arbitrarily. * C header file comments are copied into the generated C++ header files. * Compile and link the generated C++ code to create shared libraries. Generation of the MuPDF Python and C# APIs """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" * Uses SWIG to parse the previously-generated C++ headers and generate C++, Python and C# code. * Defines some custom-written Python and C# functions and methods, for example so that out-params are returned as tuples. * If SWIG is version 4+, C++ comments are converted into Python doc-comments. * Compile and link the SWIG-generated C++ code to create shared libraries. Building auto-generated MuPDF API documentation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Build HTML documentation for the C, C++ and Python APIs (using Doxygen and pydoc): .. code-block:: shell python ./scripts/mupdfwrap.py --doc all This will generate the following tree: .. code-block:: text mupdf/docs/generated/ index.html c/ c++/ python/ All content is ultimately generated from the MuPDF C header file comments. As of 2022-2-5, it looks like ``swig -doxygen`` (swig-4.02) ignores single-line ``/** ... */`` comments, so the generated Python code (and hence also Pydoc documentation) is missing information. Generated files ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All generated files are within the MuPDF checkout. * C++ headers for the MuPDF C++ API are in ``platform/c++/include/``. * Files required at runtime are in ``build/shared-release/``. **Details** .. code-block:: text mupdf/ build/ shared-release/ [Unix runtime files.] libmupdf.so [MuPDF C API, not MacOS.] libmupdf.dylib [MuPDF C API, MacOS.] libmupdfcpp.so [MuPDF C++ API.] mupdf.py [MuPDF Python API.] _mupdf.so [MuPDF Python API internals.] mupdf.cs [MuPDF C# API.] mupdfcsharp.so [MuPDF C# API internals.] shared-debug/ [as shared-release but debug build.] shared-release-x*-py*/ [Windows runtime files.] mupdfcpp.dll [MuPDF C and C++ API, x32.] mupdfcpp64.dll [MuPDF C and C++ API, x64.] mupdf.py [MuPDF Python API.] _mupdf.pyd [MuPDF Python API internals.] mupdf.cs [MuPDF C# API.] mupdfcsharp.dll [MuPDF C# API internals.] platform/ c++/ include/ [MuPDF C++ API header files.] mupdf/ classes.h classes2.h exceptions.h functions.h internal.h implementation/ [MuPDF C++ implementation source files.] classes.cpp classes2.cpp exceptions.cpp functions.cpp internal.cpp generated.pickle [Information from clang parse step, used by later stages.] windows_mupdf.def [List of MuPDF public global data, used when linking mupdfcpp.dll.] python/ [SWIG Python files.] mupdfcpp_swig.i [SWIG input file.] mupdfcpp_swig.i.cpp [SWIG output file.] csharp/ [SWIG C# files.] mupdf.cs [SWIG output file, no out-params helpers.] mupdfcpp_swig.i [SWIG input file.] mupdfcpp_swig.i.cpp [SWIG output file.] win32/ Release/ [Windows 32-bit .dll, .lib, .exp, .pdb etc.] x64/ Release/ [Windows 64-bit .dll, .lib, .exp, .pdb etc.] mupdfcpp64.dll [Copied to build/shared-release*/mupdfcpp64.dll] mupdfpyswig.dll [Copied to build/shared-release*/_mupdf.pyd] mupdfcpp64.lib mupdfpyswig.lib win32-vs-upgrade/ [used instead of win32/ if PYMUPDF_SETUP_MUPDF_VS_UPGRADE is '1'.] Windows-specifics --------------------------------------------------------------- Required predefined macros ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Code that will use the MuPDF DLL must be built with ``FZ_DLL_CLIENT`` predefined. The MuPDF DLL itself is built with ``FZ_DLL`` predefined. DLLs ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ There is no separate C library, instead the C and C++ APIs are both in ``mupdfcpp.dll``, which is built by running devenv on ``platform/win32/mupdf.sln``. The Python SWIG library is called ``_mupdf.pyd`` which, despite the name, is a standard Windows DLL, built from ``platform/python/mupdfcpp_swig.i.cpp``. DLL export of functions and data ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ On Windows, ``include/mupdf/fitz/export.h`` defines ``FZ_FUNCTION`` and ``FZ_DATA` to `__declspec(dllexport)` and/or `__declspec(dllimport)`` depending on whether ``FZ_DLL`` or ``FZ_DLL_CLIENT`` are defined. All MuPDF C headers prefix declarations of public global data with ``FZ_DATA``. In generated C++ code: * Data declarations and definitions are prefixed with ``FZ_DATA``. * Function declarations and definitions are prefixed with ``FZ_FUNCTION``. * Class method declarations and definitions are prefixed with ``FZ_FUNCTION``. When building ``mupdfcpp.dll`` on Windows we link with the auto-generated ``platform/c++/windows_mupdf.def`` file; this lists all C public global data. For reasons that are not fully understood, we don't seem to need to tag C functions with ``FZ_FUNCTION``, but this is required for C++ functions otherwise we get unresolved symbols when building MuPDF client code. Building the DLLs ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We build Windows binaries by running ``devenv.com`` directly. Building ``_mupdf.pyd`` is tricky because it needs to be built with a specific ``Python.h`` and linked with a specific ``python.lib``. This is done by setting environmental variables ``MUPDF_PYTHON_INCLUDE_PATH`` and ``MUPDF_PYTHON_LIBRARY_PATH`` when running ``devenv.com``, which are referenced by ``platform/win32/mupdfpyswig.vcxproj``. Thus one cannot easily build ``_mupdf.pyd`` directly from the Visual Studio GUI. [In the git history there is code that builds ``_mupdf.pyd`` by running the Windows compiler and linker ``cl.exe`` and ``link.exe`` directly, which avoids the complications of going via devenv, at the expense of needing to know where ``cl.exe`` and ``link.exe`` are.] C++ bindings details --------------------------------------------------------------- Wrapper functions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Wrappers for a MuPDF function ``fz_foo()`` are available in multiple forms: * Functions in the ``mupdf`` namespace. * ``mupdf::ll_fz_foo()`` * Low-level wrapper: * Does not take ``fz_context*`` arg. * Translates MuPDF exceptions into C++ exceptions. * Takes/returns pointers to MuPDF structs. * Code that uses these functions will need to make explicit calls to ``fz_keep_*()`` and ``fz_drop_*()``. * ``mupdf::fz_foo()`` * High-level class-aware wrapper: * Does not take ``fz_context*`` arg. * Translates MuPDF exceptions into C++ exceptions. * Takes references to C++ wrapper class instances instead of pointers to MuPDF structs. * Where applicable, returns C++ wrapper class instances instead of pointers to MuPDF structs. * Code that uses these functions does not need to call ``fz_keep_*()`` and ``fz_drop_*()`` - C++ wrapper class instances take care of reference counting internally. * Class methods * Where ``fz_foo()`` has a first arg (ignoring any ``fz_context*`` arg) that takes a pointer to a MuPDF struct ``foo_bar``, it is generally available as a member function of the wrapper class ``mupdf::FooBar``: * ``mupdf::FooBar::fz_foo()`` * Apart from being a member function, this is identical to class-aware wrapper ``mupdf::fz_foo()``, for example taking references to wrapper classes instead of pointers to MuPDF structs. Constructors using MuPDF functions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Wrapper class constructors are created for each MuPDF function that returns an instance of a MuPDF struct. Sometimes two such functions do not have different arg types so C++ overloading cannot distinguish between them as constructors (because C++ constructors do not have names). We cope with this in two ways: * Create a static method that returns a new instance of the wrapper class by value. * This is not possible if the underlying MuPDF struct is not copyable - i.e. not reference counted and not POD. * Define an enum within the wrapper class, and provide a constructor that takes an instance of this enum to specify which MuPDF function to use. Default constructors ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All wrapper classes have a default constructor. * For POD classes each member is set to a default value with ``this->foo = {};``. Arrays are initialised by setting all bytes to zero using ``memset()``. * For non-POD classes, class member ``m_internal`` is set to ``nullptr``. * Some classes' default constructors are customized, for example: * The default constructor for ``fz_color_params`` wrapper ``mupdf::FzColorParams`` sets state to a copy of ``fz_default_color_params``. * The default constructor for ``fz_md5`` wrapper ``mupdf::FzMd5`` sets state using ``fz_md5_init()``. * These are described in class definition comments in ``platform/c++/include/mupdf/classes.h``. Raw constructors ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Many wrapper classes have constructors that take a pointer to the underlying MuPDF C struct. These are usually for internal use only. They do not call ``fz_keep_*()`` - it is expected that any supplied MuPDF struct is already owned. POD wrapper classes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Class wrappers for MuPDF structs default to having a ``m_internal`` member which points to an instance of the wrapped struct. This works well for MuPDF structs which support reference counting, because we can automatically create copy constructors, ``operator=`` functions and destructors that call the associated ``fz_keep_*()`` and ``fz_drop_*()`` functions. However where a MuPDF struct does not support reference counting and contains simple data, it is not safe to copy a pointer to the struct, so the class wrapper will be a POD class. This is done in one of two ways: * ``m_internal`` is an instance of the MuPDF struct, not a pointer. * Sometimes we provide members that give direct access to fields in ``m_internal``. * An 'inline' POD - there is no ``m_internal`` member; instead the wrapper class contains the same members as the MuPDF struct. This can be a little more convenient to use. Extra static methods ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Where relevant, wrapper class can have static methods that wrap selected MuPDF functions. For example ``FzMatrix`` does this for ``fz_concat()``, ``fz_scale()`` etc, because these return the result by value rather than modifying a ``fz_matrix`` instance. Miscellaneous custom wrapper classes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The wrapper for ``fz_outline_item`` does not contain a ``fz_outline_item`` by value or pointer. Instead it defines C++-style member equivalents to ``fz_outline_item``'s fields, to simplify usage from C++ and Python/C#. The fields are initialised from a ``fz_outline_item`` when the wrapper class is constructed. In this particular case there is no need to hold on to a ``fz_outline_item``, and the use of ``std::string`` ensures that value semantics can work. Extra functions in C++, Python and C# --------------------------------------------------------------- [These functions are available as low-level functions, class-aware functions and class methods.] .. code-block:: c++ /** C++ alternative to ``fz_lookup_metadata()`` that returns a ``std::string`` or calls ``fz_throw()`` if not found. */ FZ_FUNCTION std::string fz_lookup_metadata2(fz_context* ctx, fz_document* doc, const char* key); /** C++ alternative to ``pdf_lookup_metadata()`` that returns a ``std::string`` or calls ``fz_throw()`` if not found. */ FZ_FUNCTION std::string pdf_lookup_metadata2(fz_context* ctx, pdf_document* doc, const char* key); /** C++ alternative to ``fz_md5_pixmap()`` that returns the digest by value. */ FZ_FUNCTION std::vector<unsigned char> fz_md5_pixmap2(fz_context* ctx, fz_pixmap* pixmap); /** C++ alternative to fz_md5_final() that returns the digest by value. */ FZ_FUNCTION std::vector<unsigned char> fz_md5_final2(fz_md5* md5); /** */ FZ_FUNCTION long long fz_pixmap_samples_int(fz_context* ctx, fz_pixmap* pixmap); /** Provides simple (but slow) access to pixmap data from Python and C#. */ FZ_FUNCTION int fz_samples_get(fz_pixmap* pixmap, int offset); /** Provides simple (but slow) write access to pixmap data from Python and C#. */ FZ_FUNCTION void fz_samples_set(fz_pixmap* pixmap, int offset, int value); /** C++ alternative to fz_highlight_selection() that returns quads in a std::vector. */ FZ_FUNCTION std::vector<fz_quad> fz_highlight_selection2(fz_context* ctx, fz_stext_page* page, fz_point a, fz_point b, int max_quads); struct fz_search_page2_hit {{ fz_quad quad; int mark; }}; /** C++ alternative to fz_search_page() that returns information in a std::vector. */ FZ_FUNCTION std::vector<fz_search_page2_hit> fz_search_page2(fz_context* ctx, fz_document* doc, int number, const char* needle, int hit_max); /** C++ alternative to fz_string_from_text_language() that returns information in a std::string. */ FZ_FUNCTION std::string fz_string_from_text_language2(fz_text_language lang); /** C++ alternative to fz_get_glyph_name() that returns information in a std::string. */ FZ_FUNCTION std::string fz_get_glyph_name2(fz_context* ctx, fz_font* font, int glyph); /** Extra struct containing fz_install_load_system_font_funcs()'s args, which we wrap with virtual_fnptrs set to allow use from Python/C# via Swig Directors. */ typedef struct fz_install_load_system_font_funcs_args {{ fz_load_system_font_fn* f; fz_load_system_cjk_font_fn* f_cjk; fz_load_system_fallback_font_fn* f_fallback; }} fz_install_load_system_font_funcs_args; /** Alternative to fz_install_load_system_font_funcs() that takes args in a struct, to allow use from Python/C# via Swig Directors. */ FZ_FUNCTION void fz_install_load_system_font_funcs2(fz_context* ctx, fz_install_load_system_font_funcs_args* args); /** Internal singleton state to allow Swig Director class to find fz_install_load_system_font_funcs_args class wrapper instance. */ FZ_DATA extern void* fz_install_load_system_font_funcs2_state; /** Helper for calling ``fz_document_handler::open`` function pointer via Swig from Python/C#. */ FZ_FUNCTION fz_document* fz_document_handler_open(fz_context* ctx, const fz_document_handler *handler, fz_stream* stream, fz_stream* accel, fz_archive* dir, void* recognize_state); /** Helper for calling a ``fz_document_handler::recognize`` function pointer via Swig from Python/C#. */ FZ_FUNCTION int fz_document_handler_recognize(fz_context* ctx, const fz_document_handler *handler, const char *magic); /** Swig-friendly wrapper for pdf_choice_widget_options(), returns the options directly in a vector. */ FZ_FUNCTION std::vector<std::string> pdf_choice_widget_options2(fz_context* ctx, pdf_annot* tw, int exportval); /** Swig-friendly wrapper for fz_new_image_from_compressed_buffer(), uses specified ``decode`` and ``colorkey`` if they are not null (in which case we assert that they have size ``2*fz_colorspace_n(colorspace)``). */ FZ_FUNCTION fz_image* fz_new_image_from_compressed_buffer2( fz_context* ctx, int w, int h, int bpc, fz_colorspace* colorspace, int xres, int yres, int interpolate, int imagemask, const std::vector<float>& decode, const std::vector<int>& colorkey, fz_compressed_buffer* buffer, fz_image* mask ); /** Swig-friendly wrapper for pdf_rearrange_pages(). */ void pdf_rearrange_pages2( fz_context* ctx, pdf_document* doc, const std::vector<int>& pages, pdf_clean_options_structure structure ); /** Swig-friendly wrapper for pdf_subset_fonts(). */ void pdf_subset_fonts2(fz_context *ctx, pdf_document *doc, const std::vector<int>& pages); /** Swig-friendly and typesafe way to do fz_snprintf(fmt, value). ``fmt`` must end with one of 'efg' otherwise we throw an exception. */ std::string fz_format_double(fz_context* ctx, const char* fmt, double value); struct fz_font_ucs_gid {{ unsigned long ucs; unsigned int gid; }}; /** SWIG-friendly wrapper for fz_enumerate_font_cmap(). */ std::vector<fz_font_ucs_gid> fz_enumerate_font_cmap2(fz_context* ctx, fz_font* font); /** SWIG-friendly wrapper for pdf_set_annot_callout_line(). */ void pdf_set_annot_callout_line2(fz_context *ctx, pdf_annot *annot, std::vector<fz_point>& callout); /** SWIG-friendly wrapper for fz_decode_barcode_from_display_list(), avoiding leak of the returned string. */ std::string fz_decode_barcode_from_display_list2(fz_context *ctx, fz_barcode_type *type, fz_display_list *list, fz_rect subarea, int rotate); /** SWIG-friendly wrapper for fz_decode_barcode_from_pixmap(), avoiding leak of the returned string. */ std::string fz_decode_barcode_from_pixmap2(fz_context *ctx, fz_barcode_type *type, fz_pixmap *pix, int rotate); /** SWIG-friendly wrapper for fz_decode_barcode_from_page(), avoiding leak of the returned string. */ std::string fz_decode_barcode_from_page2(fz_context *ctx, fz_barcode_type *type, fz_page *page, fz_rect subarea, int rotate); Python/C# bindings details --------------------------------------------------------------- Extra Python functions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Access to raw C arrays """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" The following functions can be used from Python to get access to raw data: * ``mupdf.bytes_getitem(array, index)``: Gives access to individual items in an array of ``unsigned char``'s, for example in the data returned by ``mupdf::FzPixmap``'s ``samples()`` method. * ``mupdf.floats_getitem(array, index)``: Gives access to individual items in an array of ``float``'s, for example in ``fz_stroke_state``'s ``float dash_list[32]`` array. Generated with SWIG code ``carrays.i`` and ``array_functions(float, floats);``. * ``mupdf.python_buffer_data(b)``: returns a SWIG wrapper for a ``const unsigned char*`` pointing to a Python buffer instance's raw data. For example ``b`` can be a Python ``bytes`` or ``bytearray`` instance. * ``mupdfpython_mutable_buffer_data(b)``: returns a SWIG wrapper for an ``unsigned char*`` pointing to a Python buffer instance's raw data. For example ``b`` can be a Python ``bytearray`` instance. [These functions are implemented internally using SWIG's ``carrays.i`` and ``pybuffer.i``. Python differences from C API ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [The functions described below are also available as class methods.] Custom methods """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" Python and C# code does not easily handle functions that return raw data, for example as an ``unsigned char*`` that is not a zero-terminated string. Sometimes we provide a C++ method that returns a ``std::vector`` by value, so that Python and C# code can wrap it in a systematic way. For example ``Md5::fz_md5_final2()``. For all functions described below, there is also a ``ll_*`` variant that takes/returns raw MuPDF structs instead of wrapper classes. New functions """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" * ``fz_buffer_extract_copy()``: Returns copy of buffer data as a Python ``bytes``. * ``fz_buffer_storage_memoryview(buffer, writable)``: Returns a readonly/writable Python memoryview onto ``buffer``. Relies on ``buffer`` existing and not changing size while the memory view is used. * ``fz_pixmap_samples_memoryview()``: Returns Python ``memoryview`` onto ``fz_pixmap`` data. * ``fz_lookup_metadata2(fzdocument, key)``: Return key value or raise an exception if not found: * ``pdf_lookup_metadata2(pdfdocument, key)``: Return key value or raise an exception if not found: Implemented in Python """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" * ``fz_format_output_path()`` * ``fz_story_positions()`` * ``pdf_dict_getl()`` * ``pdf_dict_putl()`` Non-standard API or implementation """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" * ``fz_buffer_extract()``: Returns a *copy* of the original buffer data as a Python ``bytes``. Still clears the buffer. * ``fz_buffer_storage()``: Returns ``(size, data)`` where ``data`` is a low-level SWIG representation of the buffer's storage. * ``fz_convert_color()``: No ``float* fv`` param, instead returns ``(rgb0, rgb1, rgb2, rgb3)``. * ``fz_fill_text()``: ``color`` arg is tuple/list of 1-4 floats. * ``fz_lookup_metadata(fzdocument, key)``: Return key value or None if not found: * ``fz_new_buffer_from_copied_data()``: Takes a Python ``bytes`` (or other Python buffer) instance. * ``fz_set_error_callback()``: Takes a Python callable; no ``void* user`` arg. * ``fz_set_warning_callback()``: Takes a Python callable; no ``void* user`` arg. * ``fz_warn()``: Takes single Python ``str`` arg. * ``pdf_dict_putl_drop()``: Always raises exception because not useful with automatic ref-counts. * ``pdf_load_field_name()``: Uses extra C++ function ``pdf_load_field_name2()`` which returns ``std::string`` by value. * ``pdf_lookup_metadata(pdfdocument, key)``: Return key value or None if not found: * ``pdf_set_annot_color()``: Takes single ``color`` arg which must be float or tuple of 1-4 floats. * ``pdf_set_annot_interior_color()``: Takes single ``color`` arg which must be float or tuple of 1-4 floats. * ``fz_install_load_system_font_funcs()``: Takes Python callbacks with no ``ctx`` arg, which can return ``None``, ``fz_font*`` or a ``mupdf.FzFont``. Example usage (from ``scripts/mupdfwrap_test.py:test_install_load_system_font()``):: def font_f(name, bold, italic, needs_exact_metrics): print(f'font_f(): Looking for font: {name=} {bold=} {italic=} {needs_exact_metrics=}.') return mupdf.fz_new_font_from_file(...) def f_cjk(name, ordering, serif): print(f'f_cjk(): Looking for font: {name=} {ordering=} {serif=}.') return None def f_fallback(script, language, serif, bold, italic): print(f'f_fallback(): looking for font: {script=} {language=} {serif=} {bold=} {italic=}.') return None mupdf.fz_install_load_system_font_funcs(font_f, f_cjk, f_fallback) Making MuPDF function pointers call Python code ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Overview """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" For MuPDF structs with function pointers, we provide a second C++ wrapper class for use by the Python bindings. * The second wrapper class has a ``2`` suffix, for example ``PdfFilterOptions2``. * This second wrapper class has a virtual method for each function pointer, so it can be used as a `SWIG Director class <https://swig.org/Doc4.0/SWIGDocumentation.html#SWIGPlus_target_language_callbacks>`_. * Overriding a virtual method in Python results in the Python method being called when MuPDF C code calls the corresponding function pointer. * One needs to activate the use of a Python method as a callback by calling the special method ``use_virtual_<method-name>()``. [It might be possible in future to remove the need to do this.] * It may be possible to use similar techniques in C# but this has not been tried. Callback args """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" Python callbacks have args that are more low-level than in the rest of the Python API: * Callbacks generally have a first arg that is a SWIG representation of a MuPDF ``fz_context*``. * Where the underlying MuPDF function pointer has an arg that is a pointer to an MuPDF struct, unlike elsewhere in the MuPDF bindings we do not translate this into an instance of the corresponding wrapper class. Instead Python callbacks will see a SWIG representation of the low-level C pointer. * It is not safe to construct a Python wrapper class instance directly from such a SWIG representation of a C pointer, because it will break MuPDF's reference counting - Python/C++ constructors that take a raw pointer to a MuPDF struct do not call ``fz_keep_*()`` but the corresponding Python/C++ destructor will call ``fz_drop_*()``. * It might be safe to create an wrapper class instance using an explicit call to ``mupdf.fz_keep_*()``, but this has not been tried. * As of 2023-02-03, exceptions from Python callbacks are propagated back through the Python, C++, C, C++ and Python layers. The resulting Python exception will have the original exception text, but the original Python backtrace is lost. Exceptions in callbacks """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" Python exceptions in Director callbacks are propagated back through the language layers (from Python to C++ to C, then back to C++ and finally to Python). For convenience we add a text representation of the original Python backtrace to the exception text, but the C layer's fz_try/catch exception handling only holds 256 characters of exception text, so this backtrace information may be truncated by the time the exception reaches the original Python code's ``except ...`` block. Example """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" Here is an example PDF filter written in Python that removes alternating items: **Details** |expand_begin| .. code-block:: import mupdf def test_filter(path): class MyFilter( mupdf.PdfFilterOptions2): def __init__( self): super().__init__() self.use_virtual_text_filter() self.recurse = 1 self.sanitize = 1 self.state = 1 self.ascii = True def text_filter( self, ctx, ucsbuf, ucslen, trm, ctm, bbox): print( f'text_filter(): ctx={ctx} ucsbuf={ucsbuf} ucslen={ucslen} trm={trm} ctm={ctm} bbox={bbox}') # Remove every other item. self.state = 1 - self.state return self.state filter_ = MyFilter() document = mupdf.PdfDocument(path) for p in range(document.pdf_count_pages()): page = document.pdf_load_page(p) print( f'Running document.pdf_filter_page_contents on page {p}') document.pdf_begin_operation('test filter') document.pdf_filter_page_contents(page, filter_) document.pdf_end_operation() document.pdf_save_document('foo.pdf', mupdf.PdfWriteOptions()) |expand_end| .. External links
