diff docs/lexer-algpseudocode.rst @ 168:bff8b900713a

REFACTOR: All documentation pages refactored: merge intro and details for lexers and filters
author Franz Glasner <fzglas.hg@dom66.de>
date Mon, 11 May 2026 01:31:12 +0200
parents docs/details-algpseudocode.rst@88f872c50aae
children 3c517c22df9c
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/docs/lexer-algpseudocode.rst	Mon May 11 01:31:12 2026 +0200
@@ -0,0 +1,577 @@
+.. -*- coding: utf-8; indent-tabs-mode: nil; -*-
+
+
+*************************************
+ AlgPseudocode and Language Variants
+*************************************
+
+These lexers are heavily heavily inspired by CTAN’s `Algpseudocodex`_.
+They recogzize expressions and additionally all sorts of comments and
+commands that are inspired by `Algpseudocodex`_.
+
+They may be used in `Sphinx`_ by their aliases:
+
+.. code-block:: none
+
+   .. code-block:: algpseudocode
+
+      \PROGRAM {The Pseudoprogram} \IS
+
+      \END PROGRAM {The Pseudoprogram}
+
+It will be rendered as:
+
+.. code-block:: algpseudocode
+
+   \PROGRAM {The Pseudoprogram} \IS
+
+   \END PROGRAM {The Pseudoprogram}
+
+And the same with the german variant
+(using ``.. code-block:: algpseudocode-de`` as language alias):
+
+.. code-block:: algpseudocode-de
+
+   \PROGRAM {The Pseudoprogram} \IS
+
+   \END PROGRAM {The Pseudoprogram}
+
+The AlgPseudocode lexer and its language variants AlgPseudocodeDE and
+AlgPseudocodeFR basically work in three states: `default`,
+`expression` and `text`.
+
+  In expressions it automatically recognizes:
+
+  - Strings (single-quote, double-quote, triple-single-quote,
+    triple-double-quote, `Python`_ style)
+  - Numbers (also `Python`_ style)
+  - (Mathematical) operators and symbols
+  - ``\TEXT{...}``
+
+    To switch in a text-mode that prohibits automatic expression
+    highlighting.
+
+    A closing curly brace can be quoted with ``\}`` to not end the
+    text mode prematurely.
+
+  - ``\NAME``, ``\CALL`` and ``\GETS``
+
+  - ``\REM`` and ``\REMARK`` for remarks (aka comments)
+
+  - Names (`Name.Entity`)
+
+  - :ref:`explicit-token-types`
+
+  In the default-mode it recogzizes expressions and additionally all
+  sorts of comments and commands that look somewhat like `Algpseudocodex`_
+  commands.
+
+  In texts it recogzizes:
+
+  - ``\EXPR`` or ``\EXPRESSION``
+
+    To switch to expression-mode.
+
+    A closing curly brace can be quoted with ``\}`` to not end the expression
+    mode prematurely.
+
+  - ``\REM`` and ``\REMARK`` for remarks (aka comments)
+
+  - :ref:`explicit-token-types`
+
+
+.. rubric:: Some Examples
+
+A synthetic example with many features:
+
+.. literalinclude:: examples/example-1.pseudocode
+   :language: algpseudocode
+   :lines: 2-
+
+With a customized `AlgPseudocodeLexer` and its `no_end`
+option set to ``True``.
+
+.. literalinclude:: examples/example-1.pseudocode
+   :language: NoEndAlgPseudocode
+   :lines: 2-
+
+This is Wikipedia's description of *Dinic's Algorithm*
+(see https://en.wikipedia.org/wiki/Dinic%27s_algorithm):
+
+.. literalinclude:: examples/algorithm-dinic.description
+   :language: algpseudocode
+   :lines: 2-
+
+This is Wikipedia's pseudocode of the *Ford–Fulkerson Algorithm*
+(see https://en.wikipedia.org/wiki/Ford%E2%80%93Fulkerson_algorithm):
+
+.. literalinclude:: examples/algorithm-ford-fulkerson.pseudocode
+   :language: algpseudocode
+   :lines: 2-
+
+This is Wikipedia's pseudocode of the *Edmonds–Karp Algorithm*
+(see https://en.wikipedia.org/wiki/Edmonds%E2%80%93Karp_algorithm)
+with a custom lexer that skip all ``ENDxxx`` keywords:
+
+.. literalinclude:: examples/algorithm-edmonds-karp.pseudocode
+   :language: NoEndAlgPseudocode
+   :lines: 2-
+
+And now the *Edmonds–Karp Algorithm* with french keywords:
+
+.. literalinclude:: examples/algorithm-edmonds-karp.pseudocode
+   :language: algpseudocode-fr
+   :lines: 2-
+
+And again the *Edmonds–Karp Algorithm* with german keywords:
+
+.. literalinclude:: examples/algorithm-edmonds-karp.pseudocode
+   :language: algpseudocode-de
+   :lines: 2-
+
+More details you will find :ref:`here <details-algpseudocode>`.
+
+
+.. _details-algpseudocode:
+
+Lexer Options
+=============
+
+  .. describe:: prohibit_raiseonerror_filter
+
+     **Type:** :py:class:`bool`
+
+     **Default:** `False`
+
+     If ``True`` the `raiseonerror` filter is not allowed to be applied by
+     `Sphinx`_ when :py:meth:`Lexer.add_filter` is called.
+
+     This setting does not apply to filters that are set by the standard
+     lexer option `filters`.
+
+  .. describe:: no_end
+
+     **Type:** :py:class:`bool`
+
+     **Default:** `False`
+
+     If ``True`` all the ``\ENDxxx`` commands will be skipped and yield
+     nothing.
+
+  .. describe:: gets
+
+     **Type:** :py:class:`str` or :py:obj:`None`
+
+     **Default:** `None` (yields ``←``)
+
+     The operator symbol to be printed by the command ``\GETS``.
+
+     An often used alternative is ``:=``.
+
+  .. describe:: remark
+
+     **Type:** :py:class:`str` or :py:obj:`None`
+
+     **Default:** `None` (yields ``▷``)
+
+     The symbol to be printed as when starting comments with
+     ``\REMARK`` or ``\REM``.
+
+  To use a lexer with non-default options in `Sphinx`_ see section
+  :ref:`customized-sphinx-lexers`.
+
+
+Comments
+========
+
+- with the ``\REMARK`` or ``\REM`` keywords (this includes a leading symbol)
+- multi-line comments with ``/* ... */``; they can be **nested**
+- multi-line comments with ``(* ... *)``; they can be **nested**
+- single-line comments with ``//`` or ``#`` (until the end of the line)
+
+.. code-block:: algpseudocode
+
+   /*
+    * A single multiline comment
+    */
+
+   /*
+    * A multiline comment
+    *
+    * /* This is a nested multi-line comment */
+    *
+    */
+
+   (*
+    * A multiline comment
+    *
+    * (* This is a nested multi-line comment *)
+    *
+    *)
+
+   // A single-line comment
+
+   # A single-line comment
+
+   \REM A remark has a leading symbol
+
+
+Literals
+========
+
+Strings and numbers as in `Python`_. String prefixes ``r``, ``f`` and ``t``
+are not supported -- ``u`` and ``b`` are.
+
+To yield non-string-delimiting single- and double-quotes you have to escape them
+using ``\'`` or ``\"``. This must be used to typeset something as
+:algpseudocode:`f\\'(x) = 0`.
+
+.. code-block:: algpseudocode
+
+   0  0xdead 0b100001 0o720  2.7 2.7e-54
+
+   "A string with an escaped double-quote \" "
+
+   'Another string with an escaped single-quote \' '
+
+   """A multiline
+   string
+   """
+
+   '''Another multiline string
+
+   '''
+
+   b"A \x20 byte string"
+
+   u'An explicit Unicode \u1234 string'
+
+   \"  a non string
+
+   \'  a non string also
+
+
+(Mathematical) Symbols and Operators
+====================================
+
+Some ASCII symbol combinations are recognized and replaced by a
+Unicode symbol:
+
+.. code-block:: algpseudocode
+
+   \TEXT{<=>}    <=>
+   \TEXT{<->}    <->
+   \TEXT{<-}     <-
+   \TEXT{->}     ->
+   \TEXT{=>}     =>
+   \TEXT{<=}     <=
+   \TEXT{>=}     >=
+   \TEXT{<>}     <>
+   \TEXT{!=}     !=
+   \TEXT{:=}     :=
+   \TEXT{=:}     =:
+   \TEXT{?=}     ?=
+
+Unicode codepoints with property ``Sm`` are recognized as mathematical symbol
+and highlighted accordingly.
+
+
+Punctuation
+===========
+
+Runs of dots ``.``, ``..``, ``...``, ``....``, ... are handled
+properly in expressions and yield a punctuation token.
+They are not replaced by corresponding Unicode symbols.
+
+
+Keywords
+========
+
+Explicit Keywords
+-----------------
+
+- Start with a backslash character ``\``
+- Case-insensitive
+- Translated if a translation is found
+
+Parameter handling is as follows:
+
+- Parameters are enclosed in curly braces ``{`` and ``}``
+- Escaping within the braces is possible using the backslash ``\``
+- Parameters are separated from the keyword/command by a (possibly empty) run
+  of space or TAB characters.
+  This is true for required and optional parameters.
+
+.. todo:: Escaping
+
+          A single backslash is a Generic.Error token
+
+
+With Required Parameters
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: algpseudocode
+
+   \TEXT{\PROGRAM {A Program\}  or  \PROG {A Program\}}                                    \PROGRAM {A Program}
+   \TEXT{\ALGORITHM{An Algorithm\}  or  \ALGO{An Algorithm\}}                              \ALGORITHM{An Algorithm}
+   \TEXT{\PROCEDURE{A Procedure\}  or  \PROC{A Procedure\}}                                \PROCEDURE{A Procedure}
+   \TEXT{\FUNCTION{A Function\}  or  \FUNC{A Function\}  or  \FN{A Function\}}              \FUNCTION{A Function}
+   \TEXT{\CLASS{A Class\}}                                                                \CLASS{A Class}
+
+   \TEXT{\STATEMENT{the expression\}  \STATE{the expression\}  \BLOCK{the expression\}}     \STATEMENT{the expression}
+
+   \TEXT{expr1: \\EXPRESSION{expression a in b\}   expr2: \\EXPR{expression b in a\}}        \TEXT{expr1: \EXPRESSION{expression a in b}   expr2: \EXPR{expression b in a}}
+
+   \TEXT{\TEXTSTATEMENT{the text\}  \TEXTSTATE{the text\}  \TSTATEMENT{the text\}  \TSTATE{the text\}  \TEXTBLOCK{the text\}  \TBLOCK{the text\}}             \TEXTSTATEMENT{the text}
+
+   \TEXT{\INPUT{Input 1\}}                            \INPUT{Input 1}
+   \TEXT{\INPUTS{Input 2\}}                           \INPUTS{Input 2}
+
+   \TEXT{\OUTPUT{Output 1\}}                          \OUTPUT{Output 1}
+   \TEXT{\OUTPUTS{Output 2\}}                         \OUTPUTS{Output 2}
+
+   \TEXT{\ENSURE{Whatever should be ensured!\}}       \ENSURE{Whatever should be ensured!}
+
+   \TEXT{\REQUIRE{Whatever should be required.\}}     \REQUIRE{Whatever should be required.}
+
+   \TEXT{\RETURNS{Return 2\}}                         \RETURNS{Return 2}
+
+   \TEXT{\CALL{a function\}(p1, p2)}                  \CALL{a function}(p1, p2)
+
+   \TEXT{\NAME{an entity name\}}                      \NAME{an entity name}
+
+
+With Optional Parameters
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Some ``END``-keywords have optional parameters:
+
+.. code-block:: algpseudocode
+
+   \TEXT{\ENDPROGRAM  \ENDPROG}              \ENDPROGRAM
+   \TEXT{\ENDALGORITHM  \ENDALGO}            \ENDALGORITHM
+   \TEXT{\ENDPROCEDURE  \ENDPROC}            \ENDPROCEDURE
+   \TEXT{\ENDFUNCTION  \ENDFUNC  \ENDFN}     \ENDFUNCTION
+   \TEXT{\ENDCLASS}                          \ENDCLASS
+
+They are used like this:
+
+.. code-block:: algpseudocode
+
+   \TEXT{\CLASS{Foo Bar Class\} ... \END CLASS {Foo Bar Class\}}   \TEXT{yields}   \CLASS{Foo Bar Class} ... \END CLASS {Foo Bar Class}
+
+.. seealso:: Syntax variants: `END-Keywords`_
+
+
+Without Parameters
+~~~~~~~~~~~~~~~~~~
+
+"Normal" Keywords
+'''''''''''''''''
+
+.. code-block:: algpseudocode
+
+   \TEXT{\IF}                                \IF
+   \TEXT{\THEN}                              \THEN
+   \TEXT{\ELSE}                              \ELSE
+   \TEXT{\ELSEIF or \ELSIF  or  \ELIF}       \ELSEIF \text{or} \ELSIF \text{or} \ELIF
+   \TEXT{\DO}                                \DO
+   \TEXT{\WHILE}                             \WHILE
+   \TEXT{\FORALL}                            \FORALL
+   \TEXT{\FOR}                               \FOR
+   \TEXT{\FROM}                              \FROM
+   \TEXT{\TO}                                \TO
+   \TEXT{\STEP}                              \STEP
+   \TEXT{\IN}                                \IN
+   \TEXT{\LOOP}                              \LOOP
+   \TEXT{\REPEAT}                            \REPEAT
+   \TEXT{\UNTIL}                             \UNTIL
+
+   \TEXT{\RETURN}                            \RETURN
+
+   \TEXT{\BEGIN}                             \BEGIN
+   \TEXT{\END}                               \END
+
+   \TEXT{\IS}                                \IS
+   \TEXT{\WITH}                              \WITH
+
+   \TEXT{\GETS}                              \GETS
+
+   \TEXT{\\REMARK   or   \\REM}                \REMARK A comment with a leading symbol
+
+``\REMARK`` or ``\REM`` is special: all characters to the end of the
+line are taken as comment; curly braces are not needed---in fact:
+they are interpreted to be part of the comment.
+
+
+END-Keywords
+''''''''''''
+
+The separator character can be empty, a run of ASCII spaces, a run of TAB characters,
+a single underscore ``_`` or a single hyphen ``-`` like:
+
+  ``\ENDIF``, ``\END IF``, ``\END-IF``, ``\END_IF`` or ``\END IF``
+
+
+.. code-block:: algpseudocode
+
+   \text{\ENDIF}             \ENDIF     \rem empty
+
+   \text{\END IF}            \END IF     \rem a single space
+
+   \text{\END  IF}           \END  IF     \rem two spaces
+
+   \text{\END-IF}            \END-IF     \rem a single hyphen
+
+   \text{\END_IF}            \END_IF     \rem a single underscore
+
+   \text{\END   IF}          \END       IF     \rem a single TAB character
+
+The list of END-keywords (here always just with ``-`` as separator):
+
+.. code-block:: algpseudocode
+
+   \text{\END-PROGRAM  \END-PROG}              \END-PROGRAM
+   \text{\END-ALGORITHM  \END-ALGO}            \END-ALGORITHM
+   \text{\END-PROCEDURE  \END-PROC}            \END-PROCEDURE
+   \text{\END-FUNCTION  \END-FUNC  \END-FN}    \END-FUNCTION
+   \text{\END-CLASS}                           \END-CLASS
+   \text{\END-IF}                              \END-IF
+   \text{\END-WHILE}                           \END-WHILE
+   \text{\END-FOR}                             \END-FOR
+   \text{\END-FORALL}                          \END-FORALL
+   \text{\END-LOOP}                            \END-LOOP
+
+
+Names and Entities
+==================
+
+In an expression context all other words are interpreted as entity
+names (token type :py:class:`pygments.token.Token.Name.Entity`).
+
+Allowed characters in the words follow the corresponding `Python`_ rules.
+As such, many Unicode characters are allowed.
+
+To highlight entity names with whitespace or other "special" characters in it
+use the ``NAME`` command.
+
+.. code-block:: algpseudocode
+
+   \TEXT{entity_name_1}          entity_name_1
+
+   \TEXT{entity_name_2}          entity_name_2
+
+   \TEXT{\NAME{entity-name 3\}}   \NAME{entity-name 3}
+
+   \TEXT{München}                München
+
+   \TEXT{Genève}                 Genève
+
+.. _explicit-token-types:
+
+Explicit Token Types
+====================
+
+Handle keywords and operators that are not handled by default or change
+the default handling of some expressions.
+
+`XX` represents a `value` in the :py:data:`pygments.token.STANDARD_TYPES`
+dict.
+Its corresponding token type (the associated `key` in this `dict`) is
+used as token type.
+
+``\\tt-XX/SINGLE-CHAR``
+
+  no escaping needed
+
+  `SINGLE-CHAR` is a single character and can be *every* character
+  (including a carriage-return or line-feed)
+
+``\\ttx-XX{CHARACTERS}``
+
+``\\ttx-XX(CHARACTERS)``
+
+``\\ttx-XX[CHARACTERS]``
+
+``\\ttx-XX<CHARACTERS>``
+
+``\\ttx-XX<SEP>CHARACTERS<SEP>``
+
+  No escaping possible! There are enough alternatives available!
+
+  `SEP` is one of ``/:|=*+!\$~``.
+
+
+Examples:
+
+.. code-block:: algpseudocode
+
+   \text{• \\tt-kc/C}      \tt-kc/C            \rem C as Keyword.Constant
+   \text{• \\tt-ow/∈}      \tt-ow/∈            \rem ∈ as Operator.Word
+   \text{• \\ttx-kc{A New Constant Keyword\}}    \ttx-kc{A New Constant Keyword}  \rem As a new Keyword.Constant
+   \text{• \\ttx-nv{A New Variable Name\}}       \ttx-nv{A New Variable Name}     \rem An explicit Name.Variable
+   \text{• \\ttx-k(∈ ∌)}   \ttx-k(∈ ∌)         \rem ∈ and ∌ as (ordinary) Keywords
+   \text{• \\ttx-o<∈ ∌>}   \ttx-o<∈ ∌>         \rem ∈ and ∌ as (ordinary) Operators
+     /*
+      * The line below has ∈_∌ as (peculiar) function name.
+      * Their params are automatic (i.e. a normal expression).
+      */
+   \text{• \\ttx-nf<∈_∌>(p1, p2)}                \ttx-nf<∈_∌>(p1, p2)
+     /*
+      * The line below has ∈_∌ as (peculiar) decorator name (as used in Python).
+      * Their params are automatic (i.e. a normal expression).
+      */
+   \text{• \\ttx-nd[∈_∌](p1, p2)}                \ttx-nd[∈_∌](p1, p2)
+     /*
+      * This is a non-existing token type: you get some generic error marking
+      * with a Generic.Error token and no expansion.
+      */
+   \text{• \\ttx-NON-EXISTING[∈_∌](p1, p2)}      \ttx-NON_EXISTING[∈_∌](p1, p2)
+
+.. note:: Explicit token types are **case-sensitive**.
+
+
+.. _customized-sphinx-lexers:
+
+Customized Lexers in Sphinx
+===========================
+
+Defining lexers with non-default options in `Sphinx`_ can be done in its
+configuration file :file:`conf.py`.
+
+The first option is to apply the Sphinx config value ``highlight_options``
+properly. An existing lexer can be customized by options.
+
+A more flexible alternative is to define a new lexer in the Sphinx
+application. The very same lexer class can be used with different options:
+
+.. code-block:: python
+
+   from functools import partial
+   from pygments_lexer_pseudocode2.lexers.algpseudocode import AlgPseudocodeLexer
+
+   def setup(app):
+
+       #
+       # Add a custom lexer: AlgPseudocodeLexer with custom init
+       # option "no_end".
+       #
+       # In modern Sphinx versions given lexer must be callable and may
+       # not be a lexer instance. So use an indirection with "partial"
+       # here.
+       #
+       app.add_lexer("noend-algpseudocode",
+                     partial(AlgPseudocodeLexer, no_end=True))
+
+Similarily it works for custom styles and filters.
+
+.. note:: Lexers in Sphinx are instantiated with the `raiseonerror` filter
+          applied by default.
+          This is also true for custom lexers that are added by
+          :py:meth:`Sphinx.add_lexer`.
+
+          Lexer *instances* that are added to
+          :py:data:`sphinx.highlighting.lexers` somehow are taken as is by
+          Sphinx and are not augmented with any default filters.
+
+For older Sphinx versions your mileage may vary.