Mercurial > hgrepos > Python > libs > pygments-lexer-pseudocode2
diff docs/lexer-algpseudocode.rst @ 168:bff8b900713a
REFACTOR: All documentation pages refactored: merge intro and details for lexers and filters
| author | Franz Glasner <fzglas.hg@dom66.de> |
|---|---|
| date | Mon, 11 May 2026 01:31:12 +0200 |
| parents | docs/details-algpseudocode.rst@88f872c50aae |
| children | 3c517c22df9c |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/docs/lexer-algpseudocode.rst Mon May 11 01:31:12 2026 +0200 @@ -0,0 +1,577 @@ +.. -*- coding: utf-8; indent-tabs-mode: nil; -*- + + +************************************* + AlgPseudocode and Language Variants +************************************* + +These lexers are heavily heavily inspired by CTAN’s `Algpseudocodex`_. +They recogzize expressions and additionally all sorts of comments and +commands that are inspired by `Algpseudocodex`_. + +They may be used in `Sphinx`_ by their aliases: + +.. code-block:: none + + .. code-block:: algpseudocode + + \PROGRAM {The Pseudoprogram} \IS + + \END PROGRAM {The Pseudoprogram} + +It will be rendered as: + +.. code-block:: algpseudocode + + \PROGRAM {The Pseudoprogram} \IS + + \END PROGRAM {The Pseudoprogram} + +And the same with the german variant +(using ``.. code-block:: algpseudocode-de`` as language alias): + +.. code-block:: algpseudocode-de + + \PROGRAM {The Pseudoprogram} \IS + + \END PROGRAM {The Pseudoprogram} + +The AlgPseudocode lexer and its language variants AlgPseudocodeDE and +AlgPseudocodeFR basically work in three states: `default`, +`expression` and `text`. + + In expressions it automatically recognizes: + + - Strings (single-quote, double-quote, triple-single-quote, + triple-double-quote, `Python`_ style) + - Numbers (also `Python`_ style) + - (Mathematical) operators and symbols + - ``\TEXT{...}`` + + To switch in a text-mode that prohibits automatic expression + highlighting. + + A closing curly brace can be quoted with ``\}`` to not end the + text mode prematurely. + + - ``\NAME``, ``\CALL`` and ``\GETS`` + + - ``\REM`` and ``\REMARK`` for remarks (aka comments) + + - Names (`Name.Entity`) + + - :ref:`explicit-token-types` + + In the default-mode it recogzizes expressions and additionally all + sorts of comments and commands that look somewhat like `Algpseudocodex`_ + commands. + + In texts it recogzizes: + + - ``\EXPR`` or ``\EXPRESSION`` + + To switch to expression-mode. + + A closing curly brace can be quoted with ``\}`` to not end the expression + mode prematurely. + + - ``\REM`` and ``\REMARK`` for remarks (aka comments) + + - :ref:`explicit-token-types` + + +.. rubric:: Some Examples + +A synthetic example with many features: + +.. literalinclude:: examples/example-1.pseudocode + :language: algpseudocode + :lines: 2- + +With a customized `AlgPseudocodeLexer` and its `no_end` +option set to ``True``. + +.. literalinclude:: examples/example-1.pseudocode + :language: NoEndAlgPseudocode + :lines: 2- + +This is Wikipedia's description of *Dinic's Algorithm* +(see https://en.wikipedia.org/wiki/Dinic%27s_algorithm): + +.. literalinclude:: examples/algorithm-dinic.description + :language: algpseudocode + :lines: 2- + +This is Wikipedia's pseudocode of the *Ford–Fulkerson Algorithm* +(see https://en.wikipedia.org/wiki/Ford%E2%80%93Fulkerson_algorithm): + +.. literalinclude:: examples/algorithm-ford-fulkerson.pseudocode + :language: algpseudocode + :lines: 2- + +This is Wikipedia's pseudocode of the *Edmonds–Karp Algorithm* +(see https://en.wikipedia.org/wiki/Edmonds%E2%80%93Karp_algorithm) +with a custom lexer that skip all ``ENDxxx`` keywords: + +.. literalinclude:: examples/algorithm-edmonds-karp.pseudocode + :language: NoEndAlgPseudocode + :lines: 2- + +And now the *Edmonds–Karp Algorithm* with french keywords: + +.. literalinclude:: examples/algorithm-edmonds-karp.pseudocode + :language: algpseudocode-fr + :lines: 2- + +And again the *Edmonds–Karp Algorithm* with german keywords: + +.. literalinclude:: examples/algorithm-edmonds-karp.pseudocode + :language: algpseudocode-de + :lines: 2- + +More details you will find :ref:`here <details-algpseudocode>`. + + +.. _details-algpseudocode: + +Lexer Options +============= + + .. describe:: prohibit_raiseonerror_filter + + **Type:** :py:class:`bool` + + **Default:** `False` + + If ``True`` the `raiseonerror` filter is not allowed to be applied by + `Sphinx`_ when :py:meth:`Lexer.add_filter` is called. + + This setting does not apply to filters that are set by the standard + lexer option `filters`. + + .. describe:: no_end + + **Type:** :py:class:`bool` + + **Default:** `False` + + If ``True`` all the ``\ENDxxx`` commands will be skipped and yield + nothing. + + .. describe:: gets + + **Type:** :py:class:`str` or :py:obj:`None` + + **Default:** `None` (yields ``←``) + + The operator symbol to be printed by the command ``\GETS``. + + An often used alternative is ``:=``. + + .. describe:: remark + + **Type:** :py:class:`str` or :py:obj:`None` + + **Default:** `None` (yields ``▷``) + + The symbol to be printed as when starting comments with + ``\REMARK`` or ``\REM``. + + To use a lexer with non-default options in `Sphinx`_ see section + :ref:`customized-sphinx-lexers`. + + +Comments +======== + +- with the ``\REMARK`` or ``\REM`` keywords (this includes a leading symbol) +- multi-line comments with ``/* ... */``; they can be **nested** +- multi-line comments with ``(* ... *)``; they can be **nested** +- single-line comments with ``//`` or ``#`` (until the end of the line) + +.. code-block:: algpseudocode + + /* + * A single multiline comment + */ + + /* + * A multiline comment + * + * /* This is a nested multi-line comment */ + * + */ + + (* + * A multiline comment + * + * (* This is a nested multi-line comment *) + * + *) + + // A single-line comment + + # A single-line comment + + \REM A remark has a leading symbol + + +Literals +======== + +Strings and numbers as in `Python`_. String prefixes ``r``, ``f`` and ``t`` +are not supported -- ``u`` and ``b`` are. + +To yield non-string-delimiting single- and double-quotes you have to escape them +using ``\'`` or ``\"``. This must be used to typeset something as +:algpseudocode:`f\\'(x) = 0`. + +.. code-block:: algpseudocode + + 0 0xdead 0b100001 0o720 2.7 2.7e-54 + + "A string with an escaped double-quote \" " + + 'Another string with an escaped single-quote \' ' + + """A multiline + string + """ + + '''Another multiline string + + ''' + + b"A \x20 byte string" + + u'An explicit Unicode \u1234 string' + + \" a non string + + \' a non string also + + +(Mathematical) Symbols and Operators +==================================== + +Some ASCII symbol combinations are recognized and replaced by a +Unicode symbol: + +.. code-block:: algpseudocode + + \TEXT{<=>} <=> + \TEXT{<->} <-> + \TEXT{<-} <- + \TEXT{->} -> + \TEXT{=>} => + \TEXT{<=} <= + \TEXT{>=} >= + \TEXT{<>} <> + \TEXT{!=} != + \TEXT{:=} := + \TEXT{=:} =: + \TEXT{?=} ?= + +Unicode codepoints with property ``Sm`` are recognized as mathematical symbol +and highlighted accordingly. + + +Punctuation +=========== + +Runs of dots ``.``, ``..``, ``...``, ``....``, ... are handled +properly in expressions and yield a punctuation token. +They are not replaced by corresponding Unicode symbols. + + +Keywords +======== + +Explicit Keywords +----------------- + +- Start with a backslash character ``\`` +- Case-insensitive +- Translated if a translation is found + +Parameter handling is as follows: + +- Parameters are enclosed in curly braces ``{`` and ``}`` +- Escaping within the braces is possible using the backslash ``\`` +- Parameters are separated from the keyword/command by a (possibly empty) run + of space or TAB characters. + This is true for required and optional parameters. + +.. todo:: Escaping + + A single backslash is a Generic.Error token + + +With Required Parameters +~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: algpseudocode + + \TEXT{\PROGRAM {A Program\} or \PROG {A Program\}} \PROGRAM {A Program} + \TEXT{\ALGORITHM{An Algorithm\} or \ALGO{An Algorithm\}} \ALGORITHM{An Algorithm} + \TEXT{\PROCEDURE{A Procedure\} or \PROC{A Procedure\}} \PROCEDURE{A Procedure} + \TEXT{\FUNCTION{A Function\} or \FUNC{A Function\} or \FN{A Function\}} \FUNCTION{A Function} + \TEXT{\CLASS{A Class\}} \CLASS{A Class} + + \TEXT{\STATEMENT{the expression\} \STATE{the expression\} \BLOCK{the expression\}} \STATEMENT{the expression} + + \TEXT{expr1: \\EXPRESSION{expression a in b\} expr2: \\EXPR{expression b in a\}} \TEXT{expr1: \EXPRESSION{expression a in b} expr2: \EXPR{expression b in a}} + + \TEXT{\TEXTSTATEMENT{the text\} \TEXTSTATE{the text\} \TSTATEMENT{the text\} \TSTATE{the text\} \TEXTBLOCK{the text\} \TBLOCK{the text\}} \TEXTSTATEMENT{the text} + + \TEXT{\INPUT{Input 1\}} \INPUT{Input 1} + \TEXT{\INPUTS{Input 2\}} \INPUTS{Input 2} + + \TEXT{\OUTPUT{Output 1\}} \OUTPUT{Output 1} + \TEXT{\OUTPUTS{Output 2\}} \OUTPUTS{Output 2} + + \TEXT{\ENSURE{Whatever should be ensured!\}} \ENSURE{Whatever should be ensured!} + + \TEXT{\REQUIRE{Whatever should be required.\}} \REQUIRE{Whatever should be required.} + + \TEXT{\RETURNS{Return 2\}} \RETURNS{Return 2} + + \TEXT{\CALL{a function\}(p1, p2)} \CALL{a function}(p1, p2) + + \TEXT{\NAME{an entity name\}} \NAME{an entity name} + + +With Optional Parameters +~~~~~~~~~~~~~~~~~~~~~~~~ + +Some ``END``-keywords have optional parameters: + +.. code-block:: algpseudocode + + \TEXT{\ENDPROGRAM \ENDPROG} \ENDPROGRAM + \TEXT{\ENDALGORITHM \ENDALGO} \ENDALGORITHM + \TEXT{\ENDPROCEDURE \ENDPROC} \ENDPROCEDURE + \TEXT{\ENDFUNCTION \ENDFUNC \ENDFN} \ENDFUNCTION + \TEXT{\ENDCLASS} \ENDCLASS + +They are used like this: + +.. code-block:: algpseudocode + + \TEXT{\CLASS{Foo Bar Class\} ... \END CLASS {Foo Bar Class\}} \TEXT{yields} \CLASS{Foo Bar Class} ... \END CLASS {Foo Bar Class} + +.. seealso:: Syntax variants: `END-Keywords`_ + + +Without Parameters +~~~~~~~~~~~~~~~~~~ + +"Normal" Keywords +''''''''''''''''' + +.. code-block:: algpseudocode + + \TEXT{\IF} \IF + \TEXT{\THEN} \THEN + \TEXT{\ELSE} \ELSE + \TEXT{\ELSEIF or \ELSIF or \ELIF} \ELSEIF \text{or} \ELSIF \text{or} \ELIF + \TEXT{\DO} \DO + \TEXT{\WHILE} \WHILE + \TEXT{\FORALL} \FORALL + \TEXT{\FOR} \FOR + \TEXT{\FROM} \FROM + \TEXT{\TO} \TO + \TEXT{\STEP} \STEP + \TEXT{\IN} \IN + \TEXT{\LOOP} \LOOP + \TEXT{\REPEAT} \REPEAT + \TEXT{\UNTIL} \UNTIL + + \TEXT{\RETURN} \RETURN + + \TEXT{\BEGIN} \BEGIN + \TEXT{\END} \END + + \TEXT{\IS} \IS + \TEXT{\WITH} \WITH + + \TEXT{\GETS} \GETS + + \TEXT{\\REMARK or \\REM} \REMARK A comment with a leading symbol + +``\REMARK`` or ``\REM`` is special: all characters to the end of the +line are taken as comment; curly braces are not needed---in fact: +they are interpreted to be part of the comment. + + +END-Keywords +'''''''''''' + +The separator character can be empty, a run of ASCII spaces, a run of TAB characters, +a single underscore ``_`` or a single hyphen ``-`` like: + + ``\ENDIF``, ``\END IF``, ``\END-IF``, ``\END_IF`` or ``\END IF`` + + +.. code-block:: algpseudocode + + \text{\ENDIF} \ENDIF \rem empty + + \text{\END IF} \END IF \rem a single space + + \text{\END IF} \END IF \rem two spaces + + \text{\END-IF} \END-IF \rem a single hyphen + + \text{\END_IF} \END_IF \rem a single underscore + + \text{\END IF} \END IF \rem a single TAB character + +The list of END-keywords (here always just with ``-`` as separator): + +.. code-block:: algpseudocode + + \text{\END-PROGRAM \END-PROG} \END-PROGRAM + \text{\END-ALGORITHM \END-ALGO} \END-ALGORITHM + \text{\END-PROCEDURE \END-PROC} \END-PROCEDURE + \text{\END-FUNCTION \END-FUNC \END-FN} \END-FUNCTION + \text{\END-CLASS} \END-CLASS + \text{\END-IF} \END-IF + \text{\END-WHILE} \END-WHILE + \text{\END-FOR} \END-FOR + \text{\END-FORALL} \END-FORALL + \text{\END-LOOP} \END-LOOP + + +Names and Entities +================== + +In an expression context all other words are interpreted as entity +names (token type :py:class:`pygments.token.Token.Name.Entity`). + +Allowed characters in the words follow the corresponding `Python`_ rules. +As such, many Unicode characters are allowed. + +To highlight entity names with whitespace or other "special" characters in it +use the ``NAME`` command. + +.. code-block:: algpseudocode + + \TEXT{entity_name_1} entity_name_1 + + \TEXT{entity_name_2} entity_name_2 + + \TEXT{\NAME{entity-name 3\}} \NAME{entity-name 3} + + \TEXT{München} München + + \TEXT{Genève} Genève + +.. _explicit-token-types: + +Explicit Token Types +==================== + +Handle keywords and operators that are not handled by default or change +the default handling of some expressions. + +`XX` represents a `value` in the :py:data:`pygments.token.STANDARD_TYPES` +dict. +Its corresponding token type (the associated `key` in this `dict`) is +used as token type. + +``\\tt-XX/SINGLE-CHAR`` + + no escaping needed + + `SINGLE-CHAR` is a single character and can be *every* character + (including a carriage-return or line-feed) + +``\\ttx-XX{CHARACTERS}`` + +``\\ttx-XX(CHARACTERS)`` + +``\\ttx-XX[CHARACTERS]`` + +``\\ttx-XX<CHARACTERS>`` + +``\\ttx-XX<SEP>CHARACTERS<SEP>`` + + No escaping possible! There are enough alternatives available! + + `SEP` is one of ``/:|=*+!\$~``. + + +Examples: + +.. code-block:: algpseudocode + + \text{• \\tt-kc/C} \tt-kc/C \rem C as Keyword.Constant + \text{• \\tt-ow/∈} \tt-ow/∈ \rem ∈ as Operator.Word + \text{• \\ttx-kc{A New Constant Keyword\}} \ttx-kc{A New Constant Keyword} \rem As a new Keyword.Constant + \text{• \\ttx-nv{A New Variable Name\}} \ttx-nv{A New Variable Name} \rem An explicit Name.Variable + \text{• \\ttx-k(∈ ∌)} \ttx-k(∈ ∌) \rem ∈ and ∌ as (ordinary) Keywords + \text{• \\ttx-o<∈ ∌>} \ttx-o<∈ ∌> \rem ∈ and ∌ as (ordinary) Operators + /* + * The line below has ∈_∌ as (peculiar) function name. + * Their params are automatic (i.e. a normal expression). + */ + \text{• \\ttx-nf<∈_∌>(p1, p2)} \ttx-nf<∈_∌>(p1, p2) + /* + * The line below has ∈_∌ as (peculiar) decorator name (as used in Python). + * Their params are automatic (i.e. a normal expression). + */ + \text{• \\ttx-nd[∈_∌](p1, p2)} \ttx-nd[∈_∌](p1, p2) + /* + * This is a non-existing token type: you get some generic error marking + * with a Generic.Error token and no expansion. + */ + \text{• \\ttx-NON-EXISTING[∈_∌](p1, p2)} \ttx-NON_EXISTING[∈_∌](p1, p2) + +.. note:: Explicit token types are **case-sensitive**. + + +.. _customized-sphinx-lexers: + +Customized Lexers in Sphinx +=========================== + +Defining lexers with non-default options in `Sphinx`_ can be done in its +configuration file :file:`conf.py`. + +The first option is to apply the Sphinx config value ``highlight_options`` +properly. An existing lexer can be customized by options. + +A more flexible alternative is to define a new lexer in the Sphinx +application. The very same lexer class can be used with different options: + +.. code-block:: python + + from functools import partial + from pygments_lexer_pseudocode2.lexers.algpseudocode import AlgPseudocodeLexer + + def setup(app): + + # + # Add a custom lexer: AlgPseudocodeLexer with custom init + # option "no_end". + # + # In modern Sphinx versions given lexer must be callable and may + # not be a lexer instance. So use an indirection with "partial" + # here. + # + app.add_lexer("noend-algpseudocode", + partial(AlgPseudocodeLexer, no_end=True)) + +Similarily it works for custom styles and filters. + +.. note:: Lexers in Sphinx are instantiated with the `raiseonerror` filter + applied by default. + This is also true for custom lexers that are added by + :py:meth:`Sphinx.add_lexer`. + + Lexer *instances* that are added to + :py:data:`sphinx.highlighting.lexers` somehow are taken as is by + Sphinx and are not augmented with any default filters. + +For older Sphinx versions your mileage may vary.
