Mercurial > hgrepos > Python > libs > pygments-lexer-pseudocode2
view docs/lexer-algpseudocode.rst @ 297:cd22c2e8390d
===== signature for changeset d8442a5b5718
| author | Franz Glasner <fzglas.hg@dom66.de> |
|---|---|
| date | Sat, 23 May 2026 12:40:50 +0200 |
| parents | a094305c5708 |
| children |
line wrap: on
line source
.. -*- coding: utf-8; indent-tabs-mode: nil; -*- .. _lexer-algpseudocode: ************************************* AlgPseudocode and Language Variants ************************************* .. only:: html .. contents:: .. only:: html .. hint:: The reST source of this documentation page can be found `here <_sources/lexer-algpseudocode.rst.txt>`_. These lexers are heavily inspired by CTAN’s `Algpseudocodex`_. They recognize all sorts of single- and multi-line comments in addition to expressions and commands that are inspired by `Algpseudocodex`_. They are used in `Sphinx`_ using their aliases. The code block: .. code-block:: none .. code-block:: algpseudocode \PROGRAM {The Pseudoprogram} \IS \END PROGRAM {The Pseudoprogram} will be rendered as: .. code-block:: algpseudocode \PROGRAM {The Pseudoprogram} \IS \END PROGRAM {The Pseudoprogram} And the same code block with the german variant (using ``.. code-block:: algpseudocode-de`` as language alias): .. code-block:: algpseudocode-de \PROGRAM {The Pseudoprogram} \IS \END PROGRAM {The Pseudoprogram} States ====== The AlgPseudocode lexer and its language variants AlgPseudocodeDE and AlgPseudocodeFR basically work in three states (aka modes or contexts): `default`, `expression` and `text`. In `expressions` it automatically recognizes: - Strings (single-quote, double-quote, triple-single-quote, triple-double-quote, `Python`_ style) - Numbers (also `Python`_ style) - (Mathematical) operators and symbols - ``\TEXT{...}`` or ``\T{...}`` Used to switch to a text-mode that prohibits automatic expression highlighting. A closing curly brace can be quoted with ``\}`` to not end the text mode prematurely. - ``\EXPR``, ``\EXPRESSION`` or ``\E`` as nested construct - ``\NAME``, ``\CALL`` and ``\GETS`` - ``\REM`` and ``\REMARK`` for remarks (aka comments) - Names (`Name.Entity`) - :ref:`explicit-token-types` The `default`-mode is an extension of `expression`: in addition to `expressions` it recognizes all sorts of single- and multi-line comments and commands that are inspired by `Algpseudocodex`_. In `text` context it recognizes: - ``\EXPRESSION``, ``\EXPR`` or ``\E`` Use to switch to expression-mode. A closing curly brace can be quoted with ``\}`` to not end the expression mode prematurely. - ``\TEXT`` (aka ``\T``) as nested construct - ``\REM`` and ``\REMARK`` for remarks (aka comments) - :ref:`explicit-token-types` Lexer Options ============= .. describe:: prohibit_raiseonerror_filter **Type:** :py:class:`bool` **Default:** :py:obj:`None` If :py:obj:`True` the `raiseonerror` filter is not allowed to be applied by `Sphinx`_ when :py:meth:`Lexer.add_filter` is called. This setting does not apply to filters that are set by the standard lexer option `filters`. .. describe:: no_end **Type:** :py:class:`bool` **Default:** :py:obj:`False` If :py:obj:`True` all the ``\ENDxxx`` commands will be skipped and yield no output. .. describe:: strict_tokentype **Type::** :py:class:`bool` **Default:** :py:obj:`True` Control whether `Explicit Token Types`_ yield :py:class:`pygments.token.Token.Generic.Error` tokens (when ``True``, this is the default) or a token type that is synthesized on the fly by :py:func:`pygments.token.string_to_tokentype` (when ``False``). .. describe:: gets **Type:** :py:class:`str` or :py:obj:`None` **Default:** :py:obj:`None` (yields ``⟵``) The operator symbol to be printed by the command ``\GETS``. An often used alternative is ``:=``. .. describe:: remark **Type:** :py:class:`str` or :py:obj:`None` **Default:** :py:obj:`None` (yields ``▷``) The symbol to be printed as when starting comments with ``\REMARK`` or ``\REM``. To use a lexer with non-default options in `Sphinx`_ see section :ref:`customized-sphinx-lexers`. Comments ======== - with the ``\REMARK`` or ``\REM`` keywords (until the end of the line; the output includes a leading symbol, by default ``▷``) - multi-line comments with ``/* ... */``; they can be **nested** - multi-line comments with ``(* ... *)``; they can be **nested** - single-line comments with ``//`` or ``#`` (until the end of the line) .. code-block:: algpseudocode /* * A single multiline comment */ /* * A multiline comment * * /* This is a nested multi-line comment */ * */ (* * A multiline comment * * (* This is a nested multi-line comment *) * *) // A single-line comment # A single-line comment \REM A remark is a single-line comment with a leading symbol Literals ======== Strings and numbers as in `Python`_. String prefixes ``u`` and ``b`` are supported---prefixes ``r``, ``f`` and ``t`` are not supported. To have non-string-delimiting single- and double-quotes in the output you have to escape them using ``\'`` or ``\"``. This must be used to typeset something as :algpseudocode:`f\'(x) = 0`. .. code-block:: algpseudocode 0 1234567890 0xdead 0b100001 0o720 2.7 2.7e-54 "A string with an escaped double-quote \" " 'Another string with an escaped single-quote \' ' """A multiline string """ '''Another multiline string ''' b"A \x20 byte string" u'An explicit Unicode \u1234 string' \" a non string \' a non string also (Mathematical) Symbols and Operators ==================================== Some ASCII symbol combinations are recognized and replaced by a Unicode symbol: .. code-block:: algpseudocode \TEXT{<=>} <=> \TEXT{<->} <-> \TEXT{<-} <- \TEXT{->} -> \TEXT{=>} => \TEXT{<=} <= \TEXT{>=} >= \TEXT{<>} <> \TEXT{!=} != \TEXT{:=} := \TEXT{=:} =: \TEXT{?=} ?= Unicode codepoints with property ``Sm`` are recognized as mathematical symbols and highlighted accordingly. Punctuation =========== Runs of dots ``.``, ``..``, ``...``, ``....``, ... are handled properly in expressions and yield a punctuation token. They are not replaced by corresponding Unicode symbols. Commands ======== - Start with a backslash character ``\`` - Case-insensitive - Yield mostly the :py:class:`pygments.token.Token.Keyword` token type - Translated if a translation is found - Depending on the command---may have required or optional parameters Parameter handling is as follows: * Parameters are enclosed in curly braces ``{`` and ``}`` * Escaping within the braces is possible using the backslash ``\`` as escape character * Parameters are separated from the keyword/command by a (possibly empty) run of space or TAB characters. This is true for required and optional parameters. - Unrecognized commands typically result in a :py:class:`pygments.token.Token.Error` token. More on escaping rules you can find in :ref:`this chapter <escaping-rules>`. Commands With Required Parameters --------------------------------- .. code-block:: algpseudocode \TEXT{\PROGRAM {A Program\} or \PROG {A Program\}} \PROGRAM {A Program} \TEXT{\ALGORITHM{An Algorithm\} or \ALGO{An Algorithm\}} \ALGORITHM{An Algorithm} \TEXT{\PROCEDURE{A Procedure\} or \PROC{A Procedure\}} \PROCEDURE{A Procedure} \TEXT{\FUNCTION{A Function\} or \FUNC{A Function\} or \FN{A Function\}} \FUNCTION{A Function} \TEXT{\CLASS{A Class\}} \CLASS{A Class} \TEXT{\STATEMENT{the expression\} \STATE{the expression\} \BLOCK{the expression\}} \STATEMENT{the expression} \TEXT{expr1: \\EXPRESSION{expression a in b\} expr2: \\EXPR{expression b in a\}} \TEXT{expr1: \EXPRESSION{expression a in b} expr2: \EXPR{expression b in a}} \TEXT{\TEXTSTATEMENT{the text\} \TEXTSTATE{the text\} \TSTATEMENT{the text\} \TSTATE{the text\} \TEXTBLOCK{the text\} \TBLOCK{the text\}} \TEXTSTATEMENT{the text} \TEXT{\INPUT{Input 1\}} \INPUT{Input 1} \TEXT{\INPUTS{Input 2\}} \INPUTS{Input 2} \TEXT{\OUTPUT{Output 1\}} \OUTPUT{Output 1} \TEXT{\OUTPUTS{Output 2\}} \OUTPUTS{Output 2} \TEXT{\ENSURE{Whatever should be ensured!\}} \ENSURE{Whatever should be ensured!} \TEXT{\REQUIRE{Whatever should be required.\}} \REQUIRE{Whatever should be required.} \TEXT{\RETURNS{Return 2\}} \RETURNS{Return 2} \TEXT{\CALL{a function\}(p1, p2)} \CALL{a function}(p1, p2) \TEXT{\NAME{an entity name\}} \NAME{an entity name} For a special command with two required parameters see `Explicit Token Types`_. Commands With Optional Parameters --------------------------------- Some ``END``-commands have optional parameters: .. code-block:: algpseudocode \TEXT{\ENDPROGRAM \ENDPROG} \ENDPROGRAM \TEXT{\ENDALGORITHM \ENDALGO} \ENDALGORITHM \TEXT{\ENDPROCEDURE \ENDPROC} \ENDPROCEDURE \TEXT{\ENDFUNCTION \ENDFUNC \ENDFN} \ENDFUNCTION \TEXT{\ENDCLASS} \ENDCLASS They are used like this: .. code-block:: algpseudocode \TEXT{\CLASS{Foo Bar Class\} ... \END CLASS {Foo Bar Class\}} \TEXT{yields} \CLASS{Foo Bar Class} ... \END CLASS {Foo Bar Class} \TEXT{\CLASS{Foo Bar Class\} ... \END CLASS} \TEXT{yields} \CLASS{Foo Bar Class} ... \END CLASS .. seealso:: For other syntax variants concerning `END` see also section `END-Commands`_. Commands Without Parameters --------------------------- "Normal" Commands ~~~~~~~~~~~~~~~~~ .. code-block:: algpseudocode \TEXT{\IF} \IF \TEXT{\THEN} \THEN \TEXT{\ELSE} \ELSE \TEXT{\ELSEIF or \ELSIF or \ELIF} \ELSEIF \text{or} \ELSIF \text{or} \ELIF \TEXT{\DO} \DO \TEXT{\WHILE} \WHILE \TEXT{\FORALL} \FORALL \TEXT{\FOR} \FOR \TEXT{\FROM} \FROM \TEXT{\TO} \TO \TEXT{\STEP} \STEP \TEXT{\IN} \IN \TEXT{\LOOP} \LOOP \TEXT{\REPEAT} \REPEAT \TEXT{\UNTIL} \UNTIL \TEXT{\RETURN} \RETURN \TEXT{\BEGIN} \BEGIN \TEXT{\END} \END \TEXT{\IS} \IS \TEXT{\WITH} \WITH \TEXT{\GETS} \GETS \TEXT{\\REMARK or \\REM} \REMARK A comment with a leading symbol ``\REMARK`` or ``\REM`` is special: all characters to the end of the line are taken as comment; curly braces are not needed---in fact: they are interpreted to be part of the comment. END-Commands ~~~~~~~~~~~~ The separator character can be empty, a run of ASCII spaces, a run of TAB characters, a single underscore ``_`` or a single hyphen ``-``. All of the following examples are equally valid and result in the same output: ``\ENDIF``, ``\END IF``, ``\END-IF``, ``\END_IF`` or ``\END IF`` .. code-block:: algpseudocode \text{\ENDIF} \ENDIF \rem empty \text{\END IF} \END IF \rem a single space \text{\END IF} \END IF \rem two spaces \text{\END-IF} \END-IF \rem a single hyphen \text{\END_IF} \END_IF \rem a single underscore \text{\END IF} \END IF \rem a single TAB character The list of END-commands (here always just with ``-`` as separator): .. code-block:: algpseudocode \text{\END-PROGRAM \END-PROG} \END-PROGRAM \text{\END-ALGORITHM \END-ALGO} \END-ALGORITHM \text{\END-PROCEDURE \END-PROC} \END-PROCEDURE \text{\END-FUNCTION \END-FUNC \END-FN} \END-FUNCTION \text{\END-CLASS} \END-CLASS \text{\END-IF} \END-IF \text{\END-WHILE} \END-WHILE \text{\END-FOR} \END-FOR \text{\END-FORALL} \END-FORALL \text{\END-LOOP} \END-LOOP .. note:: The output of these END-commands can be suppressed by setting the lexer option ``no_end`` to :py:obj:`True`. Names and Entities ================== In an `expression` context all other words are interpreted as entity names (token type :py:class:`pygments.token.Token.Name.Entity`). Allowed characters in the words follow the corresponding `Python`_ rules. As such, many Unicode characters are allowed. To highlight entity names with whitespace or other "special" characters in it use the ``NAME`` command. .. code-block:: algpseudocode \TEXT{entity_name_1} entity_name_1 \TEXT{entity_name_2} entity_name_2 \TEXT{\NAME{entity-name 3\}} \NAME{entity-name 3} \TEXT{München} München \TEXT{Genève} Genève .. note:: Should you want to change the token type and the associated highlighting you may want to have a look at :ref:`tokenreplacefilter`. .. _explicit-token-types: Explicit Token Types ==================== They allow to handle keywords and operators that are not recognized by default. And they allow the user to explicitely highlight some input text with a low-level command. They are implemented with the ``\ttX{ARG1}{ARG2}`` command. This command has two required parameters: #. The content of the first argument `ARG1` can be one of - A `value` in the :py:data:`pygments.token.STANDARD_TYPES` dict. Its corresponding token type (the associated `key` in this dictionary) will be used as token type for the token. - A string representation of an existing token type without the ``Token.`` prefix (e.g. ``String``, ``Generic``, ``Generic.EmphStrong``, ``Text``, ``Text.Multiline``). If a corresponding token type is not found the lexer's behaviour depends on the lexer option ``strict_tokentype`` (see `Lexer Options`_): If ``True`` (the default) the command yields a :py:class:`pygments.token.Token.Generic.Error` token type for the given command's content. If ``False`` then the `Pygments`_ function :py:func:`pygments.token.string_to_tokentype` will be called. This function returns either an existing token type or synthesizes a new one on the fly. The associated highlighting with freshly created token types in the output may not be well defined. For this argument escaping is neither needed nor supported. #. The content of the second argument will given the token type of the first parameter. Standard `Escaping Rules`_ apply to this argument! .. note:: The command for explicit token types is **case-sensitive**. .. rubric:: Examples: .. code-block:: algpseudocode \text{• \\ttX{\}{token\}} \ttX{}{token} \rem just a base "Token" \text{• \\ttX{kc\}{C\}} \ttX{kc}{C} \rem C as Keyword.Constant \text{• \\ttX{Keyword.Constant\}{C\}} \ttX{kc}{C} \rem C as Keyword.Constant \text{• \\ttX{ow\}{∈\}} \ttX{ow}{∈} \rem ∈ as Operator.Word \text{• \\ttX{Operator.Word\}{∈\}} \ttX{ow}{∈} \rem ∈ as Operator.Word \text{• \\ttX{kc\}{A Constant Keyword\}} \ttX{kc}{A Constant Keyword} \rem An explicit Keyword.Constant \text{• \\ttX{nv\}{A Variable Name\}} \ttX{nv}{A Variable Name} \rem An explicit Name.Variable \text{• \\ttX{ni\}{An Entity*Name\}} \ttX{ni}{An Entity*Name} \rem An explicit Name.Entity \text{• \\ttX{k\}{∈ ∌\}} \ttX{k}{∈ ∌} \rem ∈ and ∌ as (ordinary) Keywords \text{• \\ttX{o\}{∈ ∌\}} \ttX{o}{∈ ∌} \rem ∈ and ∌ as (ordinary) Operators /* * The line below has ∈_∌ as (peculiar) function name. * Their params are automatic (i.e. a normal expression). */ \text{• \\ttX{nf\}{∈_∌\}(p1, p2)} \ttX{nf}{∈_∌}(p1, p2) \text{• \\ttX{Name.Function\}{∈_∌\}(p1, p2)} \ttX{Name.Function}{∈_∌}(p1, p2) /* * The line below has ∈_∌ as (peculiar) decorator name (as used in Python). * Their params are automatic (i.e. a normal expression). */ \text{• \\ttX{nd\}{∈_∌\}(p1, p2)} \ttX{nd}{∈_∌}(p1, p2) \text{• \\ttX{Name.Decorator\}{∈_∌\}(p1, p2)} \ttX{Name.Decorator}{∈_∌}(p1, p2) /* * Normal emphasis ("strong") */ \text{• \\ttX{gs\}{this is strong\}} \ttX{gs}{this is strong} \text{• \\ttX{Generic.Strong\}{this is strong\}} \ttX{Generic.Strong}{this is strong} /* * A strong emphasis. */ \text{• \\ttX{ges\}{A Strong Emphasis!\}} \ttX{ges}{A Strong Emphasis!} \text{• \\ttX{Generic.EmphStrong\}{A Strong Emphasis!\}} \ttX{Generic.EmphStrong}{A Strong Emphasis!} /* * Escaping is allowed and needed for the closing brace! * The example token type is a "String". */ \text{• \\ttX{s\}{Escaping brace \\\} and backslash \\\\!\}} \ttX{s}{Escaping brace \} and backslash \\!} /* * This is a non-existing token type: * by default you get some generic error markup with a Generic.Error * token and no expansion. * See also `Lexer Options` and `strict_tokentype`. */ \text{• \\ttX{NON-EXISTING\}{∈_∌\}(p1, p2)} \ttX{NON-EXISTING}{∈_∌}(p1, p2) An example with a lexer and ``strict_tokentype=False`` (highlighting obviously is like standard text with the templates used): .. code-block:: nonstrict-algpseudocode \text{• \\ttX{Generic.Not.Yet.Existing\}{∈_∌\}(p1, p2)} \ttX{Generic.Not.Yet.Existing}{∈_∌}(p1, p2) .. note:: Explicit token types work in all `expression` and `text` contexts. .. note:: Nesting of explicit token types is *not supported*. .. _escaping-rules: Escaping Rules ============== The escape character is a backslash ``\``. A backslash can be escaped with ``\\`` and yields a single backslash token. Within parameters a closing curly brace ``}`` ends the current parameters environment. It must be escaped using ``\}`` if a closing curly brace is part of the argument content. A single backslash yields a :py:class:`pygments.token.Token.Generic.Error` token when in `default` and `expression` states (and also in `Explicit Token Types`_). Contrary---in `text` contexts a single backslash character that does not introduce a command yields a normal text token. In all contexts a backslash that would normally introduce a known command must be escaped if the content should not recognized as a command. Single- and double-quotes must be escaped also (``\"`` or ``\'``) in `default` and `expression` contexts when they should not introduce a string token. .. _customized-sphinx-lexers: Customized Lexers in Sphinx =========================== Defining lexers with non-default options in `Sphinx`_ can be done in its configuration file :file:`conf.py`. The first option is to apply the Sphinx config value ``highlight_options`` properly. An existing lexer can be customized by options. A more flexible alternative is to define a new lexer in the Sphinx application. The very same lexer class can be used with different options: .. code-block:: python from functools import partial from pygments_lexer_pseudocode2.lexers.algpseudocode import AlgPseudocodeLexer def setup(app): # # Add a custom lexer: AlgPseudocodeLexer with custom init # option "no_end". # # In modern Sphinx versions given lexer must be callable and may # not be a lexer instance. So use an indirection with "partial" # here. # app.add_lexer("noend-algpseudocode", partial(AlgPseudocodeLexer, no_end=True)) .. _note-raiseonerror-filter: .. note:: Lexers in Sphinx are instantiated with the `raiseonerror` filter applied by default. This is also true for custom lexers that are added by :py:meth:`Sphinx.add_lexer`. Using the `filters` option the user can associate custom filters with a lexer. These filters have precedence over the default `raiseonerror` filter. Lexer *instances* that are added to :py:data:`sphinx.highlighting.lexers` somehow are taken as is by Sphinx and are not augmented with any default filters. See also chapter :ref:`filters`. For older Sphinx versions your mileage may vary. Some Examples ============= .. rubric:: Synthetic Example The first example is a synthetic example with many features. .. only:: builder_html Its source code is in :download:`examples/example-1.pseudocode`. .. raw:: latex Its source code can be found at \url{example-1.pseudocode}. .. literalinclude:: examples/example-1.pseudocode :language: algpseudocode :lines: 2- The highlighted output with a customized `AlgPseudocodeLexer` and its `no_end` option set to :py:obj:`True`: .. literalinclude:: examples/example-1.pseudocode :language: NoEndAlgPseudocode :lines: 2- .. rubric:: Dinic's Algorithm The second example is Wikipedia's description of *Dinic's Algorithm* (see https://en.wikipedia.org/wiki/Dinic%27s_algorithm). .. only:: builder_html Its source code is in :download:`examples/algorithm-dinic.pseudocode`. .. raw:: latex Its source code can be found at \url{algorithm-dinic.pseudocode}. .. literalinclude:: examples/algorithm-dinic.pseudocode :language: algpseudocode :lines: 2- .. rubric:: Ford–Fulkerson Algorithm The third example is Wikipedia's pseudocode of the *Ford–Fulkerson Algorithm* (see https://en.wikipedia.org/wiki/Ford%E2%80%93Fulkerson_algorithm). .. only:: builder_html Its source code is in :download:`examples/algorithm-ford-fulkerson.pseudocode`. .. raw:: latex Its source code can be found at \url{algorithm-ford-fulkerson.pseudocode}. .. literalinclude:: examples/algorithm-ford-fulkerson.pseudocode :language: algpseudocode :lines: 2- .. rubric:: Edmonds–Karp Algorithm The fourth example is Wikipedia's pseudocode of the *Edmonds–Karp Algorithm* (see https://en.wikipedia.org/wiki/Edmonds%E2%80%93Karp_algorithm) with a custom lexer which skips all ``ENDxxx`` keywords. .. only:: builder_html Its source code is in :download:`examples/algorithm-edmonds-karp.pseudocode`. .. raw:: latex Its source code can be found at \url{algorithm-edmonds-karp.pseudocode}. .. literalinclude:: examples/algorithm-edmonds-karp.pseudocode :language: NoEndAlgPseudocode :lines: 2- And now the *Edmonds–Karp Algorithm* with **french** keywords: .. literalinclude:: examples/algorithm-edmonds-karp.pseudocode :language: algpseudocode-fr :lines: 2- And again the *Edmonds–Karp Algorithm* with **german** keywords: .. literalinclude:: examples/algorithm-edmonds-karp.pseudocode :language: algpseudocode-de :lines: 2-
