view docs/lexer-algpseudocode.rst @ 198:6aa1d89cd869

Typo
author Franz Glasner <fzglas.hg@dom66.de>
date Wed, 13 May 2026 12:34:41 +0200
parents 403b500e0ed4
children b6a959c31bed
line wrap: on
line source

.. -*- coding: utf-8; indent-tabs-mode: nil; -*-


*************************************
 AlgPseudocode and Language Variants
*************************************

These lexers are heavily inspired by CTAN’s `Algpseudocodex`_.
They recognize expressions and additionally all sorts of comments and
commands that are inspired by `Algpseudocodex`_.

They may be used in `Sphinx`_ by their aliases.
The code-block:

.. code-block:: none

   .. code-block:: algpseudocode

      \PROGRAM {The Pseudoprogram} \IS

      \END PROGRAM {The Pseudoprogram}

will be rendered as:

.. code-block:: algpseudocode

   \PROGRAM {The Pseudoprogram} \IS

   \END PROGRAM {The Pseudoprogram}

And the same code-block with the german variant
(using ``.. code-block:: algpseudocode-de`` as language alias):

.. code-block:: algpseudocode-de

   \PROGRAM {The Pseudoprogram} \IS

   \END PROGRAM {The Pseudoprogram}


States
======

The AlgPseudocode lexer and its language variants AlgPseudocodeDE and
AlgPseudocodeFR basically work in three states: `default`,
`expression` and `text`.

  In expressions it automatically recognizes:

  - Strings (single-quote, double-quote, triple-single-quote,
    triple-double-quote, `Python`_ style)
  - Numbers (also `Python`_ style)
  - (Mathematical) operators and symbols
  - ``\TEXT{...}``

    To switch in a text-mode that prohibits automatic expression
    highlighting.

    A closing curly brace can be quoted with ``\}`` to not end the
    text mode prematurely.

  - ``\NAME``, ``\CALL`` and ``\GETS``

  - ``\REM`` and ``\REMARK`` for remarks (aka comments)

  - Names (`Name.Entity`)

  - :ref:`explicit-token-types`

  In the default-mode it recognizes expressions and additionally all
  sorts of comments and commands that look somewhat like `Algpseudocodex`_
  commands.

  In texts it recognizes:

  - ``\EXPR`` or ``\EXPRESSION``

    To switch to expression-mode.

    A closing curly brace can be quoted with ``\}`` to not end the expression
    mode prematurely.

  - ``\REM`` and ``\REMARK`` for remarks (aka comments)

  - :ref:`explicit-token-types`


Lexer Options
=============

  .. describe:: prohibit_raiseonerror_filter

     **Type:** :py:class:`bool`

     **Default:** :py:obj:`None`

     If :py:obj:`True` the `raiseonerror` filter is not allowed to be
     applied by `Sphinx`_ when :py:meth:`Lexer.add_filter` is called.

     This setting does not apply to filters that are set by the standard
     lexer option `filters`.

  .. describe:: no_end

     **Type:** :py:class:`bool`

     **Default:** :py:obj:`False`

     If :py:obj:`True` all the ``\ENDxxx`` commands will be skipped and yield
     no output.

  .. describe:: gets

     **Type:** :py:class:`str` or :py:obj:`None`

     **Default:** :py:obj:`None` (yields ``←``)

     The operator symbol to be printed by the command ``\GETS``.

     An often used alternative is ``:=``.

  .. describe:: remark

     **Type:** :py:class:`str` or :py:obj:`None`

     **Default:** :py:obj:`None` (yields ``▷``)

     The symbol to be printed as when starting comments with
     ``\REMARK`` or ``\REM``.

  To use a lexer with non-default options in `Sphinx`_ see section
  :ref:`customized-sphinx-lexers`.


Comments
========

- with the ``\REMARK`` or ``\REM`` keywords (this includes a leading symbol)
- multi-line comments with ``/* ... */``; they can be **nested**
- multi-line comments with ``(* ... *)``; they can be **nested**
- single-line comments with ``//`` or ``#`` (until the end of the line)

.. code-block:: algpseudocode

   /*
    * A single multiline comment
    */

   /*
    * A multiline comment
    *
    * /* This is a nested multi-line comment */
    *
    */

   (*
    * A multiline comment
    *
    * (* This is a nested multi-line comment *)
    *
    *)

   // A single-line comment

   # A single-line comment

   \REM A remark has a leading symbol


Literals
========

Strings and numbers as in `Python`_. String prefixes ``r``, ``f`` and ``t``
are not supported -- ``u`` and ``b`` are.

To yield non-string-delimiting single- and double-quotes you have to escape them
using ``\'`` or ``\"``. This must be used to typeset something as
:algpseudocode:`f\\'(x) = 0`.

.. code-block:: algpseudocode

   0  0xdead 0b100001 0o720  2.7 2.7e-54

   "A string with an escaped double-quote \" "

   'Another string with an escaped single-quote \' '

   """A multiline
   string
   """

   '''Another multiline string

   '''

   b"A \x20 byte string"

   u'An explicit Unicode \u1234 string'

   \"  a non string

   \'  a non string also


(Mathematical) Symbols and Operators
====================================

Some ASCII symbol combinations are recognized and replaced by a
Unicode symbol:

.. code-block:: algpseudocode

   \TEXT{<=>}    <=>
   \TEXT{<->}    <->
   \TEXT{<-}     <-
   \TEXT{->}     ->
   \TEXT{=>}     =>
   \TEXT{<=}     <=
   \TEXT{>=}     >=
   \TEXT{<>}     <>
   \TEXT{!=}     !=
   \TEXT{:=}     :=
   \TEXT{=:}     =:
   \TEXT{?=}     ?=

Unicode codepoints with property ``Sm`` are recognized as mathematical symbol
and highlighted accordingly.


Punctuation
===========

Runs of dots ``.``, ``..``, ``...``, ``....``, ... are handled
properly in expressions and yield a punctuation token.
They are not replaced by corresponding Unicode symbols.


Commands
========

- Start with a backslash character ``\``
- Case-insensitive
- Yield mostly to :py:class:`pygments.Token.Keyword`
- Translated if a translation is found
- Depending on the command---may have required or optional parameters

  Parameter handling is as follows:

  * Parameters are enclosed in curly braces ``{`` and ``}``
  * Escaping within the braces is possible using the backslash ``\`` as
    escape character
  * Parameters are separated from the keyword/command by a (possibly empty) run
    of space or TAB characters.
    This is true for required and optional parameters.

    .. todo:: Escaping

              A single backslash yields a Generic.Error token when in
              `default` and `expression` states.


Commands With Required Parameters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: algpseudocode

   \TEXT{\PROGRAM {A Program\}  or  \PROG {A Program\}}                                    \PROGRAM {A Program}
   \TEXT{\ALGORITHM{An Algorithm\}  or  \ALGO{An Algorithm\}}                              \ALGORITHM{An Algorithm}
   \TEXT{\PROCEDURE{A Procedure\}  or  \PROC{A Procedure\}}                                \PROCEDURE{A Procedure}
   \TEXT{\FUNCTION{A Function\}  or  \FUNC{A Function\}  or  \FN{A Function\}}              \FUNCTION{A Function}
   \TEXT{\CLASS{A Class\}}                                                                \CLASS{A Class}

   \TEXT{\STATEMENT{the expression\}  \STATE{the expression\}  \BLOCK{the expression\}}     \STATEMENT{the expression}

   \TEXT{expr1: \\EXPRESSION{expression a in b\}   expr2: \\EXPR{expression b in a\}}        \TEXT{expr1: \EXPRESSION{expression a in b}   expr2: \EXPR{expression b in a}}

   \TEXT{\TEXTSTATEMENT{the text\}  \TEXTSTATE{the text\}  \TSTATEMENT{the text\}  \TSTATE{the text\}  \TEXTBLOCK{the text\}  \TBLOCK{the text\}}             \TEXTSTATEMENT{the text}

   \TEXT{\INPUT{Input 1\}}                            \INPUT{Input 1}
   \TEXT{\INPUTS{Input 2\}}                           \INPUTS{Input 2}

   \TEXT{\OUTPUT{Output 1\}}                          \OUTPUT{Output 1}
   \TEXT{\OUTPUTS{Output 2\}}                         \OUTPUTS{Output 2}

   \TEXT{\ENSURE{Whatever should be ensured!\}}       \ENSURE{Whatever should be ensured!}

   \TEXT{\REQUIRE{Whatever should be required.\}}     \REQUIRE{Whatever should be required.}

   \TEXT{\RETURNS{Return 2\}}                         \RETURNS{Return 2}

   \TEXT{\CALL{a function\}(p1, p2)}                  \CALL{a function}(p1, p2)

   \TEXT{\NAME{an entity name\}}                      \NAME{an entity name}


Commands With Optional Parameters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Some ``END``-commands have optional parameters:

.. code-block:: algpseudocode

   \TEXT{\ENDPROGRAM  \ENDPROG}              \ENDPROGRAM
   \TEXT{\ENDALGORITHM  \ENDALGO}            \ENDALGORITHM
   \TEXT{\ENDPROCEDURE  \ENDPROC}            \ENDPROCEDURE
   \TEXT{\ENDFUNCTION  \ENDFUNC  \ENDFN}     \ENDFUNCTION
   \TEXT{\ENDCLASS}                          \ENDCLASS

They are used like this:

.. code-block:: algpseudocode

   \TEXT{\CLASS{Foo Bar Class\} ... \END CLASS {Foo Bar Class\}}   \TEXT{yields}   \CLASS{Foo Bar Class} ... \END CLASS {Foo Bar Class}

   \TEXT{\CLASS{Foo Bar Class\} ... \END CLASS}                   \TEXT{yields}   \CLASS{Foo Bar Class} ... \END CLASS

.. seealso:: For other syntax variants concerning `END` see also section
             `END-Commands`_.


Commands Without Parameters
~~~~~~~~~~~~~~~~~~~~~~~~~~~

"Normal" Commands
'''''''''''''''''

.. code-block:: algpseudocode

   \TEXT{\IF}                                \IF
   \TEXT{\THEN}                              \THEN
   \TEXT{\ELSE}                              \ELSE
   \TEXT{\ELSEIF or \ELSIF  or  \ELIF}       \ELSEIF \text{or} \ELSIF \text{or} \ELIF
   \TEXT{\DO}                                \DO
   \TEXT{\WHILE}                             \WHILE
   \TEXT{\FORALL}                            \FORALL
   \TEXT{\FOR}                               \FOR
   \TEXT{\FROM}                              \FROM
   \TEXT{\TO}                                \TO
   \TEXT{\STEP}                              \STEP
   \TEXT{\IN}                                \IN
   \TEXT{\LOOP}                              \LOOP
   \TEXT{\REPEAT}                            \REPEAT
   \TEXT{\UNTIL}                             \UNTIL

   \TEXT{\RETURN}                            \RETURN

   \TEXT{\BEGIN}                             \BEGIN
   \TEXT{\END}                               \END

   \TEXT{\IS}                                \IS
   \TEXT{\WITH}                              \WITH

   \TEXT{\GETS}                              \GETS

   \TEXT{\\REMARK   or   \\REM}                \REMARK A comment with a leading symbol

``\REMARK`` or ``\REM`` is special: all characters to the end of the
line are taken as comment; curly braces are not needed---in fact:
they are interpreted to be part of the comment.


END-Commands
''''''''''''

The separator character can be empty, a run of ASCII spaces, a run of
TAB characters, a single underscore ``_`` or a single hyphen ``-``
like:

  ``\ENDIF``, ``\END IF``, ``\END-IF``, ``\END_IF`` or ``\END IF``


.. code-block:: algpseudocode

   \text{\ENDIF}             \ENDIF     \rem empty

   \text{\END IF}            \END IF     \rem a single space

   \text{\END  IF}           \END  IF     \rem two spaces

   \text{\END-IF}            \END-IF     \rem a single hyphen

   \text{\END_IF}            \END_IF     \rem a single underscore

   \text{\END   IF}          \END       IF     \rem a single TAB character

The list of END-commands (here always just with ``-`` as separator):

.. code-block:: algpseudocode

   \text{\END-PROGRAM  \END-PROG}              \END-PROGRAM
   \text{\END-ALGORITHM  \END-ALGO}            \END-ALGORITHM
   \text{\END-PROCEDURE  \END-PROC}            \END-PROCEDURE
   \text{\END-FUNCTION  \END-FUNC  \END-FN}    \END-FUNCTION
   \text{\END-CLASS}                           \END-CLASS
   \text{\END-IF}                              \END-IF
   \text{\END-WHILE}                           \END-WHILE
   \text{\END-FOR}                             \END-FOR
   \text{\END-FORALL}                          \END-FORALL
   \text{\END-LOOP}                            \END-LOOP

.. note:: The output of END-commands can be suppressed by setting the
          lexer option ``no_end`` to :py:obj:`True`.


Names and Entities
==================

In an expression context all other words are interpreted as entity
names (token type :py:class:`pygments.token.Token.Name.Entity`).

Allowed characters in the words follow the corresponding `Python`_ rules.
As such, many Unicode characters are allowed.

To highlight entity names with whitespace or other "special" characters in it
use the ``NAME`` command.

.. code-block:: algpseudocode

   \TEXT{entity_name_1}          entity_name_1

   \TEXT{entity_name_2}          entity_name_2

   \TEXT{\NAME{entity-name 3\}}   \NAME{entity-name 3}

   \TEXT{München}                München

   \TEXT{Genève}                 Genève


.. _explicit-token-types:

Explicit Token Types
====================

Handle keywords and operators that are not handled by default or change
the default handling of some expressions.

`XX` represents a `value` in the :py:data:`pygments.token.STANDARD_TYPES`
dict.
Its corresponding token type (the associated `key` in this `dict`) is
used as token type.

``\\tt-XX/SINGLE-CHAR``

  no escaping needed

  `SINGLE-CHAR` is a single character and can be *every* character
  (including a carriage-return or line-feed)

``\\ttx-XX{CHARACTERS}``

``\\ttx-XX(CHARACTERS)``

``\\ttx-XX[CHARACTERS]``

``\\ttx-XX<CHARACTERS>``

``\\ttx-XX<SEP>CHARACTERS<SEP>``

  No escaping possible! There are enough alternatives available!

  `SEP` is one of ``/:|=*+!\$~``.


Examples:

.. code-block:: algpseudocode

   \text{• \\tt-kc/C}      \tt-kc/C            \rem C as Keyword.Constant
   \text{• \\tt-ow/∈}      \tt-ow/∈            \rem ∈ as Operator.Word
   \text{• \\ttx-kc{A New Constant Keyword\}}    \ttx-kc{A New Constant Keyword}  \rem As a new Keyword.Constant
   \text{• \\ttx-nv{A New Variable Name\}}       \ttx-nv{A New Variable Name}     \rem An explicit Name.Variable
   \text{• \\ttx-k(∈ ∌)}   \ttx-k(∈ ∌)         \rem ∈ and ∌ as (ordinary) Keywords
   \text{• \\ttx-o<∈ ∌>}   \ttx-o<∈ ∌>         \rem ∈ and ∌ as (ordinary) Operators
     /*
      * The line below has ∈_∌ as (peculiar) function name.
      * Their params are automatic (i.e. a normal expression).
      */
   \text{• \\ttx-nf<∈_∌>(p1, p2)}                \ttx-nf<∈_∌>(p1, p2)
     /*
      * The line below has ∈_∌ as (peculiar) decorator name (as used in Python).
      * Their params are automatic (i.e. a normal expression).
      */
   \text{• \\ttx-nd[∈_∌](p1, p2)}                \ttx-nd[∈_∌](p1, p2)
     /*
      * This is a non-existing token type: you get some generic error marking
      * with a Generic.Error token and no expansion.
      */
   \text{• \\ttx-NON-EXISTING[∈_∌](p1, p2)}      \ttx-NON_EXISTING[∈_∌](p1, p2)

.. note:: Explicit token types are **case-sensitive**.


.. _customized-sphinx-lexers:

Customized Lexers in Sphinx
===========================

Defining lexers with non-default options in `Sphinx`_ can be done in its
configuration file :file:`conf.py`.

The first option is to apply the Sphinx config value ``highlight_options``
properly. An existing lexer can be customized by options.

A more flexible alternative is to define a new lexer in the Sphinx
application. The very same lexer class can be used with different options:

.. code-block:: python

   from functools import partial
   from pygments_lexer_pseudocode2.lexers.algpseudocode import AlgPseudocodeLexer

   def setup(app):

       #
       # Add a custom lexer: AlgPseudocodeLexer with custom init
       # option "no_end".
       #
       # In modern Sphinx versions given lexer must be callable and may
       # not be a lexer instance. So use an indirection with "partial"
       # here.
       #
       app.add_lexer("noend-algpseudocode",
                     partial(AlgPseudocodeLexer, no_end=True))

Similarily it works for custom styles and filters.

.. note:: Lexers in Sphinx are instantiated with the `raiseonerror` filter
          applied by default.
          This is also true for custom lexers that are added by
          :py:meth:`Sphinx.add_lexer`.

          Lexer *instances* that are added to
          :py:data:`sphinx.highlighting.lexers` somehow are taken as is by
          Sphinx and are not augmented with any default filters.

For older Sphinx versions your mileage may vary.


Some Examples
=============

A synthetic example with many features.

.. only:: builder_html

   Its source code is in :download:`examples/example-1.pseudocode`.

.. raw:: latex

   Its source code can be found at \url{example-1.pseudocode}.

.. literalinclude:: examples/example-1.pseudocode
   :language: algpseudocode
   :lines: 2-

With a customized `AlgPseudocodeLexer` and its `no_end`
option set to :py:obj:`True`.

.. literalinclude:: examples/example-1.pseudocode
   :language: NoEndAlgPseudocode
   :lines: 2-

The second example is Wikipedia's description of *Dinic's Algorithm*
(see https://en.wikipedia.org/wiki/Dinic%27s_algorithm).

.. only:: builder_html

   Its source code is in :download:`examples/algorithm-dinic.pseudocode`.

.. raw:: latex

   Its source code can be found at \url{algorithm-dinic.pseudocode}

.. literalinclude:: examples/algorithm-dinic.pseudocode
   :language: algpseudocode
   :lines: 2-

The third example is Wikipedia's pseudocode of the *Ford–Fulkerson Algorithm*
(see https://en.wikipedia.org/wiki/Ford%E2%80%93Fulkerson_algorithm).

.. only:: builder_html

   Its source code is in
   :download:`examples/algorithm-ford-fulkerson.pseudocode`.

.. raw:: latex

   Its source code can be found at \url{algorithm-ford-fulkerson.pseudocode}.

.. literalinclude:: examples/algorithm-ford-fulkerson.pseudocode
   :language: algpseudocode
   :lines: 2-

The fourth example is Wikipedia's pseudocode of the *Edmonds–Karp Algorithm*
(see https://en.wikipedia.org/wiki/Edmonds%E2%80%93Karp_algorithm)
with a custom lexer which skips all ``ENDxxx`` keywords.

.. only:: builder_html

   Its source code is in
   :download:`examples/algorithm-edmonds-karp.pseudocode`.

.. raw:: latex

   Its source code can be found at \url{algorithm-edmonds-karp.pseudocode}.

.. literalinclude:: examples/algorithm-edmonds-karp.pseudocode
   :language: NoEndAlgPseudocode
   :lines: 2-

And now the *Edmonds–Karp Algorithm* with **french** keywords:

.. literalinclude:: examples/algorithm-edmonds-karp.pseudocode
   :language: algpseudocode-fr
   :lines: 2-

And again the *Edmonds–Karp Algorithm* with **german** keywords:

.. literalinclude:: examples/algorithm-edmonds-karp.pseudocode
   :language: algpseudocode-de
   :lines: 2-