view docs/details-algpseudocode.rst @ 160:b4028838e0c8

Implement lexer option "prohibit_raiseonerror_filter". Sphinx raises by default when an Error token is seen (by means of the "raiseonerror" filter that is applied by default to lexers in Sphinx). This option skips this and allows error locations to be seen and highlighted properly. While there convert most Generic.Error tokens to Error tokens because now they can be handled by a lexer with "prohibit_raiseonerror_filter=True".
author Franz Glasner <fzglas.hg@dom66.de>
date Fri, 08 May 2026 17:46:28 +0200
parents 4ee0b1536ea6
children 11ce0903ff8b
line wrap: on
line source

.. -*- coding: utf-8; indent-tabs-mode: nil; -*-


.. _details-algpseudocode:

***************
 AlgPseudocode
***************

Lexer Options
=============

  .. describe:: prohibit_raiseonerror_filter

     **Type:** `bool`

     **Default:** `False`

     If ``True`` the `raiseonerror` filter is not allowed to be applied by
     `Sphinx`_ when :py:meth:`Lexer.add_filter` is called.
     
     This setting does not apply to filters that are set by the standard
     lexer option `filters`.

  .. describe:: no_end

     **Type:** `bool`

     **Default:** `False`

     If ``True`` all the ``\ENDxxx`` commands will be skipped and yield
     nothing.

  .. describe:: gets

     **Type:** `str` or `None`

     **Default:** `None` (yields ``←``)

     The operator symbol to be printed by the command ``\GETS``.

     An often used alternative is ``:=``.

  .. describe:: remark

     **Type:** `str` or `None`

     **Default:** `None` (yields ``▷``)

     The symbol to be printed as when starting comments with
     ``\REMARK`` or ``\REM``.

  To use a lexer with non-default options in `Sphinx`_ see section
  :ref:`customized-sphinx-lexers`.


Comments
========

- with the ``\REMARK`` or ``\REM`` keywords (this includes a leading symbol)
- multi-line comments with ``/* ... */``; they can be **nested**
- multi-line comments with ``(* ... *)``; they can be **nested**
- single-line comments with ``//`` or ``#`` (until the end of the line)

.. code-block:: algpseudocode

   /*
    * A single multiline comment
    */

   /*
    * A multiline comment
    *
    * /* This is a nested multi-line comment */
    *
    */

   (*
    * A multiline comment
    *
    * (* This is a nested multi-line comment *)
    *
    *)

   // A single-line comment

   # A single-line comment

   \REM A remark has a leading symbol


Literals
========

Strings and numbers as in `Python`_. String prefixes ``r``, ``f`` and ``t``
are not supported -- ``u`` and ``b`` are.

To yield non-string-delimiting single- and double-quotes you have to escape them
using ``\'`` or ``\"``. This must be used to typeset something as
:algpseudocode:`f\\'(x) = 0`.

.. code-block:: algpseudocode

   0  0xdead 0b100001 0o720  2.7 2.7e-54

   "A string with an escaped double-quote \" "

   'Another string with an escaped single-quote \' '

   """A multiline
   string
   """

   '''Another multiline string

   '''

   b"A \x20 byte string"

   u'An explicit Unicode \u1234 string'

   \"  a non string

   \'  a non string also


(Mathematical) Symbols and Operators
====================================

Some ASCII symbol combinations are recognized and replaced by a
Unicode symbol:

.. code-block:: algpseudocode

   \TEXT{<=>}    <=>
   \TEXT{<->}    <->
   \TEXT{<-}     <-
   \TEXT{->}     ->
   \TEXT{=>}     =>
   \TEXT{<=}     <=
   \TEXT{>=}     >=
   \TEXT{<>}     <>
   \TEXT{!=}     !=
   \TEXT{:=}     :=
   \TEXT{=:}     =:
   \TEXT{?=}     ?=

Unicode codepoints with property ``Sm`` are recognized as mathematical symbol
and highlighted accordingly.


Punctuation
===========

Runs of dots ``.``, ``..``, ``...``, ``....``, ... are handled
properly in expressions and yield a punctuation token.
They are not replaced by corresponding Unicode symbols.


Keywords
========

Explicit Keywords
-----------------

- Start with a backslash character ``\``
- Case-insensitive
- Translated if a translation is found

Parameter handling is as follows:

- Parameters are enclosed in curly braces ``{`` and ``}``
- Escaping within the braces is possible using the backslash ``\``
- Parameters are separated from the keyword/command by a (possibly empty) run
  of space or TAB characters.
  This is true for required and optional parameters.


With Required Parameters
~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: algpseudocode

   \TEXT{\PROGRAM {A Program\}  or  \PROG {A Program\}}                                    \PROGRAM {A Program}
   \TEXT{\ALGORITHM{An Algorithm\}  or  \ALGO{An Algorithm\}}                              \ALGORITHM{An Algorithm}
   \TEXT{\PROCEDURE{A Procedure\}  or  \PROC{A Procedure\}}                                \PROCEDURE{A Procedure}
   \TEXT{\FUNCTION{A Function\}  or  \FUNC{A Function\}  or  \FN{A Function\}}              \FUNCTION{A Function}
   \TEXT{\CLASS{A Class\}}                                                                \CLASS{A Class}

   \TEXT{\STATEMENT{the expression\}  \STATE{the expression\}  \BLOCK{the expression\}}     \STATEMENT{the expression}

   \TEXT{expr1: \\EXPRESSION{expression a in b\}   expr2: \\EXPR{expression b in a\}}        \TEXT{expr1: \EXPRESSION{expression a in b}   expr2: \EXPR{expression b in a}}

   \TEXT{\TEXTSTATEMENT{the text\}  \TEXTSTATE{the text\}  \TSTATEMENT{the text\}  \TSTATE{the text\}  \TEXTBLOCK{the text\}  \TBLOCK{the text\}}             \TEXTSTATEMENT{the text}

   \TEXT{\INPUT{Input 1\}}                            \INPUT{Input 1}
   \TEXT{\INPUTS{Input 2\}}                           \INPUTS{Input 2}

   \TEXT{\OUTPUT{Output 1\}}                          \OUTPUT{Output 1}
   \TEXT{\OUTPUTS{Output 2\}}                         \OUTPUTS{Output 2}

   \TEXT{\ENSURE{Whatever should be ensured!\}}       \ENSURE{Whatever should be ensured!}

   \TEXT{\REQUIRE{Whatever should be required.\}}     \REQUIRE{Whatever should be required.}

   \TEXT{\RETURNS{Return 2\}}                         \RETURNS{Return 2}

   \TEXT{\CALL{a function\}(p1, p2)}                  \CALL{a function}(p1, p2)

   \TEXT{\NAME{an entity name\}}                      \NAME{an entity name}


With Optional Parameters
~~~~~~~~~~~~~~~~~~~~~~~~

Some ``END``-keywords have optional parameters:

.. code-block:: algpseudocode

   \TEXT{\ENDPROGRAM  \ENDPROG}              \ENDPROGRAM
   \TEXT{\ENDALGORITHM  \ENDALGO}            \ENDALGORITHM
   \TEXT{\ENDPROCEDURE  \ENDPROC}            \ENDPROCEDURE
   \TEXT{\ENDFUNCTION  \ENDFUNC  \ENDFN}     \ENDFUNCTION
   \TEXT{\ENDCLASS}                          \ENDCLASS

They are used like this:

.. code-block:: algpseudocode

   \TEXT{\CLASS{Foo Bar Class\} ... \END CLASS {Foo Bar Class\}}   \TEXT{yields}   \CLASS{Foo Bar Class} ... \END CLASS {Foo Bar Class}

.. seealso:: Syntax variants: `END-Keywords`_


Without Parameters
~~~~~~~~~~~~~~~~~~

"Normal" Keywords
'''''''''''''''''

.. code-block:: algpseudocode

   \TEXT{\IF}                                \IF
   \TEXT{\THEN}                              \THEN
   \TEXT{\ELSE}                              \ELSE
   \TEXT{\ELSEIF or \ELSIF  or  \ELIF}       \ELSEIF \text{or} \ELSIF \text{or} \ELIF
   \TEXT{\DO}                                \DO
   \TEXT{\WHILE}                             \WHILE
   \TEXT{\FORALL}                            \FORALL
   \TEXT{\FOR}                               \FOR
   \TEXT{\FROM}                              \FROM
   \TEXT{\TO}                                \TO
   \TEXT{\STEP}                              \STEP
   \TEXT{\IN}                                \IN
   \TEXT{\LOOP}                              \LOOP
   \TEXT{\REPEAT}                            \REPEAT
   \TEXT{\UNTIL}                             \UNTIL

   \TEXT{\RETURN}                            \RETURN

   \TEXT{\BEGIN}                             \BEGIN
   \TEXT{\END}                               \END

   \TEXT{\IS}                                \IS
   \TEXT{\WITH}                              \WITH

   \TEXT{\GETS}                              \GETS

   \TEXT{\\REMARK   or   \\REM}                \REMARK A comment with a leading symbol

``\REMARK`` or ``\REM`` is special: all characters to the end of the
line are taken as comment; curly braces are not needed---in fact:
they are interpreted to be part of the comment.


END-Keywords
''''''''''''

The separator character can be empty, a run of ASCII spaces, a run of TAB characters,
a single underscore ``_`` or a single hyphen ``-`` like:

  ``\ENDIF``, ``\END IF``, ``\END-IF``, ``\END_IF`` or ``\END IF``


.. code-block:: algpseudocode

   \text{\ENDIF}             \ENDIF     \rem empty

   \text{\END IF}            \END IF     \rem a single space

   \text{\END  IF}           \END  IF     \rem two spaces

   \text{\END-IF}            \END-IF     \rem a single hyphen

   \text{\END_IF}            \END_IF     \rem a single underscore

   \text{\END   IF}          \END       IF     \rem a single TAB character

The list of END-keywords (here always just with ``-`` as separator):

.. code-block:: algpseudocode

   \text{\END-PROGRAM  \END-PROG}              \END-PROGRAM
   \text{\END-ALGORITHM  \END-ALGO}            \END-ALGORITHM
   \text{\END-PROCEDURE  \END-PROC}            \END-PROCEDURE
   \text{\END-FUNCTION  \END-FUNC  \END-FN}    \END-FUNCTION
   \text{\END-CLASS}                           \END-CLASS
   \text{\END-IF}                              \END-IF
   \text{\END-WHILE}                           \END-WHILE
   \text{\END-FOR}                             \END-FOR
   \text{\END-FORALL}                          \END-FORALL
   \text{\END-LOOP}                            \END-LOOP


Names and Entities
==================

In an expression context all other words are interpreted as entity
names (token type :py:class:`pygments.token.Token.Name.Entity`).

Allowed characters in the words follow the corresponding `Python`_ rules.
As such, many Unicode characters are allowed.

To highlight entity names with whitespace or other "special" characters in it
use the ``NAME`` command.

.. code-block:: algpseudocode

   \TEXT{entity_name_1}          entity_name_1

   \TEXT{entity_name_2}          entity_name_2

   \TEXT{\NAME{entity-name 3\}}   \NAME{entity-name 3}

   \TEXT{München}                München

   \TEXT{Genève}                 Genève

.. _explicit-token-types:

Explicit Token Types
====================

Handle keywords and operators that are not handled by default or change
the default handling of some expressions.

`XX` represents a `value` in the :py:data:`pygments.token.STANDARD_TYPES`
dict.
Its corresponding token type (the associated `key` in this `dict`) is
used as token type.

``\\tt-XX/SINGLE-CHAR``

  no escaping needed

  `SINGLE-CHAR` is a single character and can be *every* character
  (including a carriage-return or line-feed)

``\\ttx-XX{CHARACTERS}``

``\\ttx-XX(CHARACTERS)``

``\\ttx-XX[CHARACTERS]``

``\\ttx-XX<CHARACTERS>``

``\\ttx-XX<SEP>CHARACTERS<SEP>``

  No escaping possible! There are enough alternatives available!

  `SEP` is one of ``/:|=*+!\$~``.


Examples:

.. code-block:: algpseudocode

   \text{• \\tt-kc/C}      \tt-kc/C            \rem C as Keyword.Constant
   \text{• \\tt-ow/∈}      \tt-ow/∈            \rem ∈ as Operator.Word
   \text{• \\ttx-kc{A New Constant Keyword\}}    \ttx-kc{A New Constant Keyword}  \rem As a new Keyword.Constant
   \text{• \\ttx-nv{A New Variable Name\}}       \ttx-nv{A New Variable Name}     \rem An explicit Name.Variable
   \text{• \\ttx-k(∈ ∌)}   \ttx-k(∈ ∌)         \rem ∈ and ∌ as (ordinary) Keywords
   \text{• \\ttx-o<∈ ∌>}   \ttx-o<∈ ∌>         \rem ∈ and ∌ as (ordinary) Operators
     /*
      * The line below has ∈_∌ as (peculiar) function name.
      * Their params are automatic (i.e. a normal expression).
      */
   \text{• \\ttx-nf<∈_∌>(p1, p2)}                \ttx-nf<∈_∌>(p1, p2)
     /*
      * The line below has ∈_∌ as (peculiar) decorator name (as used in Python).
      * Their params are automatic (i.e. a normal expression).
      */
   \text{• \\ttx-nd[∈_∌](p1, p2)}                \ttx-nd[∈_∌](p1, p2)
     /*
      * This is a non-existing token type: you get some generic error marking
      * with a Generic.Error token and no expansion.
      */
   \text{• \\ttx-NON-EXISTING[∈_∌](p1, p2)}      \ttx-NON_EXISTING[∈_∌](p1, p2)

.. note:: Explicit token types are **case-sensitive**.


.. _customized-sphinx-lexers:

Customized Lexers in Sphinx
===========================

Defining lexers with non-default options in `Sphinx`_ can be done in its
configuration file :file:`conf.py`.

The first option is to apply the Sphinx config value ``highlight_options``
properly. An existing lexer can be customized by options.

A more flexible alternative is to define a new lexer in the Sphinx
application. The very same lexer class can be used with different options:

.. code-block:: python

   from functools import partial
   from pygments_lexer_pseudocode2.algpseudocode import AlgPseudocodeLexer

   def setup(app):

       #
       # Add a custom lexer: AlgPseudocodeLexer with custom init
       # option "no_end".
       #
       # In modern Sphinx versions given lexer must be callable and may
       # not be a lexer instance. So use an indirection with "partial"
       # here.
       #
       app.add_lexer("noend-algpseudocode",
                     partial(AlgPseudocodeLexer, no_end=True))

Similarily it works for custom styles and filters.

.. note:: Lexers in Sphinx are instantiated with the `raiseonerror` filter
          applied by default.
          This is also true for custom lexers that are added by
          :py:meth:`Sphinx.add_lexer`.

          Lexer *instances* that are added to
          :py:data:`sphinx.highlighting.lexers` somehow are taken as is by
          Sphinx and are not augmented with any default filters.

For older Sphinx versions your mileage may vary.