comparison docs/lexer-algpseudocode.rst @ 168:bff8b900713a

REFACTOR: All documentation pages refactored: merge intro and details for lexers and filters
author Franz Glasner <fzglas.hg@dom66.de>
date Mon, 11 May 2026 01:31:12 +0200
parents docs/details-algpseudocode.rst@88f872c50aae
children 3c517c22df9c
comparison
equal deleted inserted replaced
167:ddefcc20367c 168:bff8b900713a
1 .. -*- coding: utf-8; indent-tabs-mode: nil; -*-
2
3
4 *************************************
5 AlgPseudocode and Language Variants
6 *************************************
7
8 These lexers are heavily heavily inspired by CTAN’s `Algpseudocodex`_.
9 They recogzize expressions and additionally all sorts of comments and
10 commands that are inspired by `Algpseudocodex`_.
11
12 They may be used in `Sphinx`_ by their aliases:
13
14 .. code-block:: none
15
16 .. code-block:: algpseudocode
17
18 \PROGRAM {The Pseudoprogram} \IS
19
20 \END PROGRAM {The Pseudoprogram}
21
22 It will be rendered as:
23
24 .. code-block:: algpseudocode
25
26 \PROGRAM {The Pseudoprogram} \IS
27
28 \END PROGRAM {The Pseudoprogram}
29
30 And the same with the german variant
31 (using ``.. code-block:: algpseudocode-de`` as language alias):
32
33 .. code-block:: algpseudocode-de
34
35 \PROGRAM {The Pseudoprogram} \IS
36
37 \END PROGRAM {The Pseudoprogram}
38
39 The AlgPseudocode lexer and its language variants AlgPseudocodeDE and
40 AlgPseudocodeFR basically work in three states: `default`,
41 `expression` and `text`.
42
43 In expressions it automatically recognizes:
44
45 - Strings (single-quote, double-quote, triple-single-quote,
46 triple-double-quote, `Python`_ style)
47 - Numbers (also `Python`_ style)
48 - (Mathematical) operators and symbols
49 - ``\TEXT{...}``
50
51 To switch in a text-mode that prohibits automatic expression
52 highlighting.
53
54 A closing curly brace can be quoted with ``\}`` to not end the
55 text mode prematurely.
56
57 - ``\NAME``, ``\CALL`` and ``\GETS``
58
59 - ``\REM`` and ``\REMARK`` for remarks (aka comments)
60
61 - Names (`Name.Entity`)
62
63 - :ref:`explicit-token-types`
64
65 In the default-mode it recogzizes expressions and additionally all
66 sorts of comments and commands that look somewhat like `Algpseudocodex`_
67 commands.
68
69 In texts it recogzizes:
70
71 - ``\EXPR`` or ``\EXPRESSION``
72
73 To switch to expression-mode.
74
75 A closing curly brace can be quoted with ``\}`` to not end the expression
76 mode prematurely.
77
78 - ``\REM`` and ``\REMARK`` for remarks (aka comments)
79
80 - :ref:`explicit-token-types`
81
82
83 .. rubric:: Some Examples
84
85 A synthetic example with many features:
86
87 .. literalinclude:: examples/example-1.pseudocode
88 :language: algpseudocode
89 :lines: 2-
90
91 With a customized `AlgPseudocodeLexer` and its `no_end`
92 option set to ``True``.
93
94 .. literalinclude:: examples/example-1.pseudocode
95 :language: NoEndAlgPseudocode
96 :lines: 2-
97
98 This is Wikipedia's description of *Dinic's Algorithm*
99 (see https://en.wikipedia.org/wiki/Dinic%27s_algorithm):
100
101 .. literalinclude:: examples/algorithm-dinic.description
102 :language: algpseudocode
103 :lines: 2-
104
105 This is Wikipedia's pseudocode of the *Ford–Fulkerson Algorithm*
106 (see https://en.wikipedia.org/wiki/Ford%E2%80%93Fulkerson_algorithm):
107
108 .. literalinclude:: examples/algorithm-ford-fulkerson.pseudocode
109 :language: algpseudocode
110 :lines: 2-
111
112 This is Wikipedia's pseudocode of the *Edmonds–Karp Algorithm*
113 (see https://en.wikipedia.org/wiki/Edmonds%E2%80%93Karp_algorithm)
114 with a custom lexer that skip all ``ENDxxx`` keywords:
115
116 .. literalinclude:: examples/algorithm-edmonds-karp.pseudocode
117 :language: NoEndAlgPseudocode
118 :lines: 2-
119
120 And now the *Edmonds–Karp Algorithm* with french keywords:
121
122 .. literalinclude:: examples/algorithm-edmonds-karp.pseudocode
123 :language: algpseudocode-fr
124 :lines: 2-
125
126 And again the *Edmonds–Karp Algorithm* with german keywords:
127
128 .. literalinclude:: examples/algorithm-edmonds-karp.pseudocode
129 :language: algpseudocode-de
130 :lines: 2-
131
132 More details you will find :ref:`here <details-algpseudocode>`.
133
134
135 .. _details-algpseudocode:
136
137 Lexer Options
138 =============
139
140 .. describe:: prohibit_raiseonerror_filter
141
142 **Type:** :py:class:`bool`
143
144 **Default:** `False`
145
146 If ``True`` the `raiseonerror` filter is not allowed to be applied by
147 `Sphinx`_ when :py:meth:`Lexer.add_filter` is called.
148
149 This setting does not apply to filters that are set by the standard
150 lexer option `filters`.
151
152 .. describe:: no_end
153
154 **Type:** :py:class:`bool`
155
156 **Default:** `False`
157
158 If ``True`` all the ``\ENDxxx`` commands will be skipped and yield
159 nothing.
160
161 .. describe:: gets
162
163 **Type:** :py:class:`str` or :py:obj:`None`
164
165 **Default:** `None` (yields ``←``)
166
167 The operator symbol to be printed by the command ``\GETS``.
168
169 An often used alternative is ``:=``.
170
171 .. describe:: remark
172
173 **Type:** :py:class:`str` or :py:obj:`None`
174
175 **Default:** `None` (yields ``▷``)
176
177 The symbol to be printed as when starting comments with
178 ``\REMARK`` or ``\REM``.
179
180 To use a lexer with non-default options in `Sphinx`_ see section
181 :ref:`customized-sphinx-lexers`.
182
183
184 Comments
185 ========
186
187 - with the ``\REMARK`` or ``\REM`` keywords (this includes a leading symbol)
188 - multi-line comments with ``/* ... */``; they can be **nested**
189 - multi-line comments with ``(* ... *)``; they can be **nested**
190 - single-line comments with ``//`` or ``#`` (until the end of the line)
191
192 .. code-block:: algpseudocode
193
194 /*
195 * A single multiline comment
196 */
197
198 /*
199 * A multiline comment
200 *
201 * /* This is a nested multi-line comment */
202 *
203 */
204
205 (*
206 * A multiline comment
207 *
208 * (* This is a nested multi-line comment *)
209 *
210 *)
211
212 // A single-line comment
213
214 # A single-line comment
215
216 \REM A remark has a leading symbol
217
218
219 Literals
220 ========
221
222 Strings and numbers as in `Python`_. String prefixes ``r``, ``f`` and ``t``
223 are not supported -- ``u`` and ``b`` are.
224
225 To yield non-string-delimiting single- and double-quotes you have to escape them
226 using ``\'`` or ``\"``. This must be used to typeset something as
227 :algpseudocode:`f\\'(x) = 0`.
228
229 .. code-block:: algpseudocode
230
231 0 0xdead 0b100001 0o720 2.7 2.7e-54
232
233 "A string with an escaped double-quote \" "
234
235 'Another string with an escaped single-quote \' '
236
237 """A multiline
238 string
239 """
240
241 '''Another multiline string
242
243 '''
244
245 b"A \x20 byte string"
246
247 u'An explicit Unicode \u1234 string'
248
249 \" a non string
250
251 \' a non string also
252
253
254 (Mathematical) Symbols and Operators
255 ====================================
256
257 Some ASCII symbol combinations are recognized and replaced by a
258 Unicode symbol:
259
260 .. code-block:: algpseudocode
261
262 \TEXT{<=>} <=>
263 \TEXT{<->} <->
264 \TEXT{<-} <-
265 \TEXT{->} ->
266 \TEXT{=>} =>
267 \TEXT{<=} <=
268 \TEXT{>=} >=
269 \TEXT{<>} <>
270 \TEXT{!=} !=
271 \TEXT{:=} :=
272 \TEXT{=:} =:
273 \TEXT{?=} ?=
274
275 Unicode codepoints with property ``Sm`` are recognized as mathematical symbol
276 and highlighted accordingly.
277
278
279 Punctuation
280 ===========
281
282 Runs of dots ``.``, ``..``, ``...``, ``....``, ... are handled
283 properly in expressions and yield a punctuation token.
284 They are not replaced by corresponding Unicode symbols.
285
286
287 Keywords
288 ========
289
290 Explicit Keywords
291 -----------------
292
293 - Start with a backslash character ``\``
294 - Case-insensitive
295 - Translated if a translation is found
296
297 Parameter handling is as follows:
298
299 - Parameters are enclosed in curly braces ``{`` and ``}``
300 - Escaping within the braces is possible using the backslash ``\``
301 - Parameters are separated from the keyword/command by a (possibly empty) run
302 of space or TAB characters.
303 This is true for required and optional parameters.
304
305 .. todo:: Escaping
306
307 A single backslash is a Generic.Error token
308
309
310 With Required Parameters
311 ~~~~~~~~~~~~~~~~~~~~~~~~
312
313 .. code-block:: algpseudocode
314
315 \TEXT{\PROGRAM {A Program\} or \PROG {A Program\}} \PROGRAM {A Program}
316 \TEXT{\ALGORITHM{An Algorithm\} or \ALGO{An Algorithm\}} \ALGORITHM{An Algorithm}
317 \TEXT{\PROCEDURE{A Procedure\} or \PROC{A Procedure\}} \PROCEDURE{A Procedure}
318 \TEXT{\FUNCTION{A Function\} or \FUNC{A Function\} or \FN{A Function\}} \FUNCTION{A Function}
319 \TEXT{\CLASS{A Class\}} \CLASS{A Class}
320
321 \TEXT{\STATEMENT{the expression\} \STATE{the expression\} \BLOCK{the expression\}} \STATEMENT{the expression}
322
323 \TEXT{expr1: \\EXPRESSION{expression a in b\} expr2: \\EXPR{expression b in a\}} \TEXT{expr1: \EXPRESSION{expression a in b} expr2: \EXPR{expression b in a}}
324
325 \TEXT{\TEXTSTATEMENT{the text\} \TEXTSTATE{the text\} \TSTATEMENT{the text\} \TSTATE{the text\} \TEXTBLOCK{the text\} \TBLOCK{the text\}} \TEXTSTATEMENT{the text}
326
327 \TEXT{\INPUT{Input 1\}} \INPUT{Input 1}
328 \TEXT{\INPUTS{Input 2\}} \INPUTS{Input 2}
329
330 \TEXT{\OUTPUT{Output 1\}} \OUTPUT{Output 1}
331 \TEXT{\OUTPUTS{Output 2\}} \OUTPUTS{Output 2}
332
333 \TEXT{\ENSURE{Whatever should be ensured!\}} \ENSURE{Whatever should be ensured!}
334
335 \TEXT{\REQUIRE{Whatever should be required.\}} \REQUIRE{Whatever should be required.}
336
337 \TEXT{\RETURNS{Return 2\}} \RETURNS{Return 2}
338
339 \TEXT{\CALL{a function\}(p1, p2)} \CALL{a function}(p1, p2)
340
341 \TEXT{\NAME{an entity name\}} \NAME{an entity name}
342
343
344 With Optional Parameters
345 ~~~~~~~~~~~~~~~~~~~~~~~~
346
347 Some ``END``-keywords have optional parameters:
348
349 .. code-block:: algpseudocode
350
351 \TEXT{\ENDPROGRAM \ENDPROG} \ENDPROGRAM
352 \TEXT{\ENDALGORITHM \ENDALGO} \ENDALGORITHM
353 \TEXT{\ENDPROCEDURE \ENDPROC} \ENDPROCEDURE
354 \TEXT{\ENDFUNCTION \ENDFUNC \ENDFN} \ENDFUNCTION
355 \TEXT{\ENDCLASS} \ENDCLASS
356
357 They are used like this:
358
359 .. code-block:: algpseudocode
360
361 \TEXT{\CLASS{Foo Bar Class\} ... \END CLASS {Foo Bar Class\}} \TEXT{yields} \CLASS{Foo Bar Class} ... \END CLASS {Foo Bar Class}
362
363 .. seealso:: Syntax variants: `END-Keywords`_
364
365
366 Without Parameters
367 ~~~~~~~~~~~~~~~~~~
368
369 "Normal" Keywords
370 '''''''''''''''''
371
372 .. code-block:: algpseudocode
373
374 \TEXT{\IF} \IF
375 \TEXT{\THEN} \THEN
376 \TEXT{\ELSE} \ELSE
377 \TEXT{\ELSEIF or \ELSIF or \ELIF} \ELSEIF \text{or} \ELSIF \text{or} \ELIF
378 \TEXT{\DO} \DO
379 \TEXT{\WHILE} \WHILE
380 \TEXT{\FORALL} \FORALL
381 \TEXT{\FOR} \FOR
382 \TEXT{\FROM} \FROM
383 \TEXT{\TO} \TO
384 \TEXT{\STEP} \STEP
385 \TEXT{\IN} \IN
386 \TEXT{\LOOP} \LOOP
387 \TEXT{\REPEAT} \REPEAT
388 \TEXT{\UNTIL} \UNTIL
389
390 \TEXT{\RETURN} \RETURN
391
392 \TEXT{\BEGIN} \BEGIN
393 \TEXT{\END} \END
394
395 \TEXT{\IS} \IS
396 \TEXT{\WITH} \WITH
397
398 \TEXT{\GETS} \GETS
399
400 \TEXT{\\REMARK or \\REM} \REMARK A comment with a leading symbol
401
402 ``\REMARK`` or ``\REM`` is special: all characters to the end of the
403 line are taken as comment; curly braces are not needed---in fact:
404 they are interpreted to be part of the comment.
405
406
407 END-Keywords
408 ''''''''''''
409
410 The separator character can be empty, a run of ASCII spaces, a run of TAB characters,
411 a single underscore ``_`` or a single hyphen ``-`` like:
412
413 ``\ENDIF``, ``\END IF``, ``\END-IF``, ``\END_IF`` or ``\END IF``
414
415
416 .. code-block:: algpseudocode
417
418 \text{\ENDIF} \ENDIF \rem empty
419
420 \text{\END IF} \END IF \rem a single space
421
422 \text{\END IF} \END IF \rem two spaces
423
424 \text{\END-IF} \END-IF \rem a single hyphen
425
426 \text{\END_IF} \END_IF \rem a single underscore
427
428 \text{\END IF} \END IF \rem a single TAB character
429
430 The list of END-keywords (here always just with ``-`` as separator):
431
432 .. code-block:: algpseudocode
433
434 \text{\END-PROGRAM \END-PROG} \END-PROGRAM
435 \text{\END-ALGORITHM \END-ALGO} \END-ALGORITHM
436 \text{\END-PROCEDURE \END-PROC} \END-PROCEDURE
437 \text{\END-FUNCTION \END-FUNC \END-FN} \END-FUNCTION
438 \text{\END-CLASS} \END-CLASS
439 \text{\END-IF} \END-IF
440 \text{\END-WHILE} \END-WHILE
441 \text{\END-FOR} \END-FOR
442 \text{\END-FORALL} \END-FORALL
443 \text{\END-LOOP} \END-LOOP
444
445
446 Names and Entities
447 ==================
448
449 In an expression context all other words are interpreted as entity
450 names (token type :py:class:`pygments.token.Token.Name.Entity`).
451
452 Allowed characters in the words follow the corresponding `Python`_ rules.
453 As such, many Unicode characters are allowed.
454
455 To highlight entity names with whitespace or other "special" characters in it
456 use the ``NAME`` command.
457
458 .. code-block:: algpseudocode
459
460 \TEXT{entity_name_1} entity_name_1
461
462 \TEXT{entity_name_2} entity_name_2
463
464 \TEXT{\NAME{entity-name 3\}} \NAME{entity-name 3}
465
466 \TEXT{München} München
467
468 \TEXT{Genève} Genève
469
470 .. _explicit-token-types:
471
472 Explicit Token Types
473 ====================
474
475 Handle keywords and operators that are not handled by default or change
476 the default handling of some expressions.
477
478 `XX` represents a `value` in the :py:data:`pygments.token.STANDARD_TYPES`
479 dict.
480 Its corresponding token type (the associated `key` in this `dict`) is
481 used as token type.
482
483 ``\\tt-XX/SINGLE-CHAR``
484
485 no escaping needed
486
487 `SINGLE-CHAR` is a single character and can be *every* character
488 (including a carriage-return or line-feed)
489
490 ``\\ttx-XX{CHARACTERS}``
491
492 ``\\ttx-XX(CHARACTERS)``
493
494 ``\\ttx-XX[CHARACTERS]``
495
496 ``\\ttx-XX<CHARACTERS>``
497
498 ``\\ttx-XX<SEP>CHARACTERS<SEP>``
499
500 No escaping possible! There are enough alternatives available!
501
502 `SEP` is one of ``/:|=*+!\$~``.
503
504
505 Examples:
506
507 .. code-block:: algpseudocode
508
509 \text{• \\tt-kc/C} \tt-kc/C \rem C as Keyword.Constant
510 \text{• \\tt-ow/∈} \tt-ow/∈ \rem ∈ as Operator.Word
511 \text{• \\ttx-kc{A New Constant Keyword\}} \ttx-kc{A New Constant Keyword} \rem As a new Keyword.Constant
512 \text{• \\ttx-nv{A New Variable Name\}} \ttx-nv{A New Variable Name} \rem An explicit Name.Variable
513 \text{• \\ttx-k(∈ ∌)} \ttx-k(∈ ∌) \rem ∈ and ∌ as (ordinary) Keywords
514 \text{• \\ttx-o<∈ ∌>} \ttx-o<∈ ∌> \rem ∈ and ∌ as (ordinary) Operators
515 /*
516 * The line below has ∈_∌ as (peculiar) function name.
517 * Their params are automatic (i.e. a normal expression).
518 */
519 \text{• \\ttx-nf<∈_∌>(p1, p2)} \ttx-nf<∈_∌>(p1, p2)
520 /*
521 * The line below has ∈_∌ as (peculiar) decorator name (as used in Python).
522 * Their params are automatic (i.e. a normal expression).
523 */
524 \text{• \\ttx-nd[∈_∌](p1, p2)} \ttx-nd[∈_∌](p1, p2)
525 /*
526 * This is a non-existing token type: you get some generic error marking
527 * with a Generic.Error token and no expansion.
528 */
529 \text{• \\ttx-NON-EXISTING[∈_∌](p1, p2)} \ttx-NON_EXISTING[∈_∌](p1, p2)
530
531 .. note:: Explicit token types are **case-sensitive**.
532
533
534 .. _customized-sphinx-lexers:
535
536 Customized Lexers in Sphinx
537 ===========================
538
539 Defining lexers with non-default options in `Sphinx`_ can be done in its
540 configuration file :file:`conf.py`.
541
542 The first option is to apply the Sphinx config value ``highlight_options``
543 properly. An existing lexer can be customized by options.
544
545 A more flexible alternative is to define a new lexer in the Sphinx
546 application. The very same lexer class can be used with different options:
547
548 .. code-block:: python
549
550 from functools import partial
551 from pygments_lexer_pseudocode2.lexers.algpseudocode import AlgPseudocodeLexer
552
553 def setup(app):
554
555 #
556 # Add a custom lexer: AlgPseudocodeLexer with custom init
557 # option "no_end".
558 #
559 # In modern Sphinx versions given lexer must be callable and may
560 # not be a lexer instance. So use an indirection with "partial"
561 # here.
562 #
563 app.add_lexer("noend-algpseudocode",
564 partial(AlgPseudocodeLexer, no_end=True))
565
566 Similarily it works for custom styles and filters.
567
568 .. note:: Lexers in Sphinx are instantiated with the `raiseonerror` filter
569 applied by default.
570 This is also true for custom lexers that are added by
571 :py:meth:`Sphinx.add_lexer`.
572
573 Lexer *instances* that are added to
574 :py:data:`sphinx.highlighting.lexers` somehow are taken as is by
575 Sphinx and are not augmented with any default filters.
576
577 For older Sphinx versions your mileage may vary.