annotate mupdf-source/thirdparty/tesseract/doc/wordlist2dawg.1.asc @ 22:d77477b4e151

Let _int_rc() also handle (i.e. ignore) a local version suffix
author Franz Glasner <fzglas.hg@dom66.de>
date Fri, 19 Sep 2025 12:05:57 +0200
parents b50eed0cc0ef
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
2
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
1 WORDLIST2DAWG(1)
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
2 ================
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
3 :doctype: manpage
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
4
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
5 NAME
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
6 ----
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
7 wordlist2dawg - convert a wordlist to a DAWG for Tesseract
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
8
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
9 SYNOPSIS
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
10 --------
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
11 *wordlist2dawg* 'WORDLIST' 'DAWG' 'lang.unicharset'
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
12
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
13 *wordlist2dawg* -t 'WORDLIST' 'DAWG' 'lang.unicharset'
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
14
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
15 *wordlist2dawg* -r 1 'WORDLIST' 'DAWG' 'lang.unicharset'
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
16
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
17 *wordlist2dawg* -r 2 'WORDLIST' 'DAWG' 'lang.unicharset'
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
18
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
19 *wordlist2dawg* -l <short> <long> 'WORDLIST' 'DAWG' 'lang.unicharset'
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
20
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
21 DESCRIPTION
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
22 -----------
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
23 wordlist2dawg(1) converts a wordlist to a Directed Acyclic Word Graph
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
24 (DAWG) for use with Tesseract. A DAWG is a compressed, space and time
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
25 efficient representation of a word list.
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
26
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
27 OPTIONS
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
28 -------
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
29 -t
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
30 Verify that a given dawg file is equivalent to a given wordlist.
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
31
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
32 -r 1
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
33 Reverse a word if it contains an RTL character.
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
34
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
35 -r 2
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
36 Reverse all words.
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
37
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
38 -l <short> <long>
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
39 Produce a file with several dawgs in it, one each for words
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
40 of length <short>, <short+1>,... <long>
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
41
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
42 ARGUMENTS
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
43 ---------
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
44
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
45 'WORDLIST'
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
46 A plain text file in UTF-8, one word per line.
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
47
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
48 'DAWG'
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
49 The output DAWG to write.
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
50
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
51 'lang.unicharset'
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
52 The unicharset of the language. This is the unicharset
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
53 generated by mftraining(1).
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
54
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
55 SEE ALSO
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
56 --------
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
57 tesseract(1), combine_tessdata(1), dawg2wordlist(1)
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
58
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
59 <https://tesseract-ocr.github.io/tessdoc/Training-Tesseract.html>
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
60
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
61 COPYING
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
62 -------
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
63 Copyright \(C) 2006 Google, Inc.
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
64 Licensed under the Apache License, Version 2.0
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
65
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
66 AUTHOR
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
67 ------
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
68 The Tesseract OCR engine was written by Ray Smith and his research groups
b50eed0cc0ef ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
Franz Glasner <fzglas.hg@dom66.de>
parents:
diff changeset
69 at Hewlett Packard (1985-1995) and Google (2006-2018).