comparison mupdf-source/thirdparty/tesseract/doc/classifier_tester.1.asc @ 2:b50eed0cc0ef upstream

ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4. The directory name has changed: no version number in the expanded directory now.
author Franz Glasner <fzglas.hg@dom66.de>
date Mon, 15 Sep 2025 11:43:07 +0200
parents
children
comparison
equal deleted inserted replaced
1:1d09e1dec1d9 2:b50eed0cc0ef
1 CLASSIFIER_TESTER(1)
2 ====================
3
4 NAME
5 ----
6 classifier_tester - for *legacy tesseract* engine.
7
8 SYNOPSIS
9 --------
10 *classifier_tester* -U 'unicharset_file' -F 'font_properties_file' -X 'xheights_file' -classifier 'x' -lang 'lang' [-output_trainer trainer] *.tr
11
12 DESCRIPTION
13 -----------
14 classifier_tester(1) runs Tesseract in a special mode.
15 It takes a list of .tr files and tests a character classifier
16 on data as formatted for training,
17 but it doesn't have to be the same as the training data.
18
19 IN/OUT ARGUMENTS
20 ----------------
21
22 a list of .tr files
23
24 OPTIONS
25 -------
26 -l 'lang'::
27 (Input) three character language code; default value 'eng'.
28
29 -classifier 'x'::
30 (Input) One of "pruner", "full".
31
32
33 -U 'unicharset'::
34 (Input) The unicharset for the language.
35
36 -F 'font_properties_file'::
37 (Input) font properties file, each line is of the following form, where each field other than the font name is 0 or 1:
38
39 *font_name* *italic* *bold* *fixed_pitch* *serif* *fraktur*
40
41 -X 'xheights_file'::
42 (Input) x heights file, each line is of the following form, where xheight is calculated as the pixel x height of a character drawn at 32pt on 300 dpi. [ That is, if base x height + ascenders + descenders = 133, how much is x height? ]
43
44 *font_name* *xheight*
45
46 -output_trainer 'trainer'::
47 (Output, Optional) Filename for output trainer.
48
49 SEE ALSO
50 --------
51 tesseract(1)
52
53 COPYING
54 -------
55 Copyright \(C) 2012 Google, Inc.
56 Licensed under the Apache License, Version 2.0
57
58 AUTHOR
59 ------
60 The Tesseract OCR engine was written by Ray Smith and his research groups
61 at Hewlett Packard (1985-1995) and Google (2006-2018).