Mercurial > hgrepos > Python2 > PyMuPDF
diff mupdf-source/thirdparty/tesseract/doc/classifier_tester.1.asc @ 2:b50eed0cc0ef upstream
ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
The directory name has changed: no version number in the expanded directory now.
| author | Franz Glasner <fzglas.hg@dom66.de> |
|---|---|
| date | Mon, 15 Sep 2025 11:43:07 +0200 |
| parents | |
| children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/mupdf-source/thirdparty/tesseract/doc/classifier_tester.1.asc Mon Sep 15 11:43:07 2025 +0200 @@ -0,0 +1,61 @@ +CLASSIFIER_TESTER(1) +==================== + +NAME +---- +classifier_tester - for *legacy tesseract* engine. + +SYNOPSIS +-------- +*classifier_tester* -U 'unicharset_file' -F 'font_properties_file' -X 'xheights_file' -classifier 'x' -lang 'lang' [-output_trainer trainer] *.tr + +DESCRIPTION +----------- +classifier_tester(1) runs Tesseract in a special mode. +It takes a list of .tr files and tests a character classifier +on data as formatted for training, +but it doesn't have to be the same as the training data. + +IN/OUT ARGUMENTS +---------------- + +a list of .tr files + +OPTIONS +------- +-l 'lang':: + (Input) three character language code; default value 'eng'. + +-classifier 'x':: + (Input) One of "pruner", "full". + + +-U 'unicharset':: + (Input) The unicharset for the language. + +-F 'font_properties_file':: + (Input) font properties file, each line is of the following form, where each field other than the font name is 0 or 1: + + *font_name* *italic* *bold* *fixed_pitch* *serif* *fraktur* + +-X 'xheights_file':: + (Input) x heights file, each line is of the following form, where xheight is calculated as the pixel x height of a character drawn at 32pt on 300 dpi. [ That is, if base x height + ascenders + descenders = 133, how much is x height? ] + + *font_name* *xheight* + +-output_trainer 'trainer':: + (Output, Optional) Filename for output trainer. + +SEE ALSO +-------- +tesseract(1) + +COPYING +------- +Copyright \(C) 2012 Google, Inc. +Licensed under the Apache License, Version 2.0 + +AUTHOR +------ +The Tesseract OCR engine was written by Ray Smith and his research groups +at Hewlett Packard (1985-1995) and Google (2006-2018).
