Mercurial > hgrepos > Python2 > PyMuPDF
comparison mupdf-source/thirdparty/tesseract/src/textord/cjkpitch.h @ 2:b50eed0cc0ef upstream
ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
The directory name has changed: no version number in the expanded directory now.
| author | Franz Glasner <fzglas.hg@dom66.de> |
|---|---|
| date | Mon, 15 Sep 2025 11:43:07 +0200 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| 1:1d09e1dec1d9 | 2:b50eed0cc0ef |
|---|---|
| 1 /////////////////////////////////////////////////////////////////////// | |
| 2 // File: cjkpitch.h | |
| 3 // Description: Code to determine fixed pitchness and the pitch if fixed, | |
| 4 // for CJK text. | |
| 5 // Copyright 2011 Google Inc. All Rights Reserved. | |
| 6 // Author: takenaka@google.com (Hiroshi Takenaka) | |
| 7 // Created: Mon Jun 27 12:48:35 JST 2011 | |
| 8 // | |
| 9 // Licensed under the Apache License, Version 2.0 (the "License"); | |
| 10 // you may not use this file except in compliance with the License. | |
| 11 // You may obtain a copy of the License at | |
| 12 // http://www.apache.org/licenses/LICENSE-2.0 | |
| 13 // Unless required by applicable law or agreed to in writing, software | |
| 14 // distributed under the License is distributed on an "AS IS" BASIS, | |
| 15 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
| 16 // See the License for the specific language governing permissions and | |
| 17 // limitations under the License. | |
| 18 // | |
| 19 /////////////////////////////////////////////////////////////////////// | |
| 20 #ifndef CJKPITCH_H_ | |
| 21 #define CJKPITCH_H_ | |
| 22 | |
| 23 #include "blobbox.h" | |
| 24 | |
| 25 namespace tesseract { | |
| 26 | |
| 27 // Function to test "fixed-pitchness" of the input text and estimating | |
| 28 // character pitch parameters for it, based on CJK fixed-pitch layout | |
| 29 // model. | |
| 30 // | |
| 31 // This function assumes that a fixed-pitch CJK text has following | |
| 32 // characteristics: | |
| 33 // | |
| 34 // - Most glyphs are designed to fit within the same sized square | |
| 35 // (imaginary body). Also they are aligned to the center of their | |
| 36 // imaginary bodies. | |
| 37 // - The imaginary body is always a regular rectangle. | |
| 38 // - There may be some extra space between character bodies | |
| 39 // (tracking). | |
| 40 // - There may be some extra space after punctuations. | |
| 41 // - The text is *not* space-delimited. Thus spaces are rare. | |
| 42 // - Character may consists of multiple unconnected blobs. | |
| 43 // | |
| 44 // And the function works in two passes. On pass 1, it looks for such | |
| 45 // "good" blobs that has the pitch same pitch on the both side and | |
| 46 // looks like a complete CJK character. Then estimates the character | |
| 47 // pitch for every row, based on those good blobs. If we couldn't find | |
| 48 // enough good blobs for a row, then the pitch is estimated from other | |
| 49 // rows with similar character height instead. | |
| 50 // | |
| 51 // Pass 2 is an iterative process to fit the blobs into fixed-pitch | |
| 52 // character cells. Once we have estimated the character pitch, blobs | |
| 53 // that are almost as large as the pitch can be considered to be | |
| 54 // complete characters. And once we know that some characters are | |
| 55 // complete characters, we can estimate the region occupied by its | |
| 56 // neighbors. And so on. | |
| 57 // | |
| 58 // We repeat the process until all ambiguities are resolved. Then make | |
| 59 // the final decision about fixed-pitchness of each row and compute | |
| 60 // pitch and spacing parameters. | |
| 61 // | |
| 62 // (If a row is considered to be proportional, pitch_decision for the | |
| 63 // row is set to PITCH_CORR_PROP and the later phase | |
| 64 // (i.e. Textord::to_spacing()) should determine its spacing | |
| 65 // parameters) | |
| 66 // | |
| 67 // This function doesn't provide all information required by | |
| 68 // fixed_pitch_words() and the rows need to be processed with | |
| 69 // make_prop_words() even if they are fixed pitched. | |
| 70 void compute_fixed_pitch_cjk(ICOORD page_tr, // top right | |
| 71 TO_BLOCK_LIST *port_blocks); // input list | |
| 72 | |
| 73 } // namespace tesseract | |
| 74 | |
| 75 #endif // CJKPITCH_H_ |
