Mercurial > hgrepos > Python2 > PyMuPDF
comparison mupdf-source/thirdparty/leptonica/src/recogdid.c @ 2:b50eed0cc0ef upstream
ADD: MuPDF v1.26.7: the MuPDF source as downloaded by a default build of PyMuPDF 1.26.4.
The directory name has changed: no version number in the expanded directory now.
| author | Franz Glasner <fzglas.hg@dom66.de> |
|---|---|
| date | Mon, 15 Sep 2025 11:43:07 +0200 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| 1:1d09e1dec1d9 | 2:b50eed0cc0ef |
|---|---|
| 1 /*====================================================================* | |
| 2 - Copyright (C) 2001 Leptonica. All rights reserved. | |
| 3 - | |
| 4 - Redistribution and use in source and binary forms, with or without | |
| 5 - modification, are permitted provided that the following conditions | |
| 6 - are met: | |
| 7 - 1. Redistributions of source code must retain the above copyright | |
| 8 - notice, this list of conditions and the following disclaimer. | |
| 9 - 2. Redistributions in binary form must reproduce the above | |
| 10 - copyright notice, this list of conditions and the following | |
| 11 - disclaimer in the documentation and/or other materials | |
| 12 - provided with the distribution. | |
| 13 - | |
| 14 - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS | |
| 15 - ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT | |
| 16 - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR | |
| 17 - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL ANY | |
| 18 - CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, | |
| 19 - EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, | |
| 20 - PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR | |
| 21 - PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY | |
| 22 - OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING | |
| 23 - NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS | |
| 24 - SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | |
| 25 *====================================================================*/ | |
| 26 | |
| 27 /*! | |
| 28 * \file recogdid.c | |
| 29 * <pre> | |
| 30 * | |
| 31 * Top-level identification | |
| 32 * BOXA *recogDecode() | |
| 33 * | |
| 34 * Generate decoding arrays | |
| 35 * static l_int32 recogPrepareForDecoding() | |
| 36 * static l_int32 recogMakeDecodingArray() | |
| 37 * | |
| 38 * Dynamic programming for best path | |
| 39 * static l_int32 recogRunViterbi() | |
| 40 * static l_int32 recogRescoreDidResult() | |
| 41 * static PIX *recogShowPath() | |
| 42 * | |
| 43 * Create/destroy temporary DID data | |
| 44 * l_int32 recogCreateDid() | |
| 45 * l_int32 recogDestroyDid() | |
| 46 * | |
| 47 * Various helpers | |
| 48 * l_int32 recogDidExists() | |
| 49 * L_RDID *recogGetDid() | |
| 50 * static l_int32 recogGetWindowedArea() | |
| 51 * l_int32 recogSetChannelParams() | |
| 52 * static l_int32 recogTransferRchToDid() | |
| 53 * | |
| 54 * See recogbasic.c for examples of training a recognizer, which is | |
| 55 * required before it can be used for document image decoding. | |
| 56 * | |
| 57 * Gary Kopec pioneered this hidden markov approach to "Document Image | |
| 58 * Decoding" (DID) in the early 1990s. It is based on estimation | |
| 59 * using a generative model of the image generation process, and | |
| 60 * provides the most likely decoding of an image if the model is correct. | |
| 61 * Given the model, it finds the maximum a posteriori (MAP) "message" | |
| 62 * given the observed image. The model describes how to generate | |
| 63 * an image from a message, and the MAP message is derived from the | |
| 64 * observed image using Bayes' theorem. This approach can also be used | |
| 65 * to build the model, using the iterative expectation/maximization | |
| 66 * method from labeled but errorful data. | |
| 67 * | |
| 68 * In a little more detail: The model comprises three things: the ideal | |
| 69 * printed character templates, the independent bit-flip noise model, and | |
| 70 * the character setwidths. When a character is printed, the setwidth | |
| 71 * is the distance in pixels that you move forward before being able | |
| 72 * to print the next character. It is typically slightly less than the | |
| 73 * width of the character template: if too small, an extra character can be | |
| 74 * hallucinated; if too large, it will not be able to match the next | |
| 75 * character template on the line. The model assumes that the probabilities | |
| 76 * of bit flip depend only on the assignment of the pixel to background | |
| 77 * or template foreground. The multilevel templates have different | |
| 78 * bit flip probabilities for each level. Because a character image | |
| 79 * is composed of many pixels, each of which can be independently flipped, | |
| 80 * the actual probability of seeing any rendering is exceedingly small, | |
| 81 * being composed of the product of the probabilities for each pixel. | |
| 82 * The log likelihood is used both to avoid numeric underflow and, | |
| 83 * more importantly, because it results in a summation of independent | |
| 84 * pixel probabilities. That summation can be shown, in Kopec's | |
| 85 * original paper, to consist of a sum of two terms: (a) the number of | |
| 86 * fg pixels in the bit-and of the observed image with the ideal | |
| 87 * template and (b) the number of fg pixels in the template. Each | |
| 88 * has a coefficient that depends only on the bit-flip probabilities | |
| 89 * for the fg and bg. A beautiful result, and computationally simple! | |
| 90 * One nice feature of this approach is that the result of the decoding | |
| 91 * is not very sensitive to the values used for the bit flip probabilities. | |
| 92 * | |
| 93 * The procedure for finding the best decoding (MAP) for a given image goes | |
| 94 * under several names: Viterbi, dynamic programming, hidden markov model. | |
| 95 * It is called a "hidden markov model" because the templates are assumed | |
| 96 * to be printed serially and we don't know what they are -- the identity | |
| 97 * of the templates must be inferred from the observed image. | |
| 98 * The possible decodings form a dense trellis over the pixel positions, | |
| 99 * where at each pixel position you have the possibility of having any | |
| 100 * of the characters printed there (with some reference point) or having | |
| 101 * a single pixel wide space inserted there. Thus, before the trellis | |
| 102 * can be traversed, we must do the work of finding the log probability, | |
| 103 * at each pixel location, that each of the templates was printed there. | |
| 104 * Armed with those arrays of data, the dynamic programming procedure | |
| 105 * moves from left to right, one pixel at a time, recursively finding | |
| 106 * the path with the highest log probability that gets to that pixel | |
| 107 * position (and noting which template was printed to arrive there). | |
| 108 * After reaching the right side of the image, we can simply backtrack | |
| 109 * along the path, jumping over each template that lies on the highest | |
| 110 * scoring path. This best path thus only goes through a few of the | |
| 111 * pixel positions. | |
| 112 * | |
| 113 * There are two refinements to the original Kopec paper. In the first, | |
| 114 * one uses multiple, non-overlapping fg templates, each with its own | |
| 115 * bit flip probability. This makes sense, because the probability | |
| 116 * that a fg boundary pixel flips to bg is greater than that of a fg | |
| 117 * pixel not on the boundary. And the flip probability of a fg boundary | |
| 118 * pixel is smaller than that of a bg boundary pixel, which in turn | |
| 119 * is greater than that of a bg pixel not on a boundary (the latter | |
| 120 * is taken to be the true background). Then the simplest realistic | |
| 121 * multiple template model has three templates that are not background. | |
| 122 * | |
| 123 * In the second refinement, a heuristic (strict upper bound) is used | |
| 124 * iteratively in the Viterbi process to compute the log probabilities. | |
| 125 * Using the heuristic, you find the best path, and then score all nodes | |
| 126 * on that path with the actual probability, which is guaranteed to | |
| 127 * be a smaller number. You run this iteratively, rescoring just the best | |
| 128 * found path each time. After each rescoring, the path may change because | |
| 129 * the local scores have been reduced. However, the process converges | |
| 130 * rapidly, and when it doesn't change, it must be the best path because | |
| 131 * it is properly scored (even if neighboring paths are heuristically | |
| 132 * scored). The heuristic score is found column-wise by assuming | |
| 133 * that all the fg pixels in the template are on fg pixels in the image -- | |
| 134 * we just take the minimum of the number of pixels in the template | |
| 135 * and image column. This can easily give a 10-fold reduction in | |
| 136 * computation because the heuristic score can be computed much faster | |
| 137 * than the exact score. | |
| 138 * | |
| 139 * For reference, the classic paper on the approach by Kopec is: | |
| 140 * * "Document Image Decoding Using Markov Source Models", IEEE Trans. | |
| 141 * PAMI, Vol 16, No. 6, June 1994, pp 602-617. | |
| 142 * A refinement of the method for multilevel templates by Kopec is: | |
| 143 * * "Multilevel Character Templates for Document Image Decoding", | |
| 144 * Proc. SPIE 3027, Document Recognition IV, p. 168ff, 1997. | |
| 145 * Further refinements for more efficient decoding are given in these | |
| 146 * two papers, which are both stored on leptonica.org: | |
| 147 * * "Document Image Decoding using Iterated Complete Path Search", Minka, | |
| 148 * Bloomberg and Popat, Proc. SPIE Vol 4307, p. 250-258, Document | |
| 149 * Recognition and Retrieval VIII, San Jose, CA 2001. | |
| 150 * * "Document Image Decoding using Iterated Complete Path Search with | |
| 151 * Subsampled Heuristic Scoring", Bloomberg, Minka and Popat, ICDAR 2001, | |
| 152 * p. 344-349, Sept. 2001, Seattle. | |
| 153 * </pre> | |
| 154 */ | |
| 155 | |
| 156 #ifdef HAVE_CONFIG_H | |
| 157 #include <config_auto.h> | |
| 158 #endif /* HAVE_CONFIG_H */ | |
| 159 | |
| 160 #include <string.h> | |
| 161 #include <math.h> | |
| 162 #include "allheaders.h" | |
| 163 | |
| 164 static l_int32 recogPrepareForDecoding(L_RECOG *recog, PIX *pixs, | |
| 165 l_int32 debug); | |
| 166 static l_int32 recogMakeDecodingArray(L_RECOG *recog, l_int32 index, | |
| 167 l_int32 debug); | |
| 168 static l_int32 recogRunViterbi(L_RECOG *recog, PIX **ppixdb); | |
| 169 static l_int32 recogRescoreDidResult(L_RECOG *recog, PIX **ppixdb); | |
| 170 static PIX *recogShowPath(L_RECOG *recog, l_int32 select); | |
| 171 static l_int32 recogGetWindowedArea(L_RECOG *recog, l_int32 index, | |
| 172 l_int32 x, l_int32 *pdely, l_int32 *pwsum); | |
| 173 static l_int32 recogTransferRchToDid(L_RECOG *recog, l_int32 x, l_int32 y); | |
| 174 | |
| 175 /* Parameters for modeling the decoding */ | |
| 176 static const l_float32 SetwidthFraction = 0.95f; | |
| 177 static const l_int32 MaxYShift = 1; | |
| 178 | |
| 179 /* Channel parameters. alpha[0] is the probability that a bg pixel | |
| 180 * is OFF. alpha[1] is the probability that level 1 fg is ON. | |
| 181 * The actual values are not too critical, but they must be larger | |
| 182 * than 0.5 and smaller than 1.0. For more accuracy in template | |
| 183 * matching, use a 4-level template, where levels 2 and 3 are | |
| 184 * boundary pixels in the fg and bg, respectively. */ | |
| 185 static const l_float32 DefaultAlpha2[] = {0.95f, 0.9f}; | |
| 186 static const l_float32 DefaultAlpha4[] = {0.95f, 0.9f, 0.75f, 0.25f}; | |
| 187 | |
| 188 | |
| 189 /*------------------------------------------------------------------------* | |
| 190 * Top-level identification * | |
| 191 *------------------------------------------------------------------------*/ | |
| 192 /*! | |
| 193 * \brief recogDecode() | |
| 194 * | |
| 195 * \param[in] recog with LUT's pre-computed | |
| 196 * \param[in] pixs typically of multiple touching characters, 1 bpp | |
| 197 * \param[in] nlevels of templates; 2 for now | |
| 198 * \param[out] ppixdb [optional] debug result; can be null | |
| 199 * \return boxa segmentation of pixs into characters, or NULL on error | |
| 200 * | |
| 201 * <pre> | |
| 202 * Notes: | |
| 203 * (1) The input pixs has been filtered so that it is likely to be | |
| 204 * composed of more than one touching character. Specifically, | |
| 205 * its height can only slightly exceed that of the tallest | |
| 206 * unscaled template, the width is somewhat larger than the | |
| 207 * width of the widest unscaled template, and the w/h aspect ratio | |
| 208 * is bounded by max_wh_ratio. | |
| 209 * (2) This uses the DID mechanism with labeled templates to | |
| 210 * segment the input %pixs. The resulting segmentation is | |
| 211 * returned. (It is given by did->boxa). | |
| 212 * (3) In debug mode, the Viterbi path is rescored based on all | |
| 213 * the templates. In non-debug mode, the same procedure is | |
| 214 * carried out by recogIdentifyPix() on the result of the | |
| 215 * segmentation. | |
| 216 * </pre> | |
| 217 */ | |
| 218 BOXA * | |
| 219 recogDecode(L_RECOG *recog, | |
| 220 PIX *pixs, | |
| 221 l_int32 nlevels, | |
| 222 PIX **ppixdb) | |
| 223 { | |
| 224 l_int32 debug; | |
| 225 PIX *pix1; | |
| 226 PIXA *pixa; | |
| 227 | |
| 228 if (ppixdb) *ppixdb = NULL; | |
| 229 if (!recog) | |
| 230 return (BOXA *)ERROR_PTR("recog not defined", __func__, NULL); | |
| 231 if (!pixs || pixGetDepth(pixs) != 1) | |
| 232 return (BOXA *)ERROR_PTR("pixs undefined or not 1 bpp", __func__, NULL); | |
| 233 if (!recog->train_done) | |
| 234 return (BOXA *)ERROR_PTR("training not finished", __func__, NULL); | |
| 235 if (nlevels != 2) | |
| 236 return (BOXA *)ERROR_PTR("nlevels != 2 (for now)", __func__, NULL); | |
| 237 | |
| 238 debug = (ppixdb) ? 1 : 0; | |
| 239 if (recogPrepareForDecoding(recog, pixs, debug)) | |
| 240 return (BOXA *)ERROR_PTR("error making arrays", __func__, NULL); | |
| 241 recogSetChannelParams(recog, nlevels); | |
| 242 | |
| 243 /* Normal path; just run Viterbi */ | |
| 244 if (!debug) { | |
| 245 if (recogRunViterbi(recog, NULL) == 0) | |
| 246 return boxaCopy(recog->did->boxa, L_COPY); | |
| 247 else | |
| 248 return (BOXA *)ERROR_PTR("error in Viterbi", __func__, NULL); | |
| 249 } | |
| 250 | |
| 251 /* Debug path */ | |
| 252 if (recogRunViterbi(recog, &pix1)) | |
| 253 return (BOXA *)ERROR_PTR("error in viterbi", __func__, NULL); | |
| 254 pixa = pixaCreate(2); | |
| 255 pixaAddPix(pixa, pix1, L_INSERT); | |
| 256 if (recogRescoreDidResult(recog, &pix1)) { | |
| 257 pixaDestroy(&pixa); | |
| 258 return (BOXA *)ERROR_PTR("error in rescoring", __func__, NULL); | |
| 259 } | |
| 260 pixaAddPix(pixa, pix1, L_INSERT); | |
| 261 *ppixdb = pixaDisplayTiledInRows(pixa, 32, 2 * pixGetWidth(pix1) + 100, | |
| 262 1.0, 0, 30, 2); | |
| 263 pixaDestroy(&pixa); | |
| 264 return boxaCopy(recog->did->boxa, L_COPY); | |
| 265 } | |
| 266 | |
| 267 | |
| 268 /*------------------------------------------------------------------------* | |
| 269 * Generate decoding arrays * | |
| 270 *------------------------------------------------------------------------*/ | |
| 271 /*! | |
| 272 * \brief recogPrepareForDecoding() | |
| 273 * | |
| 274 * \param[in] recog with LUT's pre-computed | |
| 275 * \param[in] pixs typically of multiple touching characters, 1 bpp | |
| 276 * \param[in] debug 1 for debug output; 0 otherwise | |
| 277 * \return 0 if OK, 1 on error | |
| 278 * | |
| 279 * <pre> | |
| 280 * Notes: | |
| 281 * (1) Binarizes and crops input %pixs. | |
| 282 * (2) Removes previous L_RDID struct and makes a new one. | |
| 283 * (3) Generates the bit-and sum arrays for each character template | |
| 284 * at each pixel position in %pixs. These are used in the | |
| 285 * Viterbi dynamic programming step. | |
| 286 * (4) The values are saved in the scoring arrays at the left edge | |
| 287 * of the template. They are used in the Viterbi process | |
| 288 * at the setwidth position (which is near the RHS of the template | |
| 289 * as it is positioned on pixs) in the generated trellis. | |
| 290 * </pre> | |
| 291 */ | |
| 292 static l_int32 | |
| 293 recogPrepareForDecoding(L_RECOG *recog, | |
| 294 PIX *pixs, | |
| 295 l_int32 debug) | |
| 296 { | |
| 297 l_int32 i, ret; | |
| 298 PIX *pix1; | |
| 299 L_RDID *did; | |
| 300 | |
| 301 if (!recog) | |
| 302 return ERROR_INT("recog not defined", __func__, 1); | |
| 303 if (!pixs || pixGetDepth(pixs) != 1) | |
| 304 return ERROR_INT("pixs not defined or not 1 bpp", __func__, 1); | |
| 305 if (!recog->train_done) | |
| 306 return ERROR_INT("training not finished", __func__, 1); | |
| 307 | |
| 308 if (!recog->ave_done) { | |
| 309 ret = recogAverageSamples(recog, 0); | |
| 310 if (!ret) | |
| 311 return ERROR_INT("averaging of samples failed", __func__, 1); | |
| 312 } | |
| 313 | |
| 314 /* Binarize and crop to foreground if necessary */ | |
| 315 if ((pix1 = recogProcessToIdentify(recog, pixs, 0)) == NULL) | |
| 316 return ERROR_INT("pix1 not made", __func__, 1); | |
| 317 | |
| 318 /* Remove any existing RecogDID and set up a new one */ | |
| 319 recogDestroyDid(recog); | |
| 320 if (recogCreateDid(recog, pix1)) { | |
| 321 pixDestroy(&pix1); | |
| 322 return ERROR_INT("decoder not made", __func__, 1); | |
| 323 } | |
| 324 | |
| 325 /* Compute vertical sum and first moment arrays */ | |
| 326 did = recogGetDid(recog); /* owned by recog */ | |
| 327 did->nasum = pixCountPixelsByColumn(pix1); | |
| 328 did->namoment = pixGetMomentByColumn(pix1, 1); | |
| 329 | |
| 330 /* Generate the arrays */ | |
| 331 for (i = 0; i < recog->did->narray; i++) | |
| 332 recogMakeDecodingArray(recog, i, debug); | |
| 333 | |
| 334 pixDestroy(&pix1); | |
| 335 return 0; | |
| 336 } | |
| 337 | |
| 338 | |
| 339 /*! | |
| 340 * \brief recogMakeDecodingArray() | |
| 341 * | |
| 342 * \param[in] recog | |
| 343 * \param[in] index of averaged template | |
| 344 * \param[in] debug 1 for debug output; 0 otherwise | |
| 345 * \return 0 if OK, 1 on error | |
| 346 * | |
| 347 * <pre> | |
| 348 * Notes: | |
| 349 * (1) Generates the bit-and sum array for a character template along pixs. | |
| 350 * (2) The values are saved in the scoring arrays at the left edge | |
| 351 * of the template as it is positioned on pixs. | |
| 352 * </pre> | |
| 353 */ | |
| 354 static l_int32 | |
| 355 recogMakeDecodingArray(L_RECOG *recog, | |
| 356 l_int32 index, | |
| 357 l_int32 debug) | |
| 358 { | |
| 359 l_int32 i, j, w1, h1, w2, h2, nx, ycent2, count, maxcount, maxdely; | |
| 360 l_int32 sum, moment, dely, shifty; | |
| 361 l_int32 *counta, *delya, *ycent1, *arraysum, *arraymoment, *sumtab; | |
| 362 NUMA *nasum, *namoment; | |
| 363 PIX *pix1, *pix2, *pix3; | |
| 364 L_RDID *did; | |
| 365 | |
| 366 if (!recog) | |
| 367 return ERROR_INT("recog not defined", __func__, 1); | |
| 368 if ((did = recogGetDid(recog)) == NULL) | |
| 369 return ERROR_INT("did not defined", __func__, 1); | |
| 370 if (index < 0 || index >= did->narray) | |
| 371 return ERROR_INT("invalid index", __func__, 1); | |
| 372 | |
| 373 /* Check that pix1 is large enough for this template. */ | |
| 374 pix1 = did->pixs; /* owned by did; do not destroy */ | |
| 375 pixGetDimensions(pix1, &w1, &h1, NULL); | |
| 376 pix2 = pixaGetPix(recog->pixa_u, index, L_CLONE); | |
| 377 pixGetDimensions(pix2, &w2, &h2, NULL); | |
| 378 if (w1 < w2) { | |
| 379 L_INFO("w1 = %d < w2 = %d for index %d\n", __func__, w1, w2, index); | |
| 380 pixDestroy(&pix2); | |
| 381 return 0; | |
| 382 } | |
| 383 | |
| 384 nasum = did->nasum; | |
| 385 namoment = did->namoment; | |
| 386 ptaGetIPt(recog->pta_u, index, NULL, &ycent2); | |
| 387 sumtab = recog->sumtab; | |
| 388 counta = did->counta[index]; | |
| 389 delya = did->delya[index]; | |
| 390 | |
| 391 /* Set up the array for ycent1. This gives the y-centroid location | |
| 392 * for a window of width w2, starting at location i. */ | |
| 393 nx = w1 - w2 + 1; /* number of positions w2 can be placed in w1 */ | |
| 394 ycent1 = (l_int32 *)LEPT_CALLOC(nx, sizeof(l_int32)); | |
| 395 arraysum = numaGetIArray(nasum); | |
| 396 arraymoment = numaGetIArray(namoment); | |
| 397 for (i = 0, sum = 0, moment = 0; i < w2; i++) { | |
| 398 sum += arraysum[i]; | |
| 399 moment += arraymoment[i]; | |
| 400 } | |
| 401 for (i = 0; i < nx - 1; i++) { | |
| 402 ycent1[i] = (sum == 0) ? ycent2 : (l_float32)moment / (l_float32)sum; | |
| 403 sum += arraysum[w2 + i] - arraysum[i]; | |
| 404 moment += arraymoment[w2 + i] - arraymoment[i]; | |
| 405 } | |
| 406 ycent1[nx - 1] = (sum == 0) ? ycent2 : (l_float32)moment / (l_float32)sum; | |
| 407 | |
| 408 /* Compute the bit-and sum between the template pix2 and pix1, at | |
| 409 * locations where the left side of pix2 goes from 0 to nx - 1 | |
| 410 * in pix1. Do this around the vertical alignment of the pix2 | |
| 411 * centroid and the windowed pix1 centroid. | |
| 412 * (1) Start with pix3 cleared and approximately equal in size to pix1. | |
| 413 * (2) Blit the y-shifted pix2 onto pix3. Then all ON pixels | |
| 414 * are within the intersection of pix1 and the shifted pix2. | |
| 415 * (3) AND pix1 with pix3. */ | |
| 416 pix3 = pixCreate(w2, h1, 1); | |
| 417 for (i = 0; i < nx; i++) { | |
| 418 shifty = (l_int32)(ycent1[i] - ycent2 + 0.5); | |
| 419 maxcount = 0; | |
| 420 maxdely = 0; | |
| 421 for (j = -MaxYShift; j <= MaxYShift; j++) { | |
| 422 pixClearAll(pix3); | |
| 423 dely = shifty + j; /* amount pix2 is shifted relative to pix1 */ | |
| 424 pixRasterop(pix3, 0, dely, w2, h2, PIX_SRC, pix2, 0, 0); | |
| 425 pixRasterop(pix3, 0, 0, w2, h1, PIX_SRC & PIX_DST, pix1, i, 0); | |
| 426 pixCountPixels(pix3, &count, sumtab); | |
| 427 if (count > maxcount) { | |
| 428 maxcount = count; | |
| 429 maxdely = dely; | |
| 430 } | |
| 431 } | |
| 432 counta[i] = maxcount; | |
| 433 delya[i] = maxdely; | |
| 434 } | |
| 435 did->fullarrays = TRUE; | |
| 436 | |
| 437 pixDestroy(&pix2); | |
| 438 pixDestroy(&pix3); | |
| 439 LEPT_FREE(ycent1); | |
| 440 LEPT_FREE(arraysum); | |
| 441 LEPT_FREE(arraymoment); | |
| 442 return 0; | |
| 443 } | |
| 444 | |
| 445 | |
| 446 /*------------------------------------------------------------------------* | |
| 447 * Dynamic programming for best path | |
| 448 *------------------------------------------------------------------------*/ | |
| 449 /*! | |
| 450 * \brief recogRunViterbi() | |
| 451 * | |
| 452 * \param[in] recog with LUT's pre-computed | |
| 453 * \param[out] ppixdb [optional] debug result; can be null | |
| 454 * \return 0 if OK, 1 on error | |
| 455 * | |
| 456 * <pre> | |
| 457 * Notes: | |
| 458 * (1) This can be used when the templates are unscaled. It works by | |
| 459 * matching the average, unscaled templates of each class to | |
| 460 * all positions. | |
| 461 * (2) It is recursive, in that | |
| 462 * (a) we compute the score successively at all pixel positions x, | |
| 463 * (b) to compute the score at x in the trellis, for each | |
| 464 * template we look backwards to (x - setwidth) to get the | |
| 465 * score if that template were to be printed with its | |
| 466 * setwidth location at x. We save at x the template and | |
| 467 * score that maximizes the sum of the score at (x - setwidth) | |
| 468 * and the log-likelihood for the template to be printed with | |
| 469 * its LHS there. | |
| 470 * (3) The primary output is a boxa of the locations for splitting | |
| 471 * the input image. These locations are used later to split the | |
| 472 * image and send the pieces individually for recognition. | |
| 473 * This can be done in either recogIdentifyMultiple(), or | |
| 474 * for debugging in recogRescoreDidResult(). | |
| 475 * </pre> | |
| 476 */ | |
| 477 static l_int32 | |
| 478 recogRunViterbi(L_RECOG *recog, | |
| 479 PIX **ppixdb) | |
| 480 { | |
| 481 l_int32 i, w1, w2, h1, xnz, x, narray, minsetw; | |
| 482 l_int32 first, templ, xloc, dely, counts, area1; | |
| 483 l_int32 besttempl, spacetempl; | |
| 484 l_int32 *setw, *didtempl; | |
| 485 l_int32 *area2; /* must be freed */ | |
| 486 l_float32 prevscore, matchscore, maxscore, correl; | |
| 487 l_float32 *didscore; | |
| 488 BOX *box; | |
| 489 PIX *pix1; | |
| 490 L_RDID *did; | |
| 491 | |
| 492 if (ppixdb) *ppixdb = NULL; | |
| 493 if (!recog) | |
| 494 return ERROR_INT("recog not defined", __func__, 1); | |
| 495 if ((did = recogGetDid(recog)) == NULL) | |
| 496 return ERROR_INT("did not defined", __func__, 1); | |
| 497 if (did->fullarrays == 0) | |
| 498 return ERROR_INT("did full arrays not made", __func__, 1); | |
| 499 | |
| 500 /* Compute the minimum setwidth. Bad templates with very small | |
| 501 * width can cause havoc because the setwidth is too small. */ | |
| 502 w1 = did->size; | |
| 503 narray = did->narray; | |
| 504 spacetempl = narray; | |
| 505 setw = did->setwidth; | |
| 506 minsetw = 100000; | |
| 507 for (i = 0; i < narray; i++) { | |
| 508 if (setw[i] < minsetw) | |
| 509 minsetw = setw[i]; | |
| 510 } | |
| 511 if (minsetw <= 2) | |
| 512 return ERROR_INT("minsetw <= 2; bad templates", __func__, 1); | |
| 513 | |
| 514 /* The score array is initialized to 0.0. As we proceed to | |
| 515 * the left, the log likelihood for the partial paths goes | |
| 516 * negative, and we prune for the max (least negative) path. | |
| 517 * No matches will be computed until we reach x = min(setwidth); | |
| 518 * until then first == TRUE after looping over templates. */ | |
| 519 didscore = did->trellisscore; | |
| 520 didtempl = did->trellistempl; | |
| 521 area2 = numaGetIArray(recog->nasum_u); | |
| 522 besttempl = 0; /* just tells compiler it is initialized */ | |
| 523 maxscore = 0.0; /* ditto */ | |
| 524 for (x = minsetw; x < w1; x++) { /* will always get a score */ | |
| 525 first = TRUE; | |
| 526 for (i = 0; i < narray; i++) { | |
| 527 if (x - setw[i] < 0) continue; | |
| 528 matchscore = didscore[x - setw[i]] + | |
| 529 did->gamma[1] * did->counta[i][x - setw[i]] + | |
| 530 did->beta[1] * area2[i]; | |
| 531 if (first) { | |
| 532 maxscore = matchscore; | |
| 533 besttempl = i; | |
| 534 first = FALSE; | |
| 535 } else { | |
| 536 if (matchscore > maxscore) { | |
| 537 maxscore = matchscore; | |
| 538 besttempl = i; | |
| 539 } | |
| 540 } | |
| 541 } | |
| 542 | |
| 543 /* We can also put down a single pixel space, with no cost | |
| 544 * because all pixels are bg. */ | |
| 545 prevscore = didscore[x - 1]; | |
| 546 if (prevscore > maxscore) { /* 1 pixel space is best */ | |
| 547 maxscore = prevscore; | |
| 548 besttempl = spacetempl; | |
| 549 } | |
| 550 didscore[x] = maxscore; | |
| 551 didtempl[x] = besttempl; | |
| 552 } | |
| 553 | |
| 554 /* Backtrack to get the best path. | |
| 555 * Skip over (i.e., ignore) all single pixel spaces. */ | |
| 556 for (x = w1 - 1; x >= 0; x--) { | |
| 557 if (didtempl[x] != spacetempl) break; | |
| 558 } | |
| 559 h1 = pixGetHeight(did->pixs); | |
| 560 while (x > 0) { | |
| 561 if (didtempl[x] == spacetempl) { /* skip over spaces */ | |
| 562 x--; | |
| 563 continue; | |
| 564 } | |
| 565 templ = didtempl[x]; | |
| 566 xloc = x - setw[templ]; | |
| 567 if (xloc < 0) break; | |
| 568 counts = did->counta[templ][xloc]; /* bit-and counts */ | |
| 569 recogGetWindowedArea(recog, templ, xloc, &dely, &area1); | |
| 570 correl = ((l_float32)(counts) * counts) / | |
| 571 (l_float32)(area2[templ] * area1); | |
| 572 pix1 = pixaGetPix(recog->pixa_u, templ, L_CLONE); | |
| 573 w2 = pixGetWidth(pix1); | |
| 574 numaAddNumber(did->natempl, templ); | |
| 575 numaAddNumber(did->naxloc, xloc); | |
| 576 numaAddNumber(did->nadely, dely); | |
| 577 numaAddNumber(did->nawidth, pixGetWidth(pix1)); | |
| 578 numaAddNumber(did->nascore, correl); | |
| 579 xnz = L_MAX(xloc, 0); | |
| 580 box = boxCreate(xnz, dely, w2, h1); | |
| 581 boxaAddBox(did->boxa, box, L_INSERT); | |
| 582 pixDestroy(&pix1); | |
| 583 x = xloc; | |
| 584 } | |
| 585 | |
| 586 if (ppixdb) { | |
| 587 numaWriteStderr(did->natempl); | |
| 588 numaWriteStderr(did->naxloc); | |
| 589 numaWriteStderr(did->nadely); | |
| 590 numaWriteStderr(did->nawidth); | |
| 591 numaWriteStderr(did->nascore); | |
| 592 boxaWriteStderr(did->boxa); | |
| 593 *ppixdb = recogShowPath(recog, 0); | |
| 594 } | |
| 595 | |
| 596 LEPT_FREE(area2); | |
| 597 return 0; | |
| 598 } | |
| 599 | |
| 600 | |
| 601 /*! | |
| 602 * \brief recogRescoreDidResult() | |
| 603 * | |
| 604 * \param[in] recog with LUT's pre-computed | |
| 605 * \param[out] ppixdb [optional] debug result; can be null | |
| 606 * \return 0 if OK, 1 on error | |
| 607 * | |
| 608 * <pre> | |
| 609 * Notes: | |
| 610 * (1) This does correlation matching with all unscaled templates, | |
| 611 * using the character segmentation determined by the Viterbi path. | |
| 612 * </pre> | |
| 613 */ | |
| 614 static l_int32 | |
| 615 recogRescoreDidResult(L_RECOG *recog, | |
| 616 PIX **ppixdb) | |
| 617 { | |
| 618 l_int32 i, n, sample, x, dely, index; | |
| 619 char *text = NULL; | |
| 620 l_float32 score; | |
| 621 BOX *box1; | |
| 622 PIX *pixs, *pix1; | |
| 623 L_RDID *did; | |
| 624 | |
| 625 if (ppixdb) *ppixdb = NULL; | |
| 626 if (!recog) | |
| 627 return ERROR_INT("recog not defined", __func__, 1); | |
| 628 if ((did = recogGetDid(recog)) == NULL) | |
| 629 return ERROR_INT("did not defined", __func__, 1); | |
| 630 if (did->fullarrays == 0) | |
| 631 return ERROR_INT("did full arrays not made", __func__, 1); | |
| 632 if ((n = numaGetCount(did->naxloc)) == 0) | |
| 633 return ERROR_INT("no elements in path", __func__, 1); | |
| 634 | |
| 635 pixs = did->pixs; | |
| 636 for (i = 0; i < n; i++) { | |
| 637 box1 = boxaGetBox(did->boxa, i, L_COPY); | |
| 638 boxGetGeometry(box1, &x, &dely, NULL, NULL); | |
| 639 pix1 = pixClipRectangle(pixs, box1, NULL); | |
| 640 recogIdentifyPix(recog, pix1, NULL); | |
| 641 recogTransferRchToDid(recog, x, dely); | |
| 642 if (ppixdb) { | |
| 643 rchExtract(recog->rch, &index, &score, &text, | |
| 644 &sample, NULL, NULL, NULL); | |
| 645 lept_stderr("text = %s, index = %d, sample = %d," | |
| 646 " score = %5.3f\n", text, index, sample, score); | |
| 647 } | |
| 648 pixDestroy(&pix1); | |
| 649 boxDestroy(&box1); | |
| 650 LEPT_FREE(text); | |
| 651 } | |
| 652 | |
| 653 if (ppixdb) | |
| 654 *ppixdb = recogShowPath(recog, 1); | |
| 655 | |
| 656 return 0; | |
| 657 } | |
| 658 | |
| 659 | |
| 660 /*! | |
| 661 * \brief recogShowPath() | |
| 662 * | |
| 663 * \param[in] recog with LUT's pre-computed | |
| 664 * \param[in] select 0 for Viterbi; 1 for rescored | |
| 665 * \return pix debug output), or NULL on error | |
| 666 */ | |
| 667 static PIX * | |
| 668 recogShowPath(L_RECOG *recog, | |
| 669 l_int32 select) | |
| 670 { | |
| 671 char textstr[16]; | |
| 672 l_int32 i, j, n, index, xloc, dely; | |
| 673 l_float32 score; | |
| 674 L_BMF *bmf; | |
| 675 NUMA *natempl_s, *nasample_s = NULL, *nascore_s, *naxloc_s, *nadely_s; | |
| 676 PIX *pixs, *pix0, *pix1, *pix2, *pix3, *pix4, *pix5; | |
| 677 L_RDID *did; | |
| 678 | |
| 679 if (!recog) | |
| 680 return (PIX *)ERROR_PTR("recog not defined", __func__, NULL); | |
| 681 if ((did = recogGetDid(recog)) == NULL) | |
| 682 return (PIX *)ERROR_PTR("did not defined", __func__, NULL); | |
| 683 | |
| 684 bmf = bmfCreate(NULL, 8); | |
| 685 pixs = pixScale(did->pixs, 4.0, 4.0); | |
| 686 pix0 = pixAddBorderGeneral(pixs, 0, 0, 0, 40, 0); | |
| 687 pix1 = pixConvertTo32(pix0); | |
| 688 if (select == 0) { /* Viterbi */ | |
| 689 natempl_s = did->natempl; | |
| 690 nascore_s = did->nascore; | |
| 691 naxloc_s = did->naxloc; | |
| 692 nadely_s = did->nadely; | |
| 693 } else { /* rescored */ | |
| 694 natempl_s = did->natempl_r; | |
| 695 nasample_s = did->nasample_r; | |
| 696 nascore_s = did->nascore_r; | |
| 697 naxloc_s = did->naxloc_r; | |
| 698 nadely_s = did->nadely_r; | |
| 699 } | |
| 700 | |
| 701 n = numaGetCount(natempl_s); | |
| 702 for (i = 0; i < n; i++) { | |
| 703 numaGetIValue(natempl_s, i, &index); | |
| 704 if (select == 0) { | |
| 705 pix2 = pixaGetPix(recog->pixa_u, index, L_CLONE); | |
| 706 } else { | |
| 707 numaGetIValue(nasample_s, i, &j); | |
| 708 pix2 = pixaaGetPix(recog->pixaa_u, index, j, L_CLONE); | |
| 709 } | |
| 710 pix3 = pixScale(pix2, 4.0, 4.0); | |
| 711 pix4 = pixErodeBrick(NULL, pix3, 5, 5); | |
| 712 pixXor(pix4, pix4, pix3); | |
| 713 numaGetFValue(nascore_s, i, &score); | |
| 714 snprintf(textstr, sizeof(textstr), "%5.3f", score); | |
| 715 pix5 = pixAddTextlines(pix4, bmf, textstr, 1, L_ADD_BELOW); | |
| 716 numaGetIValue(naxloc_s, i, &xloc); | |
| 717 numaGetIValue(nadely_s, i, &dely); | |
| 718 pixPaintThroughMask(pix1, pix5, 4 * xloc, 4 * dely, 0xff000000); | |
| 719 pixDestroy(&pix2); | |
| 720 pixDestroy(&pix3); | |
| 721 pixDestroy(&pix4); | |
| 722 pixDestroy(&pix5); | |
| 723 } | |
| 724 pixDestroy(&pixs); | |
| 725 pixDestroy(&pix0); | |
| 726 bmfDestroy(&bmf); | |
| 727 return pix1; | |
| 728 } | |
| 729 | |
| 730 | |
| 731 /*------------------------------------------------------------------------* | |
| 732 * Create/destroy temporary DID data * | |
| 733 *------------------------------------------------------------------------*/ | |
| 734 /*! | |
| 735 * \brief recogCreateDid() | |
| 736 * | |
| 737 * \param[in] recog | |
| 738 * \param[in] pixs of 1 bpp image to match | |
| 739 * \return 0 if OK, 1 on error | |
| 740 */ | |
| 741 l_ok | |
| 742 recogCreateDid(L_RECOG *recog, | |
| 743 PIX *pixs) | |
| 744 { | |
| 745 l_int32 i; | |
| 746 PIX *pix1; | |
| 747 L_RDID *did; | |
| 748 | |
| 749 if (!recog) | |
| 750 return ERROR_INT("recog not defined", __func__, 1); | |
| 751 if (!pixs) | |
| 752 return ERROR_INT("pixs not defined", __func__, 1); | |
| 753 | |
| 754 recogDestroyDid(recog); | |
| 755 | |
| 756 did = (L_RDID *)LEPT_CALLOC(1, sizeof(L_RDID)); | |
| 757 recog->did = did; | |
| 758 did->pixs = pixClone(pixs); | |
| 759 did->narray = recog->setsize; | |
| 760 did->size = pixGetWidth(pixs); | |
| 761 did->natempl = numaCreate(5); | |
| 762 did->naxloc = numaCreate(5); | |
| 763 did->nadely = numaCreate(5); | |
| 764 did->nawidth = numaCreate(5); | |
| 765 did->boxa = boxaCreate(5); | |
| 766 did->nascore = numaCreate(5); | |
| 767 did->natempl_r = numaCreate(5); | |
| 768 did->nasample_r = numaCreate(5); | |
| 769 did->naxloc_r = numaCreate(5); | |
| 770 did->nadely_r = numaCreate(5); | |
| 771 did->nawidth_r = numaCreate(5); | |
| 772 did->nascore_r = numaCreate(5); | |
| 773 | |
| 774 /* Make the arrays */ | |
| 775 did->setwidth = (l_int32 *)LEPT_CALLOC(did->narray, sizeof(l_int32)); | |
| 776 did->counta = (l_int32 **)LEPT_CALLOC(did->narray, sizeof(l_int32 *)); | |
| 777 did->delya = (l_int32 **)LEPT_CALLOC(did->narray, sizeof(l_int32 *)); | |
| 778 did->beta = (l_float32 *)LEPT_CALLOC(5, sizeof(l_float32)); | |
| 779 did->gamma = (l_float32 *)LEPT_CALLOC(5, sizeof(l_float32)); | |
| 780 did->trellisscore = (l_float32 *)LEPT_CALLOC(did->size, sizeof(l_float32)); | |
| 781 did->trellistempl = (l_int32 *)LEPT_CALLOC(did->size, sizeof(l_int32)); | |
| 782 for (i = 0; i < did->narray; i++) { | |
| 783 did->counta[i] = (l_int32 *)LEPT_CALLOC(did->size, sizeof(l_int32)); | |
| 784 did->delya[i] = (l_int32 *)LEPT_CALLOC(did->size, sizeof(l_int32)); | |
| 785 } | |
| 786 | |
| 787 /* Populate the setwidth array */ | |
| 788 for (i = 0; i < did->narray; i++) { | |
| 789 pix1 = pixaGetPix(recog->pixa_u, i, L_CLONE); | |
| 790 did->setwidth[i] = (l_int32)(SetwidthFraction * pixGetWidth(pix1)); | |
| 791 pixDestroy(&pix1); | |
| 792 } | |
| 793 | |
| 794 return 0; | |
| 795 } | |
| 796 | |
| 797 | |
| 798 /*! | |
| 799 * \brief recogDestroyDid() | |
| 800 * | |
| 801 * \param[in] recog | |
| 802 * \return 0 if OK, 1 on error | |
| 803 * | |
| 804 * <pre> | |
| 805 * Notes: | |
| 806 * (1) As the signature indicates, this is owned by the recog, and can | |
| 807 * only be destroyed using this function. | |
| 808 * </pre> | |
| 809 */ | |
| 810 l_ok | |
| 811 recogDestroyDid(L_RECOG *recog) | |
| 812 { | |
| 813 l_int32 i; | |
| 814 L_RDID *did; | |
| 815 | |
| 816 if (!recog) | |
| 817 return ERROR_INT("recog not defined", __func__, 1); | |
| 818 | |
| 819 if ((did = recog->did) == NULL) return 0; | |
| 820 if (!did->counta || !did->delya) | |
| 821 return ERROR_INT("ptr array is null; shouldn't happen!", __func__, 1); | |
| 822 | |
| 823 for (i = 0; i < did->narray; i++) { | |
| 824 LEPT_FREE(did->counta[i]); | |
| 825 LEPT_FREE(did->delya[i]); | |
| 826 } | |
| 827 LEPT_FREE(did->setwidth); | |
| 828 LEPT_FREE(did->counta); | |
| 829 LEPT_FREE(did->delya); | |
| 830 LEPT_FREE(did->beta); | |
| 831 LEPT_FREE(did->gamma); | |
| 832 LEPT_FREE(did->trellisscore); | |
| 833 LEPT_FREE(did->trellistempl); | |
| 834 pixDestroy(&did->pixs); | |
| 835 numaDestroy(&did->nasum); | |
| 836 numaDestroy(&did->namoment); | |
| 837 numaDestroy(&did->natempl); | |
| 838 numaDestroy(&did->naxloc); | |
| 839 numaDestroy(&did->nadely); | |
| 840 numaDestroy(&did->nawidth); | |
| 841 boxaDestroy(&did->boxa); | |
| 842 numaDestroy(&did->nascore); | |
| 843 numaDestroy(&did->natempl_r); | |
| 844 numaDestroy(&did->nasample_r); | |
| 845 numaDestroy(&did->naxloc_r); | |
| 846 numaDestroy(&did->nadely_r); | |
| 847 numaDestroy(&did->nawidth_r); | |
| 848 numaDestroy(&did->nascore_r); | |
| 849 LEPT_FREE(did); | |
| 850 recog->did = NULL; | |
| 851 return 0; | |
| 852 } | |
| 853 | |
| 854 | |
| 855 /*------------------------------------------------------------------------* | |
| 856 * Various helpers * | |
| 857 *------------------------------------------------------------------------*/ | |
| 858 /*! | |
| 859 * \brief recogDidExists() | |
| 860 * | |
| 861 * \param[in] recog | |
| 862 * \return 1 if recog->did exists; 0 if not or on error. | |
| 863 */ | |
| 864 l_int32 | |
| 865 recogDidExists(L_RECOG *recog) | |
| 866 { | |
| 867 if (!recog) | |
| 868 return ERROR_INT("recog not defined", __func__, 0); | |
| 869 return (recog->did) ? 1 : 0; | |
| 870 } | |
| 871 | |
| 872 | |
| 873 /*! | |
| 874 * \brief recogGetDid() | |
| 875 * | |
| 876 * \param[in] recog | |
| 877 * \return did still owned by the recog, or NULL on error | |
| 878 * | |
| 879 * <pre> | |
| 880 * Notes: | |
| 881 * (1) This also makes sure the arrays are defined. | |
| 882 * </pre> | |
| 883 */ | |
| 884 L_RDID * | |
| 885 recogGetDid(L_RECOG *recog) | |
| 886 { | |
| 887 l_int32 i; | |
| 888 L_RDID *did; | |
| 889 | |
| 890 if (!recog) | |
| 891 return (L_RDID *)ERROR_PTR("recog not defined", __func__, NULL); | |
| 892 if ((did = recog->did) == NULL) | |
| 893 return (L_RDID *)ERROR_PTR("did not defined", __func__, NULL); | |
| 894 if (!did->counta || !did->delya) | |
| 895 return (L_RDID *)ERROR_PTR("did array ptrs not defined", | |
| 896 __func__, NULL); | |
| 897 for (i = 0; i < did->narray; i++) { | |
| 898 if (!did->counta[i] || !did->delya[i]) | |
| 899 return (L_RDID *)ERROR_PTR("did arrays not defined", | |
| 900 __func__, NULL); | |
| 901 } | |
| 902 | |
| 903 return did; | |
| 904 } | |
| 905 | |
| 906 | |
| 907 /*! | |
| 908 * \brief recogGetWindowedArea() | |
| 909 * | |
| 910 * \param[in] recog | |
| 911 * \param[in] index of template | |
| 912 * \param[in] x pixel position of left hand edge of template | |
| 913 * \param[out] pdely y shift of template relative to pix1 | |
| 914 * \param[out] pwsum number of fg pixels in window of pixs | |
| 915 * \return 0 if OK, 1 on error | |
| 916 * | |
| 917 * <pre> | |
| 918 * Notes: | |
| 919 * (1) This is called after the best path has been found through | |
| 920 * the trellis, in order to produce a correlation that can be used | |
| 921 * to evaluate the confidence we have in the identification. | |
| 922 * The correlation is |1 & 2|^2 / (|1| * |2|). | |
| 923 * |1 & 2| is given by the count array, |2| is found from | |
| 924 * nasum_u[], and |1| is wsum returned from this function. | |
| 925 * </pre> | |
| 926 */ | |
| 927 static l_int32 | |
| 928 recogGetWindowedArea(L_RECOG *recog, | |
| 929 l_int32 index, | |
| 930 l_int32 x, | |
| 931 l_int32 *pdely, | |
| 932 l_int32 *pwsum) | |
| 933 { | |
| 934 l_int32 w1, h1, w2, h2; | |
| 935 PIX *pix1, *pix2, *pixt; | |
| 936 L_RDID *did; | |
| 937 | |
| 938 if (pdely) *pdely = 0; | |
| 939 if (pwsum) *pwsum = 0; | |
| 940 if (!pdely || !pwsum) | |
| 941 return ERROR_INT("&dely and &wsum not both defined", __func__, 1); | |
| 942 if (!recog) | |
| 943 return ERROR_INT("recog not defined", __func__, 1); | |
| 944 if ((did = recogGetDid(recog)) == NULL) | |
| 945 return ERROR_INT("did not defined", __func__, 1); | |
| 946 if (index < 0 || index >= did->narray) | |
| 947 return ERROR_INT("invalid index", __func__, 1); | |
| 948 pix1 = did->pixs; | |
| 949 pixGetDimensions(pix1, &w1, &h1, NULL); | |
| 950 if (x >= w1) | |
| 951 return ERROR_INT("invalid x position", __func__, 1); | |
| 952 | |
| 953 pix2 = pixaGetPix(recog->pixa_u, index, L_CLONE); | |
| 954 pixGetDimensions(pix2, &w2, &h2, NULL); | |
| 955 if (w1 < w2) { | |
| 956 L_INFO("template %d too small\n", __func__, index); | |
| 957 pixDestroy(&pix2); | |
| 958 return 0; | |
| 959 } | |
| 960 | |
| 961 *pdely = did->delya[index][x]; | |
| 962 pixt = pixCreate(w2, h1, 1); | |
| 963 pixRasterop(pixt, 0, *pdely, w2, h2, PIX_SRC, pix2, 0, 0); | |
| 964 pixRasterop(pixt, 0, 0, w2, h1, PIX_SRC & PIX_DST, pix1, x, 0); | |
| 965 pixCountPixels(pixt, pwsum, recog->sumtab); | |
| 966 pixDestroy(&pix2); | |
| 967 pixDestroy(&pixt); | |
| 968 return 0; | |
| 969 } | |
| 970 | |
| 971 | |
| 972 /*! | |
| 973 * \brief recogSetChannelParams() | |
| 974 * | |
| 975 * \param[in] recog | |
| 976 * \param[in] nlevels | |
| 977 * \return 0 if OK, 1 on error | |
| 978 * | |
| 979 * <pre> | |
| 980 * Notes: | |
| 981 * (1) This converts the independent bit-flip probabilities in the | |
| 982 * "channel" into log-likelihood coefficients on image sums. | |
| 983 * These coefficients are only defined for the non-background | |
| 984 * template levels. Thus for nlevels = 2 (one fg, one bg), | |
| 985 * only beta[1] and gamma[1] are used. For nlevels = 4 (three | |
| 986 * fg templates), we use beta[1-3] and gamma[1-3]. | |
| 987 * </pre> | |
| 988 */ | |
| 989 l_ok | |
| 990 recogSetChannelParams(L_RECOG *recog, | |
| 991 l_int32 nlevels) | |
| 992 { | |
| 993 l_int32 i; | |
| 994 const l_float32 *da; | |
| 995 L_RDID *did; | |
| 996 | |
| 997 if (!recog) | |
| 998 return ERROR_INT("recog not defined", __func__, 1); | |
| 999 if ((did = recogGetDid(recog)) == NULL) | |
| 1000 return ERROR_INT("did not defined", __func__, 1); | |
| 1001 if (nlevels == 2) | |
| 1002 da = DefaultAlpha2; | |
| 1003 else if (nlevels == 4) | |
| 1004 da = DefaultAlpha4; | |
| 1005 else | |
| 1006 return ERROR_INT("nlevels not 2 or 4", __func__, 1); | |
| 1007 | |
| 1008 for (i = 1; i < nlevels; i++) { | |
| 1009 did->beta[i] = log((1.0 - da[i]) / da[0]); | |
| 1010 did->gamma[i] = log(da[0] * da[i] / ((1.0 - da[0]) * (1.0 - da[i]))); | |
| 1011 /* lept_stderr("beta[%d] = %7.3f, gamma[%d] = %7.3f\n", | |
| 1012 i, did->beta[i], i, did->gamma[i]); */ | |
| 1013 } | |
| 1014 | |
| 1015 return 0; | |
| 1016 } | |
| 1017 | |
| 1018 | |
| 1019 /*! | |
| 1020 * \brief recogTransferRchToDid() | |
| 1021 * | |
| 1022 * \param[in] recog with rch and did defined | |
| 1023 * \param[in] x left edge of extracted region, relative to decoded line | |
| 1024 * \param[in] y top edge of extracted region, relative to input image | |
| 1025 * \return 0 if OK, 1 on error | |
| 1026 * | |
| 1027 * <pre> | |
| 1028 * Notes: | |
| 1029 * (1) This is used to transfer the results for a single character match | |
| 1030 * to the rescored did arrays. | |
| 1031 * </pre> | |
| 1032 */ | |
| 1033 static l_int32 | |
| 1034 recogTransferRchToDid(L_RECOG *recog, | |
| 1035 l_int32 x, | |
| 1036 l_int32 y) | |
| 1037 { | |
| 1038 L_RDID *did; | |
| 1039 L_RCH *rch; | |
| 1040 | |
| 1041 if (!recog) | |
| 1042 return ERROR_INT("recog not defined", __func__, 1); | |
| 1043 if ((did = recogGetDid(recog)) == NULL) | |
| 1044 return ERROR_INT("did not defined", __func__, 1); | |
| 1045 if ((rch = recog->rch) == NULL) | |
| 1046 return ERROR_INT("rch not defined", __func__, 1); | |
| 1047 | |
| 1048 numaAddNumber(did->natempl_r, rch->index); | |
| 1049 numaAddNumber(did->nasample_r, rch->sample); | |
| 1050 numaAddNumber(did->naxloc_r, rch->xloc + x); | |
| 1051 numaAddNumber(did->nadely_r, rch->yloc + y); | |
| 1052 numaAddNumber(did->nawidth_r, rch->width); | |
| 1053 numaAddNumber(did->nascore_r, rch->score); | |
| 1054 return 0; | |
| 1055 } |
