comparison changes.txt @ 1:1d09e1dec1d9 upstream

ADD: PyMuPDF v1.26.4: the original sdist. It does not yet contain MuPDF. This normally will be downloaded when building PyMuPDF.
author Franz Glasner <fzglas.hg@dom66.de>
date Mon, 15 Sep 2025 11:37:51 +0200
parents
children a6bc019ac0b2
comparison
equal deleted inserted replaced
-1:000000000000 1:1d09e1dec1d9
1 Change Log
2 ==========
3
4
5 **Changes in version 1.26.4**
6
7 * Use MuPDF-1.26.7.
8
9 * Fixed issues:
10
11 * **Fixed** `3806 <https://github.com/pymupdf/PyMuPDF/issues/3806>`_: pdf to image rendering ignore optional content offs
12 * **Fixed** `4388 <https://github.com/pymupdf/PyMuPDF/issues/4388>`_: Incorrect PixMap from page due to cached data from other PDF
13 * **Fixed** `4457 <https://github.com/pymupdf/PyMuPDF/issues/4457>`_: Wrong characters displayed after font subsetting (w/ native method)
14 * **Fixed** `4462 <https://github.com/pymupdf/PyMuPDF/issues/4462>`_: delete_pages() does not accept a single int
15 * **Fixed** `4533 <https://github.com/pymupdf/PyMuPDF/issues/4533>`_: Open PDF error segmentation fault
16 * **Fixed** `4565 <https://github.com/pymupdf/PyMuPDF/issues/4565>`_: MacOS uses Tesseract and not Tesseract-OCR
17 * **Fixed** `4571 <https://github.com/pymupdf/PyMuPDF/issues/4571>`_: Broken merged pdfs.
18 * **Fixed** `4590 <https://github.com/pymupdf/PyMuPDF/issues/4590>`_: TypeError in utils.py scrub(): annot.update_file(buffer=...) is invalid
19 * **Fixed** `4614 <https://github.com/pymupdf/PyMuPDF/issues/4614>`_: Intercept bad widgets when inserting to another PDF
20 * **Fixed** `4639 <https://github.com/pymupdf/PyMuPDF/issues/4639>`_: pymupdf.mupdf.FzErrorGeneric: code=1: Director error: <class 'AttributeError'>: 'JM_new_bbox_device_Device' object has no attribute 'layer_name'
21
22 * Other:
23
24 * Check that #4392 `Segfault when running with pytest and -Werror` is fixed if PyMuPDF is built with swig>=4.4.
25 * Add `Page.clip_to_rect()`.
26 * Improved search for Tesseract data.
27 * Retrospectively mark #4496 as fixed in 1.26.1.
28 * Retrospectively mark #4503 as fixed in 1.26.3.
29 * Added experimental support for Graal.
30
31
32 **Changes in version 1.26.3 (2025-07-02)**
33
34 * Use MuPDF-1.26.3.
35
36 * Fixed issues:
37
38 * **Fixed** `4462 <https://github.com/pymupdf/PyMuPDF/issues/4462>`_: delete_pages() does not accept a single int
39 * **Fixed** `4503 <https://github.com/pymupdf/PyMuPDF/issues/4503>`_: Undetected character styles
40 * **Fixed** `4527 <https://github.com/pymupdf/PyMuPDF/issues/4527>`_: Rect.intersects() is much slower than necessary
41 * **Fixed** `4564 <https://github.com/pymupdf/PyMuPDF/issues/4564>`_: Possible encoding issue in PDF metadata
42 * **Fixed** `4575 <https://github.com/pymupdf/PyMuPDF/issues/4575>`_: Bug with IRect contains method
43
44 * Other:
45
46 * Class Shape is now available as pymupdf.Shape.
47 * Added table cell markdown support.
48
49
50 **Changes in version 1.26.2**
51
52 [Skipped.]
53
54
55 **Changes in version 1.26.1 (2025-06-11)**
56
57 * Use MuPDF-1.26.2.
58
59 * Fixed issues:
60
61 * **Fixed** `4520 <https://github.com/pymupdf/PyMuPDF/issues/4520>`_: show_pdf_page does not like empty pages created by new_page
62 * **Fixed** `4524 <https://github.com/pymupdf/PyMuPDF/issues/4524>`_: fitz.get_text ignores 'pages' kwarg
63 * **Fixed** `4412 <https://github.com/pymupdf/PyMuPDF/issues/4412>`_: Regression? Spurious error? in insert_pdf in v1.25.4
64 * **Fixed** `4496 <https://github.com/pymupdf/PyMuPDF/issues/4496>`_: pymupdf4llm with pymupdfpro
65
66 * Other:
67
68 * Partial fix for `4503 <https://github.com/pymupdf/PyMuPDF/issues/4503>`_: Undetected character styles
69 * New method `Document.rewrite_images()`, useful for reducing file size, changing image formats, or converting color spaces.
70 * `Page.get_text()`: restrict positional args to match docs.
71 * Removed bogus definition of class `Shape`.
72 * Removed release date from module, docs and changelog.
73 * `pymupdf.pymupdf_date` and `pymupdf.VersionDate` are now both None.
74 * They will be removed in a future release.
75
76
77 **Changes in version 1.26.0 (2025-05-22)**
78
79 * Use MuPDF-1.26.1.
80
81 * Fixed issues:
82
83 * **Fixed** `4324 <https://github.com/pymupdf/PyMuPDF/issues/4324>`_: cluster_drawings() fails to cluster horizontal and vertical thin lines
84 * **Fixed** `4363 <https://github.com/pymupdf/PyMuPDF/issues/4363>`_: Trouble with searching
85 * **Fixed** `4404 <https://github.com/pymupdf/PyMuPDF/issues/4404>`_: IndexError in page.get_links()
86 * **Fixed** `4412 <https://github.com/pymupdf/PyMuPDF/issues/4412>`_: Regression? Spurious error? in insert_pdf in v1.25.4
87 * **Fixed** `4423 <https://github.com/pymupdf/PyMuPDF/issues/4423>`_: pymupdf.mupdf.FzErrorFormat: code=7: cannot find object in xref error encountered after version 1.25.3
88 * **Fixed** `4435 <https://github.com/pymupdf/PyMuPDF/issues/4435>`_: get_pixmap method stuck on one page
89 * **Fixed** `4439 <https://github.com/pymupdf/PyMuPDF/issues/4439>`_: New Xml class from data does not work - bug in code
90 * **Fixed** `4445 <https://github.com/pymupdf/PyMuPDF/issues/4445>`_: Broken XREF table incorrectly repaired
91 * **Fixed** `4447 <https://github.com/pymupdf/PyMuPDF/issues/4447>`_: Stroke color of annotations cannot be correctly set
92 * **Fixed** `4479 <https://github.com/pymupdf/PyMuPDF/issues/4479>`_: set_layer_ui_config() toggles all layers rather than just one
93 * **Fixed** `4505 <https://github.com/pymupdf/PyMuPDF/issues/4505>`_: Follow Widget flag values up its parent structure
94
95 * Other:
96
97 * Partial fixed for `4457 <https://github.com/pymupdf/PyMuPDF/issues/4457>`_: Wrong characters displayed after font subsetting (w/ native method)
98 * Support image stamp annotations.
99 * Support recoloring pages.
100 * Added example of using Django's file storage API to open files with pymupdf.
101 * Clarified FreeText annotation color options.
102 We now raise an exception if an attempt is made to set attributes that can not be supported.
103 * Fixed potential segv in Pixmap.is_unicolor().
104 * Added runtime assert that that PyMuPDF and MuPDF were built with compatible
105 NDEBUG settings (related to `4390 <https://github.com/pymupdf/PyMuPDF/issues/4390>`_).
106 * Simplified handling of filename/filetype when opening documents.
107 * Removed PDF linearization support.
108 * Calls to `Document.save()` with `linear` set to true will now raise an exception.
109 * See https://artifex.com/blog/mupdf-removes-linearisation for more information.
110
111 **Changes in version 1.25.5 (2025-03-31)**
112
113 * Fixed issues:
114
115 * **Fixed** `4372 <https://github.com/pymupdf/PyMuPDF/issues/4372>`_: Text insertion fails due to missing /Resources object
116 * **Fixed** `4400 <https://github.com/pymupdf/PyMuPDF/issues/4400>`_: Infinite loop in fill_textbox
117 * **Fixed** `4403 <https://github.com/pymupdf/PyMuPDF/issues/4403>`_: Unable to get_text() - layer/clip nesting too deep
118 * **Fixed** `4415 <https://github.com/pymupdf/PyMuPDF/issues/4415>`_: PDF page is mirrored, origin is at bottom-left
119
120 * Other:
121
122 * Use MuPDF-1.25.6.
123 * Fixed MuPDF SEGV on MacOS with particular fonts.
124 * Fixed `Annot.get_textpage()`'s `clip` arg.
125 * Fixed Python-3.14 (pre-release) build error.
126
127
128 **Changes in version 1.25.4 (2025-03-14)**
129
130 * Use MuPDF-1.25.5.
131
132 * Fixed issues:
133
134 * **Fixed** `4079 <https://github.com/pymupdf/PyMuPDF/issues/4079>`_: Unexpected result for apply_redactions()
135 * **Fixed** `4224 <https://github.com/pymupdf/PyMuPDF/issues/4224>`_: MuPDF error: format error: negative code in 1d faxd
136 * **Fixed** `4303 <https://github.com/pymupdf/PyMuPDF/issues/4303>`_: page.get_image_info() returns outdated cached results after replacing image
137 * **Fixed** `4309 <https://github.com/pymupdf/PyMuPDF/issues/4309>`_: FzErrorFormat Error When Deleting First Page
138 * **Fixed** `4336 <https://github.com/pymupdf/PyMuPDF/issues/4336>`_: Major Performance Regression: pix.color_count is 150x slower in version 1.25.3 compared to 1.23.8
139 * **Fixed** `4341 <https://github.com/pymupdf/PyMuPDF/issues/4341>`_: Invalid label retrieval when /Kids is an array of multiple /Nums
140
141 * Other:
142
143 * Fixed handling of duplicate widget names when joining PDFs (PR #4347).
144 * Improved Pyodide build.
145 * Avoid SWIG-related build errors with Python-3.13 by disabling PY_LIMITED_API.
146
147
148 **Changes in version 1.25.3 (2025-02-06)**
149
150 * Use MuPDF-1.25.4.
151
152 * Fixed issues:
153
154 * **Fixed** `4139 <https://github.com/pymupdf/PyMuPDF/issues/4139>`_: Text color numbers change between 1.24.14 and 1.25.0
155 * **Fixed** `4141 <https://github.com/pymupdf/PyMuPDF/issues/4141>`_: Some insertion methods fails for pages without a /Resources object
156 * **Fixed** `4180 <https://github.com/pymupdf/PyMuPDF/issues/4180>`_: Search problems
157 * **Fixed** `4182 <https://github.com/pymupdf/PyMuPDF/issues/4182>`_: Text coordinate extraction error
158 * **Fixed** `4245 <https://github.com/pymupdf/PyMuPDF/issues/4245>`_: Highlighting issue distorted on recent versions
159 * **Fixed** `4254 <https://github.com/pymupdf/PyMuPDF/issues/4254>`_: add_freetext_annot is drawing text outside the annotation box
160
161 * Other:
162
163 * In annotations:
164 * Added support for subtype FreeTextCallout.
165 * Added support for rich text.
166 * Added miter_limit arg to insert_text*() to allow suppression of spikes caused by long miters.
167 * Add Widget Support to `Document.insert_pdf()`.
168 * Add `bibi` to span dicts.
169 * Add `synthetic' to char dict.
170 * Fixed Pyodide builds.
171
172
173 **Changes in version 1.25.2 (2025-01-17)**
174
175 * Fixed issues:
176
177 * **Fixed** `4055 <https://github.com/pymupdf/PyMuPDF/issues/4055>`_: "Yes" for all checkboxes does not work for all PDF rendering engines.
178 * **Fixed** `4155 <https://github.com/pymupdf/PyMuPDF/issues/4155>`_: samples_mv is unsafe
179 * **Fixed** `4162 <https://github.com/pymupdf/PyMuPDF/issues/4162>`_: Got AttributeError, when tried to add Signature field
180 * **Fixed** `4186 <https://github.com/pymupdf/PyMuPDF/issues/4186>`_: Incorrect handling of JPEG with color space CMYK image extraction
181 * **Fixed** `4195 <https://github.com/pymupdf/PyMuPDF/issues/4195>`_: Pixmaps that are inverted and have an alpha channel are not rendered properly
182 * **Fixed** `4225 <https://github.com/pymupdf/PyMuPDF/issues/4225>`_: pixmap.pil_save() fails due to colorspace definition
183 * **Fixed** `4232 <https://github.com/pymupdf/PyMuPDF/issues/4232>`_: Incorrect Font style and Size
184
185 * Other:
186
187 * Use Python's built-in glyphname <> unicode conversion.
188 * Improve speed of pixmap color inversion.
189 * Add new `char_flags` member to span dictionary, for example allows detection of invisible text.
190 * Detect image masks in TextPage output.
191 * Added `Pixmap.pil_image()`.
192
193
194 **Changes in version 1.25.1 (2024-12-11)**
195
196 * Use MuPDF-1.25.2.
197
198 * Fixed issues:
199
200 * **Fixed** `4125 <https://github.com/pymupdf/PyMuPDF/issues/4125>`_: memory leak while convert Pixmap's colorspace
201 * **Fixed** `4034 <https://github.com/pymupdf/PyMuPDF/issues/4034>`_: Possible regression in pdf cleaning during save.
202
203
204 **Changes in version 1.25.0 (2024-12-05)**
205
206 * Use MuPDF-1.25.1.
207
208 * Fixed issues:
209
210 * **Fixed** `4026 <https://github.com/pymupdf/PyMuPDF/issues/4026>`_: page.get_text('blocks') output two piece of very similar text with different bbox
211 * **Fixed** `4004 <https://github.com/pymupdf/PyMuPDF/issues/4004>`_: Segmentation Fault When Updating PDF Form Field Value
212 * **Fixed** `3887 <https://github.com/pymupdf/PyMuPDF/issues/3887>`_: Subset Fonts problem using Fallback Font
213 * **Fixed** `3886 <https://github.com/pymupdf/PyMuPDF/issues/3886>`_: Another issue with destroying PDF when inserting html
214 * **Fixed** `3751 <https://github.com/pymupdf/PyMuPDF/issues/3751>`_: apply_redactions causes part of the page content to be hidden / transparent
215
216
217 .. codespell:ignore-begin
218
219 **Changes in version 1.24.14 (2024-11-19)**
220
221 * Use MuPDF-1.24.11.
222
223 * Fixed issues:
224
225 * **Fixed** `3448 <https://github.com/pymupdf/PyMuPDF/issues/3448>`_: get_pixmap function removes the table and leaves just the content behind
226 * **Fixed** `3758 <https://github.com/pymupdf/PyMuPDF/issues/3758>`_: Got "malloc(): unaligned tcache chunk detected Aborted (core dumped)" while using add_redact_annot/apply_redactions
227 * **Fixed** `3813 <https://github.com/pymupdf/PyMuPDF/issues/3813>`_: Stories: Ordered list count broken with nested unordered list
228 * **Fixed** `3933 <https://github.com/pymupdf/PyMuPDF/issues/3933>`_: font.valid_codepoints() - malfunction
229 * **Fixed** `4018 <https://github.com/pymupdf/PyMuPDF/issues/4018>`_: PyMuPDF hangs when iterating over zero page PDF pages backwards
230 * **Fixed** `4043 <https://github.com/pymupdf/PyMuPDF/issues/4043>`_: fullcopypage bug
231 * **Fixed** `4047 <https://github.com/pymupdf/PyMuPDF/issues/4047>`_: Segmentation Fault in add_redact_annot
232 * **Fixed** `4050 <https://github.com/pymupdf/PyMuPDF/issues/4050>`_: Content of dict returned by doc.embfile_info() does not fit to documentation
233
234 * Other:
235
236 * Ensure that words from `Page.get_text()` never contain RTL/LTR char mixtures.
237 * Fix building with system MuPDF.
238 * Add dot product for points and vectors.
239
240
241 **Changes in version 1.24.13 (2024-10-29)**
242
243 * Fixed issues:
244
245 * **Fixed** `3848 <https://github.com/pymupdf/PyMuPDF/issues/3848>`_: Piximap program crash
246 * **Fixed** `3950 <https://github.com/pymupdf/PyMuPDF/issues/3950>`_: Unable to consistently extract field labels from PDFs
247 * **Fixed** `3981 <https://github.com/pymupdf/PyMuPDF/issues/3981>`_: PyMuPDF 1.24.12 with pyinstaller throws error.
248 * **Fixed** `3994 <https://github.com/pymupdf/PyMuPDF/issues/3994>`_: pix.color_topusage raise Segmentation fault (core dumped)
249
250
251 **Changes in version 1.24.12 (2024-10-21)**
252
253 * Fixed issues:
254
255 * **Fixed** `3914 <https://github.com/pymupdf/PyMuPDF/issues/3914>`_: Ability to print MuPDF errors to logging instead of stdout
256 * **Fixed** `3916 <https://github.com/pymupdf/PyMuPDF/issues/3916>`_: insert_htmlbox error: int too large to convert to float
257 * **Fixed** `3950 <https://github.com/pymupdf/PyMuPDF/issues/3950>`_: Unable to consistently extract field labels from PDFs
258
259 * Supported Python versions are now 3.9-3.13.
260
261 * Dropped support for Python-3.8 because end-of-life.
262 * Added support for Python-3.13 because now released.
263 * See: https://devguide.python.org/versions/
264
265
266 **Changes in version 1.24.11 (2024-10-03)**
267
268 * Use MuPDF-1.24.10.
269
270 * Fixed issues:
271
272 * **Fixed** `3624 <https://github.com/pymupdf/PyMuPDF/issues/3624>`_: Pdf file transform to image have a black block
273 * **Fixed** `3859 <https://github.com/pymupdf/PyMuPDF/issues/3859>`_: doc.need_appearances() fails with "AttributeError: module 'pymupdf.mupdf' has no attribute 'PDF_TRUE' "
274 * **Fixed** `3863 <https://github.com/pymupdf/PyMuPDF/issues/3863>`_: apply_redactions() does not work as expected
275 * **Fixed** `3905 <https://github.com/pymupdf/PyMuPDF/issues/3905>`_: open stream can raise a FzErrorFormat error instead of FileDataError
276
277 * Wheels now use the Python Stable ABI:
278
279 * There is one PyMuPDF wheel for each platform.
280 * Each wheel works with all supported Python versions.
281 * Each wheel is built using the oldest supported Python version (currently 3.8).
282 * There is no PyMuPDFb wheel.
283
284 * Other:
285
286 * Improvements to get_text_words() with sort=True.
287 * Tests now always get the latest versions of required Python packages.
288 * Removed dependency on setuptools.
289 * Added item to PyMuPDF-1.24.10 changes below - fix of #3630.
290
291
292 **Changes in version 1.24.10 (2024-09-02)**
293
294 * Use MuPDF-1.24.9.
295
296 * Fixed issues:
297
298 * **Fixed** `3450 <https://github.com/pymupdf/PyMuPDF/issues/3450>`_: get_pixmap function takes too long to process
299 * **Fixed** `3569 <https://github.com/pymupdf/PyMuPDF/issues/3569>`_: Invalid OCGs not ignored by SVG image creation
300 * **Fixed** `3603 <https://github.com/pymupdf/PyMuPDF/issues/3603>`_: ObjStm compression and PDF linearization doesn't work together
301 * **Fixed** `3650 <https://github.com/pymupdf/PyMuPDF/issues/3650>`_: Linebreak inserted between each letter
302 * **Fixed** `3661 <https://github.com/pymupdf/PyMuPDF/issues/3661>`_: Update Document to check the /XYZ len
303 * **Fixed** `3698 <https://github.com/pymupdf/PyMuPDF/issues/3698>`_: documentation issue - old code in the annotations documentation
304 * **Fixed** `3705 <https://github.com/pymupdf/PyMuPDF/issues/3705>`_: Document.select() behaves weirdly in some particular kind of pdf files
305 * **Fixed** `3706 <https://github.com/pymupdf/PyMuPDF/issues/3706>`_: extend Document.__getitem__ type annotation to reflect that the method also accepts slices
306 * **Fixed** `3727 <https://github.com/pymupdf/PyMuPDF/issues/3727>`_: Method get_pixmap() make the program exit without any exceptions or messages
307 * **Fixed** `3767 <https://github.com/pymupdf/PyMuPDF/issues/3767>`_: Cannot get Tessdata with Tesseract-OCR 5
308 * **Fixed** `3773 <https://github.com/pymupdf/PyMuPDF/issues/3773>`_: Link.set_border gives TypeError: '<' not supported between instances of 'NoneType' and 'int'
309 * **Fixed** `3774 <https://github.com/pymupdf/PyMuPDF/issues/3774>`_: fitz.__version__` does not work anymore
310 * **Fixed** `3789 <https://github.com/pymupdf/PyMuPDF/issues/3789>`_: ValueError: not enough values to unpack (expected 3, got 2) is thrown when call insert_pdf
311 * **Fixed** `3820 <https://github.com/pymupdf/PyMuPDF/issues/3820>`_: class improves namedDest handling
312
313 * **Fixed** `3630 <https://github.com/pymupdf/PyMuPDF/issues/3630>`_: page.apply_redactions gives unwanted black rectangle
314
315 * Other:
316
317 * Object streams and linearization cannot be used together; attempting to do
318 so will raise an exception. (#3603)
319 * Fixed handling of non-existing /Contents object.
320
321
322 **Changes in version 1.24.9 (2024-07-24)**
323
324 * Use MuPDF-1.24.8.
325
326
327 **Changes in version 1.24.8 (2024-07-22)**
328
329 * Fixed issues:
330
331 * **Fixed** `3636 <https://github.com/pymupdf/PyMuPDF/issues/3636>`_: API documentation for the open function is not obvious to find.
332 * **Fixed** `3654 <https://github.com/pymupdf/PyMuPDF/issues/3654>`_: docx parsing was broken in 1.24.7
333 * **Fixed** `3677 <https://github.com/pymupdf/PyMuPDF/issues/3677>`_: Unable to extract subset font name using the newer versions of PyMuPDF : 1.24.6 and 1.24.7.
334 * **Fixed** `3687 <https://github.com/pymupdf/PyMuPDF/issues/3687>`_: Page.get_text results in AssertionError for epub files
335
336 Other:
337
338 * Fixed various spelling mistakes spotted by codespell.
339 * Improved how we modify MuPDF's default configuration on Windows.
340 * Make text search to work with ligatures.
341
342
343 **Changes in version 1.24.7 (2024-06-26)**
344
345 * Fixed issues:
346
347 * **Fixed** `3615 <https://github.com/pymupdf/PyMuPDF/issues/3615>`_: Document.pagemode or Document.pagelayout crashes for epub files
348 * **Fixed** `3616 <https://github.com/pymupdf/PyMuPDF/issues/3616>`_: not last version reported
349
350
351 **Changes in version 1.24.6 (2024-06-25)**
352
353 * Use MuPDF-1.24.4
354
355 * Fixed issues:
356
357 * **Fixed** `3599 <https://github.com/pymupdf/PyMuPDF/issues/3599>`_: Story.fit_width() has a weird line
358 * **Fixed** `3594 <https://github.com/pymupdf/PyMuPDF/issues/3594>`_: Garbled extraction for Amazon Sustainability Report
359 * **Fixed** `3591 <https://github.com/pymupdf/PyMuPDF/issues/3591>`_: 'width' in Page.get_drawings() returns width equal as 0
360 * **Fixed** `3561 <https://github.com/pymupdf/PyMuPDF/issues/3561>`_: ZeroDivisionError: float division by zero with page.apply_redactions()
361 * **Fixed** `3559 <https://github.com/pymupdf/PyMuPDF/issues/3559>`_: SegFault 11 when empty H1 H2 H3 H4 etc element is used in insert_htmlbox
362 * **Fixed** `3539 <https://github.com/pymupdf/PyMuPDF/issues/3539>`_: Add dotted gridline detection to table recognition
363 * **Fixed** `3519 <https://github.com/pymupdf/PyMuPDF/issues/3519>`_: get_toc(simple=False) AttributeError: 'Outline' object has no attribute 'rect'
364 * **Fixed** `3510 <https://github.com/pymupdf/PyMuPDF/issues/3510>`_: page.get_label() gets wrong label on the first page of doc
365 * **Fixed** `3494 <https://github.com/pymupdf/PyMuPDF/issues/3494>`_: 1.24.2/1.24.3: spurious characters introduced when using subset_fonts and insert_pdf
366 * **Fixed** `3470 <https://github.com/pymupdf/PyMuPDF/issues/3470>`_: subset_fonts error exit without exception/warning
367 * **Fixed** `3400 <https://github.com/pymupdf/PyMuPDF/issues/3400>`_: set_toc alters link coordinates for some rotated pages on pymupdf 1.24.2
368 * **Fixed** `3347 <https://github.com/pymupdf/PyMuPDF/issues/3347>`_: Incorrect links to points on pages having different heights
369 * **Fixed** `3237 <https://github.com/pymupdf/PyMuPDF/issues/3237>`_: Set_metadata() does not work
370 * **Fixed** `3493 <https://github.com/pymupdf/PyMuPDF/discussions/3493>`_: Isolate PyMuPDF from other libraries; issues when PyMuPDF is loaded with other libraries like GdkPixbuf
371
372 * Other:
373
374 * Fixed concurrent use of PyMuPDF caused by use of constant temporary filenames.
375
376 * Add musllinux x86_64 wheels to release.
377
378 * Added clearer version information:
379
380 * `pymupdf.pymupdf_version`.
381 * `pymupdf.mupdf_version`.
382 * `pymupdf.pymupdf_date`.
383
384
385 **Changes in version 1.24.5 (2024-05-30)**
386
387 * Fixed issues:
388
389 * **Fixed** `3479 <https://github.com/pymupdf/PyMuPDF/issues/3479>`_: regression: fill_textbox: IndexError: pop from empty list
390 * **Fixed** `3488 <https://github.com/pymupdf/PyMuPDF/issues/3488>`_: set_toc method error
391
392 * Other:
393
394 * Some more fixes to use MuPDF floating formatting.
395 * Removed/disabled some unnecessary diagnostics.
396 * Fixed utils.do_links() crash.
397 * Experimental new functions `pymupdf.apply_pages()` and `pymupdf.get_text()`.
398 * Addresses wrong label generation for label styles "a" and "A".
399
400
401 **Changes in version 1.24.4 (2024-05-16)**
402
403 * **Fixed** `3418 <https://github.com/pymupdf/PyMuPDF/issues/3418>`_: Re-introduced bug, text align add_redact_annot
404 * **Fixed** `3472 <https://github.com/pymupdf/PyMuPDF/issues/3472>`_: insert_pdf gives SystemError
405
406 * Other:
407
408 * Fixed sysinstall test failing to remove all of prior installation before
409 new install.
410 * Fixed `utils.do_links()` crash.
411 * Correct `TextPage` creation Code.
412 * Unified various diagnostics.
413 * Fix bug in `page_merge()`.
414
415
416 **Changes in version 1.24.3 (2024-05-09)**
417
418 *
419 The Python module is now called `pymupdf`. `fitz` is still supported for
420 backwards compatibility.
421
422 * Use MuPDF-1.24.2.
423
424 * Fixed issues:
425
426 * **Fixed** `3357 <https://github.com/pymupdf/PyMuPDF/issues/3357>`_: PyMuPDF==1.24.0 will hanging when using page.get_text("text")
427 * **Fixed** `3376 <https://github.com/pymupdf/PyMuPDF/issues/3376>`_: Redacting results are not as expected in 1.24.x.
428 * **Fixed** `3379 <https://github.com/pymupdf/PyMuPDF/issues/3379>`_: Documentation mismatch for get_text_blocks return value order.
429 * **Fixed** `3381 <https://github.com/pymupdf/PyMuPDF/issues/3381>`_: Contents stream contains floats in scientific notation
430 * **Fixed** `3402 <https://github.com/pymupdf/PyMuPDF/issues/3402>`_: Cannot add Widgets containing inter-field-calculation JavaScript
431 * **Fixed** `3414 <https://github.com/pymupdf/PyMuPDF/issues/3414>`_: missing attribute set_dpi()
432 * **Fixed** `3430 <https://github.com/pymupdf/PyMuPDF/issues/3430>`_: page.get_text() cause process freeze with certain pdf on v1.24.2
433
434 * Other:
435
436 * New/modified methods:
437
438 * `Page.remove_rotation()`: new, set page rotation to zero while keeping appearance.
439
440 * Fixed some problems when checking for PDF properties.
441 * Fixed pip builds from sdist
442 (see discussion `3360 <https://github.com/pymupdf/PyMuPDF/discussions/3360>`_:
443 Alpine linux docker build failing "No matching distribution found for pymupdfb==1.24.1").
444
445
446 **Changes in version 1.24.2 (2024-04-17)**
447
448 * Removed obsolete classic implementation from releases
449 (previously available as module `fitz_old`).
450
451 * Fixed issues:
452
453 * **Fixed** `3331 <https://github.com/pymupdf/PyMuPDF/issues/3331>`_: Document.pages() is incorrectly type-hinted
454 * **Fixed** `3354 <https://github.com/pymupdf/PyMuPDF/issues/3354>`_: PyMuPDF==1.24.1: AttributeError: property 'metadata' of 'Document' object has no setter
455
456 * Other:
457
458 * New/modified methods:
459
460 * `Document.bake()`: new, make annotations / fields permanent content.
461 * `Page.cluster_drawings()`: new, identifies drawing items
462 (i.e. vector graphics or line-art)
463 that belong together based on their geometrical vicinity.
464 * `Page.apply_redactions()`: added new parameter `text`.
465 * `Document.subset_fonts()`: use MuPDF's `pdf_subset_fonts()` instead of PyMuPDF code.
466
467 * The `Document` class now supports page numbers specified as slices.
468 * Avoid causing MuPDF warnings.
469
470
471 **Changes in version 1.24.1 (2024-04-02)**
472
473 * Fixed issues:
474
475 * **Fixed** `3278 <https://github.com/pymupdf/PyMuPDF/issues/3278>`_: apply_redactions moves some unredacted text
476 * **Fixed** `3301 <https://github.com/pymupdf/PyMuPDF/issues/3301>`_: Be more permissive when classifying links as kind LINK_URI
477 * **Fixed** `3306 <https://github.com/pymupdf/PyMuPDF/issues/3306>`_: Text containing capital 'ET' not appearing as annotation
478
479 * Other:
480
481 * Use MuPDF-1.24.1.
482 * Support ObjStm Compression.
483 Methods `Document.save()`, `Document.ez_save()` and `Document.write()`
484 now support new parameters `use_objstm`, compression_effort` and
485 `preserve_metadata`.
486
487
488 **Changes in version 1.24.0 (2024-03-21)**
489
490 * Fixed issues:
491
492 * **Fixed** `3281 <https://github.com/pymupdf/PyMuPDF/issues/3281>`_: Preparing metadata (pyproject.toml) did not run successfully
493 * **Fixed** `3279 <https://github.com/pymupdf/PyMuPDF/issues/3279>`_: PyMuPDF no longer builds in Alpine Linux
494 * **Fixed** `3257 <https://github.com/pymupdf/PyMuPDF/issues/3257>`_: apply_redactions() deleting text outside of annoted box
495 * **Fixed** `3216 <https://github.com/pymupdf/PyMuPDF/issues/3216>`_: AttributeError: 'Annot' object has no attribute '__del__'
496 * **Fixed** `3207 <https://github.com/pymupdf/PyMuPDF/issues/3207>`_: get_drawings's items is missing line from h path operator
497 * **Fixed** `3201 <https://github.com/pymupdf/PyMuPDF/issues/3201>`_: Memory leaks when merging PDFs
498 * **Fixed** `3197 <https://github.com/pymupdf/PyMuPDF/issues/3197>`_: page.get_text() returns hexadecimal text for some characters
499 * **Fixed** `3196 <https://github.com/pymupdf/PyMuPDF/issues/3196>`_: Remove text not working in 1.23.25 version vs 1.20.2
500 * **Fixed** `3172 <https://github.com/pymupdf/PyMuPDF/issues/3172>`_: PDF's 45º lines dissapearing in png conversion
501 * **Fixed** `3135 <https://github.com/pymupdf/PyMuPDF/issues/3135>`_: Do not log warnings to stdout
502 * **Fixed** `3125 <https://github.com/pymupdf/PyMuPDF/issues/3125>`_: get_pixmap method stuck on one page and runs forever
503 * **Fixed** `2964 <https://github.com/pymupdf/PyMuPDF/issues/2964>`_: There is an issue with the image generated by the page.get_pixmap() function
504
505 * Other:
506
507 * Use MuPDF-1.24.0.
508 * Add support for redacting vector graphics.
509 * Several fixes for table module
510
511 * Add new method for outputting the table as a markdown string.
512
513 * Address errors in computing the table header object:
514
515 We now allow None as the cell value, because this will be resolved where
516 needed (e.g. in the pandas DataFrame).
517
518 We previously tried to enforce rect-like tuples in all header cell
519 bboxes, however this fails for tables with all-None columns. This fix
520 enables this and constructs an empty string in the corresponding cell
521 string.
522
523 We now correctly include start / stop points of lines in the bbox of the
524 clustered graphic. We previously joined the line's rectangle - which had
525 no effect because this is always empty.
526
527 * Improved exception text if we fail to open document.
528 * Fixed build with new libclang 18.
529
530
531 **Changes in version 1.23.26 (2024-02-29)**
532
533 * Fixed issues:
534
535 * **Fixed** `3199 <https://github.com/pymupdf/PyMuPDF/issues/3199>`_: Add entry_points to setuptools configuration to provide command-line console scripts
536 * **Fixed** `3209 <https://github.com/pymupdf/PyMuPDF/issues/3209>`_: Empty vertices in ink annotation
537
538 * Other:
539
540 * Improvements to table detection:
541
542 * Improved check for empty tables, fixes bugs when determining table headers.
543 * Improved computation of enveloping vector graphic rectangles.
544 * Ignore more meaningless "pseudo" tables
545
546 * Install command-line 'pymupdf' command that runs fitz/__main__.py.
547 * Don't overwrite MuPDF's config.h when building on non-Windows.
548 * Fix `Story` constructor's `archive` arg to match docs - now accepts a single `Archive` constructor arg.
549 * Do not include MuPDF source in sdist; will be downloaded automatically when building.
550
551
552 **Changes in version 1.23.25 (2024-02-20)**
553
554 * Fixed issues:
555
556 * **Fixed** `3182 <https://github.com/pymupdf/PyMuPDF/issues/3182>`_: Pixmap.invert_irect argument type error
557 * **Fixed** `3186 <https://github.com/pymupdf/PyMuPDF/issues/3186>`_: extractText() extracts broken text from pdf
558 * **Fixed** `3191 <https://github.com/pymupdf/PyMuPDF/issues/3191>`_: Error on .find_tables()
559
560 * Other:
561
562 * When building, be able to specify python-config directly, with environment
563 variable `PIPCL_PYTHON_CONFIG`.
564
565
566 **Changes in version 1.23.24 (2024-02-19)**
567
568 * Fixed issues:
569
570 * **Fixed** `3148 <https://github.com/pymupdf/PyMuPDF/issues/3148>`_: Table extraction - vertical text not handled correctly
571 * **Fixed** `3179 <https://github.com/pymupdf/PyMuPDF/issues/3179>`_: Table Detection: Incorrect Separation of Vector Graphics Clusters
572 * **Fixed** `3180 <https://github.com/pymupdf/PyMuPDF/issues/3180>`_: Cannot show optional content group: AttributeError: module 'fitz.mupdf' has no attribute 'pdf_array_push_drop'
573
574 * Other:
575
576 * Be able to test system install using `sudo pip install` instead of a venv.
577
578
579 **Changes in version 1.23.23 (2024-02-18)**
580
581 * Fixed issues:
582
583 * **Fixed** `3126 <https://github.com/pymupdf/PyMuPDF/issues/3126>`_: Initialising Archive with a pathlib.Path fails.
584 * **Fixed** `3131 <https://github.com/pymupdf/PyMuPDF/issues/3131>`_: Calling the next attribute of an Annot raises a "No attribute .parent" warning
585 * **Fixed** `3134 <https://github.com/pymupdf/PyMuPDF/issues/3134>`_: Using an IRect as clip parameter in Page.get_pixmap no longer works since 1.23.9
586 * **Fixed** `3140 <https://github.com/pymupdf/PyMuPDF/issues/3140>`_: PDF document stays in use after closing
587 * **Fixed** `3150 <https://github.com/pymupdf/PyMuPDF/issues/3150>`_: doc.select() hangs on this doc.
588 * **Fixed** `3163 <https://github.com/pymupdf/PyMuPDF/issues/3163>`_: AssertionError on using fitz.IRect
589 * **Fixed** `3177 <https://github.com/pymupdf/PyMuPDF/issues/3177>`_: fitz.Pixmap(None, pix) Unrecognised args for constructing Pixmap
590
591 * Other:
592
593 *
594 Improved `Document.select() by using new MuPDF function
595 `pdf_rearrange_pages()`. This is a more complete (and faster)
596 implementation of what needs to be done here in that not only pages will
597 be rearranged, but also consequential changes will be made to the table
598 of contents, links to removed pages and affected entries in the Optional
599 Content definitions.
600 * `TextWriter.appendv()`: added `small_caps` arg.
601 * Fixed some valgrind errors with MuPDF master.
602 * Fixed `Document.insert_image()` when build with MuPDF master.
603
604
605 **Changes in version 1.23.22 (2024-02-12)**
606
607 * Fixed issues:
608
609 * **Fixed** `3143 <https://github.com/pymupdf/PyMuPDF/issues/3143>`_: Difference in decoding of OCGs names between doc.get_ocgs() and page.get_drawings()
610
611 * **Fixed** `3139 <https://github.com/pymupdf/PyMuPDF/issues/3139>`_: Pixmap resizing needs positional arg "clip" - even if None.
612
613 * Other:
614
615 * Removed the use of MuPDF function `fz_image_size()` from PyMuPDF.
616
617
618 **Changes in version 1.23.21 (2024-02-01)**
619
620 * Fixed issues:
621
622 * Other:
623
624 * Fixed bug in set_xml_metadata(), PR `3112 https://github.com/pymupdf/PyMuPDF/pull/3112>`_: Fix pdf_add_stream metadata error
625 * Fixed lack of `.parent` member in `TextPage` from `Annot.get_textpage()`.
626 * Fixed bug in `Page.add_widget()`.
627
628
629 **Changes in version 1.23.20 (2024-01-29)**
630
631 * Bug fixes:
632
633 * **Fixed** `3100 <https://github.com/pymupdf/PyMuPDF/issues/3100>`_: Wrong internal property accessed in get_xml_metadata
634
635 * Other:
636
637 * Significantly improved speed of `Document.get_toc()`.
638
639
640 **Changes in version 1.23.19 (2024-01-25)**
641
642 * Bug fixes:
643
644 * **Fixed** `3087 <https://github.com/pymupdf/PyMuPDF/issues/3087>`_: Exception in insert_image with mask specified
645 * **Fixed** `3094 <https://github.com/pymupdf/PyMuPDF/issues/3094>`_: TypeError: '<' not supported between instances of 'FzLocation' and 'int' in doc.delete_pages
646
647 * Other:
648
649 * When finding tables:
650
651 * Allow addition of user-defined "virtual" vector graphics when finding tables.
652 * Confirm that the enveloping bboxes of vector graphics are inside the clip rectangle.
653 * Avoid slow finding of rectangle intersections.
654
655 * Added `Font.bbox` property.
656
657
658 **Changes in version 1.23.18 (2024-01-23)**
659
660 * Bug fixes:
661
662 * **Fixed** `3081 <https://github.com/pymupdf/PyMuPDF/issues/3081>`_: doc.close() not closing the document
663
664 * Other:
665
666 * Reduced size of sdist to fit on pypi.org (by reducing size of two test files).
667 * Fix `Annot.file_info()` if no `Desc` item.
668
669
670 **Changes in version 1.23.17 (2024-01-22)**
671
672 * Bug fixes:
673
674 * **Fixed** `3062 <https://github.com/pymupdf/PyMuPDF/issues/3062>`_: page_rotation_reset does not return page to original rotation
675 * **Fixed** `3070 <https://github.com/pymupdf/PyMuPDF/issues/3070>`_: update_link(): AttributeError: 'Page' object has no attribute 'super'
676
677 * Other:
678
679 * Fixed bug in `Page.links()` (PR #3075).
680 * Fixed bug in `Page.get_bboxlog()` with layers.
681 * Add support for timeouts in scripts/ and tests/run_compound.py.
682
683
684 **Changes in version 1.23.16 (2024-01-18)**
685
686 * Bug fixes:
687
688 * **Fixed** `3058 <https://github.com/pymupdf/PyMuPDF/issues/3058>`_: Pixmap created from CMYK JPEG delivers RGB format
689
690 * Other:
691
692 * In table detection strategy "lines_strict", exclude fill-only vector graphics.
693 * Fixed sysinstall test failure.
694 * In documentation, update feature matrix with item about text writing.
695
696
697 **Changes in version 1.23.15 (2024-01-16)**
698
699 * Bug fixes:
700
701 * **Fixed** `3050 <https://github.com/pymupdf/PyMuPDF/issues/3050>`_: python3.9 pix.set_pixel has something wrong in c.append( ord(i))
702
703 * Other:
704
705 * Improved docs for Page.find_tables().
706
707
708 **Changes in version 1.23.14 (2024-01-15)**
709
710 * Bug fixes:
711
712 * **Fixed** `3038 <https://github.com/pymupdf/PyMuPDF/issues/3038>`_: JM_pixmap_from_display_list > Assertion Error : Checking for wrong type
713 * **Fixed** `3039 <https://github.com/pymupdf/PyMuPDF/issues/3039>`_: Issue with doc.close() not closing the document in PyMuPDF
714
715 * Other:
716
717 * Ensure valid "re" rectangles in `Page.get_drawings()` with derotated pages.
718
719
720 **Changes in version 1.23.13 (2024-01-15)**
721
722 * Bug fixes:
723
724 * **Fixed** `2979 <https://github.com/pymupdf/PyMuPDF/issues/2979>`_: list index out of range in to_pandas()
725 * **Fixed** `3001 <https://github.com/pymupdf/PyMuPDF/issues/3001>`_: Calling find_tables() on one document alters the bounding boxes of a subsequent document
726
727 * Other:
728
729 * Fixed `Rect.height` and `Rect.width` to never return negative values.
730 * Fixed `TextPage.extractIMGINFO()`'s returned `dictkey_yres` value.
731
732
733 **Changes in version 1.23.12 (2024-01-12)**
734
735 * * **Fixed** `3027 <https://github.com/pymupdf/PyMuPDF/issues/3027>`_: Page.get_text throws Attribute Error for 'parent'
736
737
738 **Changes in version 1.23.11 (2024-01-12)**
739
740 * Fixed some Pixmap construction bugs.
741 * Fixed Pixmap.yres().
742
743
744 **Changes in version 1.23.10 (2024-01-12)**
745
746 * Bug fixes:
747
748 * **Fixed** `3020 <https://github.com/pymupdf/PyMuPDF/issues/3020>`_: Can't resize a PixMap
749
750 * Other:
751
752 * Fixed Page.delete_image().
753
754
755 **Changes in version 1.23.9 (2024-01-11)**
756
757 * Default to new "rebased" implementation.
758
759 * The old "classic" implementation is available with `import fitz_old as fitz`.
760 * For more information about why we are changing to the rebased implementation,
761 see: https://github.com/pymupdf/PyMuPDF/discussions/2680
762
763 * Use MuPDF-1.23.9.
764
765 * Bug fixes (rebased implementation only):
766
767 * **Fixed** `2911 <https://github.com/pymupdf/PyMuPDF/issues/2911>`_: Page.derotation_matrix returns a tuple instead of a Matrix with rebased implementation
768 * **Fixed** `2919 <https://github.com/pymupdf/PyMuPDF/issues/2919>`_: Rebased version: KeyError in resolve_names when merging pdfs
769 * **Fixed** `2922 <https://github.com/pymupdf/PyMuPDF/issues/2922>`_: New feature that allows inserting named-destination links doesn't work
770 * **Fixed** `2943 <https://github.com/pymupdf/PyMuPDF/issues/2943>`_: ZeroDivisionError: float division by zero when use apply_redactions()
771 * **Fixed** `2950 <https://github.com/pymupdf/PyMuPDF/issues/2950>`_: Shelling out to pip during tests is problematic
772 * **Fixed** `2954 <https://github.com/pymupdf/PyMuPDF/issues/2954>`_: Replacement unicode character in text extraction
773 * **Fixed** `2957 <https://github.com/pymupdf/PyMuPDF/issues/2957>`_: apply_redactions() moving text
774 * **Fixed** `2961 <https://github.com/pymupdf/PyMuPDF/issues/2961>`_: Passing a string as a page number raises IndexError instead of TypeError.
775 * **Fixed** `2969 <https://github.com/pymupdf/PyMuPDF/issues/2969>`_: annot.next throws AttributeError
776 * **Fixed** `2978 <https://github.com/pymupdf/PyMuPDF/issues/2978>`_: 1.23.9rc1: module 'fitz.mupdf' has no attribute 'fz_copy_pixmap_rect'
777
778 * **Fixed** `2907 <https://github.com/pymupdf/PyMuPDF/issues/2907>`_: segfault trying to call clean_contents on certain pdfs with python 3.12
779 * **Fixed** `2905 <https://github.com/pymupdf/PyMuPDF/issues/2905>`_: SystemError: <built-in function TextPage_extractIMGINFO> returned a result with an exception set
780 * **Fixed** `2742 <https://github.com/pymupdf/PyMuPDF/issues/2742>`_: Segmentation Fault when inserting three (but not two) copies of the same source page into one destination page
781
782 * Other:
783
784 * Add optional setting of opacity to `Page.insert_htmlbox()`.
785 * Fixed issue with add_redact_annot() mentioned in #2934.
786 * Fixed `Page.rotation()` to return 0 for non-PDF documents instead of raising an exception.
787 * Fixed internal quad detection to cope with any Python sequence.
788 * Fixed rebased `fitz.pymupdf_version_tuple` - was previously set to mupdf version.
789 * Improved support for Linux system installs, including adding regular testing on Github.
790 * Add missing `flake8` to `scripts/gh_release.py:test_packages`.
791 * Use newly public functions in MuPDF-1.23.8.
792 * Improved `scripts/test.py` to help investigation of MuPDF issues.
793
794
795 **Changes in version 1.23.8 (2023-12-19)**
796
797 * Bug fixes (rebased implementation only):
798
799 * **Fixed** `2634 <https://github.com/pymupdf/PyMuPDF/issues/2634>`_: get_toc and set_toc do not behave consistently for rotated pages
800 * **Fixed** `2861 <https://github.com/pymupdf/PyMuPDF/issues/2861>`_: AttributeError in getLinkDict during PDF Merge
801 * **Fixed** `2871 <https://github.com/pymupdf/PyMuPDF/issues/2871>`_: KeyError in getLinkDict during PDF merge
802 * **Fixed** `2886 <https://github.com/pymupdf/PyMuPDF/issues/2886>`_: Error in Skeleton for Named Link Destinations
803
804 * Bug fixes (rebased and classic implementations):
805
806 * **Fixed** `2885 <https://github.com/pymupdf/PyMuPDF/issues/2885>`_: pymupdf find tables too slow
807
808 * Other:
809
810 * Rebased implementation:
811
812 * `Page.insert_htmlbox()`: new, much more powerful alternative to `Page.insert_textbox()` or `TextWriter.fill_textbox()`, using `Story`.
813 * `Story.fit*()`: new methods for fitting a Story into an expanded rect.
814 * `Story.write_with_links()`: add support for external links.
815 * `Document.language()`: fixed to use MuPDF's new `mupdf.fz_string_from_text_language2()`.
816 * `Document.subset_fonts()` - fixed.
817 * Fixed internal `Archive._add_treeitem()` method.
818 * Fixed `fitz_new.__doc__` to contain PyMuPDF and Python version information, and OS name.
819 * Removed use of `(*args, **kwargs)` in API, we now specify keyword args explicitly.
820 * Work with new MuPDF Python exception classes.
821
822 * Fixed bug where `button_states()` returns None when `/AP` points to an indirect object.
823 * Fixed pillow test to not ignore all errors, and install pillow when testing.
824 * Added test for `fitz.css_for_pymupdf_font()` (uses package `pymupdf-fonts`).
825 * Simplified Github Actions test specifications.
826 * Updated `tests/README.md`.
827
828
829 **Changes in version 1.23.7 (2023-11-30)**
830
831 * Bug fixes in rebased implementation, not fixed in classic implementation:
832
833 * **Fixed** `2232 <https://github.com/pymupdf/PyMuPDF/issues/2232>`_: Geometry helper classes should support keyword arguments
834 * **Fixed** `2788 <https://github.com/pymupdf/PyMuPDF/issues/2788>`_: Problem with get_toc in pymupdf 1.23.6
835 * **Fixed** `2791 <https://github.com/pymupdf/PyMuPDF/issues/2791>`_: Experiencing small memory leak in save()
836
837 * Bug fixes (rebased and classic implementations):
838
839 * **Fixed** `2736 <https://github.com/pymupdf/PyMuPDF/issues/2736>`_: Failure when set cropbox with mediabox negative value
840 * **Fixed** `2749 <https://github.com/pymupdf/PyMuPDF/issues/2749>`_: RuntimeError: cycle in structure tree
841 * **Fixed** `2753 <https://github.com/pymupdf/PyMuPDF/issues/2753>`_: Story.write_with_links will ignore everything after the first "page break" in the HTML.
842 * **Fixed** `2812 <https://github.com/pymupdf/PyMuPDF/issues/2812>`_: find_tables on landscape page generates reversed text
843 * **Fixed** `2829 <https://github.com/pymupdf/PyMuPDF/issues/2829>`_: [cannot create /Annot for kind] is still printed despite #2345 is closed.
844 * **Fixed** `2841 <https://github.com/pymupdf/PyMuPDF/issues/2841>`_: Unexpected KeyError when using scrub with fitz_new
845
846 * Use MuPDF-1.23.7.
847
848 * Other:
849
850 * Rebased implementation:
851
852 * Added flake8 code checking to test suite, and made various fixes.
853 * Disable diagnostics during Document constructor to match classic implementation.
854
855 * Additional fix to `2553 <https://github.com/pymupdf/PyMuPDF/issues/2553>`_: Invalid characters in versions >= 1.22
856 * Fixed `MuPDF Bug 707324 <https://bugs.ghostscript.com/show_bug.cgi?id=707324>`_: Story: HTML table row background color repeated incorrectly
857 * Added `scripts/test.py`, for simple build+test of PyMuPDF git checkout.
858 * Added `fitz.pymupdf_version_tuple`, e.g. `(1, 23, 6)`.
859 * Restored mistakenly-reverted fix for `2345 <https://github.com/pymupdf/PyMuPDF/issues/2345>`_: Turn off print statements in utils.py
860 * Include any trailing `... repeated <N> times...` text in warnings returned by `mupdf_warnings()` (rebased only).
861
862
863
864 **Changes in version 1.23.6 (2023-11-06)**
865
866 * Bug fixes:
867
868 * **Fixed** `2553 <https://github.com/pymupdf/PyMuPDF/issues/2553>`_: Invalid characters in versions >= 1.22
869 * **Fixed** `2608 <https://github.com/pymupdf/PyMuPDF/issues/2608>`_: Incorrect utf32 text extraction (high & low surrogates are split)
870 * **Fixed** `2710 <https://github.com/pymupdf/PyMuPDF/issues/2710>`_: page.rect and text location wrong / differing from older version
871 * **Fixed** `2774 <https://github.com/pymupdf/PyMuPDF/issues/2774>`_: wrong encoding for "\?" character when sort=True
872 * **Fixed** `2775 <https://github.com/pymupdf/PyMuPDF/issues/2775>`_: fitz_new does not work with python3.10 or earlier
873 * **Fixed** `2777 <https://github.com/pymupdf/PyMuPDF/issues/2777>`_: With fitz_new, wrong type for Page.mediabox
874
875 * Other:
876
877 * Use MuPDF-1.23.5.
878 * Added Document.resolve_names() (rebased implementation only).
879
880
881 **Changes in version 1.23.5 (2023-10-11)**
882
883 * Bug fixes:
884
885 * **Fixed** `2341 <https://github.com/pymupdf/PyMuPDF/issues/2341>`_: Handling negative values in the zoom section for LINK_GOTO in linkDest
886 * **Fixed** `2522 <https://github.com/pymupdf/PyMuPDF/issues/2522>`_: Typo in set_layer() - NameError: name 'f' is not defined
887 * **Fixed** `2548 <https://github.com/pymupdf/PyMuPDF/issues/2548>`_: Fitz freezes on some PDFs when calling the fitz.Page.get_text_blocks method.
888 * **Fixed** `2596 <https://github.com/pymupdf/PyMuPDF/issues/2596>`_: save(garbage=3) breaks get_pixmap() with side effect
889 * **Fixed** `2635 <https://github.com/pymupdf/PyMuPDF/issues/2635>`_: "clean=True" makes objects invisible in the pdf
890 * **Fixed** `2637 <https://github.com/pymupdf/PyMuPDF/issues/2637>`_: Page.insert_textbox incorrectly handles the last word if it starts a new line
891 * **Fixed** `2699 <https://github.com/pymupdf/PyMuPDF/issues/2699>`_: extract paragraph with below table
892 * **Fixed** `2703 <https://github.com/pymupdf/PyMuPDF/issues/2703>`_: Wrong fontsize calculation in corner cases ("page.get_texttrace()")
893 * **Fixed** `2710 <https://github.com/pymupdf/PyMuPDF/issues/2710>`_: page.rect and text location wrong / differing from older version
894 * **Fixed** `2723 <https://github.com/pymupdf/PyMuPDF/issues/2723>`_: When will a Python 3.12 wheel be available?
895 * **Fixed** `2730 <https://github.com/pymupdf/PyMuPDF/issues/2730>`_: persistent get_text() formatting
896
897 * Other:
898
899 * Use MuPDF-1.23.4.
900 * Fix optimisation flags with system installs.
901 * Fixed the problem that the clip parameter does not take effect during table recognition
902 * Support Pillow mode "RGBa"
903 * Support extra word delimiters
904 * Support checking valid PDF name objects
905
906
907 **Changes in version 1.23.4 (2023-09-26)**
908
909 * Improved build instructions.
910 * Fixed Tesseract in rebased implementation.
911 * Improvements to build/install with system MuPDF.
912 * Fixed Pyodide builds.
913 * Fixed rebased bug in _insert_image().
914
915 * Bug fixes:
916
917 * **Fixed** `2556 <https://github.com/pymupdf/PyMuPDF/issues/2556>`_: Segmentation fault at caling get_cdrawings(extended=True)
918 * **Fixed** `2637 <https://github.com/pymupdf/PyMuPDF/issues/2637>`_: Page.insert_textbox incorrectly handles the last word if it starts a new line
919 * **Fixed** `2683 <https://github.com/pymupdf/PyMuPDF/issues/2683>`_: Windows sdist build failure - non-quoting of path and using UNIX which command
920 * **Fixed** `2691 <https://github.com/pymupdf/PyMuPDF/issues/2691>`_: Page.get_textpage_ocr() bug in rebased fitz_new version
921 * **Fixed** `2692 <https://github.com/pymupdf/PyMuPDF/issues/2692>`_: Page.get_pixmap(clip=Rect()) bug in rebased fitz_new version
922
923
924 **Changes in version 1.23.3 (2023-08-31)**
925
926 * Fixed use of Tesseract for OCR.
927
928
929 **Changes in version 1.23.2 (2023-08-28)**
930
931 * **Fixed** `#2613 <https://github.com/pymupdf/PyMuPDF/issues/2613>`_: release 1.23.0 not MacOS-arm64 compatible
932
933
934 **Changes in version 1.23.1 (2023-08-24)**
935
936 * Updated README and package summary description.
937
938 *
939 Fixed a problem on some Linux installations with Python-3.10
940 (and possibly earlier versions) where `import fitz` failed with
941 `ImportError: libcrypt.so.2: cannot open shared object file: No such
942 file or directory`.
943
944 *
945 Fixed `incompatible architecture` error on MacOS arm64.
946
947 *
948 Fixed installation warning from Poetry about missing entry in wheels'
949 RECORD files.
950
951
952 **Changes in version 1.23.0 (2023-08-22)**
953
954 * Add method `find_tables()` to the `Page` object.
955
956 This allows locating tables on any supported document page, and
957 extracting table content by cell.
958
959 * New "rebased" implementation of PyMuPDF.
960
961 The rebased implementation is available as Python module
962 `fitz_new`. It can be used as a drop-in replacement with `import
963 fitz_new as fitz`.
964
965 *
966 Python-independent MuPDF libraries are now in a second wheel called
967 `PyMuPDFb` that will be automatically installed by pip.
968
969 This is to save space on pypi.org - a full release only needs one
970 `PyMuPDFb` wheel for each OS.
971
972 * Bug fixes:
973
974 * **Fixed** `#2542 <https://github.com/pymupdf/PyMuPDF/issues/2542>`_: fitz.utils.scrub AttributeError Annot object has no attribute fileUpd inside
975 * **Fixed** `#2533 <https://github.com/pymupdf/PyMuPDF/issues/2533>`_: get_texttrace returned a incorrect character bbox
976 * **Fixed** `#2537 <https://github.com/pymupdf/PyMuPDF/issues/2537>`_: Validation when setting a grouped RadioButton throws a RuntimeError: path to 'V' has indirects
977
978 * Other changes:
979
980 * Dropped support for Python-3.7.
981
982 * Fix for wrong page / annot `/Contents` cleaning.
983
984 We need to set `pdf_filter_options::no_update` to zero.
985
986 * Added new function get_tessdata().
987
988 * Cope with problem `/Annot` arrays.
989
990 When copying page annotations in method Document.insert_pdf we
991 previously did not check the validity of members of the `/Annots`
992 array. For faulty members (like null or non-dictionary items) this
993 could cause unnecessary exceptions. This fix implements more checks
994 and skips such array items.
995
996 * Additional annotation type checks.
997
998 We did not previously check for annotation type when getting /
999 setting annotation border properties. This is now checked in
1000 accordance with MuPDF.
1001
1002 * Increase fault tolerance.
1003
1004 Avoid exceptions in method `insert_pdf()` when source pages contains
1005 invalid items in the `/Annots` array.
1006
1007 * Return empty border dict for applicable annots.
1008
1009 We previously were returning a non-empty border dictionary even for
1010 non-applicable annotation types. We now return the empty dictionary
1011 `{}` in these cases. This requires some corresponding changes in the
1012 annotation `.update()` method, namely for dashes and border width.
1013
1014 * Restrict `set_rect` to applicable annot types.
1015
1016 We were insufficiently excluding non-applicable annotation types
1017 from `set_rect()` method. We now let MuPDF catch unsupported
1018 annotations and return `False` in these cases.
1019
1020 * Wrong fontsize computation in `page.get_texttrace()`.
1021
1022 When computing the font size we were using the final text
1023 transformation matrix, where we should have taken `span->trm`
1024 instead. This is corrected here.
1025
1026 * Updates to cope with changes to latest MuPDF.
1027
1028 `pdf_lookup_anchor()` has been removed.
1029
1030 * Update fill_textbox to better respect rect.width
1031
1032 The function norm_words in fill_textbox had a bug in its last
1033 loop, appending n+1 characters when actually measuring width of n
1034 characters. It led to a bug in fill_texbox when you tried to write
1035 a single word mostly composed of "wide" letters (M,m, W, w...),
1036 causing the written text to exceed the given rect.
1037
1038 The fix was just to replace n+1 by n.
1039
1040 * Add `script_focus` and `script_blur` options to widget.
1041
1042
1043
1044 **Changes in version 1.22.5 (2023-06-21)**
1045
1046 * This release uses ``MuPDF-1.22.2``.
1047
1048 * Bug fixes:
1049
1050 * **Fixed** `#2365 <https://github.com/pymupdf/PyMuPDF/issues/2365>`_: Incorrect dictionary values for type "fs" drawings.
1051 * **Fixed** `#2391 <https://github.com/pymupdf/PyMuPDF/issues/2391>`_: Check box automatically uncheck when we update same checkbox more than 1 times.
1052 * **Fixed** `#2400 <https://github.com/pymupdf/PyMuPDF/issues/2400>`_: Gaps within text of same line not filled with spaces.
1053 * **Fixed** `#2404 <https://github.com/pymupdf/PyMuPDF/issues/2404>`_: Blacklining an image in PDF won't remove underlying content in version 1.22.X.
1054 * **Fixed** `#2430 <https://github.com/pymupdf/PyMuPDF/issues/2430>`_: Incorrectly reducing ref count of Py_None.
1055 * **Fixed** `#2450 <https://github.com/pymupdf/PyMuPDF/issues/2450>`_: Empty fill color and fill opacity for paths with fill and stroke operations with 1.22.*
1056 * **Fixed** `#2462 <https://github.com/pymupdf/PyMuPDF/issues/2462>`_: Error at "get_drawing(extended=True )"
1057 * **Fixed** `#2468 <https://github.com/pymupdf/PyMuPDF/issues/2468>`_: Decode error when trying to get drawings
1058 * **Fixed** `#2710 <https://github.com/pymupdf/PyMuPDF/issues/2710>`_: page.rect and text location wrong / differing from older version
1059 * **Fixed** `#2723 <https://github.com/pymupdf/PyMuPDF/issues/2723>`_: When will a Python 3.12 wheel be available?
1060
1061 * New features:
1062
1063 * **Changed** Annotations now support "cloudy" borders.
1064 The :attr:`Annot.border` property has the new item `clouds`,
1065 and method :meth:`Annot.set_border` supports the corresponding `clouds` argument.
1066
1067 * **Changed** Radio button widgets in the same RB group
1068 are now consistently updated **if the group is defined in the standard way**.
1069
1070 * **Added** Support for the `/Locked` key in PDF Optional Content.
1071 This array inside the catalog entry `/OCProperties` can now be extracted and set.
1072
1073 * **Added** Support for new parameter `tessdata` in OCR functions.
1074 New function :meth:`get_tessdata` locates the language support folder if Tesseract is installed.
1075
1076
1077
1078 **Changes in version 1.22.3 (2023-05-10)**
1079
1080 * This release uses ``MuPDF-1.22.0``.
1081
1082 * Bug fixes:
1083
1084 * **Fixed** `#2333 <https://github.com/pymupdf/PyMuPDF/issues/2333>`_: Unable to set any of button radio group in form
1085
1086
1087 **Changes in version 1.22.2 (2023-04-26)**
1088
1089 * This release uses ``MuPDF-1.22.0``.
1090
1091 * Bug fixes:
1092
1093 * **Fixed** `#2369 <https://github.com/pymupdf/PyMuPDF/issues/2369>`_: Image extraction bugs with newer versions
1094
1095
1096 **Changes in version 1.22.1 (2023-04-18)**
1097
1098 * This release uses ``MuPDF-1.22.0``.
1099
1100 * Bug fixes:
1101
1102 * **Fixed** `#2345 <https://github.com/pymupdf/PyMuPDF/issues/2345>`_: Turn off print statements in utils.py
1103 * **Fixed** `#2348 <https://github.com/pymupdf/PyMuPDF/issues/2348>`_: extract_image returns an extension "flate" instead of "png"
1104 * **Fixed** `#2350 <https://github.com/pymupdf/PyMuPDF/issues/2350>`_: Can not make widget (checkbox) to read-only by adding flags PDF_FIELD_IS_READ_ONLY
1105 * **Fixed** `#2355 <https://github.com/pymupdf/PyMuPDF/issues/2355>`_: 1.22.0 error when using get_toc (AttributeError: 'SwigPyObject' object has no attribute)
1106
1107
1108 **Changes in version 1.22.0 (2023-04-14)**
1109
1110 * This release uses ``MuPDF-1.22.0``.
1111
1112 * Behavioural changes:
1113
1114 * Text extraction now includes glyphs that overlap with clip rect; previously
1115 they were included only if they were entirely contained within the clip
1116 rect.
1117
1118 * Bug fixes:
1119
1120 * **Fixed** `#1763 <https://github.com/pymupdf/PyMuPDF/issues/1763>`_: Interactive(smartform) form PDF calculation not working in pymupdf
1121 * **Fixed** `#1995 <https://github.com/pymupdf/PyMuPDF/issues/1995>`_: RuntimeError: image is too high for a long paged pdf file when trying
1122 * **Fixed** `#2093 <https://github.com/pymupdf/PyMuPDF/issues/2093>`_: Image in pdf changes color after applying redactions
1123 * **Fixed** `#2108 <https://github.com/pymupdf/PyMuPDF/issues/2108>`_: Redaction removing more text than expected
1124 * **Fixed** `#2141 <https://github.com/pymupdf/PyMuPDF/issues/2141>`_: Failed to read JPX header when trying to get blocks
1125 * **Fixed** `#2144 <https://github.com/pymupdf/PyMuPDF/issues/2144>`_: Replace image throws an error
1126 * **Fixed** `#2146 <https://github.com/pymupdf/PyMuPDF/issues/2146>`_: Wrong Handling of Reference Count of "None" Object
1127 * **Fixed** `#2161 <https://github.com/pymupdf/PyMuPDF/issues/2161>`_: Support adding images as pages directly
1128 * **Fixed** `#2168 <https://github.com/pymupdf/PyMuPDF/issues/2168>`_: ``page.add_highlight_annot(start=pointa, stop=pointb)`` not working
1129 * **Fixed** `#2173 <https://github.com/pymupdf/PyMuPDF/issues/2173>`_: Double free of ``Colorspace`` used in ``Pixmap``
1130 * **Fixed** `#2179 <https://github.com/pymupdf/PyMuPDF/issues/2179>`_: Incorrect documentation for ``pixmap.tint_with()``
1131 * **Fixed** `#2208 <https://github.com/pymupdf/PyMuPDF/issues/2208>`_: Pushbutton widget appears as check box
1132 * **Fixed** `#2210 <https://github.com/pymupdf/PyMuPDF/issues/2210>`_: ``apply_redactions()`` move pdf text to right after redaction
1133 * **Fixed** `#2220 <https://github.com/pymupdf/PyMuPDF/issues/2220>`_: ``Page.delete_image()`` | object has no attribute ``is_image``
1134 * **Fixed** `#2228 <https://github.com/pymupdf/PyMuPDF/issues/2228>`_: open some pdf cost too much time
1135 * **Fixed** `#2238 <https://github.com/pymupdf/PyMuPDF/issues/2238>`_: Bug - can not extract data from file in the newest version 1.21.1
1136 * **Fixed** `#2242 <https://github.com/pymupdf/PyMuPDF/issues/2242>`_: Python quits silently in ``Story.element_positions()`` if callback function prototype is wrong
1137 * **Fixed** `#2246 <https://github.com/pymupdf/PyMuPDF/issues/2246>`_: TextWriter write text in a wrong position
1138 * **Fixed** `#2248 <https://github.com/pymupdf/PyMuPDF/issues/2248>`_: After redacting the content, the position of the remaining text changes
1139 * **Fixed** `#2250 <https://github.com/pymupdf/PyMuPDF/issues/2250>`_: docs: unclear or broken link in page.rst
1140 * **Fixed** `#2251 <https://github.com/pymupdf/PyMuPDF/issues/2251>`_: mupdf_display_errors does not apply to Pixmap when loading broken image
1141 * **Fixed** `#2270 <https://github.com/pymupdf/PyMuPDF/issues/2270>`_: ``Annot.get_text("words")`` - doesn't return the first line of words
1142 * **Fixed** `#2275 <https://github.com/pymupdf/PyMuPDF/issues/2275>`_: insert_image: document that rotations are counterclockwise
1143 * **Fixed** `#2278 <https://github.com/pymupdf/PyMuPDF/issues/2278>`_: Can not make widget (checkbox) to read-only by adding flags PDF_FIELD_IS_READ_ONLY
1144 * **Fixed** `#2290 <https://github.com/pymupdf/PyMuPDF/issues/2290>`_: Different image format/data from Page.get_text("dict") and Fitz.get_page_images()
1145 * **Fixed** `#2293 <https://github.com/pymupdf/PyMuPDF/issues/2293>`_: 68 failed tests when installing from sdist on my box
1146 * **Fixed** `#2300 <https://github.com/pymupdf/PyMuPDF/issues/2300>`_: Too much recursion in tree (parents), makes program terminate
1147 * **Fixed** `#2322 <https://github.com/pymupdf/PyMuPDF/issues/2322>`_: add_highlight_annot using clip generates "A Number is Out of Range" error in PDF
1148
1149 * Other:
1150
1151 * Add key "/AS (Yes)" to the underlying annot object of a selected button form field.
1152
1153 * Remove unused ``Document`` methods ``has_xref_streams()`` and
1154 ``has_old_style_xrefs()`` as MuPDF equivalents have been removed.
1155
1156 * Add new ``Document`` methods and properties for getting/setting
1157 ``/PageMode``, ``/PageLayout`` and ``/MarkInfo``.
1158
1159 * New ``Document`` property ``version_count``, which contains the number of
1160 incremental saves plus one.
1161
1162 * New ``Document`` property ``is_fast_webaccess`` which tells whether the
1163 document is linearized.
1164
1165 * ``DocumentWriter`` is now a context manager.
1166
1167 * Add support for ``Pixmap`` JPEG output.
1168
1169 * Add support for drawing rectangles with rounded corners.
1170
1171 * ``get_drawings()``: added optional ``extended`` arg.
1172
1173 * Fixed issue where trace devices' state was not being initialised
1174 correctly; data returned from things like ``fitz.Page.get_texttrace()``
1175 might be slightly altered, e.g. ``linewidth`` values.
1176
1177 * Output warning to ``stderr`` if it looks like we are being used with
1178 current directory containing an invalid ``fitz/`` directory, because
1179 this can break import of ``fitz`` module. For example this happens
1180 if one attempts to use ``fitz`` when current directory is a PyMuPDF
1181 checkout.
1182
1183 * Documentation:
1184
1185 * General rework:
1186
1187 * Introduces a new home page and new table of contents.
1188 * Structural update to include new About section.
1189 * Comparison & performance graphing.
1190 * Includes performance methodology in appendix.
1191 * Updates conf.py to understand single back-ticks as code.
1192 * Converts double back-ticks to single back-ticks.
1193 * Removes redundant files.
1194
1195 * Improve ``insert_file()`` documentation.
1196
1197 * ``get_bboxlog()``: aded optional ``layers`` to ``get_bboxlog()``.
1198 * ``Page.get_texttrace()``: add new dictionary key ``layer``, name of Optional Content Group.
1199
1200 * Mention use of Python venv in installation documentation.
1201
1202 * Added missing fix for #2057 to release 1.21.1's changelog.
1203
1204 * Fixes many links to the PyMuPDF-Utilities repo scripts.
1205
1206 * Avoid duplication of ``changes.txt`` and ``docs/changes.rst``.
1207
1208 * Build
1209
1210 * Added ``pyproject.toml`` file to improve builds using pip etc.
1211
1212
1213
1214 **Changes in Version 1.21.1 (2022-12-13)**
1215
1216 * This release uses ``MuPDF-1.21.1``.
1217
1218 * Bug fixes:
1219
1220 * **Fixed** `#2110 <https://github.com/pymupdf/PyMuPDF/issues/2110>`_: Fully embedded font is extracted only partially if it occupies more than one object
1221 * **Fixed** `#2094 <https://github.com/pymupdf/PyMuPDF/issues/2094>`_: Rectangle Detection Logic
1222 * **Fixed** `#2088 <https://github.com/pymupdf/PyMuPDF/issues/2088>`_: Destination point not set for named links in toc
1223 * **Fixed** `#2087 <https://github.com/pymupdf/PyMuPDF/issues/2087>`_: Image with Filter "[/FlateDecode/JPXDecode]" not extracted
1224 * **Fixed** `#2086 <https://github.com/pymupdf/PyMuPDF/issues/2086>`_: Document.save() owner_pw & user_pw has buffer overflow bug
1225 * **Fixed** `#2076 <https://github.com/pymupdf/PyMuPDF/issues/2076>`_: Segfault in fitz.py
1226 * **Fixed** `#2057 <https://github.com/pymupdf/PyMuPDF/issues/2057>`_: Document.save garbage parameter not working in PyMuPDF 1.21.0
1227 * **Fixed** `#2051 <https://github.com/pymupdf/PyMuPDF/issues/2051>`_: Missing DPI Parameter
1228 * **Fixed** `#2048 <https://github.com/pymupdf/PyMuPDF/issues/2048>`_: Invalid size of TextPage and bbox with newest version 1.21.0
1229 * **Fixed** `#2045 <https://github.com/pymupdf/PyMuPDF/issues/2045>`_: SystemError: <built-in function Page_get_texttrace> returned a result with an error set
1230 * **Fixed** `#2039 <https://github.com/pymupdf/PyMuPDF/issues/2039>`_: 1.21.0 fails to build against system libmupdf
1231 * **Fixed** `#2036 <https://github.com/pymupdf/PyMuPDF/issues/2036>`_: Archive::Archive defined twice
1232
1233 * Other
1234
1235 * Swallow "&zoom=nan" in link uri strings.
1236 * Add new Page utility methods ``Page.replace_image()`` and ``Page.delete_image()``.
1237
1238 * Documentation:
1239
1240 * `#2040 <https://github.com/pymupdf/PyMuPDF/issues/2040>`_: Added note about test failure with non-default build of MuPDF, to ``tests/README.md``.
1241 * `#2037 <https://github.com/pymupdf/PyMuPDF/issues/2037>`_: In ``docs/installation.rst``, mention incompatibility with chocolatey.org on Windows.
1242 * `#2061 <https://github.com/pymupdf/PyMuPDF/issues/2061>`_: Fixed description of ``Annot.file_info``.
1243 * `#2065 <https://github.com/pymupdf/PyMuPDF/issues/2065>`_: Show how to insert internal PDF link.
1244 * Improved description of building from source without an sdist.
1245 * Added information about running tests.
1246 * `#2084 <https://github.com/pymupdf/PyMuPDF/issues/2084>`_: Fixed broken link to PyMuPDF-Utilities.
1247
1248
1249 **Changes in Version 1.21.0 (2022-11-8)**
1250
1251 * This release uses ``MuPDF-1.21.0``.
1252
1253 * New feature: Stories.
1254
1255 * Added wheels for Python-3.11.
1256
1257 * Bug fixes:
1258
1259 * **Fixed** `#1701 <https://github.com/pymupdf/PyMuPDF/issues/1701>`_: Broken custom image insertion.
1260 * **Fixed** `#1854 <https://github.com/pymupdf/PyMuPDF/issues/1854>`_: `Document.delete_pages()` declines keyword arguments.
1261 * **Fixed** `#1868 <https://github.com/pymupdf/PyMuPDF/issues/1868>`_: Access Violation Error at `page.apply_redactions()`.
1262 * **Fixed** `#1909 <https://github.com/pymupdf/PyMuPDF/issues/1909>`_: Adding text with `fontname="Helvetica"` can silently fail.
1263 * **Fixed** `#1913 <https://github.com/pymupdf/PyMuPDF/issues/1913>`_: `draw_rect()`: does not respect width if color is not specified.
1264 * **Fixed** `#1917 <https://github.com/pymupdf/PyMuPDF/issues/1917>`_: `subset_fonts()`: make it possible to silence the stdout.
1265 * **Fixed** `#1936 <https://github.com/pymupdf/PyMuPDF/issues/1936>`_: Rectangle detection can be incorrect producing wrong output.
1266 * **Fixed** `#1945 <https://github.com/pymupdf/PyMuPDF/issues/1945>`_: Segmentation fault when saving with `clean=True`.
1267 * **Fixed** `#1965 <https://github.com/pymupdf/PyMuPDF/issues/1965>`_: `pdfocr_save()` Hard Crash.
1268 * **Fixed** `#1971 <https://github.com/pymupdf/PyMuPDF/issues/1971>`_: Segmentation fault when using `get_drawings()`.
1269 * **Fixed** `#1946 <https://github.com/pymupdf/PyMuPDF/issues/1946>`_: `block_no` and `block_type` switched in `get_text()` docs.
1270 * **Fixed** `#2013 <https://github.com/pymupdf/PyMuPDF/issues/2013>`_: AttributeError: 'Widget' object has no attribute '_annot' in delete widget.
1271
1272 * Misc changes to core code:
1273
1274 * Fixed various compiler warnings and a sequence-point bug.
1275 * Added support for Memento builds.
1276 * Fixed leaks detected by Memento in test suite.
1277 * Fixed handling of exceptions in set_name() and set_rect().
1278 * Allow build with latest MuPDF, for regular testing of PyMuPDF master.
1279 * Cope with new MuPDF exceptions when setting rect for some Annot types.
1280 * Reduced cosmetic differences between MuPDF's config.h and PyMuPDF's _config.h.
1281 * Cope with various changes to MuPDF API.
1282
1283 * Other:
1284
1285 * Fixed various broken links and typos in docs.
1286 * Mention install of `swig-python` on MacOS for #875.
1287 * Added (untested) wheels for macos-arm64.
1288
1289
1290
1291
1292 **Changes in Version 1.20.2**
1293
1294 * This release uses ``MuPDF-1.20.3``.
1295
1296 * **Fixed** `#1787 <https://github.com/pymupdf/PyMuPDF/issues/1787>`_.
1297 Fix linking issues on Unix systems.
1298
1299 * **Fixed** `#1824 <https://github.com/pymupdf/PyMuPDF/issues/1824>`_.
1300 SegFault when applying redactions overlapping a transparent image. (Fixed
1301 in ``MuPDF-1.20.3``.)
1302
1303 * Improvements to documentation:
1304
1305 * Improved information about building from source in ``docs/installation.rst``.
1306 * Clarified memory allocation setting ``JM_MEMORY` in ``docs/tools.rst``.
1307 * Fixed link to PDF Reference manual in ``docs/app3.rst``.
1308 * Fixed building of html documentation on OpenBSD.
1309 * Moved old ``docs/faq.rst`` into separate ``docs/recipes-*`` files.
1310
1311 * Removed some unused files and directories:
1312
1313 * ``installation/``
1314 * ``docs/wheelnames.txt``
1315
1316
1317 **Changes in Version 1.20.1**
1318
1319 * **Fixed** `#1724 <https://github.com/pymupdf/PyMuPDF/issues/1724>`_.
1320 Fix for building on FreeBSD.
1321
1322 * **Fixed** `#1771 <https://github.com/pymupdf/PyMuPDF/issues/1771>`_.
1323 `linkDest()` had a broken call to `re.match()`, introduced in 1.20.0.
1324
1325 * **Fixed** `#1751 <https://github.com/pymupdf/PyMuPDF/issues/1751>`_.
1326 `get_drawings()` and `get_cdrawings()` previously always returned with `closePath=False`.
1327
1328 * **Fixed** `#1645 <https://github.com/pymupdf/PyMuPDF/issues/1645>`_.
1329 Default FreeText annotation text color is now black.
1330
1331 * Improvements to sphinx-generated documentation:
1332
1333 * Use readthedocs theme with enhancements.
1334 * Renamed the `.txt` files to have `.rst` suffixes.
1335
1336 ------
1337
1338 **Changes in Version 1.20.0**
1339
1340 This release uses ``MuPDF-1.20.0``, released 2022-06-15.
1341
1342 * Cope with new MuPDF link uri format, changed from ``#<int>,<int>,<int>`` to ``#page=<int>&zoom=<float>,<float>,<float>``.
1343
1344 * In ``tests/test_insertpdf.py``, use new reference output ``joined-1.20.pdf``. We also check that new output values are approximately the same as the old ones.
1345
1346 * **Fixed** `#1738 <https://github.com/pymupdf/PyMuPDF/issues/1738>`_. Leak of `pdf_graft_map`.
1347 Also fixed a SEGV issue that this seemed to expose, caused by incorrect freeing of underlying fz_document.
1348
1349 * **Fixed** `#1733 <https://github.com/pymupdf/PyMuPDF/issues/1733>`_. Fixed ownership of `Annotation.get_pixmap()`.
1350
1351 Changes to build/release process:
1352
1353 * If pip builds from source because an appropriate wheel is not available, we no longer require MuPDF to be pre-installed. Instead the required MuPDF source is embedded in the sdist and automatically built into PyMuPDF.
1354
1355 * Various changes to ``setup.py`` to download the required MuPDF release as required. See comments at start of setup.py for details.
1356
1357 * Added ``.github/workflows/build_wheels.yml`` to control building of wheels on Github.
1358
1359 ------
1360
1361 **Changes in Version 1.19.6**
1362
1363 * **Fixed** `#1620 <https://github.com/pymupdf/PyMuPDF/issues/1620>`_. The :ref:`TextPage` created by :meth:`Page.get_textpage` will now be freed correctly (removed memory leak).
1364 * **Fixed** `#1601 <https://github.com/pymupdf/PyMuPDF/issues/1601>`_. Document open errors should now be more concise and easier to interpret. In the course of this, two PyMuPDF-specific Python exceptions have been **added:**
1365
1366 - ``EmptyFileError`` -- raised when trying to create a :ref:`Document` (``fitz.open()``) from an empty file or zero-length memory.
1367 - ``FileDataError`` -- raised when MuPDF encounters irrecoverable document structure issues.
1368
1369 * **Added** :meth:`Page.load_widget` given a PDF field's xref.
1370
1371 * **Added** Dictionary :attr:`pdfcolor` which provide the about 500 colors defined as PDF color values with the lower case color name as key.
1372
1373 * **Added** algebra functionality to the :ref:`Quad` class. These objects can now also be added and subtracted among themselves, and be multiplied by numbers and matrices.
1374
1375 * **Added** new constants defining the default text extraction flags for more comfortable handling. Their naming convention is like :data:`TEXTFLAGS_WORDS` for ``page.get_text("words")``. See :ref:`text_extraction_flags`.
1376
1377 * **Changed** :meth:`Page.annots` and :meth:`Page.widgets` to detect and prevent reloading the page (illegally) inside the iterator loops via :meth:`Document.reload_page`. Doing this brings down the interpretor. Documented clean ways to do annotation and widget mass updates within properly designed loops.
1378
1379 * **Changed** several internal utility functions to become standalone ("SWIG inline") as opposed to be part of the :ref:`Tools` class. This, among other things, increases the performance of geometry object creation.
1380
1381 * **Changed** :meth:`Document.update_stream` to always accept stream updates - whether or not the dictionary object behind the xref already is a stream. Thus the former ``new`` parameter is now ignored and will be removed in v1.20.0.
1382
1383
1384 ------
1385
1386 **Changes in Version 1.19.5**
1387
1388 * **Fixed** `#1518 <https://github.com/pymupdf/PyMuPDF/issues/1518>`_. A limited "fix": in some cases, rectangles and quadrupels were not correctly encoded to support re-drawing by :ref:`Shape`.
1389
1390 * **Fixed** `#1521 <https://github.com/pymupdf/PyMuPDF/issues/1521>`_. This had the same ultimate reason behind issue #1510.
1391
1392 * **Fixed** `#1513 <https://github.com/pymupdf/PyMuPDF/issues/1513>`_. Some Optional Content functions did not support non-ASCII characters.
1393
1394 * **Fixed** `#1510 <https://github.com/pymupdf/PyMuPDF/issues/1510>`_. Support more soft-mask image subtypes.
1395
1396 * **Fixed** `#1507 <https://github.com/pymupdf/PyMuPDF/issues/1507>`_. Immunize against items in the outlines chain, that are ``"null"`` objects.
1397
1398 * **Fixed** re-opened `#1417 <https://github.com/pymupdf/PyMuPDF/issues/1417>`_. ("too many open files"). This was due to insufficient calls to MuPDF's ``fz_drop_document()``. This also fixes `#1550 <https://github.com/pymupdf/PyMuPDF/issues/1550>`_.
1399
1400 * **Fixed** several undocumented issues in relation to incorrectly setting the text span origin :data:`point_like`.
1401
1402 * **Fixed** undocumented error computing the character bbox in method :meth:`Page.get_texttrace` when text is **flipped** (as opposed to just rotated).
1403
1404 * **Added** items to the dictionary returned by :meth:`image_properties`: ``orientation`` and ``transform`` report the natural image orientation (EXIF data).
1405
1406 * **Added** method :meth:`Document.xref_copy`. It will make a given target PDF object an exact copy of a source object.
1407
1408
1409 ------
1410
1411 **Changes in Version 1.19.4**
1412
1413
1414 * **Fixed** `#1505 <https://github.com/pymupdf/PyMuPDF/issues/1505>`_. Immunize against circular outline items.
1415
1416 * **Fixed** `#1484 <https://github.com/pymupdf/PyMuPDF/issues/1484>`_. Correct CropBox coordinates are now returned in all situations.
1417
1418 * **Fixed** `#1479 <https://github.com/pymupdf/PyMuPDF/issues/1479>`_.
1419
1420 * **Fixed** `#1474 <https://github.com/pymupdf/PyMuPDF/issues/1474>`_. TextPage objects are now properly deleted again.
1421
1422 * **Added** :ref:`Page` methods and attributes for PDF ``/ArtBox``, ``/BleedBox``, ``/TrimBox``.
1423
1424 * **Added** global attribute :attr:`TESSDATA_PREFIX` for easy checking of OCR support.
1425
1426 * **Changed** :meth:`Document.xref_set_key` such that dictionary keys will physically be removed if set to value ``"null"``.
1427
1428 * **Changed** :meth:`Document.extract_font` to optionally return a dictionary (instead of a tuple).
1429
1430 ------
1431
1432 **Changes in Version 1.19.3**
1433
1434 This patch version implements minor improvements for :ref:`Pixmap` and also some important fixes.
1435
1436 * **Fixed** `#1351 <https://github.com/pymupdf/PyMuPDF/discussions/1351>`_. Reverted code that introduced the memory growth in v1.18.15.
1437
1438 * **Fixed** `#1417 <https://github.com/pymupdf/PyMuPDF/discussions/1417>`_. Developped circumvention for growth of open file handles using :meth:`Document.insert_pdf`.
1439
1440 * **Fixed** `#1418 <https://github.com/pymupdf/PyMuPDF/discussions/1418>`_. Developped circumvention for memory growth using :meth:`Document.insert_pdf`.
1441
1442 * **Fixed** `#1430 <https://github.com/pymupdf/PyMuPDF/discussions/1430>`_. Developped circumvention for mass pixmap generations of document pages.
1443
1444 * **Fixed** `#1433 <https://github.com/pymupdf/PyMuPDF/discussions/1433>`_. Solves a bbox error for some Type 3 font in PyMuPDF text processing.
1445
1446 * **Added** :meth:`Pixmap.color_topusage` to determine the share of the most frequently used color. Solves `#1397 <https://github.com/pymupdf/PyMuPDF/discussions/1397>`_.
1447
1448 * **Added** :meth:`Pixmap.warp` which makes a new pixmap from a given arbitrary convex quad inside the pixmap.
1449
1450 * **Added** :attr:`Annot.irt_xref` and :meth:`Annot.set_irt_xref` to inquire or set the `/IRT` ("In Responde To") property of an annotation. Implements `#1450 <https://github.com/pymupdf/PyMuPDF/discussions/1450>`_.
1451
1452 * **Added** :meth:`Rect.torect` and :meth:`IRect.torect` which compute a matrix that transforms to a given other rectangle.
1453
1454 * **Changed** :meth:`Pixmap.color_count` to also return the count of each color.
1455 * **Changed** :meth:`Page.get_texttrace` to also return correct span and character bboxes if ``span["dir"] != (1, 0)``.
1456
1457 ------
1458
1459 **Changes in Version 1.19.2**
1460
1461 This patch version implements minor improvements for :meth:`Page.get_drawings` and also some important fixes.
1462
1463 * **Fixed** `#1388 <https://github.com/pymupdf/PyMuPDF/discussions/1388>`_. Fixed intermittent memory corruption when insert or updating annotations.
1464
1465 * **Fixed** `#1375 <https://github.com/pymupdf/PyMuPDF/discussions/1375>`_. Inconsistencies between line numbers as returned by the "words" and the "dict" options of :meth:`Page.get_text` have been corrected.
1466
1467 * **Fixed** `#1364 <https://github.com/pymupdf/PyMuPDF/issues/1342>`_. The check for being a ``"rawdict"`` span in :meth:`recover_span_quad` now works correctly.
1468
1469 * **Fixed** `#1342 <https://github.com/pymupdf/PyMuPDF/issues/1364>`_. Corrected the check for rectangle infiniteness in :meth:`Page.show_pdf_page`.
1470
1471 * **Changed** :meth:`Page.get_drawings`, :meth:`Page.get_cdrawings` to return an indicator on the area orientation covered by a rectangle. This implements `#1355 <https://github.com/pymupdf/PyMuPDF/issues/1355>`_. Also, the recognition rate for rectangles and quads has been significantly improved.
1472
1473 * **Changed** all text search and extraction methods to set the new ``flags`` option ``TEXT_MEDIABOX_CLIP`` to ON by default. That bit causes the automatic suppression of all characters that are completely outside a page's mediabox (in as far as that notion is supported for a document type). This eliminates the need for using ``clip=page.rect`` or similar for omitting text outside the visible area.
1474
1475 * **Added** parameter ``"dpi"`` to :meth:`Page.get_pixmap` and :meth:`Annot.get_pixmap`. When given, parameter ``"matrix"`` is ignored, and a :ref:`Pixmap` with the desired dots per inch is created.
1476
1477 * **Added** attributes :attr:`Pixmap.is_monochrome` and :attr:`Pixmap.is_unicolor` allowing fast checks of pixmap properties. Addresses `#1397 <https://github.com/pymupdf/PyMuPDF/discussions/1397>`_.
1478
1479 * **Added** method :meth:`Pixmap.color_count` to determine the unique colors in the pixmap.
1480
1481 * **Added** boolean parameter ``"compress"`` to PDF document method :meth:`Document.update_stream`. Addresses / enables solution for `#1408 <https://github.com/pymupdf/PyMuPDF/discussions/1408>`_.
1482
1483 ------
1484
1485 **Changes in Version 1.19.1**
1486
1487 This is the first patch version to support MuPDF v1.19.0. Apart from one bug fix, it includes important improvements for OCR support and the option to **sort extracted text** to the standard reading order "from top-left to bottom-right".
1488
1489 * **Fixed** `#1328 <https://github.com/pymupdf/PyMuPDF/issues/1328>`_. "words" text extraction again returns correct ``(x0, y0)`` coordinates.
1490
1491 * **Changed** :meth:`Page.get_textpage_ocr`: it now supports parameter ``dpi`` to control OCR quality. It is also possible to choose whether the **full page** should be OCRed or **only the images displayed** by the page.
1492
1493 * **Changed** :meth:`Page.get_drawings` and :meth:`Page.get_cdrawings` to automatically convert colors to RGB color tuples. Implements `#1332 <https://github.com/pymupdf/PyMuPDF/discussions/1332>`_. Similar change was applied to :meth:`Page.get_texttrace`.
1494
1495 * **Changed** :meth:`Page.get_text` to support a parameter ``sort``. If set to ``True`` the output is conveniently sorted.
1496
1497
1498 ------
1499
1500 **Changes in Version 1.19.0**
1501
1502 This is the first version supporting MuPDF 1.19.*, published 2021-10-05. It introduces many new features compared to the previous version 1.18.*.
1503
1504 PyMuPDF has now picked up integrated Tesseract OCR support, which was already present in MuPDF v1.18.0.
1505
1506 * Supported images can be OCRed via their :ref:`Pixmap` which results in a 1-page PDF with a text layer.
1507 * All supported document pages (i.e. not only PDFs), can be OCRed using specialized text extraction methods. The result is a mixture of standard and OCR text (depending on which part of the page was deemed to require OCRing) that can be searched and extracted without restrictions.
1508 * All this requires an independent installation of Tesseract. MuPDF actually (only) needs the location of Tesseract's ``"tessdata"`` folder, where its language support data are stored. This location must be available as environment variable ``TESSDATA_PREFIX``.
1509
1510 A new MuPDF feature is **journalling PDF updates**, which is also supported by this PyMuPDF version. Changes may be logged, rolled back or replayed, allowing to implement a whole new level of control over PDF document integrity -- similar to functions present in modern database systems.
1511
1512 A third feature (unrelated to the new MuPDF version) includes the ability to detect when page **objects cover or hide each other**. It is now e.g. possible to see that text is covered by a drawing or an image.
1513
1514 * **Changed** terminology and meaning of important geometry concepts: Rectangles are now characterized as *finite*, *valid* or *empty*, while the definitions of these terms have also changed. Rectangles specifically are now thought of being "open": not all corners and sides are considered part of the retangle. Please do read the :ref:`Rect` section for details.
1515
1516 * **Added** new parameter `"no_new_id"` to :meth:`Document.save` / :meth:`Document.tobytes` methods. Use it to suppress updating the second item of the document ``/ID`` which in PDF indicates that the original file has been updated. If the PDF has no ``/ID`` at all yet, then no new one will be created either.
1517
1518 * **Added** a **journalling facility** for PDF updates. This allows logging changes, undoing or redoing them, or saving the journal for later use. Refer to :meth:`Document.journal_enable` and friends.
1519
1520 * **Added** new :ref:`Pixmap` methods :meth:`Pixmap.pdfocr_save` and :meth:`Pixmap.pdfocr_tobytes`, which generate a 1-page PDF containing the pixmap as PNG image with OCR text layer.
1521
1522 * **Added** :meth:`Page.get_textpage_ocr` which executes optical character recognition for the page, then extracts the results and stores them together with "normal" page content in a :ref:`TextPage`. Use or reuse this object in subsequent text extractions and text searches to avoid multiple efforts. The existing text search and text extraction methods have been extended to support a separately created textpage -- see next item.
1523
1524 * **Added** a new parameter ``textpage`` to text extraction and text search methods. This allows reuse of a previously created :ref:`TextPage` and thus achieves significant runtime benefits -- which is especially important for the new OCR features. But "normal" text extractions can definitely also benefit.
1525
1526 * **Added** :meth:`Page.get_texttrace`, a technical method delivering low-level text character properties. It was present before as a private method, but the author felt it now is mature enough to be officially available. It specifically includes a "sequence number" which indicates the page appearance build operation that painted the text.
1527
1528 * **Added** :meth:`Page.get_bboxlog` which delivers the list of rectangles of page objects like text, images or drawings. Its significance lies in its sequence: rectangles intersecting areas with a lower index are covering or hiding them.
1529
1530 * **Changed** methods :meth:`Page.get_drawings` and :meth:`Page.get_cdrawings` to include a "sequence number" indicating the page appearance build operation that created the drawing.
1531
1532 * **Fixed** `#1311 <https://github.com/pymupdf/PyMuPDF/issues/1311>`_. Field values in comboboxes should now be handled correctly.
1533 * **Fixed** `#1290 <https://github.com/pymupdf/PyMuPDF/issues/1290>`_. Error was caused by incorrect rectangle emptiness check, which is fixed due to new geometry logic of this version.
1534 * **Fixed** `#1286 <https://github.com/pymupdf/PyMuPDF/issues/1286>`_. Text alignment for redact annotations is working again.
1535 * **Fixed** `#1287 <https://github.com/pymupdf/PyMuPDF/issues/1287>`_. Infinite loop issue for non-Windows systems when applying some redactions has been resolved.
1536 * **Fixed** `#1284 <https://github.com/pymupdf/PyMuPDF/issues/1284>`_. Text layout destruction after applying redactions in some cases has been resolved.
1537
1538 ------
1539
1540 **Changes in Version 1.18.18 / 1.18.19**
1541
1542 * **Fixed** issue `#1266 <https://github.com/pymupdf/PyMuPDF/issues/1266>`_. Failure to set :attr:`Pixmap.samples` in important cases, was hotfixed in a new version 1.18.19.
1543
1544 * **Fixed** issue `#1257 <https://github.com/pymupdf/PyMuPDF/issues/1257>`_. Removing the read-only flag from PDF fields is now possible.
1545
1546 * **Fixed** issue `#1252 <https://github.com/pymupdf/PyMuPDF/issues/1252>`_. Now correctly specifying the ``zoom`` value for PDF link annotations.
1547
1548 * **Fixed** issue `#1244 <https://github.com/pymupdf/PyMuPDF/issues/1244>`_. Now correctly computing the transform matrix in :meth:`Page.get_image__bbox`.
1549
1550 * **Fixed** issue `#1241 <https://github.com/pymupdf/PyMuPDF/issues/1241>`_. Prevent returning artifact characters in :meth:`Page.get_textbox`, which happened in certain constellations.
1551
1552 * **Fixed** issue `#1234 <https://github.com/pymupdf/PyMuPDF/issues/1234>`_. Avoid creating infinite rectangles in corner cases -- :meth:`Page.get_drawings`, :meth:`Page.get_cdrawings`.
1553
1554 * **Added** test data and test scripts to the source PyPI source distribution.
1555
1556 ------
1557
1558 **Changes in Version 1.18.17**
1559
1560 Focus of this version are major performance improvements of selected functions.
1561
1562 * **Fixed** issue `#1199 <https://github.com/pymupdf/PyMuPDF/issues/1199>`_. Using a non-existing page number in :meth:`Document.get_page_images` and friends will no longer lead to segfaults.
1563
1564 * **Changed** :meth:`Page.get_drawings` to now differentiate between "stroke", "fill" and combined paths. Paths containing more than one rectangle (i.e. "re" items) are now supported. Extracting "clipped" paths is now available as an option.
1565
1566 * **Added** :meth:`Page.get_cdrawings`, performance-optimized version of :meth:`Page.get_drawings`.
1567
1568 * **Added** :attr:`Pixmap.samples_mv`, *memoryview* of a pixmap's pixel area. Does not copy and thus always accesses the current state of that area.
1569
1570 * **Added** :attr:`Pixmap.samples_ptr`, Python "pointer" to a pixmap's pixel area. Allows much faster creation (factor 800+) of Qt images.
1571
1572
1573
1574 ------
1575
1576 **Changes in Version 1.18.16**
1577
1578 * **Fixed** issue `#1184 <https://github.com/pymupdf/PyMuPDF/issues/1184>`_. Existing PDF widget fonts in a PDF are now accepted (i.e. not forcedly changed to a Base-14 font).
1579
1580 * **Fixed** issue `#1154 <https://github.com/pymupdf/PyMuPDF/issues/1154>`_. Text search hits should now be correct when ``clip`` is specified.
1581
1582 * **Fixed** issue `#1152 <https://github.com/pymupdf/PyMuPDF/issues/1152>`_.
1583
1584 * **Fixed** issue `#1146 <https://github.com/pymupdf/PyMuPDF/issues/1146>`_.
1585
1586 * **Added** :attr:`Link.flags` and :meth:`Link.set_flags` to the :ref:`Link` class. Implements enhancement requests `#1187 <https://github.com/pymupdf/PyMuPDF/issues/1187>`_.
1587
1588 * **Added** option to *simulate* :meth:`TextWriter.fill_textbox` output for predicting the number of lines, that a given text would occupy in the textbox.
1589
1590 * **Added** text output support as subcommand `gettext` to the ``fitz`` CLI module. Most importantly, original **physical text layout** reproduction is now supported.
1591
1592
1593 ------
1594
1595 **Changes in Version 1.18.15**
1596
1597 * **Fixed** issue `#1088 <https://github.com/pymupdf/PyMuPDF/issues/1088>`_. Removing an annotation's fill color should now work again both ways, using the ``fill_color=[]`` argument in :meth:`Annot.update` as well as ``fill=[]`` in :meth:`Annot.set_colors`.
1598
1599 * **Fixed** issue `#1081 <https://github.com/pymupdf/PyMuPDF/issues/1081>`_. :meth:`Document.subset_fonts`: fixed an error which created wrong character widths for some fonts.
1600
1601 * **Fixed** issue `#1078 <https://github.com/pymupdf/PyMuPDF/issues/1078>`_. :meth:`Page.get_text` and other methods related to text extraction: changed the default value of the :ref:`TextPage` ``flags`` parameter. All whitespace and :data:`ligatures` are now preserved.
1602
1603 * **Fixed** issue `#1085 <https://github.com/pymupdf/PyMuPDF/issues/1085>`_. The old *snake_cased* alias of ``fitz.detTextlength`` is now defined correctly.
1604
1605 * **Changed** :meth:`Document.subset_fonts` will now correctly prefix font subsets with an appropriate six letter uppercase tag, complying with the PDF specification.
1606
1607 * **Added** new method :meth:`Widget.button_states` which returns the possible values that a button-type field can have when being set to "on" or "off".
1608
1609 * **Added** support of text with **Small Capital** letters to the :ref:`Font` and :ref:`TextWriter` classes. This is reflected by an additional bool parameter ``small_caps`` in various of their methods.
1610
1611
1612 ------
1613
1614 **Changes in Version 1.18.14**
1615
1616 * **Finished** implementing new, "snake_cased" names for methods and properties, that were "camelCased" and awkward in many aspects. At the end of this documentation, there is section :ref:`Deprecated` with more background and a mapping of old to new names.
1617
1618 * **Fixed** issue `#1053 <https://github.com/pymupdf/PyMuPDF/issues/1053>`_. :meth:`Page.insert_image`: when given, include image mask in the hash computation.
1619
1620 * **Fixed** issue `#1043 <https://github.com/pymupdf/PyMuPDF/issues/1043>`_. Added ``Pixmap.getPNGdata`` to the aliases of :meth:`Pixmap.tobytes`.
1621
1622 * **Fixed** an internal error when computing the enveloping rectangle of drawn paths as returned by :meth:`Page.get_drawings`.
1623
1624 * **Fixed** an internal error occasionally causing loops when outputting text via :meth:`TextWriter.fill_textbox`.
1625
1626 * **Added** :meth:`Font.char_lengths`, which returns a tuple of character widths of a string.
1627
1628 * **Added** more ways to specify pages in :meth:`Document.delete_pages`. Now a sequence (list, tuple or range) can be specified, and the Python ``del`` statement can be used. In the latter case, Python ``slices`` are also accepted.
1629
1630 * **Changed** :meth:`Document.del_toc_item`, which disables a single item of the TOC: previously, the title text was removed. Instead, now the complete item will be shown grayed-out by supporting viewers.
1631
1632
1633 ------
1634
1635 **Changes in Version 1.18.13**
1636
1637 * **Fixed** issue `#1014 <https://github.com/pymupdf/PyMuPDF/issues/1014>`_.
1638 * **Fixed** an internal memory leak when computing image bboxes -- :meth:`Page.get_image_bbox`.
1639 * **Added** support for low-level access and modification of the PDF trailer. Applies to :meth:`Document.xref_get_keys`, :meth:`Document.xref_get_key`, and :meth:`Document.xref_set_key`.
1640 * **Added** documentation for maintaining private entries in PDF metadata.
1641 * **Added** documentation for handling transparent image insertions, :meth:`Page.insert_image`.
1642 * **Added** :meth:`Page.get_image_rects`, an improved version of :meth:`Page.get_image_bbox`.
1643 * **Changed** :meth:`Document.delete_pages` to support various ways of specifying pages to delete. Implements `#1042 <https://github.com/pymupdf/PyMuPDF/issues/1042>`_.
1644 * **Changed** :meth:`Page.insert_image` to also accept the xref of an existing image in the file. This allows "copying" images between pages, and extremely fast mutiple insertions.
1645 * **Changed** :meth:`Page.insert_image` to also accept the integer parameter ``alpha``. To be used for performance improvements.
1646 * **Changed** :meth:`Pixmap.set_alpha` to support new parameters for pre-multiplying colors with their alpha values and setting a specific color to fully transparent (e.g. white).
1647 * **Changed** :meth:`Document.embfile_add` to automatically set creation and modification date-time. Correspondingly, :meth:`Document.embfile_upd` automatically maintains modification date-time (``/ModDate`` PDF key), and :meth:`Document.embfile_info` correspondingly reports these data. In addition, the embedded file's associated "collection item" is included via its :data:`xref`. This supports the development of PDF portfolio applications.
1648
1649 ------
1650
1651 **Changes in Version 1.18.11 / 1.18.12**
1652
1653 * **Fixed** issue `#972 <https://github.com/pymupdf/PyMuPDF/issues/972>`_. Improved layout of source distribution material.
1654 * **Fixed** issue `#962 <https://github.com/pymupdf/PyMuPDF/issues/962>`_. Stabilized Linux distribution detection for generating PyMuPDF from sources.
1655 * **Added:** :meth:`Page.get_xobjects` delivers the result of :meth:`Document.get_page_xobjects`.
1656 * **Added:** :meth:`Page.get_image_info` delivers meta information for all images shown on the page.
1657 * **Added:** :meth:`Tools.mupdf_display_warnings` allows setting on / off the display of MuPDF-generated warnings. The default is off.
1658 * **Added:** :meth:`Document.ez_save` convenience alias of :meth:`Document.save` with some different defaults.
1659 * **Changed:** Image extractions of document pages now also contain the image's **transformation matrix**. This concerns :meth:`Page.get_image_bbox` and the DICT, JSON, RAWDICT, and RAWJSON variants of :meth:`Page.get_text`.
1660
1661
1662 ------
1663
1664 **Changes in Version 1.18.10**
1665
1666 * **Fixed** issue `#941 <https://github.com/pymupdf/PyMuPDF/issues/941>`_. Added old aliases for :meth:`DisplayList.get_pixmap` and :meth:`DisplayList.get_textpage`.
1667 * **Fixed** issue `#929 <https://github.com/pymupdf/PyMuPDF/issues/929>`_. Stabilized removal of JavaScript objects with :meth:`Document.scrub`.
1668 * **Fixed** issue `#927 <https://github.com/pymupdf/PyMuPDF/issues/927>`_. Removed a loop in the reworked :meth:`TextWriter.fill_textbox`.
1669 * **Changed** :meth:`Document.xref_get_keys` and :meth:`Document.xref_get_key` to also allow accessing the PDF trailer dictionary. This can be done by using `-1` as the xref number argument.
1670 * **Added** a number of functions for reconstructing the quads for text lines, spans and characters extracted by :meth:`Page.get_text` options "dict" and "rawdict". See :meth:`recover_quad` and friends.
1671 * **Added** :meth:`Tools.unset_quad_corrections` to suppress character quad corrections (occasionally required for erroneous fonts).
1672
1673 ------
1674
1675 **Changes in Version 1.18.9**
1676
1677
1678 * **Fixed** issue `#888 <https://github.com/pymupdf/PyMuPDF/issues/888>`_. Removed ambiguous statements concerning PyMuPDF's license, which is now clearly stated to be GNU AGPL V3.
1679 * **Fixed** issue `#895 <https://github.com/pymupdf/PyMuPDF/issues/895>`_.
1680 * **Fixed** issue `#896 <https://github.com/pymupdf/PyMuPDF/issues/896>`_. Since v1.17.6 PyMuPDF suppresses the font subset tags and only reports the base fontname in text extraction outputs "dict" / "json" / "rawdict" / "rawjson". Now a new global parameter can request the old behaviour, :meth:`Tools.set_subset_fontnames`.
1681 * **Fixed** issue `#885 <https://github.com/pymupdf/PyMuPDF/issues/885>`_. Pixmap creation now also works with filenames given as ``pathlib.Paths``.
1682 * **Changed** :meth:`Document.subset_fonts`: Text is **not rewritten** any more and should therefore **retain all its origial properties** -- like being hidden or being controlled by Optional Content mechanisms.
1683 * **Changed** :ref:`TextWriter` output to also accept text in right to left mode (Arabian, Hebrew): :meth:`TextWriter.fill_textbox`, :meth:`TextWriter.append`. These methods now accept a new boolean parameter `right_to_left`, which is *False* by default. Implements `#897 <https://github.com/pymupdf/PyMuPDF/issues/897>`_.
1684 * **Changed** :meth:`TextWriter.fill_textbox` to return all lines of text, that did not fit in the given rectangle. Also changed the default of the ``warn`` parameter to no longer print a warning message in overflow situations.
1685 * **Added** a utility function :meth:`recover_quad`, which computes the quadrilateral of a span. This function can be used for correctly marking text extracted with the "dict" or "rawdict" options of :meth:`Page.get_text`.
1686
1687 ------
1688
1689 **Changes in Version 1.18.8**
1690
1691
1692 This is a bug fix version only. We are publishing early because of the potentially widely used functions.
1693
1694 * **Fixed** issue `#881 <https://github.com/pymupdf/PyMuPDF/issues/881>`_. Fixed a memory leak in :meth:`Page.insert_image` when inserting images from files or memory.
1695 * **Fixed** issue `#878 <https://github.com/pymupdf/PyMuPDF/issues/878>`_. ``pathlib.Path`` objects should now correctly handle file path hierarchies.
1696
1697
1698 ------
1699
1700 **Changes in Version 1.18.7**
1701
1702
1703 * **Added** an experimental :meth:`Document.subset_fonts` which reduces the size of eligible fonts based on their use by text in the PDF. Implements `#855 <https://github.com/pymupdf/PyMuPDF/discussions/855>`_.
1704 * **Implemented** request `#870 <https://github.com/pymupdf/PyMuPDF/pull/870>`_: :meth:`Document.convert_to_pdf` now also supports PDF documents.
1705 * **Renamed** ``Document.write`` to :meth:`Document.tobytes` for greater clarity. But the deprecated name remains available for some time.
1706 * **Implemented** request `#843 <https://github.com/pymupdf/PyMuPDF/Discussions/843>`_: :meth:`Document.tobytes` now supports linearized PDF output. :meth:`Document.save` now also supports writing to Python **file objects**. In addition, the open function now also supports Python file objects.
1707 * **Fixed** issue `#844 <https://github.com/pymupdf/PyMuPDF/issues/844>`_.
1708 * **Fixed** issue `#838 <https://github.com/pymupdf/PyMuPDF/issues/838>`_.
1709 * **Fixed** issue `#823 <https://github.com/pymupdf/PyMuPDF/issues/823>`_. More logic for better support of OCRed text output (Tesseract, ABBYY).
1710 * **Fixed** issue `#818 <https://github.com/pymupdf/PyMuPDF/issues/818>`_.
1711 * **Fixed** issue `#814 <https://github.com/pymupdf/PyMuPDF/issues/814>`_.
1712 * **Added** :meth:`Document.get_page_labels` which returns a list of page label definitions of a PDF.
1713 * **Added** :meth:`Document.has_annots` and :meth:`Document.has_links` to check whether these object types are present anywhere in a PDF.
1714 * **Added** expert low-level functions to simplify inquiry and modification of PDF object sources: :meth:`Document.xref_get_keys` lists the keys of object :data:`xref`, :meth:`Document.xref_get_key` returns type and content of a key, and :meth:`Document.xref_set_key` modifies the key's value.
1715 * **Added** parameter ``thumbnails`` to :meth:`Document.scrub` to also allow removing page thumbnail images.
1716 * **Improved** documentation for how to add valid text marker annotations for non-horizontal text.
1717
1718 We continued the process of renaming methods and properties from *"mixedCase"* to *"snake_case"*. Documentation usually mentions the new names only, but old, deprecated names remain available for some time.
1719
1720
1721
1722 ------
1723
1724 **Changes in Version 1.18.6**
1725
1726 * **Fixed** issue `#812 <https://github.com/pymupdf/PyMuPDF/issues/812>`_.
1727 * **Fixed** issue `#793 <https://github.com/pymupdf/PyMuPDF/issues/793>`_. Invalid document metadata previously prevented opening some documents at all. This error has been removed.
1728 * **Fixed** issue `#792 <https://github.com/pymupdf/PyMuPDF/issues/792>`_. Text search and text extraction will make no rectangle containment checks at all if the default ``clip=None`` is used.
1729 * **Fixed** issue `#785 <https://github.com/pymupdf/PyMuPDF/issues/785>`_.
1730 * **Fixed** issue `#780 <https://github.com/pymupdf/PyMuPDF/issues/780>`_. Corrected a parameter check error.
1731 * **Fixed** issue `#779 <https://github.com/pymupdf/PyMuPDF/issues/779>`_. Fixed typo
1732 * **Added** an option to set the desired line height for text boxes. Implements `#804 <https://github.com/pymupdf/PyMuPDF/issues/804>`_.
1733 * **Changed** text position retrieval to better cope with Tesseract's glyphless font. Implements `#803 <https://github.com/pymupdf/PyMuPDF/issues/803>`_.
1734 * **Added** an option to choose the prefix of new annotations, fields and links for providing unique annotation ids. Implements request `#807 <https://github.com/pymupdf/PyMuPDF/issues/807>`_.
1735 * **Added** getting and setting color and text properties for Table of Contents items for PDFs. Implements `#779 <https://github.com/pymupdf/PyMuPDF/issues/779>`_.
1736 * **Added** PDF page label handling: :meth:`Page.get_label()` returns the page label, :meth:`Document.get_page_numbers` return all page numbers having a specified label, and :meth:`Document.set_page_labels` adds or updates a PDF's page label definition.
1737
1738
1739
1740 .. note::
1741 This version introduces **Python type hinting**. The goal is to provide each parameter and the return value of all functions and methods with type information. This still is work in progress although the majority of functions has already been handled.
1742
1743
1744 ------
1745
1746 **Changes in Version 1.18.5**
1747
1748 Apart from several fixes, this version also focusses on several minor, but important feature improvements. Among the latter is a more precise computation of proper line heights and insertion points for writing / inserting text. As opposed to using font-agnostic constants, these values are now taken from the font's properties.
1749
1750 Also note that this is the first version which does no longer provide pregenerated wheels for Python versions older than 3.6. PIP also discontinues support for these by end of this year 2020.
1751
1752 * **Fixed** issue `#771 <https://github.com/pymupdf/PyMuPDF/issues/771>`_. By using "small glyph heights" option, the full page text can be extracted.
1753 * **Fixed** issue `#768 <https://github.com/pymupdf/PyMuPDF/issues/768>`_.
1754 * **Fixed** issue `#750 <https://github.com/pymupdf/PyMuPDF/issues/750>`_.
1755 * **Fixed** issue `#739 <https://github.com/pymupdf/PyMuPDF/issues/739>`_. The "dict", "rawdict" and corresponding JSON output variants now have two new *span* keys: ``"ascender"`` and ``"descender"``. These floats represent special font properties which can be used to compute bboxes of spans or characters of **exactly fontsize height** (as opposed to the default line height). An example algorithm is shown in section "Span Dictionary" `here <https://pymupdf.readthedocs.io/en/latest/textpage.html#dictionary-structure-of-extractdict-and-extractrawdict>`_. Also improved the detection and correction of ill-specified ascender / descender values encountered in some fonts.
1756 * **Added** a new, experimental :meth:`Tools.set_small_glyph_heights` -- also in response to issue `#739 <https://github.com/pymupdf/PyMuPDF/issues/739>`_. This method sets or unsets a global parameter to **always compute bboxes with fontsize height**. If "on", text searching and all text extractions will returned rectangles, bboxes and quads with a smaller height.
1757 * **Fixed** issue `#728 <https://github.com/pymupdf/PyMuPDF/issues/728>`_.
1758 * **Changed** fill color logic of 'Polyline' annotations: this parameter now only pertains to line end symbols -- the annotation itself can no longer have a fill color. Also addresses issue `#727 <https://github.com/pymupdf/PyMuPDF/issues/727>`_.
1759 * **Changed** :meth:`Page.getImageBbox` to also compute the bbox if the image is contained in an XObject.
1760 * **Changed** :meth:`Shape.insertTextbox`, resp. :meth:`Page.insertTextbox`, resp. :meth:`TextWriter.fillTextbox` to respect font's properties "ascender" / "descender" when computing line height and insertion point. This should no longer lead to line overlaps for multi-line output. These methods used to ignore font specifics and used constant values instead.
1761
1762
1763 ------
1764
1765 **Changes in Version 1.18.4**
1766
1767 This version adds several features to support PDF Optional Content. Among other things, this includes OCMDs (Optional Content Membership Dictionaries) with the full scope of *"visibility expressions"* (PDF key ``/VE``), text insertions (including the :ref:`TextWriter` class) and drawings.
1768
1769 * **Fixed** issue `#727 <https://github.com/pymupdf/PyMuPDF/issues/727>`_. Freetext annotations now support an uncolored rectangle when ``fill_color=None``.
1770 * **Fixed** issue `#726 <https://github.com/pymupdf/PyMuPDF/issues/726>`_. UTF-8 encoding errors are now handled for HTML / XML :meth:`Page.getText` output.
1771 * **Fixed** issue `#724 <https://github.com/pymupdf/PyMuPDF/issues/724>`_. Empty values are no longer stored in the PDF /Info metadata dictionary.
1772 * **Added** new methods :meth:`Document.set_oc` and :meth:`Document.get_oc` to set or get optional content references for **existing** image and form XObjects. These methods are similar to the same-named methods of :ref:`Annot`.
1773 * **Added** :meth:`Document.set_ocmd`, :meth:`Document.get_ocmd` for handling OCMDs.
1774 * **Added** **Optional Content** support for text insertion and drawing.
1775 * **Added** new method :meth:`Page.deleteWidget`, which deletes a form field from a page. This is analogous to deleting annotations.
1776 * **Added** support for Popup annotations. This includes defining the Popup rectangle and setting the Popup to open or closed. Methods / attributes :meth:`Annot.set_popup`, :meth:`Annot.set_open`, :attr:`Annot.has_popup`, :attr:`Annot.is_open`, :attr:`Annot.popup_rect`, :attr:`Annot.popup_xref`.
1777
1778 Other changes:
1779
1780 * The **naming of methods and attributes** in PyMuPDF is far from being satisfactory: we have *CamelCases*, *mixedCases* and *lower_case_with_underscores* all over the place. With the :ref:`Annot` as the first candidate, we have started an activity to clean this up step by step, converting to lower case with underscores for methods and attributes while keeping UPPERCASE for the constants.
1781
1782 - Old names will remain available to prevent code breaks, but they will no longer be mentioned in the documentation.
1783 - New methods and attributes of all classes will be named according to the new standard.
1784
1785 ------
1786
1787 **Changes in Version 1.18.3**
1788
1789 As a major new feature, this version introduces support for PDF's **Optional Content** concept.
1790
1791 * **Fixed** issue `#714 <https://github.com/pymupdf/PyMuPDF/issues/714>`_.
1792 * **Fixed** issue `#711 <https://github.com/pymupdf/PyMuPDF/issues/711>`_.
1793 * **Fixed** issue `#707 <https://github.com/pymupdf/PyMuPDF/issues/707>`_: if a PDF user password, but no owner password is supplied nor present, then the user password is also used as the owner password.
1794 * **Fixed** ``expand`` and ``deflate`` parameters of methods :meth:`Document.save` and :meth:`Document.write`. Individual image and font compression should now finally work. Addresses issue `#713 <https://github.com/pymupdf/PyMuPDF/issues/713>`_.
1795 * **Added** a support of PDF optional content. This includes several new :ref:`Document` methods for inquiring and setting optional content status and adding optional content configurations and groups. In addition, images, form XObjects and annotations now can be bound to optional content specifications. **Resolved** issue `#709 <https://github.com/pymupdf/PyMuPDF/issues/709>`_.
1796
1797
1798
1799 ------
1800
1801 **Changes in Version 1.18.2**
1802
1803 This version contains some interesting improvements for text searching: any number of search hits is now returned and the **hit_max** parameter was removed. The new **clip** parameter in addition allows to restrict the search area. Searching now detects hyphenations at line breaks and accordingly finds hyphenated words.
1804
1805 * **Fixed** issue `#575 <https://github.com/pymupdf/PyMuPDF/issues/575>`_: if using ``quads=False`` in text searching, then overlapping rectangles on the same line are joined. Previously, parts of the search string, which belonged to different "marked content" items, each generated their own rectangle -- just as if occurring on separate lines.
1806 * **Added** :attr:`Document.isRepaired`, which is true if the PDF was repaired on open.
1807 * **Added** :meth:`Document.setXmlMetadata` which either updates or creates PDF XML metadata. Implements issue `#691 <https://github.com/pymupdf/PyMuPDF/issues/691>`_.
1808 * **Added** :meth:`Document.getXmlMetadata` returns PDF XML metadata.
1809 * **Changed** creation of PDF documents: they will now always carry a PDF identification (``/ID`` field) in the document trailer. Implements issue `#691 <https://github.com/pymupdf/PyMuPDF/issues/691>`_.
1810 * **Changed** :meth:`Page.searchFor`: a new parameter ``clip`` is accepted to restrict the search to this rectangle. Correspondingly, the attribute :attr:`TextPage.rect` is now respected by :meth:`TextPage.search`.
1811 * **Changed** parameter ``hit_max`` in :meth:`Page.searchFor` and :meth:`TextPage.search` is now obsolete: methods will return all hits.
1812 * **Changed** character **selection criteria** in :meth:`Page.getText`: a character is now considered to be part of a ``clip`` if its bbox is fully contained. Before this, a non-empty intersection was sufficient.
1813 * **Changed** :meth:`Document.scrub` to support a new option `redact_images`. This addresses issue `#697 <https://github.com/pymupdf/PyMuPDF/issues/697>`_.
1814
1815
1816 ------
1817
1818 **Changes in Version 1.18.1**
1819
1820 * **Fixed** issue `#692 <https://github.com/pymupdf/PyMuPDF/issues/692>`_. PyMuPDF now detects and recovers from more cyclic resource dependencies in PDF pages and for the first time reports them in the MuPDF warnings store.
1821 * **Fixed** issue `#686 <https://github.com/pymupdf/PyMuPDF/issues/686>`_.
1822 * **Added** opacity options for the :ref:`Shape` class: Stroke and fill colors can now be set to some transparency value. This means that all :ref:`Page` draw methods, methods :meth:`Page.insertText`, :meth:`Page.insertTextbox`, :meth:`Shape.finish`, :meth:`Shape.insertText`, and :meth:`Shape.insertTextbox` support two new parameters: *stroke_opacity* and *fill_opacity*.
1823 * **Added** new parameter ``mask`` to :meth:`Page.insertImage` for optionally providing an external image mask. Resolves issue `#685 <https://github.com/pymupdf/PyMuPDF/issues/685>`_.
1824 * **Added** :meth:`Annot.soundGet` for extracting the sound of an audio annotation.
1825
1826 ------
1827
1828 **Changes in Version 1.18.0**
1829
1830 This is the first PyMuPDF version supporting MuPDF v1.18. The focus here is on extending PyMuPDF's own functionality -- apart from bug fixing. Subsequent PyMuPDF patches may address features new in MuPDF.
1831
1832 * **Fixed** issue `#519 <https://github.com/pymupdf/PyMuPDF/issues/519>`_. This upstream bug occurred occasionally for some pages only and seems to be fixed now: page layout should no longer be ruined in these cases.
1833
1834 * **Fixed** issue `#675 <https://github.com/pymupdf/PyMuPDF/issues/675>`_.
1835
1836 - Unsuccessful storage allocations should now always lead to exceptions (circumvention of an upstream bug intermittently crashing the interpreter).
1837 - :ref:`Pixmap` size is now based on ``size_t`` instead of ``int`` in C and should be correct even for extremely large pixmaps.
1838
1839 * **Fixed** issue `#668 <https://github.com/pymupdf/PyMuPDF/issues/668>`_. Specification of dashes for PDF drawing insertion should now correctly reflect the PDF spec.
1840 * **Fixed** issue `#669 <https://github.com/pymupdf/PyMuPDF/issues/669>`_. A major source of memory leakage in :meth:`Page.insert_pdf` has been removed.
1841 * **Added** keyword *"images"* to :meth:`Page.apply_redactions` for fine-controlling the handling of images.
1842 * **Added** :meth:`Annot.getText` and :meth:`Annot.getTextbox`, which offer the same functionality as the :ref:`Page` versions.
1843 * **Added** key *"number"* to the block dictionaries of :meth:`Page.getText` / :meth:`Annot.getText` for options "dict" and "rawdict".
1844 * **Added** :meth:`glyph_name_to_unicode` and :meth:`unicode_to_glyph_name`. Both functions do not really connect to a specific font and are now independently available, too. The data are now based on the `Adobe Glyph List <https://github.com/adobe-type-tools/agl-aglfn/blob/master/glyphlist.txt>`_.
1845 * **Added** convenience functions :meth:`adobe_glyph_names` and :meth:`adobe_glyph_unicodes` which return the respective available data.
1846 * **Added** :meth:`Page.getDrawings` which returns details of drawing operations on a document page. Works for all document types.
1847 * Improved performance of :meth:`Document.insert_pdf`. Multiple object copies are now also suppressed across multiple separate insertions from the same source. This saves time, memory and target file size. Previously this mechanism was only active within each single method execution. The feature can also be suppressed with the new method bool parameter *final=1*, which is the default.
1848 * For PNG images created from pixmaps, the resolution (dpi) is now automatically set from the respective :attr:`Pixmap.xres` and :attr:`Pixmap.yres` values.
1849
1850
1851 ------
1852
1853 **Changes in Version 1.17.7**
1854
1855 * **Fixed** issue `#651 <https://github.com/pymupdf/PyMuPDF/issues/651>`_. An upstream bug causing interpreter crashes in corner case redaction processings was fixed by backporting MuPDF changes from their development repo.
1856 * **Fixed** issue `#645 <https://github.com/pymupdf/PyMuPDF/issues/645>`_. Pixmap top-left coordinates can be set (again) by their own method, :meth:`Pixmap.set_origin`.
1857 * **Fixed** issue `#622 <https://github.com/pymupdf/PyMuPDF/issues/622>`_. :meth:`Page.insertImage` again accepts a :data:`rect_like` parameter.
1858 * **Added** severeal new methods to improve and speed-up table of contents (TOC) handling. Among other things, TOC items can now changed or deleted individually -- without always replacing the complete TOC. Furthermore, access to some PDF page attributes is now possible without first **loading** the page. This has a very significant impact on the performance of TOC manipulation.
1859 * **Added** an option to :meth:`Document.insert_pdf` which allows displaying progress messages. Adresses `#640 <https://github.com/pymupdf/PyMuPDF/issues/640>`_.
1860 * **Added** :meth:`Page.getTextbox` which extracts text contained in a rectangle. In many cases, this should obsolete writing your own script for this type of thing.
1861 * **Added** new ``clip`` parameter to :meth:`Page.getText` to simplify and speed up text extraction of page sub areas.
1862 * **Added** :meth:`TextWriter.appendv` to add text in **vertical write mode**. Addresses issue `#653 <https://github.com/pymupdf/PyMuPDF/issues/653>`_
1863
1864
1865 ------
1866
1867 **Changes in Version 1.17.6**
1868
1869 * **Fixed** issue `#605 <https://github.com/pymupdf/PyMuPDF/issues/605>`_
1870 * **Fixed** issue `#600 <https://github.com/pymupdf/PyMuPDF/issues/600>`_ -- text should now be correctly positioned also for pages with a CropBox smaller than MediaBox.
1871 * **Added** text span dictionary key ``origin`` which contains the lower left coordinate of the first character in that span.
1872 * **Added** attribute :attr:`Font.buffer`, a *bytes* copy of the font file.
1873 * **Added** parameter *sanitize* to :meth:`Page.cleanContents`. Allows switching of sanitization, so only syntax cleaning will be done.
1874
1875 ------
1876
1877 **Changes in Version 1.17.5**
1878
1879 * **Fixed** issue `#561 <https://github.com/pymupdf/PyMuPDF/issues/561>`_ -- second go: certain :ref:`TextWriter` usages with many alternating fonts did not work correctly.
1880 * **Fixed** issue `#566 <https://github.com/pymupdf/PyMuPDF/issues/566>`_.
1881 * **Fixed** issue `#568 <https://github.com/pymupdf/PyMuPDF/issues/568>`_.
1882 * **Fixed** -- opacity is now correctly taken from the :ref:`TextWriter` object, if not given in :meth:`TextWriter.writeText`.
1883 * **Added** a new global attribute :attr:`fitz_fontdescriptors`. Contains information about usable fonts from repository `pymupdf-fonts <https://github.com/pymupdf/pymupdf-fonts>`_.
1884 * **Added** :meth:`Font.valid_codepoints` which returns an array of unicode codepoints for which the font has a glyph.
1885 * **Added** option ``text_as_path`` to :meth:`Page.getSVGimage`. this implements `#580 <https://github.com/pymupdf/PyMuPDF/issues/580>`_. Generates much smaller SVG files with parseable text if set to *False*.
1886
1887
1888 ------
1889
1890 **Changes in Version 1.17.4**
1891
1892 * **Fixed** issue `#561 <https://github.com/pymupdf/PyMuPDF/issues/561>`_. Handling of more than 10 :ref:`Font` objects on one page should now work correctly.
1893 * **Fixed** issue `#562 <https://github.com/pymupdf/PyMuPDF/issues/562>`_. Annotation pixmaps are no longer derived from the page pixmap, thus avoiding unintended inclusion of page content.
1894 * **Fixed** issue `#559 <https://github.com/pymupdf/PyMuPDF/issues/559>`_. This **MuPDF** bug is being temporarily fixed with a pre-version of MuPDF's next release.
1895 * **Added** utility function :meth:`repair_mono_font` for correcting displayed character spacing for some mono-spaced fonts.
1896 * **Added** utility method :meth:`Document.need_appearances` for fine-controlling Form PDF behavior. Addresses issue `#563 <https://github.com/pymupdf/PyMuPDF/issues/563>`_.
1897 * **Added** utility function :meth:`sRGB_to_pdf` to recover the PDF color triple for a given color integer in sRGB format.
1898 * **Added** utility function :meth:`sRGB_to_rgb` to recover the (R, G, B) color triple for a given color integer in sRGB format.
1899 * **Added** utility function :meth:`make_table` which delivers table cells for a given rectangle and desired numbers of columns and rows.
1900 * **Added** support for optional fonts in repository `pymupdf-fonts <https://github.com/pymupdf/pymupdf-fonts>`_.
1901
1902 ------
1903
1904 **Changes in Version 1.17.3**
1905
1906 * **Fixed** an undocumented issue, which prevented fully cleaning a PDF page when using :meth:`Page.cleanContents`.
1907 * **Fixed** issue `#540 <https://github.com/pymupdf/PyMuPDF/issues/540>`_. Text extraction for EPUB should again work correctly.
1908 * **Fixed** issue `#548 <https://github.com/pymupdf/PyMuPDF/issues/548>`_. Documentation now includes ``LINK_NAMED``.
1909 * **Added** new parameter to control start of text in :meth:`TextWriter.fillTextbox`. Implements `#549 <https://github.com/pymupdf/PyMuPDF/issues/549>`_.
1910 * **Changed** documentation of :meth:`Page.add_redact_annot` to explain the usage of non-builtin fonts.
1911
1912 ------
1913
1914 **Changes in Version 1.17.2**
1915
1916 * **Fixed** issue `#533 <https://github.com/pymupdf/PyMuPDF/issues/533>`_.
1917 * **Added** options to modify 'Redact' annotation appearance. Implements `#535 <https://github.com/pymupdf/PyMuPDF/issues/535>`_.
1918
1919
1920 ------
1921
1922 **Changes in Version 1.17.1**
1923
1924 * **Fixed** issue `#520 <https://github.com/pymupdf/PyMuPDF/issues/520>`_.
1925 * **Fixed** issue `#525 <https://github.com/pymupdf/PyMuPDF/issues/525>`_. Vertices for 'Ink' annots should now be correct.
1926 * **Fixed** issue `#524 <https://github.com/pymupdf/PyMuPDF/issues/524>`_. It is now possible to query and set rotation for applicable annotation types.
1927
1928 Also significantly improved inline documentation for better support of interactive help.
1929
1930 ------
1931
1932 **Changes in Version 1.17.0**
1933
1934 This version is based on MuPDF v1.17. Following are highlights of new and changed features:
1935
1936 * **Added** extended language support for annotations and widgets: a mixture of Latin, Greece, Russian, Chinese, Japanese and Korean characters can now be used in 'FreeText' annotations and text widgets. No special arrangement is required to use it.
1937
1938 * Faster page access is implemented for documents supporting a "chapter" structure. This applies to EPUB documents currently. This comes with several new :ref:`Document` methods and changes for :meth:`Document.loadPage` and the "indexed" page access *doc[n]*: In addition to specifying a page number as before, a tuple *(chaper, pno)* can be specified to identify the desired page.
1939
1940 * **Changed:** Improved support of redaction annotations: images overlapped by redactions are **permanantly modified** by erasing the overlap areas. Also links are removed if overlapped by redactions. This is now fully in sync with PDF specifications.
1941
1942 Other changes:
1943
1944 * **Changed** :meth:`TextWriter.writeText` to support the *"morph"* parameter.
1945 * **Added** methods :meth:`Rect.morph`, :meth:`IRect.morph`, and :meth:`Quad.morph`, which return a new :ref:`Quad`.
1946 * **Changed** :meth:`Page.add_freetext_annot` to support text alignment via a new *"align"* parameter.
1947 * **Fixed** issue `#508 <https://github.com/pymupdf/PyMuPDF/issues/508>`_. Improved image rectangle calculation to hopefully deliver correct values in most if not all cases.
1948 * **Fixed** issue `#502 <https://github.com/pymupdf/PyMuPDF/issues/502>`_.
1949 * **Fixed** issue `#500 <https://github.com/pymupdf/PyMuPDF/issues/500>`_. :meth:`Document.convertToPDF` should no longer cause memory leaks.
1950 * **Fixed** issue `#496 <https://github.com/pymupdf/PyMuPDF/issues/496>`_. Annotations and widgets / fields are now added or modified using the coordinates of the **unrotated page**. This behavior is now in sync with other methods modifying PDF pages.
1951 * **Added** :attr:`Page.rotationMatrix` and :attr:`Page.derotationMatrix` to support coordinate transformations between the rotated and the original versions of a PDF page.
1952
1953 Potential code breaking changes:
1954
1955 * The private method ``Page._getTransformation()`` has been removed. Use the public :attr:`Page.transformationMattrix` instead.
1956
1957
1958 ------
1959
1960 **Changes in Version 1.16.18**
1961
1962 This version introduces several new features around PDF text output. The motivation is to simplify this task, while at the same time offering extending features.
1963
1964 One major achievement is using MuPDF's capabilities to dynamically choosing fallback fonts whenever a character cannot be found in the current one. This seemlessly works for Base-14 fonts in combination with CJK fonts (China, Japan, Korea). So a text may contain **any combination of characters** from the Latin, Greek, Russian, Chinese, Japanese and Korean languages.
1965
1966 * **Fixed** issue `#493 <https://github.com/pymupdf/PyMuPDF/issues/493>`_. ``Pixmap(doc, xref)`` should now again correctly resemble the loaded image object.
1967 * **Fixed** issue `#488 <https://github.com/pymupdf/PyMuPDF/issues/488>`_. Widget names are now modifiable.
1968 * **Added** new class :ref:`Font` which represents a font.
1969 * **Added** new class :ref:`TextWriter` which serves as a container for text to be written on a page.
1970 * **Added** :meth:`Page.writeText` to write one or more :ref:`TextWriter` objects to the page.
1971
1972
1973 ------
1974
1975 **Changes in Version 1.16.17**
1976
1977
1978 * **Fixed** issue `#479 <https://github.com/pymupdf/PyMuPDF/issues/479>`_. PyMuPDF should now more correctly report image resolutions. This applies to both, images (either from images files or extracted from PDF documents) and pixmaps created from images.
1979 * **Added** :meth:`Pixmap.set_dpi` which sets the image resolution in x and y directions.
1980
1981 ------
1982
1983 **Changes in Version 1.16.16**
1984
1985
1986 * **Fixed** issue `#477 <https://github.com/pymupdf/PyMuPDF/issues/477>`_.
1987 * **Fixed** issue `#476 <https://github.com/pymupdf/PyMuPDF/issues/476>`_.
1988 * **Changed** annotation line end symbol coloring and fixed an error coloring the interior of 'Polyline' /'Polygon' annotations.
1989
1990 ------
1991
1992 **Changes in Version 1.16.14**
1993
1994
1995 * **Changed** text marker annotations to accept parameters beyond just quadrilaterals such that now **text lines between two given points can be marked**.
1996
1997 * **Added** :meth:`Document.scrub` which **removes potentially sensitive data** from a PDF. Implements `#453 <https://github.com/pymupdf/PyMuPDF/issues/453>`_.
1998
1999 * **Added** :meth:`Annot.blendMode` which returns the **blend mode** of annotations.
2000
2001 * **Added** :meth:`Annot.setBlendMode` to set the annotation's blend mode. This resolves issue `#416 <https://github.com/pymupdf/PyMuPDF/issues/416>`_.
2002 * **Changed** :meth:`Annot.update` to accept additional parameters for setting blend mode and opacity.
2003 * **Added** advanced graphics features to **control the anti-aliasing values**, :meth:`Tools.set_aa_level`. Resolves `#467 <https://github.com/pymupdf/PyMuPDF/issues/467>`_
2004
2005 * **Fixed** issue `#474 <https://github.com/pymupdf/PyMuPDF/issues/474>`_.
2006 * **Fixed** issue `#466 <https://github.com/pymupdf/PyMuPDF/issues/466>`_.
2007
2008
2009
2010 ------
2011
2012 **Changes in Version 1.16.13**
2013
2014
2015 * **Added** :meth:`Document.getPageXObjectList` which returns a list of **Form XObjects** of the page.
2016 * **Added** :meth:`Page.setMediaBox` for changing the physical PDF page size.
2017 * **Added** :ref:`Page` methods which have been internal before: :meth:`Page.cleanContents` (= :meth:`Page._cleanContents`), :meth:`Page.getContents` (= :meth:`Page._getContents`), :meth:`Page.getTransformation` (= :meth:`Page._getTransformation`).
2018
2019
2020
2021 ------
2022
2023 **Changes in Version 1.16.12**
2024
2025 * **Fixed** issue `#447 <https://github.com/pymupdf/PyMuPDF/issues/447>`_
2026 * **Fixed** issue `#461 <https://github.com/pymupdf/PyMuPDF/issues/461>`_.
2027 * **Fixed** issue `#397 <https://github.com/pymupdf/PyMuPDF/issues/397>`_.
2028 * **Fixed** issue `#463 <https://github.com/pymupdf/PyMuPDF/issues/463>`_.
2029 * **Added** JavaScript support to PDF form fields, thereby fixing `#454 <https://github.com/pymupdf/PyMuPDF/issues/454>`_.
2030 * **Added** a new annotation method :meth:`Annot.delete_responses`, which removes 'Popup' and response annotations referring to the current one. Mainly serves data protection purposes.
2031 * **Added** a new form field method :meth:`Widget.reset`, which resets the field value to its default.
2032 * **Changed** and extended handling of redactions: images and XObjects are removed if *contained* in a redaction rectangle. Any partial only overlaps will just be covered by the redaction background color. Now an *overlay* text can be specified to be inserted in the rectangle area to **take the place the deleted original** text. This resolves `#434 <https://github.com/pymupdf/PyMuPDF/issues/434>`_.
2033
2034 ------
2035
2036 **Changes in Version 1.16.11**
2037
2038 * **Added** Support for redaction annotations via method :meth:`Page.add_redact_annot` and :meth:`Page.apply_redactions`.
2039 * **Fixed** issue #426 ("PolygonAnnotation in 1.16.10 version").
2040 * **Fixed** documentation only issues `#443 <https://github.com/pymupdf/PyMuPDF/issues/443>`_ and `#444 <https://github.com/pymupdf/PyMuPDF/issues/444>`_.
2041
2042 ------
2043
2044 **Changes in Version 1.16.10**
2045
2046 * **Fixed** issue #421 ("annot.set_rect(rect) has no effect on text Annotation")
2047 * **Fixed** issue #417 ("Strange behavior for page.deleteAnnot on 1.16.9 compare to 1.13.20")
2048 * **Fixed** issue #415 ("Annot.setOpacity throws mupdf warnings")
2049 * **Changed** all "add annotation / widget" methods to store a unique name in the */NM* PDF key.
2050 * **Changed** :meth:`Annot.setInfo` to also accept direct parameters in addition to a dictionary.
2051 * **Changed** :attr:`Annot.info` to now also show the annotation's unique id (*/NM* PDF key) if present.
2052 * **Added** :meth:`Page.annot_names` which returns a list of all annotation names (*/NM* keys).
2053 * **Added** :meth:`Page.load_annot` which loads an annotation given its unique id (*/NM* key).
2054 * **Added** :meth:`Document.reload_page` which provides a new copy of a page after finishing any pending updates to it.
2055
2056
2057 ------
2058
2059 **Changes in Version 1.16.9**
2060
2061 * **Fixed** #412 ("Feature Request: Allow controlling whether TOC entries should be collapsed")
2062 * **Fixed** #411 ("Seg Fault with page.firstWidget")
2063 * **Fixed** #407 ("Annot.setOpacity trouble")
2064 * **Changed** methods :meth:`Annot.setBorder`, :meth:`Annot.setColors`, :meth:`Link.setBorder`, and :meth:`Link.setColors` to also accept direct parameters, and not just cumbersome dictionaries.
2065
2066 ------
2067
2068 **Changes in Version 1.16.8**
2069
2070 * **Added** several new methods to the :ref:`Document` class, which make dealing with PDF low-level structures easier. I also decided to provide them as "normal" methods (as opposed to private ones starting with an underscore "_"). These are :meth:`Document.xrefObject`, :meth:`Document.xrefStream`, :meth:`Document.xrefStreamRaw`, :meth:`Document.PDFTrailer`, :meth:`Document.PDFCatalog`, :meth:`Document.metadataXML`, :meth:`Document.updateObject`, :meth:`Document.updateStream`.
2071 * **Added** :meth:`Tools.mupdf_disply_errors` which sets the display of mupdf errors on *sys.stderr*.
2072 * **Added** a commandline facility. This a major new feature: you can now invoke several utility functions via *"python -m fitz ..."*. It should obsolete the need for many of the most trivial scripts. Please refer to :ref:`Module`.
2073
2074
2075 ------
2076
2077 **Changes in Version 1.16.7**
2078
2079 Minor changes to better synchronize the binary image streams of :ref:`TextPage` image blocks and :meth:`Document.extractImage` images.
2080
2081 * **Fixed** issue #394 ("PyMuPDF Segfaults when using TOOLS.mupdf_warnings()").
2082 * **Changed** redirection of MuPDF error messages: apart from writing them to Python *sys.stderr*, they are now also stored with the MuPDF warnings.
2083 * **Changed** :meth:`Tools.mupdf_warnings` to automatically empty the store (if not deactivated via a parameter).
2084 * **Changed** :meth:`Page.getImageBbox` to return an **infinite rectangle** if the image could not be located on the page -- instead of raising an exception.
2085
2086
2087 ------
2088
2089 **Changes in Version 1.16.6**
2090
2091 * **Fixed** issue #390 ("Incomplete deletion of annotations").
2092 * **Changed** :meth:`Page.searchFor` / :meth:`Document.searchPageFor` to also support the *flags* parameter, which controls the data included in a :ref:`TextPage`.
2093 * **Changed** :meth:`Document.getPageImageList`, :meth:`Document.getPageFontList` and their :ref:`Page` counterparts to support a new parameter *full*. If true, the returned items will contain the :data:`xref` of the *Form XObject* where the font or image is referenced.
2094
2095 ------
2096
2097 **Changes in Version 1.16.5**
2098
2099 More performance improvements for text extraction.
2100
2101 * **Fixed** second part of issue #381 (see item in v1.16.4).
2102 * **Added** :meth:`Page.getTextPage`, so it is no longer required to create an intermediate display list for text extractions. Page level wrappers for text extraction and text searching are now based on this, which should improve performance by ca. 5%.
2103
2104 ------
2105
2106 **Changes in Version 1.16.4**
2107
2108
2109 * **Fixed** issue #381 ("TextPage.extractDICT ... failed ... after upgrading ... to 1.16.3")
2110 * **Added** method :meth:`Document.pages` which delivers a generator iterator over a page range.
2111 * **Added** method :meth:`Page.links` which delivers a generator iterator over the links of a page.
2112 * **Added** method :meth:`Page.annots` which delivers a generator iterator over the annotations of a page.
2113 * **Added** method :meth:`Page.widgets` which delivers a generator iterator over the form fields of a page.
2114 * **Changed** :attr:`Document.is_form_pdf` to now contain the number of widgets, and *False* if not a PDF or this number is zero.
2115
2116
2117 ------
2118
2119 **Changes in Version 1.16.3**
2120
2121 Minor changes compared to version 1.16.2. The code of the "dict" and "rawdict" variants of :meth:`Page.getText` has been ported to C which has greatly improved their performance. This improvement is mostly noticeable with text-oriented documents, where they now should execute almost two times faster.
2122
2123 * **Fixed** issue #369 ("mupdf: cmsCreateTransform failed") by removing ICC colorspace support.
2124 * **Changed** :meth:`Page.getText` to accept additional keywords "blocks" and "words". These will deliver the results of :meth:`Page.getTextBlocks` and :meth:`Page.getTextWords`, respectively. So all text extraction methods are now available via a uniform API. Correspondingly, there are now new methods :meth:`TextPage.extractBLOCKS` and :meth:`TextPage.extractWords`.
2125 * **Changed** :meth:`Page.getText` to default bit indicator *TEXT_INHIBIT_SPACES* to **off**. Insertion of additional spaces is **not suppressed** by default.
2126
2127 ------
2128
2129 **Changes in Version 1.16.2**
2130
2131 * **Changed** text extraction methods of :ref:`Page` to allow detail control of the amount of extracted data.
2132 * **Added** :meth:`planish_line` which maps a given line (defined as a pair of points) to the x-axis.
2133 * **Fixed** an issue (w/o Github number) which brought down the interpreter when encountering certain non-UTF-8 encodable characters while using :meth:`Page.getText` with te "dict" option.
2134 * **Fixed** issue #362 ("Memory Leak with getText('rawDICT')").
2135
2136 ------
2137
2138 **Changes in Version 1.16.1**
2139
2140 * **Added** property :attr:`Quad.is_convex` which checks whether a line is contained in the quad if it connects two points of it.
2141 * **Changed** :meth:`Document.insert_pdf` to now allow dropping or including links and annotations independently during the copy. Fixes issue #352 ("Corrupt PDF data and ..."), which seemed to intermittently occur when using the method for some problematic PDF files.
2142 * **Fixed** a bug which, in matrix division using the syntax *"m1/m2"*, caused matrix *"m1"* to be **replaced** by the result instead of delivering a new matrix.
2143 * **Fixed** issue #354 ("SyntaxWarning with Python 3.8"). We now always use *"=="* for literals (instead of the *"is"* Python keyword).
2144 * **Fixed** issue #353 ("mupdf version check"), to no longer refuse the import when there are only patch level deviations from MuPDF.
2145
2146
2147
2148 ------
2149
2150 **Changes in Version 1.16.0**
2151
2152 This major new version of MuPDF comes with several nice new or changed features. Some of them imply programming API changes, however. This is a synopsis of what has changed:
2153
2154 * PDF document encryption and decryption is now **fully supported**. This includes setting **permissions**, **passwords** (user and owner passwords) and the desired encryption method.
2155 * In response to the new encryption features, PyMuPDF returns an integer (ie. a combination of bits) for document permissions, and no longer a dictionary.
2156 * Redirection of MuPDF errors and warnings is now natively supported. PyMuPDF redirects error messages from MuPDF to *sys.stderr* and no longer buffers them. Warnings continue to be buffered and will not be displayed. Functions exist to access and reset the warnings buffer.
2157 * Annotations are now **only supported for PDF**.
2158 * Annotations and widgets (form fields) are now **separate object chains** on a page (although widgets technically still **are** PDF annotations). This means, that you will **never encounter widgets** when using :attr:`Page.firstAnnot` or :meth:`Annot.next`. You must use :attr:`Page.firstWidget` and :meth:`Widget.next` to access form fields.
2159 * As part of MuPDF's changes regarding widgets, only the following four fonts are supported, when **adding** or **changing** form fields: **Courier, Helvetica, Times-Roman** and **ZapfDingBats**.
2160
2161 List of change details:
2162
2163 * **Added** :meth:`Document.can_save_incrementally` which checks conditions that are preventing use of option *incremental=True* of :meth:`Document.save`.
2164 * **Added** :attr:`Page.firstWidget` which points to the first field on a page.
2165 * **Added** :meth:`Page.getImageBbox` which returns the rectangle occupied by an image shown on the page.
2166 * **Added** :meth:`Annot.setName` which lets you change the (icon) name field.
2167 * **Added** outputting the text color in :meth:`Page.getText`: the *"dict"*, *"rawdict"* and *"xml"* options now also show the color in sRGB format.
2168 * **Changed** :attr:`Document.permissions` to now contain an integer of bool indicators -- was a dictionary before.
2169 * **Changed** :meth:`Document.save`, :meth:`Document.write`, which now fully support password-based decryption and encryption of PDF files.
2170 * **Changed the names of all Python constants** related to annotations and widgets. Please make sure to consult the **Constants and Enumerations** chapter if your script is dealing with these two classes. This decision goes back to the dropped support for non-PDF annotations. The **old names** (starting with "ANNOT_*" or "WIDGET_*") will be available as deprecated synonyms.
2171 * **Changed** font support for widgets: only *Cour* (Courier), *Helv* (Helvetica, default), *TiRo* (Times-Roman) and *ZaDb* (ZapfDingBats) are accepted when **adding or changing** form fields. Only the plain versions are possible -- not their italic or bold variations. **Reading** widgets, however will show its original font.
2172 * **Changed** the name of the warnings buffer to :meth:`Tools.mupdf_warnings` and the function to empty this buffer is now called :meth:`Tools.reset_mupdf_warnings`.
2173 * **Changed** :meth:`Page.getPixmap`, :meth:`Document.get_page_pixmap`: a new bool argument *annots* can now be used to **suppress the rendering of annotations** on the page.
2174 * **Changed** :meth:`Page.add_file_annot` and :meth:`Page.add_text_annot` to enable setting an icon.
2175 * **Removed** widget-related methods and attributes from the :ref:`Annot` object.
2176 * **Removed** :ref:`Document` attributes *openErrCode*, *openErrMsg*, and :ref:`Tools` attributes / methods *stderr*, *reset_stderr*, *stdout*, and *reset_stdout*.
2177 * **Removed** **thirdparty zlib** dependency in PyMuPDF: there are now compression functions available in MuPDF. Source installers of PyMuPDF may now omit this extra installation step.
2178
2179 **No version published for MuPDF v1.15.0**
2180
2181
2182 ------
2183
2184 **Changes in Version 1.14.20 / 1.14.21**
2185
2186 * **Changed** text marker annotations to support multiple rectangles / quadrilaterals. This fixes issue #341 ("Question : How to addhighlight so that a string spread across more than a line is covered by one highlight?") and similar (#285).
2187 * **Fixed** issue #331 ("Importing PyMuPDF changes warning filtering behaviour globally").
2188
2189
2190 ------
2191
2192 **Changes in Version 1.14.19**
2193
2194 * **Fixed** issue #319 ("InsertText function error when use custom font").
2195 * **Added** new method :meth:`Document.get_sigflags` which returns information on whether a PDF is signed. Resolves issue #326 ("How to detect signature in a form pdf?").
2196
2197
2198 ------
2199
2200 **Changes in Version 1.14.17**
2201
2202 * **Added** :meth:`Document.fullcopyPage` to make full page copies within a PDF (not just copied references as :meth:`Document.copyPage` does).
2203 * **Changed** :meth:`Page.getPixmap`, :meth:`Document.get_page_pixmap` now use *alpha=False* as default.
2204 * **Changed** text extraction: the span dictionary now (again) contains its rectangle under the *bbox* key.
2205 * **Changed** :meth:`Document.movePage` and :meth:`Document.copyPage` to use direct functions instead of wrapping :meth:`Document.select` -- similar to :meth:`Document.delete_page` in v1.14.16.
2206
2207 ------
2208
2209 **Changes in Version 1.14.16**
2210
2211 * **Changed** :ref:`Document` methods around PDF */EmbeddedFiles* to no longer use MuPDF's "portfolio" functions. That support will be dropped in MuPDF v1.15 -- therefore another solution was required.
2212 * **Changed** :meth:`Document.embfile_Count` to be a function (was an attribute).
2213 * **Added** new method :meth:`Document.embfile_Names` which returns a list of names of embedded files.
2214 * **Changed** :meth:`Document.delete_page` and :meth:`Document.delete_pages` to internally no longer use :meth:`Document.select`, but instead use functions to perform the deletion directly. As it has turned out, the :meth:`Document.select` method yields invalid outline trees (tables of content) for very complex PDFs and sophisticated use of annotations.
2215
2216
2217 ------
2218
2219 **Changes in Version 1.14.15**
2220
2221 * **Fixed** issues #301 ("Line cap and Line join"), #300 ("How to draw a shape without outlines") and #298 ("utils.updateRect exception"). These bugs pertain to drawing shapes with PyMuPDF. Drawing shapes without any border is fully supported. Line cap styles and line line join style are now differentiated and support all possible PDF values (0, 1, 2) instead of just being a bool. The previous parameter *roundCap* is deprecated in favor of *lineCap* and *lineJoin* and will be deleted in the next release.
2222 * **Fixed** issue #290 ("Memory Leak with getText('rawDICT')"). This bug caused memory not being (completely) freed after invoking the "dict", "rawdict" and "json" versions of :meth:`Page.getText`.
2223
2224
2225 ------
2226
2227 **Changes in Version 1.14.14**
2228
2229 * **Added** new low-level function :meth:`ImageProperties` to determine a number of characteristics for an image.
2230 * **Added** new low-level function :meth:`Document.is_stream`, which checks whether an object is of stream type.
2231 * **Changed** low-level functions :meth:`Document._getXrefString` and :meth:`Document._getTrailerString` now by default return object definitions in a formatted form which makes parsing easy.
2232
2233 ------
2234
2235 **Changes in Version 1.14.13**
2236
2237 * **Changed** methods working with binary input: while ever supporting bytes and bytearray objects, they now also accept *io.BytesIO* input, using their *getvalue()* method. This pertains to document creation, embedded files, FileAttachment annotations, pixmap creation and others. Fixes issue #274 ("Segfault when using BytesIO as a stream for insertImage").
2238 * **Fixed** issue #278 ("Is insertImage(keep_proportion=True) broken?"). Images are now correctly presented when keeping aspect ratio.
2239
2240
2241 ------
2242
2243 **Changes in Version 1.14.12**
2244
2245 * **Changed** the draw methods of :ref:`Page` and :ref:`Shape` to support not only RGB, but also GRAY and CMYK colorspaces. This solves issue #270 ("Is there a way to use CMYK color to draw shapes?"). This change also applies to text insertion methods of :ref:`Shape`, resp. :ref:`Page`.
2246 * **Fixed** issue #269 ("AttributeError in Document.insert_page()"), which occurred when using :meth:`Document.insert_page` with text insertion.
2247
2248
2249 ------
2250
2251 **Changes in Version 1.14.11**
2252
2253 * **Changed** :meth:`Page.show_pdf_page` to always position the source rectangle centered in the target. This method now also supports **rotation by arbitrary angles**. The argument *reuse_xref* has been deprecated: prevention of duplicates is now **handled internally**.
2254 * **Changed** :meth:`Page.insertImage` to support rotated display of the image and keeping the aspect ratio. Only rotations by multiples of 90 degrees are supported here.
2255 * **Fixed** issue #265 ("TypeError: insertText() got an unexpected keyword argument 'idx'"). This issue only occurred when using :meth:`Document.insert_page` with also inserting text.
2256
2257 ------
2258
2259 **Changes in Version 1.14.10**
2260
2261 * **Changed** :meth:`Page.show_pdf_page` to support rotation of the source rectangle. Fixes #261 ("Cannot rotate insterted pages").
2262 * **Fixed** a bug in :meth:`Page.insertImage` which prevented insertion of multiple images provided as streams.
2263
2264
2265 ------
2266
2267 **Changes in Version 1.14.9**
2268
2269 * **Added** new low-level method :meth:`Document._getTrailerString`, which returns the trailer object of a PDF. This is much like :meth:`Document._getXrefString` except that the PDF trailer has no / needs no :data:`xref` to identify it.
2270 * **Added** new parameters for text insertion methods. You can now set stroke and fill colors of glyphs (text characters) independently, as well as the thickness of the glyph border. A new parameter *render_mode* controls the use of these colors, and whether the text should be visible at all.
2271 * **Fixed** issue #258 ("Copying image streams to new PDF without size increase"): For JPX images embedded in a PDF, :meth:`Document.extractImage` will now return them in their original format. Previously, the MuPDF base library was used, which returns them in PNG format (entailing a massive size increase).
2272 * **Fixed** issue #259 ("Morphing text to fit inside rect"). Clarified use of :meth:`get_text_length` and removed extra line breaks for long words.
2273
2274 ------
2275
2276 **Changes in Version 1.14.8**
2277
2278 * **Added** :meth:`Pixmap.set_rect` to change the pixel values in a rectangle. This is also an alternative to setting the color of a complete pixmap (:meth:`Pixmap.clear_with`).
2279 * **Fixed** an image extraction issue with JBIG2 (monochrome) encoded PDF images. The issue occurred in :meth:`Page.getText` (parameters "dict" and "rawdict") and in :meth:`Document.extractImage` methods.
2280 * **Fixed** an issue with not correctly clearing a non-alpha :ref:`Pixmap` (:meth:`Pixmap.clear_with`).
2281 * **Fixed** an issue with not correctly inverting colors of a non-alpha :ref:`Pixmap` (:meth:`Pixmap.invert_irect`).
2282
2283 ------
2284
2285 **Changes in Version 1.14.7**
2286
2287 * **Added** :meth:`Pixmap.set_pixel` to change one pixel value.
2288 * **Added** documentation for image conversion in the :ref:`FAQ`.
2289 * **Added** new function :meth:`get_text_length` to determine the string length for a given font.
2290 * **Added** Postscript image output (changed :meth:`Pixmap.save` and :meth:`Pixmap.tobytes`).
2291 * **Changed** :meth:`Pixmap.save` and :meth:`Pixmap.tobytes` to ensure valid combinations of colorspace, alpha and output format.
2292 * **Changed** :meth:`Pixmap.save`: the desired format is now inferred from the filename.
2293 * **Changed** FreeText annotations can now have a transparent background - see :meth:`Annot.update`.
2294
2295 ------
2296
2297 **Changes in Version 1.14.5**
2298
2299 * **Changed:** :ref:`Shape` methods now strictly use the transformation matrix of the :ref:`Page` -- instead of "manually" calculating locations.
2300 * **Added** method :meth:`Pixmap.pixel` which returns the pixel value (a list) for given pixel coordinates.
2301 * **Added** method :meth:`Pixmap.tobytes` which returns a bytes object representing the pixmap in a variety of formats. Previously, this could be done for PNG outputs only (:meth:`Pixmap.tobytes`).
2302 * **Changed:** output of methods :meth:`Pixmap.save` and (the new) :meth:`Pixmap.tobytes` may now also be PSD (Adobe Photoshop Document).
2303 * **Added** method :meth:`Shape.drawQuad` which draws a :ref:`Quad`. This actually is a shorthand for a :meth:`Shape.drawPolyline` with the edges of the quad.
2304 * **Changed** method :meth:`Shape.drawOval`: the argument can now be **either** a rectangle (:data:`rect_like`) **or** a quadrilateral (:data:`quad_like`).
2305
2306 ------
2307
2308 **Changes in Version 1.14.4**
2309
2310 * **Fixes** issue #239 "Annotation coordinate consistency".
2311
2312
2313 ------
2314
2315 **Changes in Version 1.14.3**
2316
2317 This patch version contains minor bug fixes and CJK font output support.
2318
2319 * **Added** support for the four CJK fonts as PyMuPDF generated text output. This pertains to methods :meth:`Page.insertFont`, :meth:`Shape.insertText`, :meth:`Shape.insertTextbox`, and corresponding :ref:`Page` methods. The new fonts are available under "reserved" fontnames "china-t" (traditional Chinese), "china-s" (simplified Chinese), "japan" (Japanese), and "korea" (Korean).
2320 * **Added** full support for the built-in fonts 'Symbol' and 'Zapfdingbats'.
2321 * **Changed:** The 14 standard fonts can now each be referenced by a 4-letter abbreviation.
2322
2323 ------
2324
2325 **Changes in Version 1.14.1**
2326
2327 This patch version contains minor performance improvements.
2328
2329 * **Added** support for :ref:`Document` filenames given as *pathlib* object by using the Python *str()* function.
2330
2331
2332 ------
2333
2334 **Changes in Version 1.14.0**
2335
2336 To support MuPDF v1.14.0, massive changes were required in PyMuPDF -- most of them purely technical, with little visibility to developers. But there are also quite a lot of interesting new and improved features. Following are the details:
2337
2338 * **Added** "ink" annotation.
2339 * **Added** "rubber stamp" annotation.
2340 * **Added** "squiggly" text marker annotation.
2341 * **Added** new class :ref:`Quad` (quadrilateral or tetragon) -- which represents a general four-sided shape in the plane. The special subtype of rectangular, non-empty tetragons is used in text marker annotations and as returned objects in text search methods.
2342 * **Added** a new option "decrypt" to :meth:`Document.save` and :meth:`Document.write`. Now you can **keep encryption** when saving a password protected PDF.
2343 * **Added** suppression and redirection of unsolicited messages issued by the underlying C-library MuPDF. Consult :ref:`RedirectMessages` for details.
2344 * **Changed:** Changes to annotations now **always require** :meth:`Annot.update` to become effective.
2345 * **Changed** free text annotations to support the full Latin character set and range of appearance options.
2346 * **Changed** text searching, :meth:`Page.searchFor`, to optionally return :ref:`Quad` instead :ref:`Rect` objects surrounding each search hit.
2347 * **Changed** plain text output: we now add a *\n* to each line if it does not itself end with this character.
2348 * **Fixed** issue 211 ("Something wrong in the doc").
2349 * **Fixed** issue 213 ("Rewritten outline is displayed only by mupdf-based applications").
2350 * **Fixed** issue 214 ("PDF decryption GONE!").
2351 * **Fixed** issue 215 ("Formatting of links added with pyMuPDF").
2352 * **Fixed** issue 217 ("extraction through json is failing for my pdf").
2353
2354 Behind the curtain, we have changed the implementation of geometry objects: they now purely exist in Python and no longer have "shadow" twins on the C-level (in MuPDF). This has improved processing speed in that area by more than a factor of two.
2355
2356 Because of the same reason, most methods involving geometry parameters now also accept the corresponding Python sequence. For example, in method *"page.show_pdf_page(rect, ...)"* parameter *rect* may now be any :data:`rect_like` sequence.
2357
2358 We also invested considerable effort to further extend and improve the :ref:`FAQ` chapter.
2359
2360
2361 ------
2362
2363 **Changes in Version 1.13.19**
2364
2365 This version contains some technical / performance improvements and bug fixes.
2366
2367 * **Changed** memory management: for Python 3 builds, Python memory management is exclusively used across all C-level code (i.e. no more native *malloc()* in MuPDF code or PyMuPDF interface code). This leads to improved memory usage profiles and also some runtime improvements: we have seen > 2% shorter runtimes for text extractions and pixmap creations (on Windows machines only to date).
2368 * **Fixed** an error occurring in Python 2.7, which crashed the interpreter when using :meth:`TextPage.extractRAWDICT` (= *Page.getText("rawdict")*).
2369 * **Fixed** an error occurring in Python 2.7, when creating link destinations.
2370 * **Extended** the :ref:`FAQ` chapter with more examples.
2371
2372 ------
2373
2374 **Changes in Version 1.13.18**
2375
2376 * **Added** method :meth:`TextPage.extractRAWDICT`, and a corresponding new string parameter "rawdict" to method :meth:`Page.getText`. It extracts text and images from a page in Python *dict* form like :meth:`TextPage.extractDICT`, but with the detail level of :meth:`TextPage.extractXML`, which is position information down to each single character.
2377
2378 ------
2379
2380 **Changes in Version 1.13.17**
2381
2382 * **Fixed** an error that intermittently caused an exception in :meth:`Page.show_pdf_page`, when pages from many different source PDFs were shown.
2383 * **Changed** method :meth:`Document.extractImage` to now return more meta information about the extracted imgage. Also, its performance has been greatly improved. Several demo scripts have been changed to make use of this method.
2384 * **Changed** method :meth:`Document._getXrefStream` to now return *None* if the object is no stream and no longer raise an exception if otherwise.
2385 * **Added** method :meth:`Document._deleteObject` which deletes a PDF object identified by its :data:`xref`. Only to be used by the experienced PDF expert.
2386 * **Added** a method :meth:`paper_rect` which returns a :ref:`Rect` for a supplied paper format string. Example: *fitz.paper_rect("letter") = fitz.Rect(0.0, 0.0, 612.0, 792.0)*.
2387 * **Added** a :ref:`FAQ` chapter to this document.
2388
2389 ------
2390
2391 **Changes in Version 1.13.16**
2392
2393 * **Added** support for correctly setting transparency (opacity) for certain annotation types.
2394 * **Added** a tool property (:attr:`Tools.fitz_config`) showing the configuration of this PyMuPDF version.
2395 * **Fixed** issue #193 ('insertText(overlay=False) gives "cannot resize a buffer with shared storage" error') by avoiding read-only buffers.
2396
2397 ------
2398
2399 **Changes in Version 1.13.15**
2400
2401 * **Fixed** issue #189 ("cannot find builtin CJK font"), so we are supporting builtin CJK fonts now (CJK = China, Japan, Korea). This should lead to correctly generated pixmaps for documents using these languages. This change has consequences for our binary file size: it will now range between 8 and 10 MB, depending on the OS.
2402 * **Fixed** issue #191 ("Jupyter notebook kernel dies after ca. 40 pages"), which occurred when modifying the contents of an annotation.
2403
2404 ------
2405
2406 **Changes in Version 1.13.14**
2407
2408 This patch version contains several improvements, mainly for annotations.
2409
2410 * **Changed** :attr:`Annot.lineEnds` is now a list of two integers representing the line end symbols. Previously was a *dict* of strings.
2411 * **Added** support of line end symbols for applicable annotations. PyMuPDF now can generate these annotations including the line end symbols.
2412 * **Added** :meth:`Annot.setLineEnds` adds line end symbols to applicable annotation types ('Line', 'PolyLine', 'Polygon').
2413 * **Changed** technical implementation of :meth:`Page.insertImage` and :meth:`Page.show_pdf_page`: they now create there own contents objects, thereby avoiding changes of potentially large streams with consequential compression / decompression efforts and high change volumes with incremental updates.
2414
2415 ------
2416
2417 **Changes in Version 1.13.13**
2418
2419 This patch version contains several improvements for embedded files and file attachment annotations.
2420
2421 * **Added** :meth:`Document.embfile_Upd` which allows changing **file content and metadata** of an embedded file. It supersedes the old method :meth:`Document.embfile_SetInfo` (which will be deleted in a future version). Content is automatically compressed and metadata may be unicode.
2422 * **Changed** :meth:`Document.embfile_Add` to now automatically compress file content. Accompanying metadata can now be unicode (had to be ASCII in the past).
2423 * **Changed** :meth:`Document.embfile_Del` to now automatically delete **all entries** having the supplied identifying name. The return code is now an integer count of the removed entries (was *None* previously).
2424 * **Changed** embedded file methods to now also accept or show the PDF unicode filename as additional parameter *ufilename*.
2425 * **Added** :meth:`Page.add_file_annot` which adds a new file attachment annotation.
2426 * **Changed** :meth:`Annot.fileUpd` (file attachment annot) to now also accept the PDF unicode *ufilename* parameter. The description parameter *desc* correctly works with unicode. Furthermore, **all** parameters are optional, so metadata may be changed without also replacing the file content.
2427 * **Changed** :meth:`Annot.fileInfo` (file attachment annot) to now also show the PDF unicode filename as parameter *ufilename*.
2428 * **Fixed** issue #180 ("page.getText(output='dict') return invalid bbox") to now also work for vertical text.
2429 * **Fixed** issue #185 ("Can't render the annotations created by PyMuPDF"). The issue's cause was the minimalistic MuPDF approach when creating annotations. Several annotation types have no */AP* ("appearance") object when created by MuPDF functions. MuPDF, SumatraPDF and hence also PyMuPDF cannot render annotations without such an object. This fix now ensures, that an appearance object is always created together with the annotation itself. We still do not support line end styles.
2430
2431 ------
2432
2433 **Changes in Version 1.13.12**
2434
2435 * **Fixed** issue #180 ("page.getText(output='dict') return invalid bbox"). Note that this is a circumvention of an MuPDF error, which generates zero-height character rectangles in some cases. When this happens, this fix ensures a bbox height of at least fontsize.
2436 * **Changed** for ListBox and ComboBox widgets, the attribute list of selectable values has been renamed to :attr:`Widget.choice_values`.
2437 * **Changed** when adding widgets, any missing of the :ref:`Base-14-Fonts` is automatically added to the PDF. Widget text fonts can now also be chosen from existing widget fonts. Any specified field values are now honored and lead to a field with a preset value.
2438 * **Added** :meth:`Annot.updateWidget` which allows changing existing form fields -- including the field value.
2439
2440 ------
2441
2442 **Changes in Version 1.13.11**
2443
2444 While the preceeding patch subversions only contained various fixes, this version again introduces major new features:
2445
2446 * **Added** basic support for PDF widget annotations. You can now add PDF form fields of types Text, CheckBox, ListBox and ComboBox. Where necessary, the PDF is tranformed to a Form PDF with the first added widget.
2447 * **Fixed** issues #176 ("wrong file embedding"), #177 ("segment fault when invoking page.getText()")and #179 ("Segmentation fault using page.getLinks() on encrypted PDF").
2448
2449
2450 ------
2451
2452 **Changes in Version 1.13.7**
2453
2454 * **Added** support of variable page sizes for reflowable documents (e-books, HTML, etc.): new parameters *rect* and *fontsize* in :ref:`Document` creation (open), and as a separate method :meth:`Document.layout`.
2455 * **Added** :ref:`Annot` creation of many annotations types: sticky notes, free text, circle, rectangle, line, polygon, polyline and text markers.
2456 * **Added** support of annotation transparency (:attr:`Annot.opacity`, :meth:`Annot.setOpacity`).
2457 * **Changed** :attr:`Annot.vertices`: point coordinates are now grouped as pairs of floats (no longer as separate floats).
2458 * **Changed** annotation colors dictionary: the two keys are now named *"stroke"* (formerly *"common"*) and *"fill"*.
2459 * **Added** :attr:`Document.isDirty` which is *True* if a PDF has been changed in this session. Reset to *False* on each :meth:`Document.save` or :meth:`Document.write`.
2460
2461 ------
2462
2463 **Changes in Version 1.13.6**
2464
2465 * Fix #173: for memory-resident documents, ensure the stream object will not be garbage-collected by Python before document is closed.
2466
2467 ------
2468
2469 **Changes in Version 1.13.5**
2470
2471 * New low-level method :meth:`Page._setContents` defines an object given by its :data:`xref` to serve as the :data:`contents` object.
2472 * Changed and extended PDF form field support: the attribute *widget_text* has been renamed to :attr:`Annot.widget_value`. Values of all form field types (except signatures) are now supported. A new attribute :attr:`Annot.widget_choices` contains the selectable values of listboxes and comboboxes. All these attributes now contain *None* if no value is present.
2473
2474 ------
2475
2476 **Changes in Version 1.13.4**
2477
2478 * :meth:`Document.convertToPDF` now supports page ranges, reverted page sequences and page rotation. If the document already is a PDF, an exception is raised.
2479 * Fixed a bug (introduced with v1.13.0) that prevented :meth:`Page.insertImage` for transparent images.
2480
2481 ------
2482
2483 **Changes in Version 1.13.3**
2484
2485 Introduces a way to convert **any MuPDF supported document** to a PDF. If you ever wanted PDF versions of your XPS, EPUB, CBZ or FB2 files -- here is a way to do this.
2486
2487 * :meth:`Document.convertToPDF` returns a Python *bytes* object in PDF format. Can be opened like normal in PyMuPDF, or be written to disk with the *".pdf"* extension.
2488
2489 ------
2490
2491 **Changes in Version 1.13.2**
2492
2493 The major enhancement is PDF form field support. Form fields are annotations of type *(19, 'Widget')*. There is a new document method to check whether a PDF is a form. The :ref:`Annot` class has new properties describing field details.
2494
2495 * :attr:`Document.is_form_pdf` is true if object type */AcroForm* and at least one form field exists.
2496 * :attr:`Annot.widget_type`, :attr:`Annot.widget_text` and :attr:`Annot.widget_name` contain the details of a form field (i.e. a "Widget" annotation).
2497
2498 ------
2499
2500 **Changes in Version 1.13.1**
2501
2502 * :meth:`TextPage.extractDICT` is a new method to extract the contents of a document page (text and images). All document types are supported as with the other :ref:`TextPage` *extract*()* methods. The returned object is a dictionary of nested lists and other dictionaries, and **exactly equal** to the JSON-deserialization of the old :meth:`TextPage.extractJSON`. The difference is that the result is created directly -- no JSON module is used. Because the user needs no JSON module to interpet the information, it should be easier to use, and also have a better performance, because it contains images in their original **binary format** -- they need not be base64-decoded.
2503 * :meth:`Page.getText` correspondingly supports the new parameter value *"dict"* to invoke the above method.
2504 * :meth:`TextPage.extractJSON` (resp. *Page.getText("json")*) is still supported for convenience, but its use is expected to decline.
2505
2506 ------
2507
2508 **Changes in Version 1.13.0**
2509
2510 This version is based on MuPDF v1.13.0. This release is "primarily a bug fix release".
2511
2512 In PyMuPDF, we are also doing some bug fixes while introducing minor enhancements. There only very minimal changes to the user's API.
2513
2514 * :ref:`Document` construction is more flexible: the new *filetype* parameter allows setting the document type. If specified, any extension in the filename will be ignored. More completely addresses `issue #156 <https://github.com/pymupdf/PyMuPDF/issues/156>`_. As part of this, the documentation has been reworked.
2515
2516 * Changes to :ref:`Pixmap` constructors:
2517 - Colorspace conversion no longer allows dropping the alpha channel: source and target **alpha will now always be the same**. We have seen exceptions and even interpreter crashes when using *alpha = 0*.
2518 - As a replacement, the simple pixmap copy lets you choose the target alpha.
2519
2520 * :meth:`Document.save` again offers the full garbage collection range 0 thru 4. Because of a bug in :data:`xref` maintenance, we had to temporarily enforce *garbage > 1*. Finally resolves `issue #148 <https://github.com/pymupdf/PyMuPDF/issues/148>`_.
2521
2522 * :meth:`Document.save` now offers to "prettify" PDF source via an additional argument.
2523 * :meth:`Page.insertImage` has the additional *stream* \-parameter, specifying a memory area holding an image.
2524
2525 * Issue with garbled PNGs on Linux systems has been resolved (`"Problem writing PNG" #133) <https://github.com/pymupdf/PyMuPDF/issues/133>`_.
2526
2527
2528 ------
2529
2530 **Changes in Version 1.12.4**
2531
2532 This is an extension of 1.12.3.
2533
2534 * Fix of `issue #147 <https://github.com/pymupdf/PyMuPDF/issues/147>`_: methods :meth:`Document.getPageFontlist` and :meth:`Document.getPageImagelist` now also show fonts and images contained in :data:`resources` nested via "Form XObjects".
2535 * Temporary fix of `issue #148 <https://github.com/pymupdf/PyMuPDF/issues/148>`_: Saving to new PDF files will now automatically use *garbage = 2* if a lower value is given. Final fix is to be expected with MuPDF's next version. At that point we will remove this circumvention.
2536 * Preventive fix of illegally using stencil / image mask pixmaps in some methods.
2537 * Method :meth:`Document.getPageFontlist` now includes the encoding name for each font in the list.
2538 * Method :meth:`Document.getPageImagelist` now includes the decode method name for each image in the list.
2539
2540 ------
2541
2542 **Changes in Version 1.12.3**
2543
2544 This is an extension of 1.12.2.
2545
2546 * Many functions now return *None* instead of *0*, if the result has no other meaning than just indicating successful execution (:meth:`Document.close`, :meth:`Document.save`, :meth:`Document.select`, :meth:`Pixmap.save` and many others).
2547
2548 ------
2549
2550 **Changes in Version 1.12.2**
2551
2552 This is an extension of 1.12.1.
2553
2554 * Method :meth:`Page.show_pdf_page` now accepts the new *clip* argument. This specifies an area of the source page to which the display should be restricted.
2555
2556 * New :attr:`Page.CropBox` and :attr:`Page.MediaBox` have been included for convenience.
2557
2558
2559 ------
2560
2561 **Changes in Version 1.12.1**
2562
2563 This is an extension of version 1.12.0.
2564
2565 * New method :meth:`Page.show_pdf_page` displays another's PDF page. This is a **vector** image and therefore remains precise across zooming. Both involved documents must be PDF.
2566
2567 * New method :meth:`Page.getSVGimage` creates an SVG image from the page. In contrast to the raster image of a pixmap, this is a vector image format. The return is a unicode text string, which can be saved in a *.svg* file.
2568
2569 * Method :meth:`Page.getTextBlocks` now accepts an additional bool parameter "images". If set to true (default is false), image blocks (metadata only) are included in the produced list and thus allow detecting areas with rendered images.
2570
2571 * Minor bug fixes.
2572
2573 * "text" result of :meth:`Page.getText` concatenates all lines within a block using a single space character. MuPDF's original uses "\\n" instead, producing a rather ragged output.
2574
2575 * New properties of :ref:`Page` objects :attr:`Page.MediaBoxSize` and :attr:`Page.CropBoxPosition` provide more information about a page's dimensions. For non-PDF files (and for most PDF files, too) these will be equal to :attr:`Page.rect.bottom_right`, resp. :attr:`Page.rect.top_left`. For example, class :ref:`Shape` makes use of them to correctly position its items.
2576
2577 ------
2578
2579 **Changes in Version 1.12.0**
2580
2581 This version is based on and requires MuPDF v1.12.0. The new MuPDF version contains quite a number of changes -- most of them around text extraction. Some of the changes impact the programmer's API.
2582
2583 * :meth:`Outline.saveText` and :meth:`Outline.saveXML` have been deleted without replacement. You probably haven't used them much anyway. But if you are looking for a replacement: the output of :meth:`Document.get_toc` can easily be used to produce something equivalent.
2584
2585 * Class *TextSheet* does no longer exist.
2586
2587 * Text "spans" (one of the hierarchy levels of :ref:`TextPage`) no longer contain positioning information (i.e. no "bbox" key). Instead, spans now provide the font information for its text. This impacts our JSON output variant.
2588
2589 * HTML output has improved very much: it now creates valid documents which can be displayed by browsers to produce a similar view as the original document.
2590
2591 * There is a new output format XHTML, which provides text and images in a browser-readable format. The difference to HTML output is, that no effort is made to reproduce the original layout.
2592
2593 * All output formats of :meth:`Page.getText` now support creating complete, valid documents, by wrapping them with appropriate header and trailer information. If you are interested in using the HTML output, please make sure to read :ref:`HTMLQuality`.
2594
2595 * To support finding text positions, we have added special methods that don't need detours like :meth:`TextPage.extractJSON` or :meth:`TextPage.extractXML`: use :meth:`Page.getTextBlocks` or resp. :meth:`Page.getTextWords` to create lists of text blocks or resp. words, which are accompanied by their rectangles. This should be much faster than the standard text extraction methods and also avoids using additional packages for interpreting their output.
2596
2597
2598 ------
2599
2600 **Changes in Version 1.11.2**
2601
2602 This is an extension of v1.11.1.
2603
2604 * New :meth:`Page.insertFont` creates a PDF */Font* object and returns its object number.
2605
2606 * New :meth:`Document.extractFont` extracts the content of an embedded font given its object number.
2607
2608 * Methods **FontList(...)** items no longer contain the PDF generation number. This value never had any significance. Instead, the font file extension is included (e.g. "pfa" for a "PostScript Font for ASCII"), which is more valuable information.
2609
2610 * Fonts other than "simple fonts" (Type1) are now also supported.
2611
2612 * New options to change :ref:`Pixmap` size:
2613
2614 * Method :meth:`Pixmap.shrink` reduces the pixmap proportionally in place.
2615
2616 * A new :ref:`Pixmap` copy constructor allows scaling via setting target width and height.
2617
2618
2619 ------
2620
2621 **Changes in Version 1.11.1**
2622
2623 This is an extension of v1.11.0.
2624
2625 * New class *Shape*. It facilitates and extends the creation of image shapes on PDF pages. It contains multiple methods for creating elementary shapes like lines, rectangles or circles, which can be combined into more complex ones and be given common properties like line width or colors. Combined shapes are handled as a unit and e.g. be "morphed" together. The class can accumulate multiple complex shapes and put them all in the page's foreground or background -- thus also reducing the number of updates to the page's :data:`contents` object.
2626
2627 * All *Page* draw methods now use the new *Shape* class.
2628
2629 * Text insertion methods *insertText()* and *insertTextBox()* now support morphing in addition to text rotation. They have become part of the *Shape* class and thus allow text to be freely combined with graphics.
2630
2631 * A new *Pixmap* constructor allows creating pixmap copies with an added alpha channel. A new method also allows directly manipulating alpha values.
2632
2633 * Binary algebraic operations with geometry objects (matrices, rectangles and points) now generally also support lists or tuples as the second operand. You can add a tuple *(x, y)* of numbers to a :ref:`Point`. In this context, such sequences are called ":data:`point_like`" (resp. :data:`matrix_like`, :data:`rect_like`).
2634
2635 * Geometry objects now fully support in-place operators. For example, *p /= m* replaces point p with *p * 1/m* for a number, or *p * ~m* for a :data:`matrix_like` object *m*. Similarly, if *r* is a rectangle, then *r |= (3, 4)* is the new rectangle that also includes *fitz.Point(3, 4)*, and *r &= (1, 2, 3, 4)* is its intersection with *fitz.Rect(1, 2, 3, 4)*.
2636
2637 ------
2638
2639 **Changes in Version 1.11.0**
2640
2641 This version is based on and requires MuPDF v1.11.
2642
2643 Though MuPDF has declared it as being mostly a bug fix version, one major new feature is indeed contained: support of embedded files -- also called portfolios or collections. We have extended PyMuPDF functionality to embrace this up to an extent just a little beyond the *mutool* utility as follows.
2644
2645 * The *Document* class now support embedded files with several new methods and one new property:
2646
2647 - *embfile_Info()* returns metadata information about an entry in the list of embedded files. This is more than *mutool* currently provides: it shows all the information that was used to embed the file (not just the entry's name).
2648 - *embfile_Get()* retrieves the (decompressed) content of an entry into a *bytes* buffer.
2649 - *embfile_Add(...)* inserts new content into the PDF portfolio. We (in contrast to *mutool*) **restrict** this to entries with a **new name** (no duplicate names allowed).
2650 - *embfile_Del(...)* deletes an entry from the portfolio (function not offered in MuPDF).
2651 - *embfile_SetInfo()* -- changes filename or description of an embedded file.
2652 - *embfile_Count* -- contains the number of embedded files.
2653
2654 * Several enhancements deal with streamlining geometry objects. These are not connected to the new MuPDF version and most of them are also reflected in PyMuPDF v1.10.0. Among them are new properties to identify the corners of rectangles by name (e.g. *Rect.bottom_right*) and new methods to deal with set-theoretic questions like *Rect.contains(x)* or *IRect.intersects(x)*. Special effort focussed on supporting more "Pythonic" language constructs: *if x in rect ...* is equivalent to *rect.contains(x)*.
2655
2656 * The :ref:`Rect` chapter now has more background on empty amd infinite rectangles and how we handle them. The handling itself was also updated for more consistency in this area.
2657
2658 * We have started basic support for **generation** of PDF content:
2659
2660 - *Document.insert_page()* adds a new page into a PDF, optionally containing some text.
2661 - *Page.insertImage()* places a new image on a PDF page.
2662 - *Page.insertText()* puts new text on an existing page
2663
2664 * For **FileAttachment** annotations, content and name of the attached file can extracted and changed.
2665
2666 ------
2667
2668 **Changes in Version 1.10.0**
2669
2670 **MuPDF v1.10 Impact**
2671
2672 MuPDF version 1.10 has a significant impact on our bindings. Some of the changes also affect the API -- in other words, **you** as a PyMuPDF user.
2673
2674 * Link destination information has been reduced. Several properties of the *linkDest* class no longer contain valuable information. In fact, this class as a whole has been deleted from MuPDF's library and we in PyMuPDF only maintain it to provide compatibilty to existing code.
2675
2676 * In an effort to minimize memory requirements, several improvements have been built into MuPDF v1.10:
2677
2678 - A new *config.h* file can be used to de-select unwanted features in the C base code. Using this feature we have been able to reduce the size of our binary *_fitz.o* / *_fitz.pyd* by about 50% (from 9 MB to 4.5 MB). When UPX-ing this, the size goes even further down to a very handy 2.3 MB.
2679
2680 - The alpha (transparency) channel for pixmaps is now optional. Letting alpha default to *False* significantly reduces pixmap sizes (by 20% -- CMYK, 25% -- RGB, 50% -- GRAY). Many *Pixmap* constructors therefore now accept an *alpha* boolean to control inclusion of this channel. Other pixmap constructors (e.g. those for file and image input) create pixmaps with no alpha alltogether. On the downside, save methods for pixmaps no longer accept a *savealpha* option: this channel will always be saved when present. To minimize code breaks, we have left this parameter in the call patterns -- it will just be ignored.
2681
2682 * *DisplayList* and *TextPage* class constructors now **require the mediabox** of the page they are referring to (i.e. the *page.bound()* rectangle). There is no way to construct this information from other sources, therefore a source code change cannot be avoided in these cases. We assume however, that not many users are actually employing these rather low level classes explixitely. So the impact of that change should be minor.
2683
2684 **Other Changes compared to Version 1.9.3**
2685
2686 * The new :ref:`Document` method *write()* writes an opened PDF to memory (as opposed to a file, like *save()* does).
2687 * An annotation can now be scaled and moved around on its page. This is done by modifying its rectangle.
2688 * Annotations can now be deleted. :ref:`Page` contains the new method *deleteAnnot()*.
2689 * Various annotation attributes can now be modified, e.g. content, dates, title (= author), border, colors.
2690 * Method *Document.insert_pdf()* now also copies annotations of source pages.
2691 * The *Pages* class has been deleted. As documents can now be accessed with page numbers as indices (like *doc[n] = doc.loadPage(n)*), and document object can be used as iterators, the benefit of this class was too low to maintain it. See the following comments.
2692 * *loadPage(n)* / *doc[n]* now accept arbitrary integers to specify a page number, as long as *n < pageCount*. So, e.g. *doc[-500]* is always valid and will load page *(-500) % pageCount*.
2693 * A document can now also be used as an iterator like this: *for page in doc: ...<do something with "page"> ...*. This will yield all pages of *doc* as *page*.
2694 * The :ref:`Pixmap` method *getSize()* has been replaced with property *size*. As before *Pixmap.size == len(Pixmap)* is true.
2695 * In response to transparency (alpha) being optional, several new parameters and properties have been added to :ref:`Pixmap` and :ref:`Colorspace` classes to support determining their characteristics.
2696 * The :ref:`Page` class now contains new properties *firstAnnot* and *firstLink* to provide starting points to the respective class chains, where *firstLink* is just a mnemonic synonym to method *loadLinks()* which continues to exist. Similarly, the new property *rect* is a synonym for method *bound()*, which also continues to exist.
2697 * :ref:`Pixmap` methods *samplesRGB()* and *samplesAlpha()* have been deleted because pixmaps can now be created without transparency.
2698 * :ref:`Rect` now has a property *irect* which is a synonym of method *round()*. Likewise, :ref:`IRect` now has property *rect* to deliver a :ref:`Rect` which has the same coordinates as floats values.
2699 * Document has the new method *searchPageFor()* to search for a text string. It works exactly like the corresponding *Page.searchFor()* with page number as additional parameter.
2700
2701
2702 ------
2703
2704 **Changes in Version 1.9.3**
2705
2706 This version is also based on MuPDF v1.9a. Changes compared to version 1.9.2:
2707
2708 * As a major enhancement, annotations are now supported in a similar way as links. Annotations can be displayed (as pixmaps) and their properties can be accessed.
2709 * In addition to the document *select()* method, some simpler methods can now be used to manipulate a PDF:
2710
2711 - *copyPage()* copies a page within a document.
2712 - *movePage()* is similar, but deletes the original.
2713 - *delete_page()* deletes a page
2714 - *delete_pages()* deletes a page range
2715
2716 * *rotation* or *setRotation()* access or change a PDF page's rotation, respectively.
2717 * Available but undocumented before, :ref:`IRect`, :ref:`Rect`, :ref:`Point` and :ref:`Matrix` support the *len()* method and their coordinate properties can be accessed via indices, e.g. *IRect.x1 == IRect[2]*.
2718 * For convenience, documents now support simple indexing: *doc.loadPage(n) == doc[n]*. The index may however be in range *-pageCount < n < pageCount*, such that *doc[-1]* is the last page of the document.
2719
2720 ------
2721
2722 **Changes in Version 1.9.2**
2723
2724 This version is also based on MuPDF v1.9a. Changes compared to version 1.9.1:
2725
2726 * *fitz.open()* (no parameters) creates a new empty **PDF** document, i.e. if saved afterwards, it must be given a *.pdf* extension.
2727 * :ref:`Document` now accepts all of the following formats (*Document* and *open* are synonyms):
2728
2729 - *open()*,
2730 - *open(filename)* (equivalent to *open(filename, None)*),
2731 - *open(filetype, area)* (equivalent to *open(filetype, stream = area)*).
2732
2733 Type of memory area *stream* may be *bytes* or *bytearray*. Thus, e.g. *area = open("file.pdf", "rb").read()* may be used directly (without first converting it to bytearray).
2734 * New method *Document.insert_pdf()* (PDFs only) inserts a range of pages from another PDF.
2735 * *Document* objects doc now support the *len()* function: ``len(doc) == doc.pageCount``.
2736 * New method *Document.getPageImageList()* creates a list of images used on a page.
2737 * New method *Document.getPageFontList()* creates a list of fonts referenced by a page.
2738 * New pixmap constructor *fitz.Pixmap(doc, xref)* creates a pixmap based on an opened PDF document and an :data:`xref` number of the image.
2739 * New pixmap constructor *fitz.Pixmap(cspace, spix)* creates a pixmap as a copy of another one *spix* with the colorspace converted to *cspace*. This works for all colorspace combinations.
2740 * Pixmap constructor *fitz.Pixmap(colorspace, width, height, samples)* now allows *samples* to also be *bytes*, not only *bytearray*.
2741
2742
2743 ------
2744
2745 **Changes in Version 1.9.1**
2746
2747 This version of PyMuPDF is based on MuPDF library source code version 1.9a published on April 21, 2016.
2748
2749 Please have a look at MuPDF's website to see which changes and enhancements are contained herein.
2750
2751 Changes in version 1.9.1 compared to version 1.8.0 are the following:
2752
2753 * New methods *get_area()* for both *fitz.Rect* and *fitz.IRect*
2754 * Pixmaps can now be created directly from files using the new constructor *fitz.Pixmap(filename)*.
2755 * The Pixmap constructor *fitz.Pixmap(image)* has been extended accordingly.
2756 * *fitz.Rect* can now be created with all possible combinations of points and coordinates.
2757 * PyMuPDF classes and methods now all contain __doc__ strings, most of them created by SWIG automatically. While the PyMuPDF documentation certainly is more detailed, this feature should help a lot when programming in Python-aware IDEs.
2758 * A new document method of *getPermits()* returns the permissions associated with the current access to the document (print, edit, annotate, copy), as a Python dictionary.
2759 * The identity matrix *fitz.Identity* is now **immutable**.
2760 * The new document method *select(list)* removes all pages from a document that are not contained in the list. Pages can also be duplicated and re-arranged.
2761 * Various improvements and new members in our demo and examples collections. Perhaps most prominently: *PDF_display* now supports scrolling with the mouse wheel, and there is a new example program *wxTableExtract* which allows to graphically identify and extract table data in documents.
2762 * *fitz.open()* is now an alias of *fitz.Document()*.
2763 * New pixmap method *tobytes()* which will return a bytearray formatted as a PNG image of the pixmap.
2764 * New pixmap method *samplesRGB()* providing a *samples* version with alpha bytes stripped off (RGB colorspaces only).
2765 * New pixmap method *samplesAlpha()* providing the alpha bytes only of the *samples* area.
2766 * New iterator *fitz.Pages(doc)* over a document's set of pages.
2767 * New matrix methods *invert()* (calculate inverted matrix), *concat()* (calculate matrix product), *pretranslate()* (perform a shift operation).
2768 * New *IRect* methods *intersect()* (intersection with another rectangle), *translate()* (perform a shift operation).
2769 * New *Rect* methods *intersect()* (intersection with another rectangle), *transform()* (transformation with a matrix), *include_point()* (enlarge rectangle to also contain a point), *include_rect()* (enlarge rectangle to also contain another one).
2770 * Documented *Point.transform()* (transform a point with a matrix).
2771 * *Matrix*, *IRect*, *Rect* and *Point* classes now support compact, algebraic formulations for manipulating such objects.
2772 * Incremental saves for changes are possible now using the call pattern *doc.save(doc.name, incremental=True)*.
2773 * A PDF's metadata can now be deleted, set or changed by document method *set_metadata()*. Supports incremental saves.
2774 * A PDF's bookmarks (or table of contents) can now be deleted, set or changed with the entries of a list using document method *set_toc(list)*. Supports incremental saves.
2775
2776 .. codespell:ignore-end