fix: manage PDFium backend resource lifecycles to avoid SIGSEGV/SIGTRAP crashes (#3180)
* fix: explicitly close PdfBitmap after copy in both PDF backends pypdfium2's to_pil() shares native buffer memory for RGBA/RGBX/L formats via frombuffer(). The chained render().to_pil().resize() pattern allowed the PdfBitmap to reach refcount 0 mid-expression, causing GC to invoke FPDFBitmap_Destroy and free the native buffer while PIL still held a dangling pointer to it — resulting in non-deterministic SIGSEGV crashes in concurrent scenarios. Fix: store the bitmap explicitly, copy the PIL image to detach it from the shared native buffer, then close the bitmap under the lock before proceeding with the resize on the independent copy. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * upgrade uv.lock Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * fix: managed PDFium backend lifecycle with explicit native close and live-page tracking Introduces ManagedPdfiumDocumentBackend / ManagedPdfiumPageBackend base classes that both PDF backends now inherit from. Key changes: - Live pages are tracked in a set on the document; document unload waits for all pages to be released before tearing down native handles. - Page and document unload now call explicit .close() on native PDFium objects under the lock, rather than just nulling Python references. This makes teardown deterministic rather than relying on GC finalizers which can fire from any thread without the lock. - text_page is explicitly closed before _ppage to respect the PDFium parent/child handle hierarchy. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor: strip dead live-page tracking from managed PDFium backend The Condition, Lock, _live_pages set, _closing flag, and owner back-ref on pages were remnants of the Group-3b pipeline defensive shutdown that was not included here. The pipeline always unloads page backends before calling document.unload(), so _close_live_pages() was always a no-op and notify_all() had zero waiters. Reduced ManagedPdfiumDocumentBackend/ManagedPdfiumPageBackend to just a _closed guard and the abstract _close_native_* dispatch. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * DCO Remediation Commit for Christoph Auer <cau@zurich.ibm.com> I, Christoph Auer <cau@zurich.ibm.com>, hereby add my Signed-off-by to this commit: b3f4e6692d81c324d301e4f6b79681e763ea9217 I, Christoph Auer <cau@zurich.ibm.com>, hereby add my Signed-off-by to this commit: 79b18945a8c954381ce9ffb043fc142e26d3cde5 I, Christoph Auer <cau@zurich.ibm.com>, hereby add my Signed-off-by to this commit: b389c82456d6d09ad16623f7d361f8c82d621df9 I, Christoph Auer <cau@zurich.ibm.com>, hereby add my Signed-off-by to this commit: 5e3510f80f7eb34db0d6b6da9bfff9e5f96f30ed Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * downgrade mkdocs-jupyter to <0.26 because it breaks docs gen Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Christoph Auer <cau@zurich.ibm.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
C
Christoph Auer committed
a0fc3c9d731c29f896680b17fa6df5549e2dfc5d
Parent: 1c74a9b
Committed by GitHub <noreply@github.com>
on 3/24/2026, 4:59:55 PM