SIGN IN SIGN UP
microsoft / markitdown UNCLAIMED

Python tool for converting files and office documents to Markdown.

0 0 74 Python

[MS] Update PDF table extraction to support aligned Markdown (#1499)

* Added PDF table extraction feature with aligned Markdown (#1419)

* Add PDF test files and enhance extraction tests

- Added a medical report scan PDF for testing scanned PDF handling.
- Included a retail purchase receipt PDF to validate receipt extraction functionality.
- Introduced a multipage invoice PDF to test extraction of complex invoice structures.
- Added a borderless table PDF for testing inventory reconciliation report extraction.
- Implemented comprehensive tests for PDF table extraction, ensuring proper structure and data integrity.
- Enhanced existing tests to validate the order and presence of extracted content across various PDF types.

* fix: update dependencies for PDF processing and improve table extraction logic

* Bumped version of pdfminer.six
---------

Authored-by: Ashok <ashh010101@gmail.com>
L
lesyk committed
251dddcf0cffec467d72837fc5dd0b10f08df98e
Parent: dde250a
Committed by GitHub <noreply@github.com> on 1/8/2026, 12:38:45 AM