COMMITS
June 3, 2025
M
feat: simplify dependencies, switch to uv (#1700)
Michele Dolfi committed
P
test: mark flaky test (#1698)
Panos Vagenas committed
June 2, 2025
P
feat: new vlm-models support (#1570)
Peter W. J. Staar committed
G
chore: bump version to 2.35.0 [skip ci]
github-actions[bot] committed
E
docs: fix typo in index.md (#1676)
Edgar Hipp committed
P
test: ensure utf-8 in test data utils (#1691)
Panos Vagenas committed
C
fix: guess HTML content starting with script tag (#1673)
Cesar Berrospi Ramis committed
May 28, 2025
C
chore: fix or ignore runtime and deprecation warnings (#1660)
Cesar Berrospi Ramis committed
P
chore: exclude data from GH Linguist (#1671)
Panos Vagenas committed
C
test: add missing ground truth files (#1667)
Cesar Berrospi Ramis committed
P
feat: Add visualization of bbox on page with html export. (#1663)
Peter W. J. Staar committed
May 27, 2025
May 22, 2025
G
chore: bump version to 2.34.0 [skip ci]
github-actions[bot] committed
S
fix: fix ZeroDivisionError for cell_bbox.area() (#1636)
Said Gürbüz committed
May 21, 2025
C
feat(ocr): auto-detect rotated pages in Tesseract (#1167)
Clément Doumouro committed
C
feat: Establish confidence estimation for document and pages (#1313)
Christoph Auer committed
V
fix(integration): update the Apify Actor integration (#1619)
Václav Vančura committed
May 20, 2025
G
chore: bump version to 2.33.0 [skip ci]
github-actions[bot] committed
M
fix: Fix issue with detecting docx files, and files with upper case extensions (#1609)
MoheyElDin Badr committed
S
fix: load_from_doctags static usage (#1617)
Said Gürbüz committed
K
fix: incorrect force_backend_text behaviour for VLM DocTag pipelines (#1371)
Krishnan committed
May 19, 2025
P
fix(pypdfium): resolve overlapping text when merging bounding boxes (#1549)
Pedro Ribeiro committed
A
feat: add textbox content extraction in msword_backend (#1538)
AndrewTsai0406 committed
May 16, 2025
P
chore: fix chunking example data link (#1596)
Panos Vagenas committed
May 14, 2025
G
chore: bump version to 2.32.0 [skip ci]
github-actions[bot] committed
V
feat: Improve parallelization for remote services API calls (#1548)
Vinay R Damodaran committed
J
fix(ocr): orig field in TesseractOcrCliModel as str (#1553)
jimkarag02 committed
P
docs: add advanced chunking & serialization example (#1589)
Panos Vagenas committed
A
fix(settings): fix nested settings load via environment variables (#1551)
Alex Sokolov committed
E
feat: support image/webp file type (#1415)
Elwin committed