SIGN IN SIGN UP

fix(documents): seed Part I for 10-Q TOC walks so pre-header items aren't bare (edgartools-3usf)

A 10-Q's items repeat across parts (Part I Item 1 = Financial Statements,
Part II Item 1 = Legal Proceedings), so the document-order TOC walk tracked
current_part=None until it hit a Part header — leaving Part I items, which
appear before any header, with bare "Item N" keys. Downstream this was a
correctness bug, not just a key inconsistency: TenQ.__getitem__('Item 1')
tried part_i_item_1, missed (the key was bare), then fell through to
part_ii_item_1 — returning Legal Proceedings instead of Financial Statements.

A 10-Q always opens with Part I, so the items lacking part context are exactly
those before the first Part header. Seed the walk with Part I for repeating-item
forms: FormSchema gains repeating_parts=("I","II") on the 10-Q schema and a
seed_part property (first repeating part, else None); the six document-order
walks in TOCAnalyzer start current_part = self.schema.seed_part. Detected Part
headers still win — the seed only fills pre-header items. _make_section_key and
10-K number-inference are untouched (seed_part is None for 10-K).

jnj and pg 10-Q are now fully part-prefixed (part_i_item_1..4, part_ii_item_*),
part_i_item_4 retained. Tests: seed_part schema units, synthetic no-Part-I-header
walk, and jnj/pg ground-truth (Part I Item 1 = Financial Statements, Part II
Item 1 = Legal Proceedings). A fully header-less 10-Q (no Part I AND no Part II
header) still collides Item 1 across parts — same latent risk as before, deferred.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
D
Dwight Gunning committed
60ee8e1a95bc496003e403efcd21716993085944
Parent: 9d49278