fix(docx): split multiple OMML equations into separate formula items (#3123)
* fix(msword): split multiple OMML equations into separate formula items When a DOCX paragraph contains multiple sibling <m:oMath> elements (e.g. separate equations on one line), the converter previously concatenated them into a single LaTeX string because element.iter() walks all descendants depth-first. Fix: iterate direct children of the paragraph element first to correctly identify sibling <m:oMath> elements, converting each independently. Falls back to deep iteration only when oMath elements are nested inside wrapper elements. Also splits standalone multi-equation paragraphs into individual FORMULA document items instead of merging them into one. Closes #3121 Signed-off-by: giulio-leone <giulio.leone@users.noreply.github.com> Signed-off-by: giulio-leone <giulio97.leone@gmail.com> * test(msword): add multi-equation paragraph test document Add a minimal DOCX file containing two separate oMath elements in one paragraph with a text separator, along with groundtruth output files for markdown, json, and plain text export. Requested-by: @dolfim-ibm Signed-off-by: Giulio Leone <giulioleone10@gmail.com> Signed-off-by: giulio-leone <giulio.leone@users.noreply.github.com> Signed-off-by: giulio-leone <giulio97.leone@gmail.com> * test(msword): regenerate multi-equation indented-text snapshot Signed-off-by: giulio-leone <giulio.leone@users.noreply.github.com> Signed-off-by: giulio-leone <giulio97.leone@gmail.com> * test: replace test doc with issue #3121 attachment Use the real Word document from the issue reporter (smroels) instead of the minimal programmatic fixture. The new document contains three sibling <m:oMath> elements in one paragraph, matching the exact failing shape described in #3121. Regenerate groundtruth to match the richer document structure. Signed-off-by: giulio-leone <giulio97.leone@gmail.com> * test: regenerate groundtruth for omml_multi_equation_paragraph Re-run document conversion with current code to update .itxt and .json groundtruth files. The .itxt had stale structure from the previous programmatic fixture; the new real-document conversion produces the correct output with three separate formula items. Signed-off-by: giulio-leone <giulio97.leone@gmail.com> * style(docx): rerun ruff formatter for msword backend Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * refactor(docx): drop unused tag_name binding Remove the unused local in the direct oMath iteration path so the code reads clearly and the outstanding review comment is fully addressed without changing equation-handling behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: giulio-leone <giulio97.leone@gmail.com> * DCO Remediation Commit for giulio-leone <giulio97.leone@gmail.com> I, giulio-leone <giulio97.leone@gmail.com>, hereby add my Signed-off-by to this commit: 84cc70b55e96804f32590215a3eab31a0c280586 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: giulio-leone <giulio97.leone@gmail.com> * test(docx): cover equation paragraph branches Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: giulio-leone <giulio97.leone@gmail.com> * test(docx): reuse backend fixture in msword tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: giulio-leone <giulio97.leone@gmail.com> --------- Signed-off-by: giulio-leone <giulio.leone@users.noreply.github.com> Signed-off-by: giulio-leone <giulio97.leone@gmail.com> Signed-off-by: Giulio Leone <giulioleone10@gmail.com> Co-authored-by: giulio-leone <giulio.leone@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
G
Giulio Leone committed
90d6dd4e87d96167aced588249dcb2e0f47cd68f
Parent: fdf5e20
Committed by GitHub <noreply@github.com>
on 3/24/2026, 8:42:16 AM