SIGN IN SIGN UP

fix(docx): split multiple OMML equations into separate formula items (#3123)

* fix(msword): split multiple OMML equations into separate formula items

When a DOCX paragraph contains multiple sibling <m:oMath> elements
(e.g. separate equations on one line), the converter previously
concatenated them into a single LaTeX string because element.iter()
walks all descendants depth-first.

Fix: iterate direct children of the paragraph element first to
correctly identify sibling <m:oMath> elements, converting each
independently. Falls back to deep iteration only when oMath
elements are nested inside wrapper elements.

Also splits standalone multi-equation paragraphs into individual
FORMULA document items instead of merging them into one.

Closes #3121

Signed-off-by: giulio-leone <giulio.leone@users.noreply.github.com>
Signed-off-by: giulio-leone <giulio97.leone@gmail.com>

* test(msword): add multi-equation paragraph test document

Add a minimal DOCX file containing two separate oMath elements
in one paragraph with a text separator, along with groundtruth
output files for markdown, json, and plain text export.

Requested-by: @dolfim-ibm
Signed-off-by: Giulio Leone <giulioleone10@gmail.com>
Signed-off-by: giulio-leone <giulio.leone@users.noreply.github.com>
Signed-off-by: giulio-leone <giulio97.leone@gmail.com>

* test(msword): regenerate multi-equation indented-text snapshot

Signed-off-by: giulio-leone <giulio.leone@users.noreply.github.com>
Signed-off-by: giulio-leone <giulio97.leone@gmail.com>

* test: replace test doc with issue #3121 attachment

Use the real Word document from the issue reporter (smroels)
instead of the minimal programmatic fixture. The new document
contains three sibling <m:oMath> elements in one paragraph,
matching the exact failing shape described in #3121.

Regenerate groundtruth to match the richer document structure.

Signed-off-by: giulio-leone <giulio97.leone@gmail.com>

* test: regenerate groundtruth for omml_multi_equation_paragraph

Re-run document conversion with current code to update .itxt and .json
groundtruth files. The .itxt had stale structure from the previous
programmatic fixture; the new real-document conversion produces the
correct output with three separate formula items.

Signed-off-by: giulio-leone <giulio97.leone@gmail.com>

* style(docx): rerun ruff formatter for msword backend

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor(docx): drop unused tag_name binding

Remove the unused local in the direct oMath iteration path so the code
reads clearly and the outstanding review comment is fully addressed
without changing equation-handling behavior.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: giulio-leone <giulio97.leone@gmail.com>

* DCO Remediation Commit for giulio-leone <giulio97.leone@gmail.com>

I, giulio-leone <giulio97.leone@gmail.com>, hereby add my Signed-off-by to this commit: 84cc70b55e96804f32590215a3eab31a0c280586

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: giulio-leone <giulio97.leone@gmail.com>

* test(docx): cover equation paragraph branches

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: giulio-leone <giulio97.leone@gmail.com>

* test(docx): reuse backend fixture in msword tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: giulio-leone <giulio97.leone@gmail.com>

---------

Signed-off-by: giulio-leone <giulio.leone@users.noreply.github.com>
Signed-off-by: giulio-leone <giulio97.leone@gmail.com>
Signed-off-by: Giulio Leone <giulioleone10@gmail.com>
Co-authored-by: giulio-leone <giulio.leone@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
G
Giulio Leone committed
90d6dd4e87d96167aced588249dcb2e0f47cd68f
Parent: fdf5e20
Committed by GitHub <noreply@github.com> on 3/24/2026, 8:42:16 AM