SIGN IN SIGN UP
apache / superset UNCLAIMED

Apache Superset is a Data Visualization and Data Exploration Platform

0 0 150 TypeScript

feat(versioning): change records + diff engine

Adds a structured per-field change log alongside the foundational
shadow tables. Each save flush emits zero or more ``version_changes``
rows describing what changed relative to the previous version, with
shape ``[{kind, path, from_value, to_value, sequence}]`` keyed to
``version_transaction.id`` (FR-016 .. FR-021).

**Schema** — ``version_changes`` table, FK to ``version_transaction``
with ``ON DELETE CASCADE`` so retention drops dependent records
without explicit cleanup. Composite unique index on
``(transaction_id, entity_kind, entity_id, sequence)`` so the
listener can write monotonically and downstream readers see a
deterministic order.

**Diff engine** (``superset/versioning/diff.py``) — pure-function
diffing of pre-/post-state pairs:

- ``diff_scalar_fields`` for ordinary columns; emits one record per
  changed field with JSON-safe ``from_value`` / ``to_value``.
- ``diff_json_field`` for ``json_metadata`` and ``params``, walking
  the parsed structure and emitting per-sub-key records. Honours
  an ``exclude_keys`` set
  (``_DASHBOARD_JSON_METADATA_AUDIT_KEYS``: ``chart_configuration``,
  ``global_chart_configuration``, ``map_label_colors``,
  ``show_chart_timestamps``, ``color_namespace``;
  ``_CHART_PARAMS_AUDIT_KEYS``) so frontend-stamped sub-keys that
  mutate on every save don't dominate the change log (FR-022).
- ``diff_dashboard_layout`` walks ``position_json`` structurally
  and emits ``[verb, kind, id]`` records (verbs ``add``, ``remove``,
  ``move``, ``edit``; kinds from a ``CHART``/``ROW``/``COLUMN``/etc.
  → english map) so a UI can render "Added chart 'Foo'" without
  re-parsing JSON. ``HEADER_ID`` is suppressed because it duplicates
  the ``dashboard_title`` scalar record.
- ``fold_dashboard_layout_with_chart_changes`` deduplicates layout
  records against M2M / chart-membership records by UUID so an
  add-and-attach doesn't appear twice.
- ``_values_equivalent`` treats ``None`` and ``""`` as equal; this
  matches the save path's habit of normalising nullable strings to
  the empty string.

**Listener** — ``superset/versioning/changes.py`` registers a
``before_flush`` listener that captures pre-state for each dirty
entity and an ``after_flush`` listener that runs the diff engine
against the post-state and writes ``version_changes`` rows under
the resolved ``transaction_id``. Tracks processed transaction ids
on ``session.info`` so re-firings within a single transaction
(autoflush triggered by mid-commit queries) don't double-insert and
trip the unique constraint. Reads child rows via raw SELECT against
``table_columns`` / ``sql_metrics`` rather than ``dataset.columns``
because the live collection is stale during the restore path's raw
DELETE+INSERT cycle.

**Endpoint surface** — ``VersionDAO.list_change_records_batch``
batches the lookup across multiple transactions with a single
``WHERE transaction_id IN (...)`` query so the version-list
endpoint avoids N+1 round-trips. ``list_versions`` / ``get_version``
return entries with a populated ``changes`` array (empty for
``operation_type=0`` baseline rows).

**Tests** — ``test_diff.py`` covers the diff engine shape (39
unit cases across scalar, JSON, layout, child-collection, and
fold paths). ``change_records_tests.py`` exercises the listener
end-to-end with realistic save flows. ``perf_validation_tests.py``
is the T044 harness for SC-002/3/4 (list endpoint p95 < 1s,
restore < 3s, save overhead < 50ms).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
M
Mike Bridge committed
f7d73e2e1bd18089ffca573bef43ae761005e6ff
Parent: be01e45