SIGN IN SIGN UP

feat(ci): add RunnerContext and RegressionError for experiment GH action (#1635)

* feat(ci): add RunnerContext and RegressionError for experiment GH action

Adds the SDK-side primitives consumed by the upcoming
`langfuse/experiment-action` GitHub Action (LFE-9241):

- `RunnerContext` wraps `Langfuse.run_experiment` with action-injected
  defaults (data, dataset_version, name, run_name, metadata). Users can
  override any default on the call site; metadata is merged with
  user-supplied keys winning on collision.
- `RegressionError` lets users signal a CI gate failure and optionally
  pass structured `metric`/`value`/`threshold` fields so the action can
  render a callout in the PR comment.

Both live in a dedicated `langfuse/ci.py` module so the CI surface stays
isolated from the general experiment API.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(experiment): move RunnerContext and RegressionError into experiment module

Relocates the CI-action primitives from the standalone `langfuse/ci.py`
module into `langfuse/experiment.py` alongside the other experiment
types. Deletes `langfuse/ci.py` and renames the tests accordingly.

The public import paths (`from langfuse import RunnerContext,
RegressionError`) are unchanged.

`CompositeEvaluatorFunction` is imported under `TYPE_CHECKING` to avoid
a circular import with `langfuse.batch_evaluation`. The
signature-drift guard now resolves the forward reference via
`typing.get_type_hints(..., localns=...)`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: rename test_runner_context.py to test_experiment.py

Mirrors the module name now that RunnerContext and RegressionError
live in `langfuse.experiment`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(experiment): tighten RunnerContext + RegressionError public surface

- RunnerContext no longer carries `name` or `run_name` as context-level
  defaults. `name` is now required on every `run_experiment` call
  (supports the action's directory-of-experiments mode where each
  script must name itself). `run_name` passes straight through to
  `Langfuse.run_experiment`.
- RegressionError gains three typed `@overload` signatures (minimal,
  free-form message, structured metric/value/threshold) so type
  checkers enforce that `metric` and `value` are supplied together.
  At runtime, partial structured input falls back to the default
  message instead of rendering misleading `None` placeholders in PR
  comments.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
T
Tobias Wochinger committed
3166cb8bd6667e4a49ab11e3f049719723c88552
Parent: 5ef17a0
Committed by GitHub <noreply@github.com> on 5/4/2026, 7:32:10 AM