feat(ci): add RunnerContext and RegressionError for experiment GH action (#1635)
* feat(ci): add RunnerContext and RegressionError for experiment GH action Adds the SDK-side primitives consumed by the upcoming `langfuse/experiment-action` GitHub Action (LFE-9241): - `RunnerContext` wraps `Langfuse.run_experiment` with action-injected defaults (data, dataset_version, name, run_name, metadata). Users can override any default on the call site; metadata is merged with user-supplied keys winning on collision. - `RegressionError` lets users signal a CI gate failure and optionally pass structured `metric`/`value`/`threshold` fields so the action can render a callout in the PR comment. Both live in a dedicated `langfuse/ci.py` module so the CI surface stays isolated from the general experiment API. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(experiment): move RunnerContext and RegressionError into experiment module Relocates the CI-action primitives from the standalone `langfuse/ci.py` module into `langfuse/experiment.py` alongside the other experiment types. Deletes `langfuse/ci.py` and renames the tests accordingly. The public import paths (`from langfuse import RunnerContext, RegressionError`) are unchanged. `CompositeEvaluatorFunction` is imported under `TYPE_CHECKING` to avoid a circular import with `langfuse.batch_evaluation`. The signature-drift guard now resolves the forward reference via `typing.get_type_hints(..., localns=...)`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: rename test_runner_context.py to test_experiment.py Mirrors the module name now that RunnerContext and RegressionError live in `langfuse.experiment`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(experiment): tighten RunnerContext + RegressionError public surface - RunnerContext no longer carries `name` or `run_name` as context-level defaults. `name` is now required on every `run_experiment` call (supports the action's directory-of-experiments mode where each script must name itself). `run_name` passes straight through to `Langfuse.run_experiment`. - RegressionError gains three typed `@overload` signatures (minimal, free-form message, structured metric/value/threshold) so type checkers enforce that `metric` and `value` are supplied together. At runtime, partial structured input falls back to the default message instead of rendering misleading `None` placeholders in PR comments. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
T
Tobias Wochinger committed
3166cb8bd6667e4a49ab11e3f049719723c88552
Parent: 5ef17a0
Committed by GitHub <noreply@github.com>
on 5/4/2026, 7:32:10 AM