SIGN IN SIGN UP

feat: configurable embedder indexing_params/query_params + curated defaults (#150)

Users can now set `indexing_params` and `query_params` under `embedding:` in
`global_settings.yml` to pass extra kwargs to the embedder separately for
indexing vs. query — supporting asymmetric retrieval models (Cohere v3,
Voyage, Nvidia NIM, Gemini, nomic-ai code/text models, Snowflake arctic,
etc.).

- `ccc init` auto-populates these from a curated table of known models and
  prints the applied defaults; unknown models get a commented-out template
  for the accepted keys (`prompt_name` for sentence-transformers;
  `input_type`, `dimensions` for litellm).
- Daemon validates the effective params at startup; invalid keys fail fast
  with a clear error.
- Backward compat: configs for `nomic-ai/CodeRankEmbed` /
  `nomic-ai/nomic-embed-code` that predate this feature keep the previous
  hardcoded `prompt_name=query` behavior, and a one-time handshake warning
  asks users to make the setting explicit. The warning is suppressible by
  any non-None `query_params` (including `{}`).
- `ccc doctor` now tests indexing and query separately so asymmetric
  misconfigurations surface independently.

Drops the legacy `shared.query_prompt_name` module variable and
`_QUERY_PROMPT_MODELS` set; the new resolution path is centralized in
`embedder_params.resolve_embedder_params` and the curated defaults live in
`embedder_defaults._DEFAULT_PARAMS`.

Also enables `litellm.drop_params = True` so provider-specific kwargs that
a particular model doesn't accept are silently dropped instead of failing.
J
Jiangzhou committed
ee3515fa57e7fdd058d61d3f8922d6fc41163e43
Parent: da886d5
Committed by GitHub <noreply@github.com> on 4/24/2026, 11:08:20 PM