feat: configurable embedder indexing_params/query_params + curated defaults (#150)
Users can now set `indexing_params` and `query_params` under `embedding:` in
`global_settings.yml` to pass extra kwargs to the embedder separately for
indexing vs. query — supporting asymmetric retrieval models (Cohere v3,
Voyage, Nvidia NIM, Gemini, nomic-ai code/text models, Snowflake arctic,
etc.).
- `ccc init` auto-populates these from a curated table of known models and
prints the applied defaults; unknown models get a commented-out template
for the accepted keys (`prompt_name` for sentence-transformers;
`input_type`, `dimensions` for litellm).
- Daemon validates the effective params at startup; invalid keys fail fast
with a clear error.
- Backward compat: configs for `nomic-ai/CodeRankEmbed` /
`nomic-ai/nomic-embed-code` that predate this feature keep the previous
hardcoded `prompt_name=query` behavior, and a one-time handshake warning
asks users to make the setting explicit. The warning is suppressible by
any non-None `query_params` (including `{}`).
- `ccc doctor` now tests indexing and query separately so asymmetric
misconfigurations surface independently.
Drops the legacy `shared.query_prompt_name` module variable and
`_QUERY_PROMPT_MODELS` set; the new resolution path is centralized in
`embedder_params.resolve_embedder_params` and the curated defaults live in
`embedder_defaults._DEFAULT_PARAMS`.
Also enables `litellm.drop_params = True` so provider-specific kwargs that
a particular model doesn't accept are silently dropped instead of failing. J
Jiangzhou committed
ee3515fa57e7fdd058d61d3f8922d6fc41163e43
Parent: da886d5
Committed by GitHub <noreply@github.com>
on 4/24/2026, 11:08:20 PM