feat: add new scorers and tools for enhanced evaluation
- Introduced `factualityScorer` to evaluate claims for factual accuracy and evidence support. - Added `financialDataScorer` to assess the integrity of financial analysis outputs, checking for required fields and data sanity. - Implemented `keywordCoverageScorer` to measure the coverage of required keywords in outputs. - Created `taskCompleteScorer` to verify if research and writing tasks are fully completed based on content structure and context. - Developed `discordWebhookTool` for posting messages to a Discord webhook, including input validation and error handling. - Added comprehensive tests for the new supervisor scorers to ensure they evaluate responses effectively based on various criteria such as routing discipline, evidence grounding, request coverage, actionability, uncertainty handling, and conciseness.
S
ssdeanx committed
64e3c8d8b6a65ccc0197ee4bf9e9a272fef66f0a
Parent: df0df84