| | evals-2 | |
| | legacy | |
| | bedrock_tracing_and_evals_tutorial.ipynb | 13.4 KB |
| | build_benchmark_dataset_and_custom_evaluator.ipynb | 20.9 KB |
| | CoT_explanations_simple_vs_complex_evals.ipynb | 45.1 KB |
| | evals_quickstart.ipynb | 10.9 KB |
| | evaluate_agent_parameter_extraction_classifications.ipynb | 951.3 KB |
| | evaluate_agent_tool_calling_classifications.ipynb | 202.4 KB |
| | evaluate_agent_tool_selection_classifications.ipynb | 892.3 KB |
| | evaluate_agent.ipynb | 83.3 KB |
| | evaluate_code_functionality_classifications.ipynb | 203.0 KB |
| | evaluate_code_readability_classifications.ipynb | 166.8 KB |
| | evaluate_hallucination_classifications.ipynb | 141.5 KB |
| | evaluate_human_vs_ai_classifications.ipynb | 159.6 KB |
| | evaluate_QA_classifications.ipynb | 123.2 KB |
| | evaluate_rag_haystack.ipynb | 32.2 KB |
| | evaluate_rag.ipynb | 44.5 KB |
| | evaluate_reference_link_correctness_classifications.ipynb | 1.2 MB |
| | evaluate_relevance_classifications.ipynb | 181.7 KB |
| | evaluate_summarization_classifications.ipynb | 187.6 KB |
| | evaluate_toxicity_classifications.ipynb | 112.7 KB |
| | evaluate_user_frustration_classifications.ipynb | 1021.8 KB |
| | evaluations_with_error_handling.ipynb | 55.1 KB |
| | openai_agents_cookbook.ipynb | 21.3 KB |
| | optimizing_llm_as_a_judge_prompts.ipynb | 29.7 KB |
| | pydantic-evals.ipynb | 19.5 KB |
| | session_level_evals.ipynb | 24.3 KB |
| | trace_level_evals.ipynb | 16.2 KB |