Mandoline helps developers evaluate and improve LLM applications in ways that matter to users.
Create custom metrics that align with your specific use case, evaluate LLM performance in real situations, and track improvements over time.
- Multimodal Evaluation: Evaluate LLMs Across Text and Vision Tasks
- Model Selection: Comparing LLMs for Creative Tasks
- Prompt Engineering: Reduce Unwanted LLM Behaviors