Python client library for improving your LLM app accuracy
-
Updated
Feb 11, 2025 - Python
Python client library for improving your LLM app accuracy
A framework to build scenario simulation projects where human and LLM based agents can participant in, with a user-friendly web UI to visualize simulation, support automatically evaluation on agent action level.
IELTS listening, speaking, reading and writing modules practice and evaluation with IELTS band calculation based on speech and text analysis and evaluation.
This library implements various metrics (including Kaggle Competition, Medicine) for evaluating ML, DL, AI models, and algorithms. 📐📊📈📉📏
NLP tool for wide-range model reliability evaluations
3cb: Catastrophic Cyber Capabilities Benchmarking of Large Language Models
You can build a robust opinion mining and website evaluation system on AWS. The combination of data collection, preprocessing, sentiment analysis, and rating calculation ensures that you can efficiently analyze user feedback and generate meaningful insights to evaluate websites.
This repo contains code for localized LLM evaluation metrics vis a framework using Ollama and edge resource and novel derived metrics
Add a description, image, and links to the evaluations topic page so that developers can more easily learn about it.
To associate your repository with the evaluations topic, visit your repo's landing page and select "manage topics."