evaluations

Here are 10 public repositories matching this topic...

log10-io / log10

Python client library for improving your LLM app accuracy

python debugging ai monitoring evaluations feedback logging artificial-intelligence openai agents autonomous-agents fine-tuning llms rlhf llmops anthropic

Updated Feb 11, 2025
Python

LLM-Evaluation-s-Always-Fatiguing / leaf-playground

Star

A framework to build scenario simulation projects where human and LLM based agents can participant in, with a user-friendly web UI to visualize simulation, support automatically evaluation on agent action level.

agent automation evaluations agents agent-based-simulation chatgpt llm-evaluation

Updated Jun 18, 2024
Python

ZainabZaman / IELTS_PracticeAndEvaluation

Star

IELTS listening, speaking, reading and writing modules practice and evaluation with IELTS band calculation based on speech and text analysis and evaluation.

python evaluations azure text-analysis openai speech-processing ielts ielts-writing ielts-listening ielts-learning gpt-3 ielts-speaking ielts-exam

Updated Dec 13, 2023
Python

yisaienkov / evaluations

Star

This library implements various metrics (including Kaggle Competition, Medicine) for evaluating ML, DL, AI models, and algorithms. 📐📊📈📉📏

python evaluations metrics python-library pypi python3 kaggle kaggle-competition metrics-library

Updated Dec 8, 2022
Python

Maitreyapatel / reliability-checklist

Star

NLP tool for wide-range model reliability evaluations

evaluations nlp-library language-model robustness reliability-benchmarking

Updated Jun 18, 2023
Python

ComputerScienceHouse / conditional

Star

CSH Evals, the modern way.

flask csh evaluations hacktoberfest

Updated Apr 4, 2025
Python

apartresearch / 3cb

Star

3cb: Catastrophic Cyber Capabilities Benchmarking of Large Language Models

testing machine-learning evaluations audit cybersecurity ai-safety

Updated Oct 30, 2024
Python

AccentureMacr0s / Opinion-Mining-System

Star

You can build a robust opinion mining and website evaluation system on AWS. The combination of data collection, preprocessing, sentiment analysis, and rating calculation ensures that you can efficiently analyze user feedback and generate meaningful insights to evaluate websites.

website evaluations aws-lambda aws-s3 aws-ec2

Updated Jun 13, 2024
Python

mandoline-ai / mandoline-python

Star

Official Python client for the Mandoline API

python ai evaluations mandoline llm evals

Updated Jan 1, 2025
Python

ParthaPRay / llm_evaluation_metrics_localized

Star

This repo contains code for localized LLM evaluation metrics vis a framework using Ollama and edge resource and novel derived metrics

flask evaluations metrics evaluation restful-api evaluation-metrics evaluation-framework large-language-models ollama-api

Updated Jan 11, 2025
Python

Improve this page

Add a description, image, and links to the evaluations topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the evaluations topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluations

Here are 10 public repositories matching this topic...

log10-io / log10

LLM-Evaluation-s-Always-Fatiguing / leaf-playground

ZainabZaman / IELTS_PracticeAndEvaluation

yisaienkov / evaluations

Maitreyapatel / reliability-checklist

ComputerScienceHouse / conditional

apartresearch / 3cb

AccentureMacr0s / Opinion-Mining-System

mandoline-ai / mandoline-python

ParthaPRay / llm_evaluation_metrics_localized

Improve this page

Add this topic to your repo