muchori

muchori/terminal-bench

Python
0
0
Apache License 2.0

Terminal-Bench is a benchmark tool for testing AI agents in real terminal environments, evaluating their ability to handle end-to-end tasks like compiling code, training models, and setting up servers autonomously. It consists of a dataset of tasks and an execution harness that connects language models to a sandboxed terminal, designed for developers, researchers, and engineers building or benchmarking LLM agents. The project is currently in beta with ~100 tasks and aims to become a comprehensive testbed for AI agents in text-based environments.

Total donated
Undistributed
Share with your subscribers:

Recipients

How the donated funds are distributed

Support the dependencies

Top contributors

alexgshaw's profile
alexgshaw
159 contributions
TheMikeMerrill's profile
TheMikeMerrill
123 contributions
SifatTanvirTuring's profile
SifatTanvirTuring
63 contributions
ShoaibTuring's profile
ShoaibTuring
49 contributions
Nwokedi100's profile
Nwokedi100
43 contributions
carlini's profile
carlini
36 contributions
li-boxuan's profile
li-boxuan
14 contributions
ibercovich's profile
ibercovich
11 contributions
harshraj172's profile
harshraj172
10 contributions
Jaydeep-Turing's profile
Jaydeep-Turing
10 contributions

Recent events

Kivach works on the Obyte network, and therefore you can track all donations.

No events yet