Terminal-Bench is a benchmark tool for testing AI agents in real terminal environments, evaluating their ability to handle end-to-end tasks like compiling code, training models, and setting up servers autonomously. It consists of a dataset of tasks and an execution harness that connects language models to a sandboxed terminal, designed for developers, researchers, and engineers building or benchmarking LLM agents. The project is currently in beta with ~100 tasks and aims to become a comprehensive testbed for AI agents in text-based environments.
How the donated funds are distributed
Kivach works on the Obyte network, and therefore you can track all donations.