A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
KTransformers is a Python framework that enhances Hugging Face Transformers with advanced kernel optimizations and placement/parallelism strategies, enabling cutting-edge LLM inference optimizations for local deployments. It provides a flexible, extensible platform for experimenting with optimizations like CPU/GPU offloading of quantized models, supporting features such as MoE offloading, sparse attention, and integration with kernels from Llamafile and Marlin.
How the donated funds are distributed
Kivach works on the Obyte network, and therefore you can track all donations.