powerfulmoves/pmoves-ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Apache License 2.0

KTransformers is a Python framework that enhances Hugging Face Transformers with advanced kernel optimizations and placement/parallelism strategies, enabling cutting-edge LLM inference optimizations for local deployments. It provides a flexible, extensible platform for experimenting with optimizations like CPU/GPU offloading of quantized models, supporting features such as MoE offloading, sparse attention, and integration with kernels from Llamafile and Marlin.

Total donated

Undistributed

Share with your subscribers: