blossomin/ktransformers

Python

Apache License 2.0

KTransformers is a flexible Python framework that enhances Hugging Face Transformers with advanced kernel optimizations and placement/parallelism strategies for local LLM inference. It provides a template-based injection system that allows users to easily swap in optimized modules, enabling faster inference on resource-constrained devices through GPU/CPU offloading and support for quantized models.

Total donated

Undistributed

Share with your subscribers: