ludoplex/neural-compressor

Provide unified APIs for SOTA model compression techniques, such as low precision (INT8/INT4/FP4/NF4) quantization, sparsity, pruning, and knowledge distillation on mainstream AI frameworks such as TensorFlow, PyTorch, and ONNX Runtime.

Apache License 2.0

Intel® Neural Compressor is an open-source Python library that provides unified APIs for model compression techniques like quantization, pruning, and distillation across TensorFlow, PyTorch, ONNX Runtime, and MXNet. It's designed for developers and researchers working on optimizing deep learning models for Intel hardware and other platforms, offering features like automatic accuracy-driven quantization and support for popular models from hubs like Hugging Face and Torch Vision.

Total donated

Undistributed

Share with your subscribers: