[DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"
HybriMoE is a system designed to improve the efficiency of Mixture-of-Experts (MoE) inference by optimizing CPU-GPU scheduling and cache management. It introduces hybrid scheduling to balance workloads, impact-driven prefetching to prioritize expert loading, and MoE-specialized caching to reduce cache misses. This project is intended for researchers and developers working on large-scale MoE models, particularly those seeking to optimize inference performance and resource utilization.
How the donated funds are distributed
Kivach works on the Obyte network, and therefore you can track all donations.