Quick Review: ZeroQuant-FP
作者:XD / 发表: 2023年12月7日 00:32 / 更新: 2023年12月7日 00:56 / 科研学习 / 阅读量:1087
ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats
- Paper: ZeroQuant-FP on arXiv
- Code: ZeroQuant-FP on GitHub
- Organization: Microsoft
Highlights:
- FP4 Weight Quantization: Implements 4-bit floating-point (FP4) quantization for model weights.
- FP8 Activation Quantization: Utilizes 8-bit floating-point (FP8) quantization for activations, optimizing the balance between performance and precision.
相关标签