EADST

Quick Review: ZeroQuant-FP

ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats

Highlights:

  • FP4 Weight Quantization: Implements 4-bit floating-point (FP4) quantization for model weights.
  • FP8 Activation Quantization: Utilizes 8-bit floating-point (FP8) quantization for activations, optimizing the balance between performance and precision.
相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
InvalidArgumentError Food HuggingFace uWSGI GPTQ Windows C++ Translation OpenCV UNIX Qwen2 Color Tensor RAR SPIE SQL Video Knowledge Github LLAMA GGML transformers Diagram Excel Mixtral NLTK BTC Freesound CC 多进程 Tracking 图形思考法 Image2Text Attention Michelin Crawler FP8 COCO Transformers Pytorch 版权 TTS Nginx XGBoost 飞书 Linux WebCrawler Review Baidu Search Augmentation Zip Plate CUDA Clash Card PDF Logo Shortcut Sklearn Safetensors MD5 Pickle 财报 论文 VGG-16 Rebuttal 阿里云 Dataset 公式 Input JSON Bin QWEN Conda Distillation tqdm CSV Numpy RGB Algorithm Datetime 继承 Hungarian 报税 Interview CTC Paper Docker icon 关于博主 签证 Plotly LLM BF16 Proxy HaggingFace 净利润 Breakpoint SVR Random UI Bert Gemma 强化学习 tar DeepSeek 证件照 算法题 CLAP DeepStream XML 域名 IndexTTS2 Streamlit WAN Anaconda FlashAttention Ubuntu 论文速读 Magnet 云服务器 diffusers TSV 腾讯云 Quantization 第一性原理 Ptyhon 图标 Land NameSilo 搞笑 Vim EXCEL LeetCode Miniforge 顶会 PyTorch FP64 API OCR Domain ONNX hf Bipartite Heatmap Web Qwen2.5 Firewall Website Markdown News CEIR Math Tiktoken 递归学习法 Jetson Bitcoin Agent TensorRT GPT4 Git Pandas git GoogLeNet VSCode ModelScope Cloudreve uwsgi torchinfo TensorFlow BeautifulSoup 多线程 LoRA mmap git-lfs Qwen FP16 CV Django VPN Claude Paddle printf logger v0.dev Disk Google Llama llama.cpp YOLO Password ResNet-50 音频 NLP ChatGPT FP32 Use Base64 Vmess PDB Jupyter FastAPI Quantize OpenAI Hilton AI CAM Permission SAM Statistics PIP Animate GIT LaTeX SQLite Hotel Data Pillow Python Template v2ray scipy PyCharm
站点统计

本站现有博文327篇,共被浏览835500

本站已经建立2540天!

热门文章
文章归档
回到顶部