EADST

GGML Q4_0 Quantize Analysis in llama.cpp

GGML Q4_0 Quantization in llama.cpp

For the LLAMA7B model, there are 387 tensors consisting of various weights and biases. These tensors include token_embd.weight, 32 sets of attention and feedforward network weights and biases (attn_norm.weigh, attn_q.weight, attn_k.weight, attn_v.weight, attn_q.bias, attn_k.bias, attn_v.bias, attn_output.weight, ffn_norm.weight, ffn_up.weight, ffn_gate.weight, ffn_down.weight), output_norm.weight, and output.weight.

Quantization Details:

  • Total Tensors for Quantization: 226
  • token_embd.weight
  • 32 sets of: attn_q.weight, attn_k.weight, attn_v.weight, attn_output.weight, ffn_up.weight, ffn_gate.weight, ffn_down.weight
  • output.weight

Tensor Breakdown:

  • llama_model_loader:
  • f32 type: 161 tensors
  • f16 type: 226 tensors
  • llama_model_quantize_internal:
  • Meta size: 6162784 bytes

Example Tensors:

  • [ 1/ 387] token_embd.weight - [ 4096, 151851, 1, 1], type = f16, quantizing to q4_0 .. size = 1186.34 MB -> 333.66 MB | hist: 0.036 0.016 0.025 0.039 0.057 0.077 0.096 0.111 0.117 0.111 0.096 0.077 0.057 0.039 0.025 0.021

[ 2/ 387] blk.0.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB

  • [ 3/ 387] blk.0.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, quantizing to q4_0 .. size = 32.00 MB -> 9.00 MB | hist: 0.036 0.015 0.025 0.038 0.056 0.076 0.097 0.113 0.120 0.113 0.097 0.076 0.056 0.038 0.025 0.020

  • [ 4/ 387] blk.0.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, quantizing to q4_0 .. size = 32.00 MB -> 9.00 MB | hist: 0.036 0.015 0.024 0.037 0.055 0.075 0.097 0.115 0.123 0.115 0.097 0.076 0.055 0.037 0.024 0.020

  • [ 5/ 387] blk.0.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, quantizing to q4_0 .. size = 32.00 MB -> 9.00 MB | hist: 0.036 0.016 0.025 0.039 0.056 0.076 0.096 0.112 0.119 0.112 0.096 0.076 0.056 0.039 0.025 0.021

[ 6/ 387] blk.0.attn_q.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB

[ 7/ 387] blk.0.attn_k.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB

[ 8/ 387] blk.0.attn_v.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB

  • [ 9/ 387] blk.0.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, quantizing to q4_0 .. size = 32.00 MB -> 9.00 MB | hist: 0.036 0.015 0.025 0.039 0.056 0.077 0.096 0.112 0.118 0.112 0.096 0.077 0.056 0.039 0.025 0.021

[ 10/ 387] blk.0.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB

  • [ 11/ 387] blk.0.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, quantizing to q4_0 .. size = 86.00 MB -> 24.19 MB | hist: 0.037 0.016 0.025 0.039 0.057 0.077 0.096 0.111 0.116 0.111 0.096 0.077 0.057 0.039 0.025 0.021

  • [ 12/ 387] blk.0.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, quantizing to q4_0 .. size = 86.00 MB -> 24.19 MB | hist: 0.037 0.016 0.026 0.039 0.057 0.077 0.096 0.110 0.116 0.110 0.096 0.077 0.057 0.040 0.026 0.021

  • [ 13/ 387] blk.0.ffn_down.weight - [11008, 4096, 1, 1], type = f16, quantizing to q4_0 .. size = 86.00 MB -> 24.19 MB | hist: 0.036 0.016 0.025 0.039 0.057 0.077 0.096 0.111 0.117 0.111 0.096 0.077 0.057 0.039 0.025 0.021

*...and so on for other tensors [14/ 387]-[385/ 387] *

The remaining 31 blocks follow a similar pattern. blk.0*-blk.31*

[ 386/ 387] output_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB

  • [ 387/ 387] output.weight - [ 4096, 151851, 1, 1], type = f16, quantizing to q6_K .. size = 1186.34 MB -> 486.58 MB | hist:

llama_model_quantize_internal: model size = 14727.19 MB

llama_model_quantize_internal: quant size = 4296.76 MB

llama_model_quantize_internal: hist: 0.036 0.016 0.025 0.039 0.056 0.077 0.096 0.111 0.117 0.111 0.096 0.077 0.057 0.039 0.025 0.021

Reference:

相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
GoogLeNet Conda ONNX VGG-16 Proxy 多进程 VSCode 阿里云 搞笑 算法题 FP16 API Crawler HuggingFace BeautifulSoup Llama Jupyter Permission RAR CEIR Plate Numpy 域名 Color GIT CSV InvalidArgumentError PDB SQLite Zip Claude ChatGPT Hungarian PyTorch 关于博主 Image2Text TensorRT Search C++ 公式 ResNet-50 Linux Template BF16 Logo FastAPI DeepSeek Datetime Data 报税 Mixtral GPTQ Clash scipy Disk Transformers FP64 Baidu 图标 Dataset Pytorch Agent UI TTS TensorFlow Ptyhon Base64 Pickle Distillation Statistics 图形思考法 Qwen CUDA Algorithm Vim Password DeepStream Firewall BTC Qwen2.5 Tensor uWSGI Excel Web Use Freesound Google 论文速读 Git GPT4 PyCharm Streamlit Interview SPIE 论文 COCO uwsgi LeetCode transformers Cloudreve Plotly TSV NameSilo Bipartite 飞书 LLAMA GGML git OpenCV LLM JSON Translation 云服务器 WAN icon OCR Gemma Anaconda diffusers 净利润 Heatmap torchinfo CC CV Quantize Rebuttal Miniforge NLTK tar Card Paddle FP32 VPN CTC Windows OpenAI Sklearn Docker WebCrawler XGBoost mmap Land Website AI Review Magnet tqdm Quantization 第一性原理 Diagram ModelScope CLAP Bitcoin Input Hotel Breakpoint Markdown Pandas 音频 QWEN 顶会 Math XML Hilton RGB Django SQL Nginx llama.cpp Video Tracking Jetson Bert YOLO 继承 Augmentation Python Github Michelin UNIX SVR EXCEL Random PDF Paper logger v2ray 递归学习法 Animate NLP LaTeX Shortcut Vmess git-lfs FP8 多线程 MD5 CAM Safetensors LoRA 证件照 Knowledge IndexTTS2 Pillow PIP 腾讯云 hf 财报 强化学习 Tiktoken Food SAM v0.dev Qwen2 News Attention 签证 Domain HaggingFace Ubuntu FlashAttention printf Bin 版权
站点统计

本站现有博文328篇,共被浏览844344

本站已经建立2549天!

热门文章
文章归档
回到顶部