EADST

GGML Q4_0 Quantize Analysis in llama.cpp

GGML Q4_0 Quantization in llama.cpp

For the LLAMA7B model, there are 387 tensors consisting of various weights and biases. These tensors include token_embd.weight, 32 sets of attention and feedforward network weights and biases (attn_norm.weigh, attn_q.weight, attn_k.weight, attn_v.weight, attn_q.bias, attn_k.bias, attn_v.bias, attn_output.weight, ffn_norm.weight, ffn_up.weight, ffn_gate.weight, ffn_down.weight), output_norm.weight, and output.weight.

Quantization Details:

  • Total Tensors for Quantization: 226
  • token_embd.weight
  • 32 sets of: attn_q.weight, attn_k.weight, attn_v.weight, attn_output.weight, ffn_up.weight, ffn_gate.weight, ffn_down.weight
  • output.weight

Tensor Breakdown:

  • llama_model_loader:
  • f32 type: 161 tensors
  • f16 type: 226 tensors
  • llama_model_quantize_internal:
  • Meta size: 6162784 bytes

Example Tensors:

  • [ 1/ 387] token_embd.weight - [ 4096, 151851, 1, 1], type = f16, quantizing to q4_0 .. size = 1186.34 MB -> 333.66 MB | hist: 0.036 0.016 0.025 0.039 0.057 0.077 0.096 0.111 0.117 0.111 0.096 0.077 0.057 0.039 0.025 0.021

[ 2/ 387] blk.0.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB

  • [ 3/ 387] blk.0.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, quantizing to q4_0 .. size = 32.00 MB -> 9.00 MB | hist: 0.036 0.015 0.025 0.038 0.056 0.076 0.097 0.113 0.120 0.113 0.097 0.076 0.056 0.038 0.025 0.020

  • [ 4/ 387] blk.0.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, quantizing to q4_0 .. size = 32.00 MB -> 9.00 MB | hist: 0.036 0.015 0.024 0.037 0.055 0.075 0.097 0.115 0.123 0.115 0.097 0.076 0.055 0.037 0.024 0.020

  • [ 5/ 387] blk.0.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, quantizing to q4_0 .. size = 32.00 MB -> 9.00 MB | hist: 0.036 0.016 0.025 0.039 0.056 0.076 0.096 0.112 0.119 0.112 0.096 0.076 0.056 0.039 0.025 0.021

[ 6/ 387] blk.0.attn_q.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB

[ 7/ 387] blk.0.attn_k.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB

[ 8/ 387] blk.0.attn_v.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB

  • [ 9/ 387] blk.0.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, quantizing to q4_0 .. size = 32.00 MB -> 9.00 MB | hist: 0.036 0.015 0.025 0.039 0.056 0.077 0.096 0.112 0.118 0.112 0.096 0.077 0.056 0.039 0.025 0.021

[ 10/ 387] blk.0.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB

  • [ 11/ 387] blk.0.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, quantizing to q4_0 .. size = 86.00 MB -> 24.19 MB | hist: 0.037 0.016 0.025 0.039 0.057 0.077 0.096 0.111 0.116 0.111 0.096 0.077 0.057 0.039 0.025 0.021

  • [ 12/ 387] blk.0.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, quantizing to q4_0 .. size = 86.00 MB -> 24.19 MB | hist: 0.037 0.016 0.026 0.039 0.057 0.077 0.096 0.110 0.116 0.110 0.096 0.077 0.057 0.040 0.026 0.021

  • [ 13/ 387] blk.0.ffn_down.weight - [11008, 4096, 1, 1], type = f16, quantizing to q4_0 .. size = 86.00 MB -> 24.19 MB | hist: 0.036 0.016 0.025 0.039 0.057 0.077 0.096 0.111 0.117 0.111 0.096 0.077 0.057 0.039 0.025 0.021

*...and so on for other tensors [14/ 387]-[385/ 387] *

The remaining 31 blocks follow a similar pattern. blk.0*-blk.31*

[ 386/ 387] output_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB

  • [ 387/ 387] output.weight - [ 4096, 151851, 1, 1], type = f16, quantizing to q6_K .. size = 1186.34 MB -> 486.58 MB | hist:

llama_model_quantize_internal: model size = 14727.19 MB

llama_model_quantize_internal: quant size = 4296.76 MB

llama_model_quantize_internal: hist: 0.036 0.016 0.025 0.039 0.056 0.077 0.096 0.111 0.117 0.111 0.096 0.077 0.057 0.039 0.025 0.021

Reference:

相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
Sklearn Translation 搞笑 VPN git-lfs Web 净利润 Tensor Claude Conda 算法题 VSCode RGB NLP Review 版权 PyCharm Pillow Qwen Ptyhon 公式 Pytorch Quantize HaggingFace Qwen2 Input 云服务器 PyTorch Agent Excel Plotly Pickle Clash Markdown OpenCV LeetCode Use 第一性原理 PDF 图标 HuggingFace Github 阿里云 COCO LoRA Miniforge Animate WebCrawler llama.cpp FP8 SQL GoogLeNet FP32 Tiktoken diffusers QWEN Land Image2Text uWSGI CEIR Qwen2.5 Plate C++ Bitcoin BeautifulSoup 飞书 Git hf Streamlit v2ray ChatGPT Django News Distillation Windows Attention PIP GPT4 Firewall JSON 强化学习 LLM 顶会 继承 Tracking uwsgi GGML tar Pandas Vim Zip Gemma Card Numpy SAM 腾讯云 Paddle SPIE IndexTTS2 关于博主 Disk VGG-16 OpenAI Statistics SVR Heatmap Interview CTC TensorFlow Math 音频 PDB Datetime DeepStream v0.dev Search BF16 Color Jetson Data scipy Ubuntu Cloudreve SQLite Bert Permission Bin CUDA tqdm CAM Hotel Shortcut Freesound Jupyter Python transformers Template XML Breakpoint ModelScope Password YOLO Algorithm FastAPI 财报 Michelin InvalidArgumentError XGBoost TSV Magnet GPTQ LLAMA Domain OCR 签证 FlashAttention CV Logo Bipartite Mixtral Anaconda 证件照 Diagram git 图形思考法 icon GIT mmap ResNet-50 UI 多进程 torchinfo CLAP CSV NLTK printf 递归学习法 Augmentation NameSilo Base64 Website Proxy Llama Baidu Quantization Linux FP64 Video Dataset Safetensors Hilton BTC Crawler Paper Hungarian Vmess Random Food Nginx Google MD5 WAN CC 报税 Knowledge AI TensorRT TTS EXCEL 多线程 LaTeX API 域名 Transformers UNIX logger DeepSeek ONNX Docker RAR FP16
站点统计

本站现有博文322篇,共被浏览792899

本站已经建立2489天!

热门文章
文章归档
回到顶部