EADST

GGML Q4_0 Quantize Analysis in llama.cpp

GGML Q4_0 Quantization in llama.cpp

For the LLAMA7B model, there are 387 tensors consisting of various weights and biases. These tensors include token_embd.weight, 32 sets of attention and feedforward network weights and biases (attn_norm.weigh, attn_q.weight, attn_k.weight, attn_v.weight, attn_q.bias, attn_k.bias, attn_v.bias, attn_output.weight, ffn_norm.weight, ffn_up.weight, ffn_gate.weight, ffn_down.weight), output_norm.weight, and output.weight.

Quantization Details:

  • Total Tensors for Quantization: 226
  • token_embd.weight
  • 32 sets of: attn_q.weight, attn_k.weight, attn_v.weight, attn_output.weight, ffn_up.weight, ffn_gate.weight, ffn_down.weight
  • output.weight

Tensor Breakdown:

  • llama_model_loader:
  • f32 type: 161 tensors
  • f16 type: 226 tensors
  • llama_model_quantize_internal:
  • Meta size: 6162784 bytes

Example Tensors:

  • [ 1/ 387] token_embd.weight - [ 4096, 151851, 1, 1], type = f16, quantizing to q4_0 .. size = 1186.34 MB -> 333.66 MB | hist: 0.036 0.016 0.025 0.039 0.057 0.077 0.096 0.111 0.117 0.111 0.096 0.077 0.057 0.039 0.025 0.021

[ 2/ 387] blk.0.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB

  • [ 3/ 387] blk.0.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, quantizing to q4_0 .. size = 32.00 MB -> 9.00 MB | hist: 0.036 0.015 0.025 0.038 0.056 0.076 0.097 0.113 0.120 0.113 0.097 0.076 0.056 0.038 0.025 0.020

  • [ 4/ 387] blk.0.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, quantizing to q4_0 .. size = 32.00 MB -> 9.00 MB | hist: 0.036 0.015 0.024 0.037 0.055 0.075 0.097 0.115 0.123 0.115 0.097 0.076 0.055 0.037 0.024 0.020

  • [ 5/ 387] blk.0.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, quantizing to q4_0 .. size = 32.00 MB -> 9.00 MB | hist: 0.036 0.016 0.025 0.039 0.056 0.076 0.096 0.112 0.119 0.112 0.096 0.076 0.056 0.039 0.025 0.021

[ 6/ 387] blk.0.attn_q.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB

[ 7/ 387] blk.0.attn_k.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB

[ 8/ 387] blk.0.attn_v.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB

  • [ 9/ 387] blk.0.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, quantizing to q4_0 .. size = 32.00 MB -> 9.00 MB | hist: 0.036 0.015 0.025 0.039 0.056 0.077 0.096 0.112 0.118 0.112 0.096 0.077 0.056 0.039 0.025 0.021

[ 10/ 387] blk.0.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB

  • [ 11/ 387] blk.0.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, quantizing to q4_0 .. size = 86.00 MB -> 24.19 MB | hist: 0.037 0.016 0.025 0.039 0.057 0.077 0.096 0.111 0.116 0.111 0.096 0.077 0.057 0.039 0.025 0.021

  • [ 12/ 387] blk.0.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, quantizing to q4_0 .. size = 86.00 MB -> 24.19 MB | hist: 0.037 0.016 0.026 0.039 0.057 0.077 0.096 0.110 0.116 0.110 0.096 0.077 0.057 0.040 0.026 0.021

  • [ 13/ 387] blk.0.ffn_down.weight - [11008, 4096, 1, 1], type = f16, quantizing to q4_0 .. size = 86.00 MB -> 24.19 MB | hist: 0.036 0.016 0.025 0.039 0.057 0.077 0.096 0.111 0.117 0.111 0.096 0.077 0.057 0.039 0.025 0.021

*...and so on for other tensors [14/ 387]-[385/ 387] *

The remaining 31 blocks follow a similar pattern. blk.0*-blk.31*

[ 386/ 387] output_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB

  • [ 387/ 387] output.weight - [ 4096, 151851, 1, 1], type = f16, quantizing to q6_K .. size = 1186.34 MB -> 486.58 MB | hist:

llama_model_quantize_internal: model size = 14727.19 MB

llama_model_quantize_internal: quant size = 4296.76 MB

llama_model_quantize_internal: hist: 0.036 0.016 0.025 0.039 0.056 0.077 0.096 0.111 0.117 0.111 0.096 0.077 0.057 0.039 0.025 0.021

Reference:

相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
PyTorch uWSGI Data 第一性原理 PyCharm Crawler Llama v2ray LLM Conda WebCrawler Ubuntu CV scipy IndexTTS2 Pillow Ptyhon UI SQL Bitcoin Hungarian Image2Text Qwen2.5 阿里云 TensorFlow 飞书 多线程 Website C++ Distillation mmap HuggingFace Diagram 域名 FP32 Tiktoken Cloudreve Vim 报税 Plate 递归学习法 PDF Jupyter hf OCR Streamlit Password RGB Baidu QWEN UNIX CEIR diffusers OpenCV Attention Datetime VSCode Review GIT tqdm COCO LaTeX FlashAttention FP64 Windows Quantize 签证 Food Color VPN BeautifulSoup DeepSeek JSON GPT4 XML Bipartite EXCEL Jetson API Augmentation Numpy Statistics Pytorch RAR VGG-16 Nginx SAM NLTK printf GoogLeNet Clash ChatGPT Permission Domain 财报 ResNet-50 Animate NLP Heatmap Django Disk Docker Anaconda Random GGML Plotly Knowledge Hilton TSV InvalidArgumentError YOLO v0.dev BF16 SQLite CC Github llama.cpp Base64 Translation Dataset Proxy Bert Safetensors Bin FP16 图形思考法 继承 Shortcut PIP SVR 公式 Pickle Linux Land Magnet DeepStream Quantization Web Interview GPTQ PDB Transformers MD5 Python Input Math git-lfs HaggingFace 关于博主 Logo Agent Tensor AI TensorRT CAM SPIE 音频 Sklearn 版权 Paddle CUDA LeetCode Qwen2 Paper Tracking WAN 净利润 NameSilo LoRA Zip FP8 Mixtral CLAP TTS CSV logger Google Pandas Template Markdown FastAPI 算法题 XGBoost Claude ModelScope 腾讯云 tar Freesound git OpenAI BTC uwsgi Video Michelin transformers 多进程 ONNX Miniforge Git Hotel Card Gemma torchinfo Use Firewall 证件照 CTC Algorithm Qwen Breakpoint Excel 搞笑 Vmess LLAMA
站点统计

本站现有博文316篇,共被浏览748381

本站已经建立2399天!

热门文章
文章归档
回到顶部