EADST

GGML Q4_0 Quantize Analysis in llama.cpp

GGML Q4_0 Quantization in llama.cpp

For the LLAMA7B model, there are 387 tensors consisting of various weights and biases. These tensors include token_embd.weight, 32 sets of attention and feedforward network weights and biases (attn_norm.weigh, attn_q.weight, attn_k.weight, attn_v.weight, attn_q.bias, attn_k.bias, attn_v.bias, attn_output.weight, ffn_norm.weight, ffn_up.weight, ffn_gate.weight, ffn_down.weight), output_norm.weight, and output.weight.

Quantization Details:

  • Total Tensors for Quantization: 226
  • token_embd.weight
  • 32 sets of: attn_q.weight, attn_k.weight, attn_v.weight, attn_output.weight, ffn_up.weight, ffn_gate.weight, ffn_down.weight
  • output.weight

Tensor Breakdown:

  • llama_model_loader:
  • f32 type: 161 tensors
  • f16 type: 226 tensors
  • llama_model_quantize_internal:
  • Meta size: 6162784 bytes

Example Tensors:

  • [ 1/ 387] token_embd.weight - [ 4096, 151851, 1, 1], type = f16, quantizing to q4_0 .. size = 1186.34 MB -> 333.66 MB | hist: 0.036 0.016 0.025 0.039 0.057 0.077 0.096 0.111 0.117 0.111 0.096 0.077 0.057 0.039 0.025 0.021

[ 2/ 387] blk.0.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB

  • [ 3/ 387] blk.0.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, quantizing to q4_0 .. size = 32.00 MB -> 9.00 MB | hist: 0.036 0.015 0.025 0.038 0.056 0.076 0.097 0.113 0.120 0.113 0.097 0.076 0.056 0.038 0.025 0.020

  • [ 4/ 387] blk.0.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, quantizing to q4_0 .. size = 32.00 MB -> 9.00 MB | hist: 0.036 0.015 0.024 0.037 0.055 0.075 0.097 0.115 0.123 0.115 0.097 0.076 0.055 0.037 0.024 0.020

  • [ 5/ 387] blk.0.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, quantizing to q4_0 .. size = 32.00 MB -> 9.00 MB | hist: 0.036 0.016 0.025 0.039 0.056 0.076 0.096 0.112 0.119 0.112 0.096 0.076 0.056 0.039 0.025 0.021

[ 6/ 387] blk.0.attn_q.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB

[ 7/ 387] blk.0.attn_k.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB

[ 8/ 387] blk.0.attn_v.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB

  • [ 9/ 387] blk.0.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, quantizing to q4_0 .. size = 32.00 MB -> 9.00 MB | hist: 0.036 0.015 0.025 0.039 0.056 0.077 0.096 0.112 0.118 0.112 0.096 0.077 0.056 0.039 0.025 0.021

[ 10/ 387] blk.0.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB

  • [ 11/ 387] blk.0.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, quantizing to q4_0 .. size = 86.00 MB -> 24.19 MB | hist: 0.037 0.016 0.025 0.039 0.057 0.077 0.096 0.111 0.116 0.111 0.096 0.077 0.057 0.039 0.025 0.021

  • [ 12/ 387] blk.0.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, quantizing to q4_0 .. size = 86.00 MB -> 24.19 MB | hist: 0.037 0.016 0.026 0.039 0.057 0.077 0.096 0.110 0.116 0.110 0.096 0.077 0.057 0.040 0.026 0.021

  • [ 13/ 387] blk.0.ffn_down.weight - [11008, 4096, 1, 1], type = f16, quantizing to q4_0 .. size = 86.00 MB -> 24.19 MB | hist: 0.036 0.016 0.025 0.039 0.057 0.077 0.096 0.111 0.117 0.111 0.096 0.077 0.057 0.039 0.025 0.021

*...and so on for other tensors [14/ 387]-[385/ 387] *

The remaining 31 blocks follow a similar pattern. blk.0*-blk.31*

[ 386/ 387] output_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB

  • [ 387/ 387] output.weight - [ 4096, 151851, 1, 1], type = f16, quantizing to q6_K .. size = 1186.34 MB -> 486.58 MB | hist:

llama_model_quantize_internal: model size = 14727.19 MB

llama_model_quantize_internal: quant size = 4296.76 MB

llama_model_quantize_internal: hist: 0.036 0.016 0.025 0.039 0.056 0.077 0.096 0.111 0.117 0.111 0.096 0.077 0.057 0.039 0.025 0.021

Reference:

相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
Diagram 顶会 音频 Magnet GGML WebCrawler transformers Pandas Hotel ONNX Conda 报税 Sklearn TensorFlow 搞笑 OpenAI Paddle PDF Zip VPN 继承 Attention Llama PIP Augmentation Pickle Quantize Git ResNet-50 财报 Qwen2 第一性原理 PDB FP8 签证 递归学习法 FP64 Pillow Heatmap Docker Qwen Numpy Transformers Clash Streamlit Algorithm VGG-16 Michelin uwsgi Qwen2.5 Claude Python Image2Text git-lfs VSCode Hungarian CLAP Statistics QWEN 强化学习 Miniforge UNIX Base64 公式 DeepSeek Datetime Safetensors Django Animate Card RAR XGBoost 多线程 Vim BTC EXCEL Domain SPIE Github RGB Food LLM OpenCV Agent Nginx Logo ChatGPT Shortcut CUDA Cloudreve LoRA Hilton 版权 ModelScope Tiktoken Distillation TensorRT Proxy Color SAM News Firewall 云服务器 Permission 论文速读 InvalidArgumentError LaTeX 腾讯云 GIT llama.cpp Bipartite Web Jetson LeetCode Search Mixtral 论文 Pytorch TTS Knowledge Breakpoint Tensor CEIR YOLO Plotly 净利润 PyTorch Bert Bitcoin Password Markdown Bin Input GPTQ 图形思考法 Gemma XML Interview ms-swift CC FP32 TSV NameSilo v0.dev FP16 CSV git UI JSON v2ray API 图标 hf Video Math NLTK Excel Disk logger HuggingFace Ubuntu icon HaggingFace Land uWSGI WAN tar Ptyhon 飞书 SQLite torchinfo Plate NLP Google FlashAttention Vmess Template Review Quantization Random Paper CAM tqdm 证件照 Data C++ 域名 CTC Website IndexTTS2 Windows printf RL SQL GPT4 Linux 算法题 mmap Rebuttal OCR MD5 SVR Dataset 关于博主 Crawler Tracking COCO PyCharm diffusers 阿里云 Anaconda Translation BeautifulSoup 多进程 Jupyter LLAMA DeepStream scipy BF16 CV Use AI Freesound GoogLeNet FastAPI Baidu
站点统计

本站现有博文332篇,共被浏览869198

本站已经建立2577天!

热门文章
文章归档
回到顶部