QWEN7B to LLAMA GPTQ model structure
作者:XD / 发表: 2023年11月13日 21:32 / 更新: 2023年11月13日 21:32 / 科研学习 / 阅读量:779
Here is the markdown format for the GPTQ model structure, detailing each layer and component:
GPTQ Model Structure
The GPTQ model consists of the following layers and components:
Embedding Layer
model.embed_tokens.weight
:torch.Size([151851, 4096])
Layers
Each layer in the model has the following components:
Layer 0 to Layer 31
Each layer (model.layers.[0-31]
) includes:
-
input_layernorm.weight
:torch.Size([4096])
-
Self-Attention Sublayer:
-
k_proj
:-
qweight
:torch.Size([512, 4096])
-
qzeros
:torch.Size([32, 512])
-
scales
:torch.Size([32, 4096])
-
g_idx
:torch.Size([4096])
-
bias
:torch.Size([4096])
-
-
o_proj
:-
qweight
:torch.Size([512, 4096])
-
qzeros
:torch.Size([32, 512])
-
scales
:torch.Size([32, 4096])
-
g_idx
:torch.Size([4096])
-
bias
:torch.Size([4096])
-
-
q_proj
:-
qweight
:torch.Size([512, 4096])
-
qzeros
:torch.Size([32, 512])
-
scales
:torch.Size([32, 4096])
-
g_idx
:torch.Size([4096])
-
bias
:torch.Size([4096])
-
-
v_proj
:-
qweight
:torch.Size([512, 4096])
-
qzeros
:torch.Size([32, 512])
-
scales
:torch.Size([32, 4096])
-
g_idx
:torch.Size([4096])
-
bias
:torch.Size([4096])
-
-
-
MLP (Multi-Layer Perceptron) Sublayer:
-
down_proj
:-
qweight
:torch.Size([1376, 4096])
-
qzeros
:torch.Size([86, 512])
-
scales
:torch.Size([86, 4096])
-
g_idx
:torch.Size([11008])
-
bias
:torch.Size([4096])
-
-
gate_proj
:-
qweight
:torch.Size([512, 11008])
-
qzeros
:torch.Size([32, 1376])
-
scales
:torch.Size([32, 11008])
-
g_idx
:torch.Size([4096])
-
bias
:torch.Size([11008])
-
-
up_proj
:-
qweight
:torch.Size([512, 11008])
-
qzeros
:torch.Size([32, 1376])
-
scales
:torch.Size([32, 11008])
-
g_idx
:torch.Size([4096])
-
bias
:torch.Size([11008])
-
-
-
post_attention_layernorm.weight
:torch.Size([4096])
Final Layer Normalization and Output
model.norm.weight
:torch.Size([4096])
lm_head.weight
:torch.Size([151851, 4096])