QWEN7B to LLAMA7B Model Structure
作者:XD / 发表: 2023年11月13日 21:00 / 更新: 2023年11月13日 21:06 / 科研学习 / 阅读量:510
Here is the markdown format for the LLAMA7B model structure, detailing each layer and component:
LLAMA7B Model Structure
The LLAMA7B model consists of the following layers and components:
Embedding Layer
model.embed_tokens.weight
:torch.Size([151851, 4096])
Layers
Each layer in the model has the following components:
Layer 0 to Layer 31
Each layer (model.layers.[0-31]
) includes:
-
input_layernorm.weight
:torch.Size([4096])
-
Self-Attention Sublayer:
-
q_proj.weight
:torch.Size([4096, 4096])
-
k_proj.weight
:torch.Size([4096, 4096])
-
v_proj.weight
:torch.Size([4096, 4096])
-
q_proj.bias
:torch.Size([4096])
-
k_proj.bias
:torch.Size([4096])
-
v_proj.bias
:torch.Size([4096])
-
o_proj.weight
:torch.Size([4096, 4096])
-
post_attention_layernorm.weight
:torch.Size([4096])
-
-
MLP (Multi-Layer Perceptron) Sublayer:
-
up_proj.weight
:torch.Size([11008, 4096])
-
gate_proj.weight
:torch.Size([11008, 4096])
-
down_proj.weight
:torch.Size([4096, 11008])
-
Final Layer Normalization and Output
model.norm.weight
:torch.Size([4096])
lm_head.weight
:torch.Size([151851, 4096])