QWEN7B to LLAMA7B Model Structure| 东毅居士

QWEN7B to LLAMA7B Model Structure

作者：XD / 发表： 2023年11月13日 21:00 / 更新： 2023年11月13日 21:06 / 科研学习 / 阅读量：1545

Here is the markdown format for the LLAMA7B model structure, detailing each layer and component:

LLAMA7B Model Structure

The LLAMA7B model consists of the following layers and components:

Embedding Layer

model.embed_tokens.weight: torch.Size([151851, 4096])

Layers

Each layer in the model has the following components:

Layer 0 to Layer 31

Each layer (model.layers.[0-31]) includes:

input_layernorm.weight: torch.Size([4096])
Self-Attention Sublayer:
- q_proj.weight: torch.Size([4096, 4096])
- k_proj.weight: torch.Size([4096, 4096])
- v_proj.weight: torch.Size([4096, 4096])
- q_proj.bias: torch.Size([4096])
- k_proj.bias: torch.Size([4096])
- v_proj.bias: torch.Size([4096])
- o_proj.weight: torch.Size([4096, 4096])
- post_attention_layernorm.weight: torch.Size([4096])
MLP (Multi-Layer Perceptron) Sublayer:
- up_proj.weight: torch.Size([11008, 4096])
- gate_proj.weight: torch.Size([11008, 4096])
- down_proj.weight: torch.Size([4096, 11008])

Final Layer Normalization and Output

model.norm.weight: torch.Size([4096])
lm_head.weight: torch.Size([151851, 4096])

本文作者：XD 转载请标明出处：http://www.eadst.com/blog/216

本站采用知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议进行许可。

上一篇
GGML Q4_0 Quantize Analysis in llama.cpp

下一篇
QWEN7B to LLAMA GPTQ model structure

相关标签

QWEN LLAMA

About Me

XD

Goals determine what you are going to be.

Category

标签云

MD5 Git hf LLM Michelin FlashAttention Sklearn Cloudreve LeetCode Base64 CUDA Plotly Anaconda Qwen2.5 tqdm CAM Dataset Numpy FP32 Qwen Pytorch Qwen2 Food printf Random Zip Interview Land AI TensorRT 搞笑 DeepStream Domain PDB QWEN XML SQLite git 阿里云 Algorithm mmap Crawler Disk Nginx GoogLeNet Firewall NLTK Jetson WebCrawler Knowledge Datetime Attention Data Pillow Llama 域名 ModelScope Vim YOLO Bert Clash Tracking UNIX CTC InvalidArgumentError Website API v2ray HuggingFace Statistics PyTorch Diagram PIP Tiktoken SPIE XGBoost Gemma Distillation Baidu tar HaggingFace GIT Paddle Augmentation Video VPN Logo BeautifulSoup GPT4 Math OpenAI TSV EXCEL logger Hilton diffusers Quantization FastAPI GPTQ CV Pandas VGG-16 scipy 公式 Permission Color ChatGPT Hungarian LaTeX Input Linux PDF Bitcoin 报税 VSCode uwsgi GGML v0.dev FP8 ONNX Conda Streamlit Password transformers RAR Vmess RGB Markdown 算法题 Hotel PyCharm Translation 飞书 Ptyhon Pickle Paper Python Card BF16 uWSGI Plate Shortcut Docker Web Heatmap SQL llama.cpp Review Use FP16 LoRA Quantize Claude Tensor 腾讯云 SVR UI Template Bin DeepSeek Mixtral Breakpoint Ubuntu Django Transformers NLP JSON Proxy Image2Text torchinfo Safetensors Excel git-lfs Magnet CSV 证件照 TensorFlow ResNet-50 NameSilo OpenCV CEIR COCO 签证 C++ 关于博主 BTC FP64 Google Windows OCR Bipartite LLAMA Github

站点统计

本站现有博文295篇,共被浏览645660次

本站已经建立2225天!

热门文章

文章归档