EADST

QWEN7B to LLAMA7B Model Structure

Here is the markdown format for the LLAMA7B model structure, detailing each layer and component:


LLAMA7B Model Structure

The LLAMA7B model consists of the following layers and components:

Embedding Layer

  • model.embed_tokens.weight: torch.Size([151851, 4096])

Layers

Each layer in the model has the following components:

Layer 0 to Layer 31

Each layer (model.layers.[0-31]) includes:

  • input_layernorm.weight: torch.Size([4096])

  • Self-Attention Sublayer:

    • q_proj.weight: torch.Size([4096, 4096])

    • k_proj.weight: torch.Size([4096, 4096])

    • v_proj.weight: torch.Size([4096, 4096])

    • q_proj.bias: torch.Size([4096])

    • k_proj.bias: torch.Size([4096])

    • v_proj.bias: torch.Size([4096])

    • o_proj.weight: torch.Size([4096, 4096])

    • post_attention_layernorm.weight: torch.Size([4096])

  • MLP (Multi-Layer Perceptron) Sublayer:

    • up_proj.weight: torch.Size([11008, 4096])

    • gate_proj.weight: torch.Size([11008, 4096])

    • down_proj.weight: torch.Size([4096, 11008])

Final Layer Normalization and Output

  • model.norm.weight: torch.Size([4096])
  • lm_head.weight: torch.Size([151851, 4096])
相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
Windows WebCrawler SQLite Git QWEN Plate IndexTTS2 Logo PyCharm Python 财报 Card OpenAI 云服务器 Vim Tracking torchinfo Permission FlashAttention Statistics Quantization llama.cpp YOLO OCR Docker Hungarian Augmentation BF16 Markdown Llama CV transformers C++ RGB EXCEL Review MD5 NameSilo diffusers Tiktoken Jetson 图形思考法 GGML Color Bitcoin UNIX NLTK Safetensors FP16 Algorithm Sklearn 净利润 Baidu API JSON Ubuntu logger Linux LoRA CAM ONNX Knowledge Use SVR Conda Zip Datetime CLAP ModelScope LeetCode VGG-16 Github InvalidArgumentError Animate Pytorch Bert git-lfs uwsgi HaggingFace Paddle WAN Miniforge Attention Agent COCO ChatGPT Mixtral 域名 Password Quantize Pandas uWSGI Magnet Firewall Base64 Data LLM Search Streamlit 多线程 BeautifulSoup Numpy Random Claude Web 阿里云 Bipartite Heatmap CSV 腾讯云 Django Google UI 继承 Disk FP32 Pillow Shortcut printf mmap BTC 版权 Domain DeepStream v0.dev Qwen2 Image2Text 顶会 递归学习法 Qwen2.5 GoogLeNet HuggingFace CC Bin XML 强化学习 算法题 Dataset Anaconda VPN Vmess Freesound Input 关于博主 Pickle Ptyhon 第一性原理 Website LaTeX SAM Diagram 签证 CEIR LLAMA tar Land Paper XGBoost DeepSeek scipy Template Clash TensorFlow v2ray Tensor 搞笑 FP8 Video 多进程 News 飞书 Math 证件照 Gemma TSV Jupyter TTS Hilton Food 公式 CUDA Translation ResNet-50 PDB AI tqdm hf GPTQ PyTorch TensorRT Crawler Transformers GIT SPIE Cloudreve SQL Hotel PDF CTC VSCode Interview PIP Qwen FP64 Nginx 报税 Plotly Distillation OpenCV NLP 音频 Excel FastAPI Proxy git Michelin Breakpoint RAR GPT4
站点统计

本站现有博文321篇,共被浏览776817

本站已经建立2468天!

热门文章
文章归档
回到顶部