EADST

QWEN7B to LLAMA7B Model Structure

Here is the markdown format for the LLAMA7B model structure, detailing each layer and component:


LLAMA7B Model Structure

The LLAMA7B model consists of the following layers and components:

Embedding Layer

  • model.embed_tokens.weight: torch.Size([151851, 4096])

Layers

Each layer in the model has the following components:

Layer 0 to Layer 31

Each layer (model.layers.[0-31]) includes:

  • input_layernorm.weight: torch.Size([4096])

  • Self-Attention Sublayer:

    • q_proj.weight: torch.Size([4096, 4096])

    • k_proj.weight: torch.Size([4096, 4096])

    • v_proj.weight: torch.Size([4096, 4096])

    • q_proj.bias: torch.Size([4096])

    • k_proj.bias: torch.Size([4096])

    • v_proj.bias: torch.Size([4096])

    • o_proj.weight: torch.Size([4096, 4096])

    • post_attention_layernorm.weight: torch.Size([4096])

  • MLP (Multi-Layer Perceptron) Sublayer:

    • up_proj.weight: torch.Size([11008, 4096])

    • gate_proj.weight: torch.Size([11008, 4096])

    • down_proj.weight: torch.Size([4096, 11008])

Final Layer Normalization and Output

  • model.norm.weight: torch.Size([4096])
  • lm_head.weight: torch.Size([151851, 4096])
相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
SQLite 腾讯云 logger BTC TensorRT RGB Random OpenAI Website DeepSeek Excel Pytorch Firewall 第一性原理 JSON Pandas XGBoost Plotly Logo VGG-16 论文速读 Miniforge v0.dev Input Transformers Windows Breakpoint Streamlit 继承 Knowledge GPT4 Magnet torchinfo SVR PyCharm Plate 音频 Math Heatmap UI CUDA Dataset RAR HaggingFace Qwen2 FastAPI 论文 Gemma Ubuntu Vim LeetCode IndexTTS2 diffusers LoRA CAM Video Github Animate CTC Jetson Color Numpy Jupyter 搞笑 Permission 顶会 CC Quantize Linux icon Mixtral printf Llama 算法题 COCO LaTeX GGML Hotel Domain NLTK Bert News Claude Bitcoin Image2Text CLAP Base64 git PyTorch PIP v2ray Search PDB TSV uwsgi Pickle Algorithm YOLO Markdown Cloudreve Land ChatGPT Docker EXCEL ONNX Nginx MD5 GoogLeNet uWSGI Rebuttal HuggingFace 公式 mmap Paper SPIE CV LLAMA TensorFlow FlashAttention BF16 FP64 Paddle Pillow Hilton Django Conda NameSilo 关于博主 WAN scipy Agent Data QWEN XML Augmentation 多线程 DeepStream Qwen2.5 Hungarian Safetensors VPN FP32 hf tqdm InvalidArgumentError Python Tensor Anaconda 递归学习法 Tracking 签证 transformers 云服务器 Card GIT tar Attention Google Git 净利润 VSCode Quantization Sklearn 多进程 Zip Freesound Qwen FP8 PDF Diagram API Use OCR Tiktoken 域名 ModelScope ResNet-50 git-lfs Food Template Shortcut Clash Crawler C++ CEIR 图标 FP16 飞书 强化学习 llama.cpp 图形思考法 Baidu TTS CSV Translation 报税 LLM Ptyhon BeautifulSoup Vmess Interview UNIX Proxy OpenCV 证件照 财报 NLP Disk Web Statistics SQL Password 版权 Review Distillation WebCrawler Michelin SAM AI Bin GPTQ 阿里云 Datetime Bipartite
站点统计

本站现有博文328篇,共被浏览850627

本站已经建立2557天!

热门文章
文章归档
回到顶部