EADST

QWEN7B to LLAMA7B Model Structure

Here is the markdown format for the LLAMA7B model structure, detailing each layer and component:


LLAMA7B Model Structure

The LLAMA7B model consists of the following layers and components:

Embedding Layer

  • model.embed_tokens.weight: torch.Size([151851, 4096])

Layers

Each layer in the model has the following components:

Layer 0 to Layer 31

Each layer (model.layers.[0-31]) includes:

  • input_layernorm.weight: torch.Size([4096])

  • Self-Attention Sublayer:

    • q_proj.weight: torch.Size([4096, 4096])

    • k_proj.weight: torch.Size([4096, 4096])

    • v_proj.weight: torch.Size([4096, 4096])

    • q_proj.bias: torch.Size([4096])

    • k_proj.bias: torch.Size([4096])

    • v_proj.bias: torch.Size([4096])

    • o_proj.weight: torch.Size([4096, 4096])

    • post_attention_layernorm.weight: torch.Size([4096])

  • MLP (Multi-Layer Perceptron) Sublayer:

    • up_proj.weight: torch.Size([11008, 4096])

    • gate_proj.weight: torch.Size([11008, 4096])

    • down_proj.weight: torch.Size([4096, 11008])

Final Layer Normalization and Output

  • model.norm.weight: torch.Size([4096])
  • lm_head.weight: torch.Size([151851, 4096])
相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
CC CLAP Base64 Attention Github TSV Mixtral 证件照 ONNX 多线程 InvalidArgumentError 版权 Translation BTC Firewall TensorFlow tar PDF Crawler FlashAttention Baidu Input Markdown ModelScope diffusers 公式 MD5 Pickle NLTK Distillation ChatGPT LLM GPT4 Tensor Permission Transformers 报税 Math Cloudreve 飞书 SQLite Pandas Qwen2.5 Algorithm Magnet Python 多进程 FP16 YOLO TensorRT OpenCV Review Card Pytorch API CUDA Ptyhon FastAPI C++ mmap 图标 图形思考法 transformers TTS JSON 阿里云 VPN Conda Numpy COCO Disk Miniforge 关于博主 Statistics Jupyter PyCharm UNIX Color Zip scipy Search Template GIT Streamlit Excel Linux Quantization hf OCR WAN Diagram UI Google 财报 Hilton 顶会 ResNet-50 v2ray FP32 DeepStream 搞笑 Land Jetson Augmentation tqdm Qwen Image2Text Bipartite Knowledge git RAR HuggingFace HaggingFace Vmess OpenAI Pillow 音频 Sklearn XML Dataset Llama NLP 净利润 Paddle News Use RGB FP8 第一性原理 uWSGI Vim Quantize Ubuntu Animate 腾讯云 XGBoost Docker Django Proxy AI WebCrawler Datetime SPIE BeautifulSoup torchinfo CAM 签证 CSV Gemma IndexTTS2 DeepSeek FP64 Web LLAMA v0.dev GoogLeNet QWEN Tiktoken CEIR Michelin NameSilo icon Shortcut GGML Freesound Bert GPTQ git-lfs Claude Git Logo PyTorch Plate logger printf SVR Heatmap Password Windows LoRA Safetensors Random SAM Bin Food uwsgi Video CV 强化学习 PDB Bitcoin EXCEL Anaconda Paper PIP VGG-16 LaTeX SQL Domain Tracking Plotly 递归学习法 LeetCode Nginx Hotel llama.cpp BF16 Agent Clash 云服务器 Qwen2 Breakpoint 算法题 Hungarian 继承 CTC Interview 域名 VSCode Website Data
站点统计

本站现有博文322篇,共被浏览792696

本站已经建立2489天!

热门文章
文章归档
回到顶部