EADST

QWEN7B to LLAMA7B Model Structure

Here is the markdown format for the LLAMA7B model structure, detailing each layer and component:


LLAMA7B Model Structure

The LLAMA7B model consists of the following layers and components:

Embedding Layer

  • model.embed_tokens.weight: torch.Size([151851, 4096])

Layers

Each layer in the model has the following components:

Layer 0 to Layer 31

Each layer (model.layers.[0-31]) includes:

  • input_layernorm.weight: torch.Size([4096])

  • Self-Attention Sublayer:

    • q_proj.weight: torch.Size([4096, 4096])

    • k_proj.weight: torch.Size([4096, 4096])

    • v_proj.weight: torch.Size([4096, 4096])

    • q_proj.bias: torch.Size([4096])

    • k_proj.bias: torch.Size([4096])

    • v_proj.bias: torch.Size([4096])

    • o_proj.weight: torch.Size([4096, 4096])

    • post_attention_layernorm.weight: torch.Size([4096])

  • MLP (Multi-Layer Perceptron) Sublayer:

    • up_proj.weight: torch.Size([11008, 4096])

    • gate_proj.weight: torch.Size([11008, 4096])

    • down_proj.weight: torch.Size([4096, 11008])

Final Layer Normalization and Output

  • model.norm.weight: torch.Size([4096])
  • lm_head.weight: torch.Size([151851, 4096])
相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
UI Clash PDF Paddle Hilton Paper Hotel RAR GoogLeNet 签证 logger CEIR IndexTTS2 Google 递归学习法 Land InvalidArgumentError Food API LLAMA TSV Cloudreve Shortcut Disk Linux Statistics CUDA Gemma Plotly Qwen2.5 Random Magnet Claude Input OCR NameSilo PDB CLAP Docker ModelScope Ptyhon Crawler Logo 第一性原理 Search COCO BeautifulSoup BTC ONNX Anaconda Quantize OpenCV LaTeX uWSGI DeepStream 净利润 Bert Hungarian tar CTC Quantization Use PyCharm NLTK mmap Website Pytorch GPT4 Transformers diffusers 算法题 printf Algorithm Streamlit 域名 Python CC v2ray Github Vim Baidu Miniforge VPN XML VGG-16 News 腾讯云 VSCode OpenAI TensorFlow WebCrawler v0.dev NLP Template Michelin Pickle 公式 图形思考法 HaggingFace Math Bipartite DeepSeek Jetson Base64 Password Domain transformers Agent ResNet-50 Zip scipy MD5 Mixtral FP16 FastAPI Augmentation git torchinfo EXCEL 证件照 阿里云 hf 继承 Bitcoin 版权 C++ Numpy JSON 音频 PIP FP64 Data 多进程 Diagram Animate CAM GGML 飞书 Django 顶会 Qwen Bin Nginx Color 财报 Image2Text GIT TTS Plate YOLO Vmess Jupyter 关于博主 LoRA Tensor Translation Permission Tiktoken Distillation FP32 报税 Qwen2 Excel Video PyTorch SPIE Datetime Freesound Interview CSV 云服务器 QWEN Heatmap Attention TensorRT GPTQ Llama Pandas Windows LLM XGBoost Conda Git Breakpoint Safetensors llama.cpp Sklearn uwsgi SQL Ubuntu LeetCode FlashAttention CV 多线程 Pillow tqdm SAM HuggingFace Card Dataset Knowledge 搞笑 Tracking RGB Firewall Markdown Web 强化学习 ChatGPT AI SVR WAN UNIX git-lfs FP8 Proxy SQLite BF16 Review
站点统计

本站现有博文321篇,共被浏览764596

本站已经建立2442天!

热门文章
文章归档
回到顶部