EADST

QWEN7B to LLAMA7B Model Structure

Here is the markdown format for the LLAMA7B model structure, detailing each layer and component:


LLAMA7B Model Structure

The LLAMA7B model consists of the following layers and components:

Embedding Layer

  • model.embed_tokens.weight: torch.Size([151851, 4096])

Layers

Each layer in the model has the following components:

Layer 0 to Layer 31

Each layer (model.layers.[0-31]) includes:

  • input_layernorm.weight: torch.Size([4096])

  • Self-Attention Sublayer:

    • q_proj.weight: torch.Size([4096, 4096])

    • k_proj.weight: torch.Size([4096, 4096])

    • v_proj.weight: torch.Size([4096, 4096])

    • q_proj.bias: torch.Size([4096])

    • k_proj.bias: torch.Size([4096])

    • v_proj.bias: torch.Size([4096])

    • o_proj.weight: torch.Size([4096, 4096])

    • post_attention_layernorm.weight: torch.Size([4096])

  • MLP (Multi-Layer Perceptron) Sublayer:

    • up_proj.weight: torch.Size([11008, 4096])

    • gate_proj.weight: torch.Size([11008, 4096])

    • down_proj.weight: torch.Size([4096, 11008])

Final Layer Normalization and Output

  • model.norm.weight: torch.Size([4096])
  • lm_head.weight: torch.Size([151851, 4096])
相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
第一性原理 论文速读 TensorRT Template torchinfo ResNet-50 SPIE Pandas NameSilo llama.cpp Land WAN Mixtral Hilton SQL SVR Pytorch Rebuttal Ptyhon Pickle Clash Augmentation Web CUDA Bin PIP Input Hotel Paper LaTeX XML ChatGPT 云服务器 Breakpoint Interview Qwen2.5 Git Video PDB Card EXCEL 关于博主 Magnet LoRA 顶会 uwsgi transformers 飞书 AI HuggingFace 阿里云 CLAP Shortcut Algorithm scipy Cloudreve 报税 ONNX Baidu Tensor OCR logger Random Ubuntu QWEN uWSGI icon 图形思考法 hf Paddle Streamlit Michelin Search VSCode JSON Review 强化学习 YOLO Diagram CSV Animate Permission Data v0.dev tqdm Use ModelScope TensorFlow Windows GGML Freesound DeepStream CV TTS NLP Jetson 搞笑 Vmess v2ray Translation Food 财报 Firewall UI Image2Text CAM Github printf diffusers Statistics Excel 证件照 Disk Heatmap PyTorch News FastAPI SQLite Proxy Bipartite Color XGBoost FP8 FP64 VPN UNIX Miniforge Conda Django Website Python WebCrawler VGG-16 Logo Vim git-lfs LLM OpenCV Distillation GPTQ Plate Bitcoin BeautifulSoup Claude Base64 音频 Quantization PDF Plotly Tiktoken COCO Attention Knowledge Pillow OpenAI API Math 签证 Agent Google FP32 git Linux Dataset Nginx Gemma GPT4 InvalidArgumentError Bert 净利润 论文 MD5 NLTK CEIR RAR TSV 多线程 Datetime tar LLAMA CC C++ Qwen FP16 Llama BF16 Domain Safetensors 多进程 图标 GIT BTC 继承 Quantize Numpy DeepSeek IndexTTS2 Password Sklearn 域名 Anaconda 版权 mmap 算法题 Tracking Transformers 公式 FlashAttention Qwen2 Hungarian Markdown Crawler 腾讯云 Docker LeetCode RGB Zip GoogLeNet Jupyter HaggingFace CTC PyCharm SAM 递归学习法
站点统计

本站现有博文328篇,共被浏览850672

本站已经建立2557天!

热门文章
文章归档
回到顶部