EADST

QWEN7B to LLAMA7B Model Structure

Here is the markdown format for the LLAMA7B model structure, detailing each layer and component:


LLAMA7B Model Structure

The LLAMA7B model consists of the following layers and components:

Embedding Layer

  • model.embed_tokens.weight: torch.Size([151851, 4096])

Layers

Each layer in the model has the following components:

Layer 0 to Layer 31

Each layer (model.layers.[0-31]) includes:

  • input_layernorm.weight: torch.Size([4096])

  • Self-Attention Sublayer:

    • q_proj.weight: torch.Size([4096, 4096])

    • k_proj.weight: torch.Size([4096, 4096])

    • v_proj.weight: torch.Size([4096, 4096])

    • q_proj.bias: torch.Size([4096])

    • k_proj.bias: torch.Size([4096])

    • v_proj.bias: torch.Size([4096])

    • o_proj.weight: torch.Size([4096, 4096])

    • post_attention_layernorm.weight: torch.Size([4096])

  • MLP (Multi-Layer Perceptron) Sublayer:

    • up_proj.weight: torch.Size([11008, 4096])

    • gate_proj.weight: torch.Size([11008, 4096])

    • down_proj.weight: torch.Size([4096, 11008])

Final Layer Normalization and Output

  • model.norm.weight: torch.Size([4096])
  • lm_head.weight: torch.Size([151851, 4096])
相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
PyCharm Search uWSGI Domain 继承 Datetime WebCrawler Attention QWEN Diagram Markdown 音频 VSCode uwsgi Pillow hf Tiktoken BTC Logo CLAP TSV v0.dev C++ Password CC Tensor Agent News 签证 SQL LaTeX JSON Ptyhon Safetensors llama.cpp GGML OpenCV FP8 AI Bitcoin Miniforge VPN Hungarian Web DeepSeek Review 报税 NameSilo GoogLeNet Hilton FP16 Quantize OpenAI Qwen2.5 递归学习法 CUDA tqdm HuggingFace 公式 FastAPI Pandas Jupyter Translation FlashAttention torchinfo SVR Distillation 飞书 UI git-lfs SPIE Bipartite MD5 Pickle Baidu TensorFlow Magnet Color Math Bin 腾讯云 HaggingFace Input Heatmap Plate Llama Tracking 多进程 TensorRT Google diffusers scipy Anaconda PDF RGB 云服务器 Knowledge CSV 证件照 Sklearn 算法题 NLP 关于博主 Card Jetson Breakpoint Vmess EXCEL FP32 Michelin Paper Shortcut logger mmap Ubuntu TTS CV Firewall XML Crawler Freesound UNIX Qwen transformers Git Nginx Conda RAR Website Streamlit 强化学习 CAM Linux GPTQ Augmentation Github GPT4 git Zip InvalidArgumentError 版权 LoRA Algorithm 图形思考法 Excel Transformers Mixtral SQLite Animate Dataset Land YOLO Claude ChatGPT 域名 Windows API Python DeepStream Image2Text OCR BF16 CTC Proxy Vim PDB PyTorch GIT Bert ModelScope CEIR IndexTTS2 BeautifulSoup 财报 NLTK Numpy 搞笑 LLM XGBoost Docker LLAMA Paddle 多线程 Video VGG-16 Clash Permission Base64 Food Quantization Random Interview FP64 PIP v2ray Plotly Template LeetCode tar Django printf Qwen2 Hotel 顶会 COCO Use Disk Cloudreve SAM Gemma ResNet-50 第一性原理 净利润 ONNX Statistics WAN Pytorch 阿里云 Data
站点统计

本站现有博文321篇,共被浏览776086

本站已经建立2466天!

热门文章
文章归档
回到顶部