EADST

QWEN7B to LLAMA7B Model Structure

Here is the markdown format for the LLAMA7B model structure, detailing each layer and component:


LLAMA7B Model Structure

The LLAMA7B model consists of the following layers and components:

Embedding Layer

  • model.embed_tokens.weight: torch.Size([151851, 4096])

Layers

Each layer in the model has the following components:

Layer 0 to Layer 31

Each layer (model.layers.[0-31]) includes:

  • input_layernorm.weight: torch.Size([4096])

  • Self-Attention Sublayer:

    • q_proj.weight: torch.Size([4096, 4096])

    • k_proj.weight: torch.Size([4096, 4096])

    • v_proj.weight: torch.Size([4096, 4096])

    • q_proj.bias: torch.Size([4096])

    • k_proj.bias: torch.Size([4096])

    • v_proj.bias: torch.Size([4096])

    • o_proj.weight: torch.Size([4096, 4096])

    • post_attention_layernorm.weight: torch.Size([4096])

  • MLP (Multi-Layer Perceptron) Sublayer:

    • up_proj.weight: torch.Size([11008, 4096])

    • gate_proj.weight: torch.Size([11008, 4096])

    • down_proj.weight: torch.Size([4096, 11008])

Final Layer Normalization and Output

  • model.norm.weight: torch.Size([4096])
  • lm_head.weight: torch.Size([151851, 4096])
相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
Transformers NLP Diagram FP8 CC Color Permission Random OpenCV Numpy Vmess 报税 Python VPN Domain v0.dev TensorRT C++ Jupyter Plotly Use Breakpoint WebCrawler CLAP 阿里云 Google torchinfo Windows HaggingFace Pandas PyCharm Bipartite 域名 GPT4 printf Git TTS TSV 多线程 mmap Statistics ONNX FP16 Magnet OpenAI git Rebuttal LLAMA 图标 InvalidArgumentError Nginx Michelin Django 飞书 VSCode Gemma GPTQ 搞笑 Agent PIP DeepSeek BeautifulSoup ModelScope SPIE Qwen2.5 Math FP32 HuggingFace TensorFlow CEIR 音频 PDF CSV Firewall v2ray Knowledge RGB XGBoost Markdown OCR Datetime LeetCode Hotel Docker Logo Website Zip Data 论文速读 Bert git-lfs SQLite GoogLeNet Card CAM MD5 Attention Template Tracking Conda WAN Cloudreve QWEN Paddle Streamlit XML diffusers logger 云服务器 公式 Miniforge Web NameSilo GIT Animate CUDA Dataset Ubuntu Qwen2 Qwen SVR News Crawler uWSGI 财报 EXCEL hf Hungarian BF16 Claude Food Input GGML uwsgi LLM Quantization SQL llama.cpp YOLO Mixtral Tiktoken Review Llama Tensor BTC Plate 证件照 Password JSON Video 图形思考法 顶会 Pickle COCO Pytorch Augmentation Disk transformers Translation Excel Sklearn FP64 IndexTTS2 Linux Baidu Ptyhon Search Clash 递归学习法 签证 LoRA PyTorch Land Bitcoin Image2Text CV Pillow API Heatmap Hilton AI PDB Anaconda scipy Quantize LaTeX Distillation 算法题 Shortcut Algorithm ResNet-50 tar NLTK Github FlashAttention 净利润 版权 第一性原理 RAR Interview icon FastAPI Proxy Bin ChatGPT Base64 腾讯云 继承 多进程 强化学习 CTC 关于博主 Safetensors Vim Paper tqdm SAM UI DeepStream Freesound Jetson VGG-16 UNIX
站点统计

本站现有博文326篇,共被浏览825182

本站已经建立2531天!

热门文章
文章归档
回到顶部