EADST

QWEN7B to LLAMA7B Model Structure

Here is the markdown format for the LLAMA7B model structure, detailing each layer and component:


LLAMA7B Model Structure

The LLAMA7B model consists of the following layers and components:

Embedding Layer

  • model.embed_tokens.weight: torch.Size([151851, 4096])

Layers

Each layer in the model has the following components:

Layer 0 to Layer 31

Each layer (model.layers.[0-31]) includes:

  • input_layernorm.weight: torch.Size([4096])

  • Self-Attention Sublayer:

    • q_proj.weight: torch.Size([4096, 4096])

    • k_proj.weight: torch.Size([4096, 4096])

    • v_proj.weight: torch.Size([4096, 4096])

    • q_proj.bias: torch.Size([4096])

    • k_proj.bias: torch.Size([4096])

    • v_proj.bias: torch.Size([4096])

    • o_proj.weight: torch.Size([4096, 4096])

    • post_attention_layernorm.weight: torch.Size([4096])

  • MLP (Multi-Layer Perceptron) Sublayer:

    • up_proj.weight: torch.Size([11008, 4096])

    • gate_proj.weight: torch.Size([11008, 4096])

    • down_proj.weight: torch.Size([4096, 11008])

Final Layer Normalization and Output

  • model.norm.weight: torch.Size([4096])
  • lm_head.weight: torch.Size([151851, 4096])
相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
Statistics WebCrawler Heatmap git CAM Tracking InvalidArgumentError News UI Tiktoken transformers Llama 域名 Conda Django SQLite Docker Data Bipartite 音频 Quantize Quantization ChatGPT 顶会 Random Cloudreve Python FlashAttention Numpy 强化学习 Mixtral v2ray Agent FP64 Land EXCEL uwsgi hf Video 算法题 CTC Animate Hilton v0.dev Git Dataset Shortcut LLAMA CC 财报 SQL YOLO SAM Bert NLP Miniforge COCO CLAP Base64 VSCode Plate Google Magnet tar OCR Template Bin 云服务器 Domain llama.cpp QWEN PIP Hotel uWSGI logger Ptyhon DeepSeek Github XML Qwen2.5 FastAPI Distillation LoRA Website FP32 阿里云 Excel WAN BeautifulSoup printf Clash Hungarian 证件照 Datetime OpenCV Michelin Review Jupyter GPTQ VPN Proxy Paddle 继承 净利润 Paper NLTK Translation Food DeepStream Tensor TSV 多进程 IndexTTS2 ResNet-50 Logo 关于博主 Safetensors Search Vim Augmentation FP16 Input GIT Attention Diagram RAR LLM Card Image2Text OpenAI CEIR 签证 BF16 ModelScope Crawler Jetson Sklearn API Linux Web Bitcoin Plotly PyCharm Breakpoint XGBoost Pillow Transformers Markdown Zip 多线程 Disk GoogLeNet tqdm Ubuntu NameSilo FP8 SPIE Qwen Knowledge Permission Password ONNX Algorithm CUDA RGB C++ LaTeX PyTorch scipy Pandas 第一性原理 CV Streamlit Vmess 递归学习法 Use CSV PDF Color Anaconda git-lfs 腾讯云 TensorRT VGG-16 HuggingFace HaggingFace UNIX JSON Freesound Nginx diffusers 搞笑 SVR MD5 Claude 公式 TTS 飞书 Firewall Pickle 版权 PDB GPT4 Math Baidu Interview GGML TensorFlow Pytorch AI 图形思考法 Qwen2 torchinfo LeetCode Windows mmap Gemma 报税 BTC
站点统计

本站现有博文321篇,共被浏览776136

本站已经建立2466天!

热门文章
文章归档
回到顶部