EADST

QWEN7B to LLAMA7B Model Structure

Here is the markdown format for the LLAMA7B model structure, detailing each layer and component:


LLAMA7B Model Structure

The LLAMA7B model consists of the following layers and components:

Embedding Layer

  • model.embed_tokens.weight: torch.Size([151851, 4096])

Layers

Each layer in the model has the following components:

Layer 0 to Layer 31

Each layer (model.layers.[0-31]) includes:

  • input_layernorm.weight: torch.Size([4096])

  • Self-Attention Sublayer:

    • q_proj.weight: torch.Size([4096, 4096])

    • k_proj.weight: torch.Size([4096, 4096])

    • v_proj.weight: torch.Size([4096, 4096])

    • q_proj.bias: torch.Size([4096])

    • k_proj.bias: torch.Size([4096])

    • v_proj.bias: torch.Size([4096])

    • o_proj.weight: torch.Size([4096, 4096])

    • post_attention_layernorm.weight: torch.Size([4096])

  • MLP (Multi-Layer Perceptron) Sublayer:

    • up_proj.weight: torch.Size([11008, 4096])

    • gate_proj.weight: torch.Size([11008, 4096])

    • down_proj.weight: torch.Size([4096, 11008])

Final Layer Normalization and Output

  • model.norm.weight: torch.Size([4096])
  • lm_head.weight: torch.Size([151851, 4096])
相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
Knowledge 继承 TTS Algorithm XGBoost Attention Sklearn LaTeX 阿里云 COCO Ptyhon GIT Docker InvalidArgumentError YOLO VGG-16 XML WebCrawler uwsgi SQLite FlashAttention LeetCode Michelin CTC Pandas 财报 Cloudreve VSCode Image2Text Interview Mixtral Bitcoin Permission Windows Distillation Breakpoint Hilton Tensor PyTorch Domain Base64 证件照 AI hf OpenAI LLM 域名 Plate FP8 Magnet Template Vmess Qwen CLAP NameSilo LoRA NLP Crawler 报税 多进程 Excel Qwen2 Hotel git 飞书 FP64 Clash Land Bin OCR GPT4 VPN logger 搞笑 Freesound Tracking v0.dev Pillow transformers C++ Random Safetensors HuggingFace Numpy Heatmap PyCharm SPIE BF16 PIP Jupyter 公式 BTC Paddle Bipartite Datetime Proxy Dataset Food SQL Baidu Translation Bert Tiktoken RGB Logo Claude diffusers BeautifulSoup Linux Python Google Web DeepSeek 净利润 多线程 Zip Color printf GoogLeNet tqdm Transformers 腾讯云 FastAPI CAM Video 版权 ChatGPT tar TensorRT Gemma Git NLTK Jetson SVR 算法题 PDB UI CV Card FP16 Data Qwen2.5 Quantize CUDA GGML ModelScope Github DeepStream Disk mmap Pickle EXCEL IndexTTS2 Use HaggingFace uWSGI MD5 CC scipy Vim Quantization Review 音频 CEIR Statistics Miniforge QWEN Anaconda Ubuntu ResNet-50 GPTQ Shortcut Math Password TensorFlow Conda TSV Django OpenCV Markdown Website Hungarian Augmentation API 签证 UNIX Paper RAR 关于博主 WAN ONNX PDF Nginx Pytorch JSON Animate Plotly Streamlit v2ray CSV SAM 视频信息 git-lfs FP32 LLAMA Diagram Llama llama.cpp torchinfo Firewall Input
站点统计

本站现有博文311篇,共被浏览739987

本站已经建立2376天!

热门文章
文章归档
回到顶部