EADST

QWEN7B to LLAMA7B Model Structure

Here is the markdown format for the LLAMA7B model structure, detailing each layer and component:


LLAMA7B Model Structure

The LLAMA7B model consists of the following layers and components:

Embedding Layer

  • model.embed_tokens.weight: torch.Size([151851, 4096])

Layers

Each layer in the model has the following components:

Layer 0 to Layer 31

Each layer (model.layers.[0-31]) includes:

  • input_layernorm.weight: torch.Size([4096])

  • Self-Attention Sublayer:

    • q_proj.weight: torch.Size([4096, 4096])

    • k_proj.weight: torch.Size([4096, 4096])

    • v_proj.weight: torch.Size([4096, 4096])

    • q_proj.bias: torch.Size([4096])

    • k_proj.bias: torch.Size([4096])

    • v_proj.bias: torch.Size([4096])

    • o_proj.weight: torch.Size([4096, 4096])

    • post_attention_layernorm.weight: torch.Size([4096])

  • MLP (Multi-Layer Perceptron) Sublayer:

    • up_proj.weight: torch.Size([11008, 4096])

    • gate_proj.weight: torch.Size([11008, 4096])

    • down_proj.weight: torch.Size([4096, 11008])

Final Layer Normalization and Output

  • model.norm.weight: torch.Size([4096])
  • lm_head.weight: torch.Size([151851, 4096])
相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
音频 TensorFlow Jetson SVR Permission LaTeX LLAMA Ubuntu printf Paper Anaconda Algorithm icon COCO NameSilo FP16 阿里云 搞笑 Shortcut v2ray Base64 Use FastAPI C++ Firewall Heatmap Pandas CV Attention Nginx RAR Github AI VSCode Vmess OCR Translation API Qwen VPN GPT4 Knowledge Food BeautifulSoup Linux ms-swift Hotel TSV Qwen2.5 Card RL Bin Hungarian PDB ChatGPT Pillow diffusers Michelin Augmentation Freesound OpenCV Proxy Template CEIR PIP TTS Claude 图标 VGG-16 hf Safetensors GPTQ UNIX Mixtral Quantize Logo Domain Animate 顶会 Land GGML Hilton Web 论文 Gemma GoogLeNet XML YOLO DeepStream Vim ONNX Dataset 第一性原理 Qwen2 Cloudreve Crawler Jupyter Tensor Zip Datetime CC Transformers Bert WebCrawler 腾讯云 Video mmap UI 多进程 JSON CTC llama.cpp ResNet-50 Statistics Math 域名 Tiktoken CLAP uWSGI Clash Pickle XGBoost Django FP64 NLP Website v0.dev CAM Plate Input Bipartite 签证 Llama Password LLM 报税 Sklearn 财报 CSV 云服务器 CUDA scipy LoRA 算法题 公式 Diagram FP32 Distillation Plotly 飞书 QWEN Color GIT SPIE Streamlit 净利润 Review Interview HuggingFace Disk 论文速读 Quantization Pytorch 递归学习法 Magnet FP8 BTC 强化学习 InvalidArgumentError Random OpenAI git-lfs Docker SAM 证件照 关于博主 Rebuttal LeetCode SQL Google Bitcoin Search HaggingFace PyTorch TensorRT Baidu Numpy 版权 DeepSeek Excel Data transformers Markdown MD5 RGB BF16 News NLTK Windows Python Breakpoint Agent EXCEL Paddle Conda 继承 SQLite Tracking logger tar Image2Text torchinfo git tqdm Miniforge IndexTTS2 PDF FlashAttention 多线程 WAN Ptyhon ModelScope uwsgi PyCharm Git 图形思考法
站点统计

本站现有博文332篇,共被浏览869262

本站已经建立2578天!

热门文章
文章归档
回到顶部