EADST

QWEN7B to LLAMA7B Model Structure

Here is the markdown format for the LLAMA7B model structure, detailing each layer and component:


LLAMA7B Model Structure

The LLAMA7B model consists of the following layers and components:

Embedding Layer

  • model.embed_tokens.weight: torch.Size([151851, 4096])

Layers

Each layer in the model has the following components:

Layer 0 to Layer 31

Each layer (model.layers.[0-31]) includes:

  • input_layernorm.weight: torch.Size([4096])

  • Self-Attention Sublayer:

    • q_proj.weight: torch.Size([4096, 4096])

    • k_proj.weight: torch.Size([4096, 4096])

    • v_proj.weight: torch.Size([4096, 4096])

    • q_proj.bias: torch.Size([4096])

    • k_proj.bias: torch.Size([4096])

    • v_proj.bias: torch.Size([4096])

    • o_proj.weight: torch.Size([4096, 4096])

    • post_attention_layernorm.weight: torch.Size([4096])

  • MLP (Multi-Layer Perceptron) Sublayer:

    • up_proj.weight: torch.Size([11008, 4096])

    • gate_proj.weight: torch.Size([11008, 4096])

    • down_proj.weight: torch.Size([4096, 11008])

Final Layer Normalization and Output

  • model.norm.weight: torch.Size([4096])
  • lm_head.weight: torch.Size([151851, 4096])
相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
Plate GIT Markdown News Shortcut Jetson GPTQ IndexTTS2 Search Anaconda FP16 Excel Review Website Interview MD5 v2ray 多线程 域名 WebCrawler VSCode torchinfo VGG-16 Heatmap 搞笑 第一性原理 uWSGI 算法题 递归学习法 Windows CEIR LLM Food Qwen 顶会 EXCEL mmap 关于博主 LLAMA Color Datetime Random Streamlit 报税 diffusers ChatGPT PDB Dataset VPN 签证 Pickle Freesound hf InvalidArgumentError ONNX WAN CTC 多进程 Statistics Nginx 继承 Algorithm ModelScope GoogLeNet SAM NameSilo LoRA Hungarian OCR Cloudreve Ptyhon Qwen2.5 阿里云 Quantization Gemma Git SQLite Bert CSV Google tqdm printf Claude 版权 Llama Breakpoint 腾讯云 Crawler QWEN API Card Data PyTorch Clash UI PyCharm BTC NLTK Web FP8 YOLO Vmess Paddle Linux 音频 Jupyter 证件照 Password Proxy XGBoost HuggingFace BF16 NLP 净利润 Bitcoin ResNet-50 Agent Firewall Tiktoken 飞书 AI Baidu Attention Use 强化学习 git-lfs GPT4 transformers XML Magnet 图形思考法 Translation UNIX LeetCode uwsgi v0.dev 公式 SVR Qwen2 Augmentation Image2Text DeepStream Logo LaTeX TSV Michelin GGML JSON 财报 Zip Template Conda PDF Math Transformers Distillation PIP logger Miniforge git RGB Hilton Pillow scipy FastAPI CLAP COCO TensorFlow Hotel CUDA Python Github Knowledge Pytorch Quantize Animate Base64 Ubuntu Disk Bipartite CC FP32 Tensor FlashAttention CAM tar DeepSeek Diagram Numpy FP64 Land Permission OpenAI BeautifulSoup TensorRT Sklearn C++ TTS Pandas Docker Safetensors Bin CV Vim SQL Domain RAR Django Plotly HaggingFace Tracking Input Paper llama.cpp Mixtral Video OpenCV SPIE
站点统计

本站现有博文320篇,共被浏览756548

本站已经建立2420天!

热门文章
文章归档
回到顶部