EADST

QWEN7B to LLAMA7B Model Structure

Here is the markdown format for the LLAMA7B model structure, detailing each layer and component:


LLAMA7B Model Structure

The LLAMA7B model consists of the following layers and components:

Embedding Layer

  • model.embed_tokens.weight: torch.Size([151851, 4096])

Layers

Each layer in the model has the following components:

Layer 0 to Layer 31

Each layer (model.layers.[0-31]) includes:

  • input_layernorm.weight: torch.Size([4096])

  • Self-Attention Sublayer:

    • q_proj.weight: torch.Size([4096, 4096])

    • k_proj.weight: torch.Size([4096, 4096])

    • v_proj.weight: torch.Size([4096, 4096])

    • q_proj.bias: torch.Size([4096])

    • k_proj.bias: torch.Size([4096])

    • v_proj.bias: torch.Size([4096])

    • o_proj.weight: torch.Size([4096, 4096])

    • post_attention_layernorm.weight: torch.Size([4096])

  • MLP (Multi-Layer Perceptron) Sublayer:

    • up_proj.weight: torch.Size([11008, 4096])

    • gate_proj.weight: torch.Size([11008, 4096])

    • down_proj.weight: torch.Size([4096, 11008])

Final Layer Normalization and Output

  • model.norm.weight: torch.Size([4096])
  • lm_head.weight: torch.Size([151851, 4096])
相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
Review Tensor SAM Google Python CUDA Web Distillation 关于博主 TSV LeetCode Linux Baidu BeautifulSoup Numpy Pillow Color SVR Hungarian OpenAI Domain VPN Pytorch transformers WebCrawler COCO git-lfs mmap NameSilo GPT4 Interview CEIR 财报 Bipartite EXCEL Base64 uwsgi Math Transformers BTC logger YOLO Heatmap 算法题 Food Pickle tqdm Paddle 飞书 XML Logo Plotly Attention LLM CLAP CTC FP16 Augmentation llama.cpp Pandas Hotel PyTorch AI Docker 公式 版权 OCR Diagram Tracking NLP Bitcoin Vim LLAMA git Safetensors BF16 LaTeX UI printf XGBoost ONNX Ubuntu DeepStream 净利润 diffusers Data IndexTTS2 阿里云 Random TensorFlow Michelin 报税 RAR Nginx Django ResNet-50 音频 GPTQ Breakpoint C++ FP8 QWEN SQL Ptyhon Permission Sklearn Vmess TensorRT ModelScope 证件照 SQLite Windows Land Excel NLTK 搞笑 Knowledge FP64 PDF CC ChatGPT Magnet torchinfo Claude Template Streamlit Cloudreve 多线程 Quantization HaggingFace CAM Conda MD5 Clash Miniforge Markdown Qwen 域名 Website Gemma Quantize Image2Text Animate tar API Bert JSON WAN VGG-16 Firewall scipy PyCharm RGB Algorithm v0.dev v2ray Plate Github Video 视频信息 继承 Disk Zip FastAPI Input Anaconda OpenCV CV Bin GIT VSCode FlashAttention Datetime Qwen2.5 GoogLeNet Password LoRA Tiktoken uWSGI Statistics Mixtral Proxy Jetson hf Qwen2 GGML Jupyter InvalidArgumentError 腾讯云 签证 UNIX FP32 PIP Freesound SPIE Dataset Llama Translation 多进程 DeepSeek Hilton Crawler Git CSV PDB Card TTS Use Shortcut HuggingFace Paper
站点统计

本站现有博文311篇,共被浏览740384

本站已经建立2378天!

热门文章
文章归档
回到顶部