EADST

QWEN7B to LLAMA7B Model Structure

Here is the markdown format for the LLAMA7B model structure, detailing each layer and component:


LLAMA7B Model Structure

The LLAMA7B model consists of the following layers and components:

Embedding Layer

  • model.embed_tokens.weight: torch.Size([151851, 4096])

Layers

Each layer in the model has the following components:

Layer 0 to Layer 31

Each layer (model.layers.[0-31]) includes:

  • input_layernorm.weight: torch.Size([4096])

  • Self-Attention Sublayer:

    • q_proj.weight: torch.Size([4096, 4096])

    • k_proj.weight: torch.Size([4096, 4096])

    • v_proj.weight: torch.Size([4096, 4096])

    • q_proj.bias: torch.Size([4096])

    • k_proj.bias: torch.Size([4096])

    • v_proj.bias: torch.Size([4096])

    • o_proj.weight: torch.Size([4096, 4096])

    • post_attention_layernorm.weight: torch.Size([4096])

  • MLP (Multi-Layer Perceptron) Sublayer:

    • up_proj.weight: torch.Size([11008, 4096])

    • gate_proj.weight: torch.Size([11008, 4096])

    • down_proj.weight: torch.Size([4096, 11008])

Final Layer Normalization and Output

  • model.norm.weight: torch.Size([4096])
  • lm_head.weight: torch.Size([151851, 4096])
相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
Pickle Plate WAN Datetime 域名 Dataset Animate Plotly CSV Cloudreve Statistics 报税 BTC scipy Qwen EXCEL 递归学习法 TSV NLP Augmentation Food PDF ResNet-50 Django NameSilo IndexTTS2 Miniforge git 腾讯云 HuggingFace Video Clash LaTeX Logo Pytorch Hungarian mmap FP32 diffusers Claude XML AI Zip Bin VSCode DeepSeek InvalidArgumentError Base64 搞笑 继承 LLAMA Bitcoin 多进程 transformers JSON Tensor Breakpoint Pillow 算法题 Ubuntu Image2Text Heatmap GPT4 Docker Github Hotel tqdm TensorRT logger UI ONNX Color PIP Land Git torchinfo Bipartite Safetensors Llama ModelScope 阿里云 Anaconda Translation Qwen2 CUDA Pandas Quantization printf SPIE 关于博主 Firewall MD5 Agent API Domain Quantize Data FastAPI git-lfs hf Crawler LeetCode tar 净利润 FlashAttention HaggingFace OpenAI Distillation VGG-16 Disk NLTK Password Algorithm uwsgi CLAP DeepStream CAM Vmess COCO Michelin 公式 Math Qwen2.5 Tiktoken ChatGPT CEIR Magnet VPN LoRA Review Template 图形思考法 TTS CC Sklearn Excel FP64 Markdown FP16 Hilton GIT YOLO CTC Diagram TensorFlow Bert Conda 版权 飞书 Windows Jetson Interview OCR Python Baidu Use uWSGI Numpy Jupyter LLM Knowledge Nginx PDB v0.dev Google BeautifulSoup SVR QWEN llama.cpp Transformers GPTQ RAR 证件照 Web Permission PyCharm Website OpenCV Paper Tracking Ptyhon WebCrawler RGB UNIX 财报 Attention Gemma SAM Shortcut BF16 GoogLeNet 多线程 C++ SQLite Freesound 第一性原理 PyTorch 音频 XGBoost CV Random 签证 Streamlit Card Input Paddle GGML SQL Mixtral Linux v2ray Proxy FP8 Vim
站点统计

本站现有博文316篇,共被浏览748405

本站已经建立2399天!

热门文章
文章归档
回到顶部