EADST

llama.cpp: Efficient 6-bit Data Packing in an 8-bit Array

This code snippet, adapted from llama.cpp by ggerganov, demonstrates a method for efficiently packing 6-bit values into an 8-bit uint8 array. It involves scaling, clamping, and bitwise manipulation to optimize or compress data, suitable for specific processing or hardware requirements.

// Initialize inverse scale factor with a fixed scaling offset and the maximum scale value.
float iscale = -32.f/max_scale;
// QK_K = 256. Iterate over a subset of the scales array, determined by QK_K divided by 16.
for (int j = 0; j < QK_K/16; ++j) {
    // Scale and round the j-th element of the scales array to the nearest integer.
    int8_t l = nearest_int(iscale * scales[j]);

    // Clamp the value of l to the range [-32, 31] and normalize it to [0, 63].
    l = MAX(-32, MIN(31, l)) + 32;

    // Store the 0-7th scale lower 4 bits of l in y[i].scales if in the first half of the loop.
    if (j < 8) {
        y[i].scales[j] = l & 0xF;
    } 
    // In the second half, store the 8-15th scale lower 4 bits of l into the higher 4 bits of y[i].scales at j-8.
    else {
        y[i].scales[j-8] |= ((l & 0xF) << 4);
    }

    // Shift the higher 4 bits of l to the lower positions.
    l >>= 4;

    // Calculate the index for storing the lower 2 bits(previous l 2 higher bits) of the shifted l and store them in y[i].scales.
    // The specific position in the array is determined by a combination of modulo and division operations.
    y[i].scales[j % 4 + 8] |= (l << (2 * (j / 4)));
}

The key aspects of this code include:

  • Scaling and Normalization: Adjusts the data values to a suitable range for bit manipulation.
  • Bitwise Operations: Utilizes masking (&), shifting (<<, >>), and bitwise OR (|=) to pack data efficiently.
  • Data Optimization: The method packs data into a smaller space, allowing for efficient use of memory and potentially faster processing.

This approach is particularly useful in scenarios where memory optimization is crucial, such as in embedded systems or when dealing with large datasets.

相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
QWEN CSV Baidu GPTQ Sklearn v0.dev PDF Claude FlashAttention C++ Quantization Zip BTC Tensor Rebuttal Github 论文速读 Image2Text Diagram Clash API LLAMA Jetson 报税 GIT OCR Qwen2.5 Conda Color Pickle 第一性原理 Logo IndexTTS2 Transformers Vmess Video 图标 Statistics News Anaconda SVR Gemma SPIE Translation Bipartite CTC CEIR Pandas AI Michelin Interview Plotly Food Qwen2 Proxy Bert Vim Animate COCO 腾讯云 CUDA Tiktoken NLP Land tqdm ChatGPT 域名 Python tar DeepSeek Input Hungarian 净利润 TensorFlow mmap SQLite CV NameSilo InvalidArgumentError Domain LaTeX 论文 Use Crawler XGBoost hf XML Pillow SAM ModelScope DeepStream Windows Disk PyTorch Safetensors CAM Agent Breakpoint FP32 BF16 多进程 音频 Git LoRA Freesound Hotel Google Algorithm icon TTS TensorRT Permission VPN 版权 git-lfs Knowledge 搞笑 UNIX EXCEL Shortcut FP16 Dataset Llama Password Data Distillation Ptyhon Paper GPT4 Cloudreve transformers 继承 PDB ResNet-50 RAR Review Excel RGB VGG-16 uWSGI Tracking 多线程 printf GGML Heatmap 图形思考法 scipy Django BeautifulSoup OpenAI Plate diffusers PIP HaggingFace 证件照 llama.cpp FastAPI 算法题 Paddle 飞书 NLTK OpenCV CC Web Ubuntu Quantize UI 公式 Augmentation Mixtral torchinfo uwsgi YOLO Random 顶会 SQL 阿里云 Website HuggingFace Numpy Linux 强化学习 LLM Attention CLAP JSON MD5 WAN FP8 Base64 Bin 签证 Datetime v2ray Nginx TSV Docker Qwen Math Search 云服务器 关于博主 Card 递归学习法 ONNX WebCrawler Markdown Hilton Miniforge Template Bitcoin Pytorch GoogLeNet logger git LeetCode Jupyter 财报 PyCharm VSCode FP64 Streamlit Magnet Firewall
站点统计

本站现有博文328篇,共被浏览853508

本站已经建立2560天!

热门文章
文章归档
回到顶部