EADST

llama.cpp: Efficient 6-bit Data Packing in an 8-bit Array

This code snippet, adapted from llama.cpp by ggerganov, demonstrates a method for efficiently packing 6-bit values into an 8-bit uint8 array. It involves scaling, clamping, and bitwise manipulation to optimize or compress data, suitable for specific processing or hardware requirements.

// Initialize inverse scale factor with a fixed scaling offset and the maximum scale value.
float iscale = -32.f/max_scale;
// QK_K = 256. Iterate over a subset of the scales array, determined by QK_K divided by 16.
for (int j = 0; j < QK_K/16; ++j) {
    // Scale and round the j-th element of the scales array to the nearest integer.
    int8_t l = nearest_int(iscale * scales[j]);

    // Clamp the value of l to the range [-32, 31] and normalize it to [0, 63].
    l = MAX(-32, MIN(31, l)) + 32;

    // Store the 0-7th scale lower 4 bits of l in y[i].scales if in the first half of the loop.
    if (j < 8) {
        y[i].scales[j] = l & 0xF;
    } 
    // In the second half, store the 8-15th scale lower 4 bits of l into the higher 4 bits of y[i].scales at j-8.
    else {
        y[i].scales[j-8] |= ((l & 0xF) << 4);
    }

    // Shift the higher 4 bits of l to the lower positions.
    l >>= 4;

    // Calculate the index for storing the lower 2 bits(previous l 2 higher bits) of the shifted l and store them in y[i].scales.
    // The specific position in the array is determined by a combination of modulo and division operations.
    y[i].scales[j % 4 + 8] |= (l << (2 * (j / 4)));
}

The key aspects of this code include:

  • Scaling and Normalization: Adjusts the data values to a suitable range for bit manipulation.
  • Bitwise Operations: Utilizes masking (&), shifting (<<, >>), and bitwise OR (|=) to pack data efficiently.
  • Data Optimization: The method packs data into a smaller space, allowing for efficient use of memory and potentially faster processing.

This approach is particularly useful in scenarios where memory optimization is crucial, such as in embedded systems or when dealing with large datasets.

相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
云服务器 Quantization Django CSV EXCEL RGB Docker YOLO Pandas Plotly Miniforge Tracking FastAPI Nginx Color 公式 NLP 阿里云 GPT4 SPIE SQL 净利润 NameSilo CEIR Land FP64 第一性原理 GGML FlashAttention LLM Distillation XGBoost Use DeepStream Conda Jupyter BTC Hilton CLAP Food git TSV CUDA Anaconda Zip C++ Shortcut Llama diffusers 签证 TTS Gemma 飞书 uwsgi HaggingFace Math OCR Breakpoint Video Paddle 音频 torchinfo SAM Excel printf NLTK Quantize CTC BF16 Website OpenAI Plate Bert Disk Michelin WAN llama.cpp VPN Proxy LLAMA Statistics Hungarian Git Interview tar LaTeX Review Password Tensor GoogLeNet Clash HuggingFace transformers PyCharm Github Input git-lfs 多进程 scipy CC 财报 XML SQLite GIT Claude Diagram Qwen FP8 Translation Firewall 证件照 DeepSeek Card CV Permission Heatmap Knowledge Algorithm BeautifulSoup 腾讯云 Bipartite FP32 Google GPTQ Safetensors TensorFlow Web QWEN Sklearn VSCode Vim Search API Freesound InvalidArgumentError PDB FP16 ResNet-50 继承 Bitcoin 递归学习法 v2ray Data UI OpenCV logger Mixtral Linux ChatGPT ONNX Paper RAR IndexTTS2 图标 Pickle Tiktoken Domain JSON 算法题 mmap Base64 PIP Numpy Qwen2 Dataset 顶会 图形思考法 LoRA COCO UNIX SVR CAM AI Magnet Vmess Cloudreve MD5 域名 Crawler Streamlit 强化学习 tqdm Logo Markdown Pytorch hf Windows Ubuntu Augmentation uWSGI 关于博主 v0.dev Datetime Ptyhon Rebuttal Random Jetson Python Transformers 版权 Attention Template 搞笑 icon ModelScope Image2Text 多线程 报税 Hotel Bin Agent VGG-16 Baidu PyTorch Pillow Animate Qwen2.5 PDF TensorRT WebCrawler News LeetCode
站点统计

本站现有博文324篇,共被浏览812418

本站已经建立2516天!

热门文章
文章归档
回到顶部