EADST

llama.cpp: Efficient 6-bit Data Packing in an 8-bit Array

This code snippet, adapted from llama.cpp by ggerganov, demonstrates a method for efficiently packing 6-bit values into an 8-bit uint8 array. It involves scaling, clamping, and bitwise manipulation to optimize or compress data, suitable for specific processing or hardware requirements.

// Initialize inverse scale factor with a fixed scaling offset and the maximum scale value.
float iscale = -32.f/max_scale;
// QK_K = 256. Iterate over a subset of the scales array, determined by QK_K divided by 16.
for (int j = 0; j < QK_K/16; ++j) {
    // Scale and round the j-th element of the scales array to the nearest integer.
    int8_t l = nearest_int(iscale * scales[j]);

    // Clamp the value of l to the range [-32, 31] and normalize it to [0, 63].
    l = MAX(-32, MIN(31, l)) + 32;

    // Store the 0-7th scale lower 4 bits of l in y[i].scales if in the first half of the loop.
    if (j < 8) {
        y[i].scales[j] = l & 0xF;
    } 
    // In the second half, store the 8-15th scale lower 4 bits of l into the higher 4 bits of y[i].scales at j-8.
    else {
        y[i].scales[j-8] |= ((l & 0xF) << 4);
    }

    // Shift the higher 4 bits of l to the lower positions.
    l >>= 4;

    // Calculate the index for storing the lower 2 bits(previous l 2 higher bits) of the shifted l and store them in y[i].scales.
    // The specific position in the array is determined by a combination of modulo and division operations.
    y[i].scales[j % 4 + 8] |= (l << (2 * (j / 4)));
}

The key aspects of this code include:

  • Scaling and Normalization: Adjusts the data values to a suitable range for bit manipulation.
  • Bitwise Operations: Utilizes masking (&), shifting (<<, >>), and bitwise OR (|=) to pack data efficiently.
  • Data Optimization: The method packs data into a smaller space, allowing for efficient use of memory and potentially faster processing.

This approach is particularly useful in scenarios where memory optimization is crucial, such as in embedded systems or when dealing with large datasets.

相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
LeetCode Heatmap Disk Google SAM Bitcoin Domain Food PyTorch Conda transformers FP32 Web 算法题 Tensor scipy Algorithm LaTeX Knowledge Review 多进程 VGG-16 TensorFlow Paddle COCO Video 多线程 Sklearn Qwen2 Augmentation CLAP Quantization NameSilo Diagram Statistics mmap Numpy 证件照 Ubuntu Transformers LLAMA CV Input SQL TTS CC Tracking UNIX Paper SPIE BeautifulSoup Pillow Bin Breakpoint Permission C++ diffusers v0.dev 净利润 ChatGPT Github llama.cpp Pickle SQLite Jupyter 报税 Crawler ModelScope OpenCV Quantize Pandas GPT4 Bipartite tar Card 签证 腾讯云 BTC Git Shortcut Hotel CSV Datetime PyCharm YOLO Template CAM Plotly 财报 Website SVR 音频 Random DeepStream ResNet-50 Streamlit TensorRT Excel 关于博主 Python Nginx ONNX Translation FP64 PDF VSCode CTC Llama Safetensors 域名 Linux NLP 阿里云 公式 Docker AI 版权 XML Math Dataset WAN Freesound HaggingFace CEIR Tiktoken JSON OpenAI TSV Gemma NLTK UI Color HuggingFace FastAPI Password QWEN IndexTTS2 logger uWSGI 继承 FP8 Clash OCR Windows EXCEL API Animate v2ray Markdown Anaconda Hilton InvalidArgumentError uwsgi Vim GIT Cloudreve PIP Magnet LoRA VPN LLM Interview Claude WebCrawler Base64 Michelin Land hf printf Jetson Bert Vmess Distillation FlashAttention XGBoost GoogLeNet 搞笑 CUDA Plate RGB Ptyhon Qwen2.5 git-lfs Pytorch torchinfo MD5 Image2Text Django DeepSeek Data Firewall GPTQ 飞书 Proxy Miniforge Baidu Hungarian Logo PDB FP16 Zip Qwen tqdm Use RAR BF16 Mixtral GGML Attention git
站点统计

本站现有博文311篇,共被浏览741852

本站已经建立2380天!

热门文章
文章归档
回到顶部