EADST

llama.cpp: Efficient 6-bit Data Packing in an 8-bit Array

This code snippet, adapted from llama.cpp by ggerganov, demonstrates a method for efficiently packing 6-bit values into an 8-bit uint8 array. It involves scaling, clamping, and bitwise manipulation to optimize or compress data, suitable for specific processing or hardware requirements.

// Initialize inverse scale factor with a fixed scaling offset and the maximum scale value.
float iscale = -32.f/max_scale;
// QK_K = 256. Iterate over a subset of the scales array, determined by QK_K divided by 16.
for (int j = 0; j < QK_K/16; ++j) {
    // Scale and round the j-th element of the scales array to the nearest integer.
    int8_t l = nearest_int(iscale * scales[j]);

    // Clamp the value of l to the range [-32, 31] and normalize it to [0, 63].
    l = MAX(-32, MIN(31, l)) + 32;

    // Store the 0-7th scale lower 4 bits of l in y[i].scales if in the first half of the loop.
    if (j < 8) {
        y[i].scales[j] = l & 0xF;
    } 
    // In the second half, store the 8-15th scale lower 4 bits of l into the higher 4 bits of y[i].scales at j-8.
    else {
        y[i].scales[j-8] |= ((l & 0xF) << 4);
    }

    // Shift the higher 4 bits of l to the lower positions.
    l >>= 4;

    // Calculate the index for storing the lower 2 bits(previous l 2 higher bits) of the shifted l and store them in y[i].scales.
    // The specific position in the array is determined by a combination of modulo and division operations.
    y[i].scales[j % 4 + 8] |= (l << (2 * (j / 4)));
}

The key aspects of this code include:

  • Scaling and Normalization: Adjusts the data values to a suitable range for bit manipulation.
  • Bitwise Operations: Utilizes masking (&), shifting (<<, >>), and bitwise OR (|=) to pack data efficiently.
  • Data Optimization: The method packs data into a smaller space, allowing for efficient use of memory and potentially faster processing.

This approach is particularly useful in scenarios where memory optimization is crucial, such as in embedded systems or when dealing with large datasets.

相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
Plate CTC LLAMA SQLite CLAP Django Bin Permission Rebuttal GPTQ Pillow ONNX 版权 Vim API 证件照 GPT4 OpenAI llama.cpp 多进程 腾讯云 Mixtral RGB 净利润 torchinfo JSON Firewall 云服务器 Tiktoken Card LLM TSV logger Markdown 递归学习法 ModelScope Food DeepStream Sklearn 第一性原理 Streamlit HaggingFace NLTK LaTeX ResNet-50 Distillation LeetCode TensorRT Shortcut XML Color Proxy Hilton git-lfs v0.dev Bert Git Cloudreve Math InvalidArgumentError Base64 Linux 公式 论文速读 Safetensors Jupyter tqdm git 报税 图形思考法 Clash Paddle CUDA Plotly Baidu Vmess Datetime 域名 XGBoost MD5 Bipartite 财报 FP16 阿里云 Data Interview Agent 搞笑 diffusers Statistics Google CC SVR uWSGI TensorFlow BeautifulSoup Windows RAR Excel YOLO mmap FP64 Llama Zip Disk Tracking Augmentation 音频 图标 Hungarian SPIE VGG-16 PyCharm Claude Random Website transformers Quantize CEIR FastAPI Gemma HuggingFace VPN Animate BTC UNIX hf C++ Video WebCrawler 关于博主 EXCEL icon PDB FP32 Freesound OpenCV Pickle BF16 Knowledge Miniforge CAM 继承 COCO 强化学习 Search GIT Jetson Bitcoin Pandas 飞书 Crawler News Qwen SAM Attention IndexTTS2 Hotel 算法题 Diagram Pytorch CSV Use printf 多线程 签证 Michelin Nginx Review FP8 Logo SQL Conda Template tar NameSilo Qwen2.5 Ptyhon uwsgi GoogLeNet VSCode Heatmap Paper OCR v2ray NLP Dataset PIP Python Transformers Numpy Docker Image2Text GGML DeepSeek TTS Translation Algorithm Input PDF scipy WAN Qwen2 FlashAttention LoRA UI ChatGPT Github Land Quantization QWEN Tensor CV PyTorch AI Ubuntu Domain Breakpoint Web Anaconda 顶会 Magnet Password
站点统计

本站现有博文326篇,共被浏览825011

本站已经建立2530天!

热门文章
文章归档
回到顶部