EADST

llama.cpp: Efficient 6-bit Data Packing in an 8-bit Array

This code snippet, adapted from llama.cpp by ggerganov, demonstrates a method for efficiently packing 6-bit values into an 8-bit uint8 array. It involves scaling, clamping, and bitwise manipulation to optimize or compress data, suitable for specific processing or hardware requirements.

// Initialize inverse scale factor with a fixed scaling offset and the maximum scale value.
float iscale = -32.f/max_scale;
// QK_K = 256. Iterate over a subset of the scales array, determined by QK_K divided by 16.
for (int j = 0; j < QK_K/16; ++j) {
    // Scale and round the j-th element of the scales array to the nearest integer.
    int8_t l = nearest_int(iscale * scales[j]);

    // Clamp the value of l to the range [-32, 31] and normalize it to [0, 63].
    l = MAX(-32, MIN(31, l)) + 32;

    // Store the 0-7th scale lower 4 bits of l in y[i].scales if in the first half of the loop.
    if (j < 8) {
        y[i].scales[j] = l & 0xF;
    } 
    // In the second half, store the 8-15th scale lower 4 bits of l into the higher 4 bits of y[i].scales at j-8.
    else {
        y[i].scales[j-8] |= ((l & 0xF) << 4);
    }

    // Shift the higher 4 bits of l to the lower positions.
    l >>= 4;

    // Calculate the index for storing the lower 2 bits(previous l 2 higher bits) of the shifted l and store them in y[i].scales.
    // The specific position in the array is determined by a combination of modulo and division operations.
    y[i].scales[j % 4 + 8] |= (l << (2 * (j / 4)));
}

The key aspects of this code include:

  • Scaling and Normalization: Adjusts the data values to a suitable range for bit manipulation.
  • Bitwise Operations: Utilizes masking (&), shifting (<<, >>), and bitwise OR (|=) to pack data efficiently.
  • Data Optimization: The method packs data into a smaller space, allowing for efficient use of memory and potentially faster processing.

This approach is particularly useful in scenarios where memory optimization is crucial, such as in embedded systems or when dealing with large datasets.

相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
Input COCO v2ray Review Quantization Gemma PyTorch hf CC FP16 ModelScope TTS Food 第一性原理 LLAMA Paddle 财报 CTC Vmess Plate transformers 强化学习 Heatmap Hilton VPN Docker torchinfo Django Bitcoin Video 云服务器 Numpy Algorithm Pillow 音频 Ptyhon 顶会 v0.dev Llama 证件照 InvalidArgumentError Bipartite RAR Data Breakpoint Knowledge LLM Firewall Michelin Disk Nginx OCR 算法题 Proxy Excel 域名 WebCrawler Shortcut TSV Github LeetCode CSV NLP BF16 git-lfs 签证 DeepSeek BeautifulSoup Attention Clash QWEN Claude Pickle Qwen2.5 CLAP AI ONNX VGG-16 BTC Cloudreve Interview Color Math Use Tracking YOLO Bert TensorRT diffusers 净利润 RGB Card XGBoost Quantize GGML Augmentation CV Animate IndexTTS2 ChatGPT PyCharm 多进程 CUDA Bin tqdm 阿里云 Conda Hungarian SVR XML Diagram 递归学习法 NLTK Web Google SQLite SQL GIT 版权 Template printf Dataset Streamlit LoRA Freesound Magnet EXCEL Sklearn Git 继承 Hotel MD5 Pytorch Transformers FP8 JSON tar Qwen API Plotly 飞书 Baidu FP64 GPT4 Mixtral Password 报税 TensorFlow Vim Pandas PDB ResNet-50 FlashAttention Miniforge git Agent UI 图形思考法 Land Datetime News 公式 CAM Jetson OpenAI Zip Random HuggingFace PIP GPTQ scipy Translation Windows VSCode 多线程 Distillation HaggingFace 搞笑 Linux 关于博主 UNIX Base64 mmap CEIR Search Statistics uWSGI NameSilo GoogLeNet Website uwsgi SPIE llama.cpp C++ DeepStream Tiktoken Ubuntu LaTeX PDF Paper SAM Logo 腾讯云 Python Anaconda FastAPI OpenCV Markdown Crawler Permission Safetensors FP32 WAN Image2Text Jupyter logger Tensor Qwen2 Domain
站点统计

本站现有博文321篇,共被浏览779075

本站已经建立2470天!

热门文章
文章归档
回到顶部