EADST

llama.cpp: Efficient 6-bit Data Packing in an 8-bit Array

This code snippet, adapted from llama.cpp by ggerganov, demonstrates a method for efficiently packing 6-bit values into an 8-bit uint8 array. It involves scaling, clamping, and bitwise manipulation to optimize or compress data, suitable for specific processing or hardware requirements.

// Initialize inverse scale factor with a fixed scaling offset and the maximum scale value.
float iscale = -32.f/max_scale;
// QK_K = 256. Iterate over a subset of the scales array, determined by QK_K divided by 16.
for (int j = 0; j < QK_K/16; ++j) {
    // Scale and round the j-th element of the scales array to the nearest integer.
    int8_t l = nearest_int(iscale * scales[j]);

    // Clamp the value of l to the range [-32, 31] and normalize it to [0, 63].
    l = MAX(-32, MIN(31, l)) + 32;

    // Store the 0-7th scale lower 4 bits of l in y[i].scales if in the first half of the loop.
    if (j < 8) {
        y[i].scales[j] = l & 0xF;
    } 
    // In the second half, store the 8-15th scale lower 4 bits of l into the higher 4 bits of y[i].scales at j-8.
    else {
        y[i].scales[j-8] |= ((l & 0xF) << 4);
    }

    // Shift the higher 4 bits of l to the lower positions.
    l >>= 4;

    // Calculate the index for storing the lower 2 bits(previous l 2 higher bits) of the shifted l and store them in y[i].scales.
    // The specific position in the array is determined by a combination of modulo and division operations.
    y[i].scales[j % 4 + 8] |= (l << (2 * (j / 4)));
}

The key aspects of this code include:

  • Scaling and Normalization: Adjusts the data values to a suitable range for bit manipulation.
  • Bitwise Operations: Utilizes masking (&), shifting (<<, >>), and bitwise OR (|=) to pack data efficiently.
  • Data Optimization: The method packs data into a smaller space, allowing for efficient use of memory and potentially faster processing.

This approach is particularly useful in scenarios where memory optimization is crucial, such as in embedded systems or when dealing with large datasets.

相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
llama.cpp Tensor Windows Ubuntu Linux Proxy CAM 财报 BeautifulSoup PyTorch Ptyhon CLAP Nginx 递归学习法 Gemma Food Hotel uWSGI ChatGPT Permission FP16 GPT4 Llama FP32 Video SAM Bert Paddle Knowledge QWEN DeepStream VSCode Diagram Quantization 报税 签证 BF16 CV MD5 RGB Dataset Input 阿里云 搞笑 Template Datetime VGG-16 Base64 Django Data CTC hf 图形思考法 HuggingFace Breakpoint Algorithm tqdm Mixtral transformers CUDA Statistics diffusers Search Land ModelScope git YOLO Claude IndexTTS2 净利润 Transformers SQLite Bipartite Jetson Hungarian Anaconda GPTQ Michelin PIP Freesound Docker Plate Jupyter Miniforge FlashAttention OpenAI uwsgi printf Qwen2.5 UI VPN GGML FP64 TSV scipy COCO Translation Firewall Github Agent Distillation SPIE Cloudreve PDB Qwen2 News API HaggingFace Website Domain DeepSeek Streamlit Augmentation NLP TTS Magnet v0.dev 多进程 JSON Tracking Quantize 顶会 NLTK LLAMA 域名 CEIR Plotly XGBoost 第一性原理 BTC Python Web 证件照 腾讯云 音频 Pytorch Pickle Vim CC 多线程 Conda PyCharm GIT Pandas 公式 飞书 Bin Sklearn Baidu git-lfs 关于博主 Logo SVR InvalidArgumentError GoogLeNet Numpy LeetCode Markdown Bitcoin OpenCV Tiktoken Heatmap UNIX 继承 版权 TensorRT LLM torchinfo Excel Interview 强化学习 WebCrawler Random LoRA Safetensors Shortcut Use Password logger Card EXCEL LaTeX FP8 Git ResNet-50 PDF 算法题 Hilton Attention Clash Zip v2ray Google Paper Disk WAN TensorFlow C++ Qwen ONNX Crawler mmap tar SQL Pillow OCR Math Review RAR Color Animate Image2Text CSV Vmess FastAPI AI NameSilo XML
站点统计

本站现有博文320篇,共被浏览759187

本站已经建立2427天!

热门文章
文章归档
回到顶部