EADST

llama.cpp: Efficient 6-bit Data Packing in an 8-bit Array

This code snippet, adapted from llama.cpp by ggerganov, demonstrates a method for efficiently packing 6-bit values into an 8-bit uint8 array. It involves scaling, clamping, and bitwise manipulation to optimize or compress data, suitable for specific processing or hardware requirements.

// Initialize inverse scale factor with a fixed scaling offset and the maximum scale value.
float iscale = -32.f/max_scale;
// QK_K = 256. Iterate over a subset of the scales array, determined by QK_K divided by 16.
for (int j = 0; j < QK_K/16; ++j) {
    // Scale and round the j-th element of the scales array to the nearest integer.
    int8_t l = nearest_int(iscale * scales[j]);

    // Clamp the value of l to the range [-32, 31] and normalize it to [0, 63].
    l = MAX(-32, MIN(31, l)) + 32;

    // Store the 0-7th scale lower 4 bits of l in y[i].scales if in the first half of the loop.
    if (j < 8) {
        y[i].scales[j] = l & 0xF;
    } 
    // In the second half, store the 8-15th scale lower 4 bits of l into the higher 4 bits of y[i].scales at j-8.
    else {
        y[i].scales[j-8] |= ((l & 0xF) << 4);
    }

    // Shift the higher 4 bits of l to the lower positions.
    l >>= 4;

    // Calculate the index for storing the lower 2 bits(previous l 2 higher bits) of the shifted l and store them in y[i].scales.
    // The specific position in the array is determined by a combination of modulo and division operations.
    y[i].scales[j % 4 + 8] |= (l << (2 * (j / 4)));
}

The key aspects of this code include:

  • Scaling and Normalization: Adjusts the data values to a suitable range for bit manipulation.
  • Bitwise Operations: Utilizes masking (&), shifting (<<, >>), and bitwise OR (|=) to pack data efficiently.
  • Data Optimization: The method packs data into a smaller space, allowing for efficient use of memory and potentially faster processing.

This approach is particularly useful in scenarios where memory optimization is crucial, such as in embedded systems or when dealing with large datasets.

相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
Claude YOLO CAM Attention Freesound Zip FP16 Food TensorFlow printf Disk Pickle BF16 Tracking Anaconda Django Ptyhon Bipartite HaggingFace Pillow Streamlit Web 签证 音频 GPTQ Vim Crawler LLM FP32 icon LLAMA Sklearn 继承 News Pandas TensorRT Transformers DeepSeek llama.cpp Quantization Card VGG-16 OpenAI TTS Input FP64 SVR tqdm Markdown Password Github 域名 Random CLAP Data RGB Search Ubuntu Translation CEIR DeepStream Safetensors Llama CTC LoRA ResNet-50 Bert Video Agent Gemma Proxy ONNX Python Nginx Hilton Land GoogLeNet BeautifulSoup Augmentation Hungarian COCO XML Logo Breakpoint Numpy Paddle FlashAttention uWSGI GIT OpenCV GGML 腾讯云 证件照 API SPIE 多线程 Git tar Domain SAM NLP HuggingFace Miniforge Knowledge Shortcut LaTeX VPN Tiktoken 图形思考法 CUDA Algorithm Mixtral Bitcoin Cloudreve scipy Qwen PyCharm Hotel AI Baidu 公式 报税 VSCode 搞笑 Heatmap Website 第一性原理 Bin Michelin torchinfo IndexTTS2 git-lfs Tensor Distillation Base64 InvalidArgumentError RAR PDF ChatGPT Qwen2 Pytorch UNIX Docker transformers 云服务器 净利润 diffusers 图标 mmap logger Dataset OCR 算法题 Color Permission Rebuttal Math Use WebCrawler Magnet 财报 版权 NLTK git Jupyter Paper MD5 PDB hf Google Excel PyTorch FP8 SQLite 递归学习法 Windows BTC Interview v0.dev FastAPI CV v2ray 关于博主 uwsgi XGBoost 顶会 CC Plotly Template Datetime Quantize Qwen2.5 WAN Conda 强化学习 Review Diagram Statistics JSON TSV SQL NameSilo LeetCode C++ PIP Clash 阿里云 EXCEL Firewall ModelScope Image2Text Linux Plate QWEN Jetson 飞书 GPT4 多进程 Vmess Animate CSV UI
站点统计

本站现有博文323篇,共被浏览796628

本站已经建立2494天!

热门文章
文章归档
回到顶部