EADST

llama.cpp: Efficient 6-bit Data Packing in an 8-bit Array

This code snippet, adapted from llama.cpp by ggerganov, demonstrates a method for efficiently packing 6-bit values into an 8-bit uint8 array. It involves scaling, clamping, and bitwise manipulation to optimize or compress data, suitable for specific processing or hardware requirements.

// Initialize inverse scale factor with a fixed scaling offset and the maximum scale value.
float iscale = -32.f/max_scale;
// QK_K = 256. Iterate over a subset of the scales array, determined by QK_K divided by 16.
for (int j = 0; j < QK_K/16; ++j) {
    // Scale and round the j-th element of the scales array to the nearest integer.
    int8_t l = nearest_int(iscale * scales[j]);

    // Clamp the value of l to the range [-32, 31] and normalize it to [0, 63].
    l = MAX(-32, MIN(31, l)) + 32;

    // Store the 0-7th scale lower 4 bits of l in y[i].scales if in the first half of the loop.
    if (j < 8) {
        y[i].scales[j] = l & 0xF;
    } 
    // In the second half, store the 8-15th scale lower 4 bits of l into the higher 4 bits of y[i].scales at j-8.
    else {
        y[i].scales[j-8] |= ((l & 0xF) << 4);
    }

    // Shift the higher 4 bits of l to the lower positions.
    l >>= 4;

    // Calculate the index for storing the lower 2 bits(previous l 2 higher bits) of the shifted l and store them in y[i].scales.
    // The specific position in the array is determined by a combination of modulo and division operations.
    y[i].scales[j % 4 + 8] |= (l << (2 * (j / 4)));
}

The key aspects of this code include:

  • Scaling and Normalization: Adjusts the data values to a suitable range for bit manipulation.
  • Bitwise Operations: Utilizes masking (&), shifting (<<, >>), and bitwise OR (|=) to pack data efficiently.
  • Data Optimization: The method packs data into a smaller space, allowing for efficient use of memory and potentially faster processing.

This approach is particularly useful in scenarios where memory optimization is crucial, such as in embedded systems or when dealing with large datasets.

相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
git-lfs mmap Github Git transformers 视频信息 CUDA Pickle Logo Heatmap FastAPI 搞笑 Firewall Plate XML Miniforge FlashAttention GPT4 uwsgi CC Bitcoin Hungarian EXCEL v2ray LeetCode Conda 继承 RGB Web Jetson Gemma Base64 Vim LoRA NLP Jupyter XGBoost 多进程 OpenAI GoogLeNet Augmentation WAN VGG-16 算法题 BeautifulSoup Dataset Qwen2.5 WebCrawler Disk Card 版权 Color Quantize Knowledge Template Interview logger SPIE HuggingFace Land DeepStream Freesound Django Baidu ModelScope Nginx 净利润 Domain diffusers PDB GPTQ Breakpoint RAR Password Crawler Food API Claude QWEN git CLAP Video OpenCV Use UI YOLO OCR 阿里云 Math Random Transformers 报税 JSON Llama Mixtral Pandas LLAMA Magnet 公式 FP64 DeepSeek v0.dev Quantization 多线程 Vmess Ubuntu Michelin PyCharm scipy VSCode 关于博主 Python PDF Bert BF16 LLM ResNet-50 BTC Data Algorithm ChatGPT Zip AI Bipartite Translation PyTorch tqdm CEIR Windows 飞书 CV tar Ptyhon NameSilo hf Permission TensorFlow Sklearn SAM CAM Attention Tensor 财报 Tracking Website 域名 Numpy Bin SQL MD5 Markdown Pytorch CTC Pillow Paddle TTS VPN TSV Hotel Shortcut 腾讯云 uWSGI C++ CSV Hilton Docker Diagram Distillation LaTeX FP8 签证 Safetensors Statistics Plotly Clash Paper Cloudreve Qwen llama.cpp printf Image2Text SVR TensorRT Datetime FP32 Streamlit GGML UNIX HaggingFace Linux Input Proxy Qwen2 Tiktoken Review SQLite Excel GIT IndexTTS2 torchinfo Anaconda COCO PIP FP16 证件照 Google NLTK Animate 音频 ONNX InvalidArgumentError
站点统计

本站现有博文311篇,共被浏览739985

本站已经建立2376天!

热门文章
文章归档
回到顶部