EADST

llama.cpp: Efficient 6-bit Data Packing in an 8-bit Array

This code snippet, adapted from llama.cpp by ggerganov, demonstrates a method for efficiently packing 6-bit values into an 8-bit uint8 array. It involves scaling, clamping, and bitwise manipulation to optimize or compress data, suitable for specific processing or hardware requirements.

// Initialize inverse scale factor with a fixed scaling offset and the maximum scale value.
float iscale = -32.f/max_scale;
// QK_K = 256. Iterate over a subset of the scales array, determined by QK_K divided by 16.
for (int j = 0; j < QK_K/16; ++j) {
    // Scale and round the j-th element of the scales array to the nearest integer.
    int8_t l = nearest_int(iscale * scales[j]);

    // Clamp the value of l to the range [-32, 31] and normalize it to [0, 63].
    l = MAX(-32, MIN(31, l)) + 32;

    // Store the 0-7th scale lower 4 bits of l in y[i].scales if in the first half of the loop.
    if (j < 8) {
        y[i].scales[j] = l & 0xF;
    } 
    // In the second half, store the 8-15th scale lower 4 bits of l into the higher 4 bits of y[i].scales at j-8.
    else {
        y[i].scales[j-8] |= ((l & 0xF) << 4);
    }

    // Shift the higher 4 bits of l to the lower positions.
    l >>= 4;

    // Calculate the index for storing the lower 2 bits(previous l 2 higher bits) of the shifted l and store them in y[i].scales.
    // The specific position in the array is determined by a combination of modulo and division operations.
    y[i].scales[j % 4 + 8] |= (l << (2 * (j / 4)));
}

The key aspects of this code include:

  • Scaling and Normalization: Adjusts the data values to a suitable range for bit manipulation.
  • Bitwise Operations: Utilizes masking (&), shifting (<<, >>), and bitwise OR (|=) to pack data efficiently.
  • Data Optimization: The method packs data into a smaller space, allowing for efficient use of memory and potentially faster processing.

This approach is particularly useful in scenarios where memory optimization is crucial, such as in embedded systems or when dealing with large datasets.

相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
BF16 Pillow 图标 Pickle TensorRT 顶会 Tracking Domain Disk Windows Transformers GPTQ Miniforge CLAP Logo Interview Attention 音频 BeautifulSoup C++ Statistics 签证 uwsgi GoogLeNet OpenAI HaggingFace Safetensors 版权 Agent OCR Review SQLite Gemma Breakpoint Qwen2.5 transformers Quantization Jetson 强化学习 PyTorch 递归学习法 Tiktoken VPN YOLO Password Quantize FP16 Numpy Datetime diffusers Random uWSGI LLM Git 公式 飞书 AI v2ray SPIE GPT4 Docker Search Sklearn VSCode Hungarian logger WAN Conda PDB ChatGPT Firewall Qwen2 TensorFlow Shortcut 净利润 Clash CC Jupyter RAR mmap torchinfo FastAPI Paper 算法题 Markdown Vim Nginx FP8 Augmentation 报税 继承 Translation Hotel Python RGB Vmess 财报 TSV git-lfs ONNX TTS SVR PyCharm LeetCode OpenCV GGML BTC Bitcoin Django SQL Zip InvalidArgumentError tqdm Paddle CEIR CV CAM FP32 Dataset 第一性原理 多进程 论文速读 DeepStream PDF EXCEL HuggingFace Baidu Heatmap Streamlit UI scipy Base64 Plate Data Mixtral Input NameSilo COCO llama.cpp printf Video 关于博主 阿里云 Distillation Ubuntu tar CTC Bin Hilton Rebuttal Ptyhon Pytorch 搞笑 API Anaconda LaTeX git GIT Template Knowledge Diagram LoRA Excel icon Pandas NLTK NLP Github Plotly JSON Website Land Bipartite FP64 SAM 域名 云服务器 IndexTTS2 Freesound Proxy hf Animate 图形思考法 v0.dev Image2Text Qwen Tensor XGBoost VGG-16 Llama Web Color Permission Card 腾讯云 Bert XML DeepSeek Linux WebCrawler Food 证件照 CSV 多线程 MD5 ResNet-50 Cloudreve Algorithm QWEN Magnet Google 论文 LLAMA Crawler CUDA FlashAttention ModelScope UNIX Claude Michelin News PIP Math Use
站点统计

本站现有博文328篇,共被浏览847192

本站已经建立2553天!

热门文章
文章归档
回到顶部