EADST

llama.cpp: Efficient 6-bit Data Packing in an 8-bit Array

This code snippet, adapted from llama.cpp by ggerganov, demonstrates a method for efficiently packing 6-bit values into an 8-bit uint8 array. It involves scaling, clamping, and bitwise manipulation to optimize or compress data, suitable for specific processing or hardware requirements.

// Initialize inverse scale factor with a fixed scaling offset and the maximum scale value.
float iscale = -32.f/max_scale;
// QK_K = 256. Iterate over a subset of the scales array, determined by QK_K divided by 16.
for (int j = 0; j < QK_K/16; ++j) {
    // Scale and round the j-th element of the scales array to the nearest integer.
    int8_t l = nearest_int(iscale * scales[j]);

    // Clamp the value of l to the range [-32, 31] and normalize it to [0, 63].
    l = MAX(-32, MIN(31, l)) + 32;

    // Store the 0-7th scale lower 4 bits of l in y[i].scales if in the first half of the loop.
    if (j < 8) {
        y[i].scales[j] = l & 0xF;
    } 
    // In the second half, store the 8-15th scale lower 4 bits of l into the higher 4 bits of y[i].scales at j-8.
    else {
        y[i].scales[j-8] |= ((l & 0xF) << 4);
    }

    // Shift the higher 4 bits of l to the lower positions.
    l >>= 4;

    // Calculate the index for storing the lower 2 bits(previous l 2 higher bits) of the shifted l and store them in y[i].scales.
    // The specific position in the array is determined by a combination of modulo and division operations.
    y[i].scales[j % 4 + 8] |= (l << (2 * (j / 4)));
}

The key aspects of this code include:

  • Scaling and Normalization: Adjusts the data values to a suitable range for bit manipulation.
  • Bitwise Operations: Utilizes masking (&), shifting (<<, >>), and bitwise OR (|=) to pack data efficiently.
  • Data Optimization: The method packs data into a smaller space, allowing for efficient use of memory and potentially faster processing.

This approach is particularly useful in scenarios where memory optimization is crucial, such as in embedded systems or when dealing with large datasets.

相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
BTC JSON 多进程 图形思考法 净利润 TSV 顶会 hf 版权 Bin Safetensors C++ Freesound MD5 Color 证件照 uWSGI Hungarian TensorFlow Windows Streamlit VPN Plate Tensor Augmentation Michelin SQLite Random Agent tqdm PDB 公式 BF16 VGG-16 Docker Card Base64 搞笑 Magnet 音频 torchinfo 递归学习法 Use mmap Excel WebCrawler Bitcoin GIT Tiktoken transformers Ptyhon GGML LLAMA HaggingFace ModelScope Bipartite BeautifulSoup UNIX Clash Template Linux DeepStream FP64 OpenCV IndexTTS2 Data Python Miniforge CUDA Qwen Qwen2.5 Attention 域名 OpenAI Git API Conda SQL Interview LoRA XML Google Logo LeetCode ResNet-50 CLAP uwsgi SPIE Django PyTorch tar GPT4 Pillow FP8 CTC git-lfs 财报 SAM Crawler Plotly RAR TensorRT Diagram Permission Land LaTeX QWEN Distillation Algorithm LLM Video Math FP32 Knowledge XGBoost CSV Jetson PIP Quantization v0.dev v2ray FP16 git GPTQ Firewall Website CEIR YOLO 第一性原理 SVR Hotel 签证 InvalidArgumentError Paper PDF 关于博主 Statistics diffusers Web Transformers Llama llama.cpp Review RGB FlashAttention TTS ChatGPT Image2Text Translation Animate Dataset 算法题 Qwen2 Hilton NameSilo Markdown 飞书 报税 Baidu Numpy CV 阿里云 FastAPI Datetime Gemma Zip Pytorch Password 强化学习 Quantize VSCode Vim CAM PyCharm COCO Search Domain Anaconda Disk Input Pickle HuggingFace Claude Cloudreve GoogLeNet Github Mixtral Jupyter DeepSeek WAN scipy Vmess Bert Shortcut Proxy NLTK UI Paddle Nginx 腾讯云 CC OCR EXCEL Tracking AI Sklearn Pandas logger ONNX 继承 NLP 多线程 Heatmap Food printf Breakpoint Ubuntu
站点统计

本站现有博文319篇,共被浏览750371

本站已经建立2404天!

热门文章
文章归档
回到顶部