EADST

llama.cpp: Efficient 6-bit Data Packing in an 8-bit Array

This code snippet, adapted from llama.cpp by ggerganov, demonstrates a method for efficiently packing 6-bit values into an 8-bit uint8 array. It involves scaling, clamping, and bitwise manipulation to optimize or compress data, suitable for specific processing or hardware requirements.

// Initialize inverse scale factor with a fixed scaling offset and the maximum scale value.
float iscale = -32.f/max_scale;
// QK_K = 256. Iterate over a subset of the scales array, determined by QK_K divided by 16.
for (int j = 0; j < QK_K/16; ++j) {
    // Scale and round the j-th element of the scales array to the nearest integer.
    int8_t l = nearest_int(iscale * scales[j]);

    // Clamp the value of l to the range [-32, 31] and normalize it to [0, 63].
    l = MAX(-32, MIN(31, l)) + 32;

    // Store the 0-7th scale lower 4 bits of l in y[i].scales if in the first half of the loop.
    if (j < 8) {
        y[i].scales[j] = l & 0xF;
    } 
    // In the second half, store the 8-15th scale lower 4 bits of l into the higher 4 bits of y[i].scales at j-8.
    else {
        y[i].scales[j-8] |= ((l & 0xF) << 4);
    }

    // Shift the higher 4 bits of l to the lower positions.
    l >>= 4;

    // Calculate the index for storing the lower 2 bits(previous l 2 higher bits) of the shifted l and store them in y[i].scales.
    // The specific position in the array is determined by a combination of modulo and division operations.
    y[i].scales[j % 4 + 8] |= (l << (2 * (j / 4)));
}

The key aspects of this code include:

  • Scaling and Normalization: Adjusts the data values to a suitable range for bit manipulation.
  • Bitwise Operations: Utilizes masking (&), shifting (<<, >>), and bitwise OR (|=) to pack data efficiently.
  • Data Optimization: The method packs data into a smaller space, allowing for efficient use of memory and potentially faster processing.

This approach is particularly useful in scenarios where memory optimization is crucial, such as in embedded systems or when dealing with large datasets.

相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
ChatGPT 搞笑 Interview DeepStream CEIR Quantization 音频 Magnet SQLite YOLO Bert CLAP Search VPN Gemma 继承 AI ResNet-50 Baidu 财报 Image2Text Anaconda FP16 Docker Knowledge 腾讯云 Statistics 图形思考法 LLM Disk 阿里云 Quantize MD5 VSCode Algorithm SQL 第一性原理 Attention mmap 关于博主 LaTeX Tracking Vim CC Bitcoin XGBoost NameSilo Claude Tensor 证件照 DeepSeek InvalidArgumentError Linux Bipartite Python News git-lfs 飞书 OCR Use Dataset Agent llama.cpp Input Pytorch FlashAttention 云服务器 算法题 Jupyter NLP EXCEL v2ray 顶会 Vmess Review transformers GPT4 HaggingFace TTS PDF PyTorch ONNX GPTQ Pillow hf Michelin 多线程 Tiktoken 版权 IndexTTS2 WAN Plotly PyCharm OpenCV PIP Permission Transformers tqdm uwsgi diffusers Qwen2 logger 报税 Firewall RGB LeetCode LoRA OpenAI Ubuntu Plate LLAMA VGG-16 Nginx Pandas Diagram Safetensors SAM Proxy 签证 RAR git FP64 WebCrawler XML Color Animate Translation COCO 递归学习法 CV Password Clash 域名 FastAPI CUDA Hilton TensorFlow ModelScope TSV Datetime SVR uWSGI 公式 Card printf QWEN Math GGML Paddle Base64 Sklearn NLTK API Logo Cloudreve 净利润 BTC Qwen CAM Website Bin v0.dev Github Video GoogLeNet 多进程 BeautifulSoup Web GIT Jetson CTC FP32 Miniforge Zip Crawler Google FP8 Template Streamlit HuggingFace 强化学习 scipy Llama C++ Windows Augmentation Pickle Land Conda TensorRT Django tar JSON Random UI Qwen2.5 Freesound Distillation PDB Data UNIX Heatmap Markdown Ptyhon Food Hungarian SPIE BF16 Domain Breakpoint Git Hotel Paper torchinfo Excel Mixtral Shortcut Numpy CSV
站点统计

本站现有博文321篇,共被浏览767783

本站已经建立2451天!

热门文章
文章归档
回到顶部