Merge Safetensors to Bin File
作者:XD / 发表: 2024年2月6日 03:45 / 编程笔记/ 阅读量:2300
Merge Safetensors to Bin File
Merge Safetensors to Bin File
Check the Index and Token from Tiktoken
Lucid Plugin from ChatGPT to Creating the Diagram
Use md5sum to Verify File Integrity
llama.cpp: Definitions of Q2_K, Q3_K, Q4_K, Q5_K, Q6_K, and Q8_K Structures
llama.cpp: Efficient 6-bit Data Packing in an 8-bit Array
Setting Up v2rayNG with Tencent Cloud Silicon Valley Lighthouse 利用腾讯云配置自己的v2ray
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Paper: https://arxiv.org/abs/2211.10438
Code: https://github.com/mit-han-lab/smoothquant
Organization: MIT
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Paper: https://arxiv.org/abs/2306.00978
Code: https://github.com/mit-han-lab/llm-awq/
Organization: MIT
ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats
Paper: https://arxiv.org/abs/2307.09782
Code: https://github.com/microsoft/DeepSpeed
Organization: Microsoft
QUIK: Towards End-to-end 4-Bit Inference on Generative Large Language Models
Paper: https://arxiv.org/abs/2310.09259
Code: https://github.com/IST-DASLab/QUIK
Organization: ETH Zurich
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
Paper: https://arxiv.org/abs/2306.03078
Code: https://github.com/Vahe1994/SpQR
Organization: University of Washington
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of Large Language Models
Paper: https://arxiv.org/abs/2309.05516
Code: https://github.com/intel/neural-compressor
Organization: Intel
Norm Tweaking: High-performance Low-bit Quantization of Large Language Models
Paper: https://arxiv.org/abs/2309.02784
Code: None
Organization: Meituan
Review: H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
Level: Average, Not Recommend
Check All Values from One Tensor Equal to One Value