东毅居士

max_shard_size (int or str, optional, defaults to "10GB") — Only applicable for models. The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size lower than this size. If expressed as a string, needs to be digits followed by a unit (like "5MB").

LLAMA Model Save with INT8 Format

作者：XD / 发表： 2023年7月31日 02:51 / 编程笔记/ 阅读量：1790

LLAMA Model Save with INT8 Format

原 Transformers Llama 分词器代码中文注释 tokenization_llama.py

作者：XD / 发表： 2025年4月23日 04:55 / 编程笔记/ 阅读量：604

原 Transformers Llama 模型代码中文注释 modeling_llama.py

作者：XD / 发表： 2025年4月23日 04:50 / 编程笔记/ 阅读量：894

原 Transformers Llama 参数配置代码中文注释 configuration_llama.py

作者：XD / 发表： 2025年4月23日 04:24 / 编程笔记/ 阅读量：618

原 Building llama.cpp

作者：XD / 发表： 2025年2月19日 05:18 / 编程笔记/ 阅读量：811

原 llama.cpp: Definations of Q2_K, Q3_K, Q4_K, Q5_K, Q6_K, and Q8_K Structures

作者：XD / 发表： 2024年1月25日 01:05 / 编程笔记/ 阅读量：3349

原 llama.cpp: Efficient 6-bit Data Packing in an 8-bit Array

作者：XD / 发表： 2024年1月25日 00:39 / 编程笔记/ 阅读量：1805

原 Pytorch Q4_1 Quantize and Dequantize aligning with llama.cpp

作者：XD / 发表： 2023年11月13日 21:49 / 编程笔记/ 阅读量：1691

原 Pytorch Q4_0 Quantize and Dequantize aligning with llama.cpp

作者：XD / 发表： 2023年11月13日 21:42 / 编程笔记/ 阅读量：1729

原 QWEN7B to LLAMA GPTQ model structure

作者：XD / 发表： 2023年11月13日 21:32 / 科研学习/ 阅读量：1755

原 QWEN7B to LLAMA7B Model Structure

作者：XD / 发表： 2023年11月13日 21:00 / 科研学习/ 阅读量：1701

原 GGML Q4_0 Quantize Analysis in llama.cpp

作者：XD / 发表： 2023年11月13日 20:39 / 科研学习/ 阅读量：2048

原 Save the LLAMA Model with LoRA to One Model

作者：XD / 发表： 2023年8月7日 05:16 / 编程笔记/ 阅读量：2402

原 Save Hugging Face Model with One Bin

作者：XD / 发表： 2023年8月7日 02:41 / 编程笔记/ 阅读量：1931

原 LLAMA Model Save with INT8 Format

作者：XD / 发表： 2023年7月31日 02:51 / 编程笔记/ 阅读量：1790

Transformers Llama 分词器代码中文注释 tokenization_llama.py

Transformers Llama 模型代码中文注释 modeling_llama.py

Transformers Llama 参数配置代码中文注释 configuration_llama.py

Building llama.cpp

llama.cpp: Definations of Q2_K, Q3_K, Q4_K, Q5_K, Q6_K, and Q8_K Structures

llama.cpp: Efficient 6-bit Data Packing in an 8-bit Array

Pytorch Q4_1 Quantize and Dequantize aligning with llama.cpp

Pytorch Q4_0 Quantize and Dequantize aligning with llama.cpp

QWEN7B to LLAMA GPTQ model structure

QWEN7B to LLAMA7B Model Structure

GGML Q4_0 Quantize Analysis in llama.cpp

Save the LLAMA Model with LoRA to One Model

Save Hugging Face Model with One Bin

LLAMA Model Save with INT8 Format