东毅居士

Gemma模型结构注释

作者：XD / 发表： 2024年3月12日 23:50 / 编程笔记/ 阅读量：2555

Gemma模型结构注释

Quick Review: SmoothQuant: Accurate and Efficient Post-Training Quantization for LLMs

作者：XD / 发表： 2023年12月7日 00:45 / 科研学习/ 阅读量：1484

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Paper: https://arxiv.org/abs/2211.10438

Code: https://github.com/mit-han-lab/smoothquant

Organization: MIT

Quick Review: AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

作者：XD / 发表： 2023年12月7日 00:38 / 科研学习/ 阅读量：1918

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Paper: https://arxiv.org/abs/2306.00978

Code: https://github.com/mit-han-lab/llm-awq/

Organization: MIT

Quick Review: ZeroQuant-FP

作者：XD / 发表： 2023年12月7日 00:32 / 科研学习/ 阅读量：1624

ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats

Paper: https://arxiv.org/abs/2307.09782

Code: https://github.com/microsoft/DeepSpeed

Organization: Microsoft

Quick Review: QUIK: Towards End-to-end 4-Bit Inference on Generative Large Language Models

作者：XD / 发表： 2023年12月7日 00:06 / 科研学习/ 阅读量：1622

QUIK: Towards End-to-end 4-Bit Inference on Generative Large Language Models

Paper: https://arxiv.org/abs/2310.09259

Code: https://github.com/IST-DASLab/QUIK

Organization: ETH Zurich

Quick Review: SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

作者：XD / 发表： 2023年12月6日 23:57 / 科研学习/ 阅读量：1474

SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

Paper: https://arxiv.org/abs/2306.03078

Code: https://github.com/Vahe1994/SpQR

Organization: University of Washington

Quick Review: Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

作者：XD / 发表： 2023年12月6日 23:51 / 科研学习/ 阅读量：1196

Optimize Weight Rounding via Signed Gradient Descent for the Quantization of Large Language Models

Paper: https://arxiv.org/abs/2309.05516

Code: https://github.com/intel/neural-compressor

Organization: Intel

Quick Review: Norm Tweaking: High-performance Low-bit Quantization of Large Language Models

作者：XD / 发表： 2023年12月6日 23:44 / 科研学习/ 阅读量：1407

Norm Tweaking: High-performance Low-bit Quantization of Large Language Models

Paper: https://arxiv.org/abs/2309.02784

Code: None

Organization: Meituan

Qwen-7B-Chat模型结构注释

作者：XD / 发表： 2023年10月31日 23:52 / 编程笔记/ 阅读量：3459

Qwen-7B-Chat模型结构注释

原 Gemma模型结构注释

作者：XD / 发表： 2024年3月12日 23:50 / 编程笔记/ 阅读量：2555

原 Quick Review: SmoothQuant: Accurate and Efficient Post-Training Quantization for LLMs

作者：XD / 发表： 2023年12月7日 00:45 / 科研学习/ 阅读量：1484

原 Quick Review: AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

作者：XD / 发表： 2023年12月7日 00:38 / 科研学习/ 阅读量：1918

原 Quick Review: ZeroQuant-FP

作者：XD / 发表： 2023年12月7日 00:32 / 科研学习/ 阅读量：1624

原 Quick Review: QUIK: Towards End-to-end 4-Bit Inference on Generative Large Language Models

作者：XD / 发表： 2023年12月7日 00:06 / 科研学习/ 阅读量：1622

原 Quick Review: SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

作者：XD / 发表： 2023年12月6日 23:57 / 科研学习/ 阅读量：1474

原 Quick Review: Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

作者：XD / 发表： 2023年12月6日 23:51 / 科研学习/ 阅读量：1196

原 Quick Review: Norm Tweaking: High-performance Low-bit Quantization of Large Language Models

作者：XD / 发表： 2023年12月6日 23:44 / 科研学习/ 阅读量：1407

原 Qwen-7B-Chat模型结构注释

作者：XD / 发表： 2023年10月31日 23:52 / 编程笔记/ 阅读量：3459

Gemma模型结构注释

Quick Review: SmoothQuant: Accurate and Efficient Post-Training Quantization for LLMs

Quick Review: AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Quick Review: ZeroQuant-FP

Quick Review: QUIK: Towards End-to-end 4-Bit Inference on Generative Large Language Models

Quick Review: SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

Quick Review: Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Quick Review: Norm Tweaking: High-performance Low-bit Quantization of Large Language Models

Qwen-7B-Chat模型结构注释