东毅居士

CLAP 模型：对齐音频与文本的跨模态

作者：XD / 发表： 2025年8月26日 03:31 / 科研学习/ 阅读量：1428

CLAP 模型的结构文本输入 → 文本编码器 → 投影层 → 共享语义空间音频输入 → 音频编码器 → 投影层 → 共享语义空间

About Me

Goals determine what you are going to be.

Category

标签云

Interview SQL SAM Land SVR Qwen2 ChatGPT Image2Text Qwen2.5 printf LaTeX 阿里云 scipy Template Miniforge EXCEL Hungarian C++ QWEN Permission Baidu SQLite logger FP64 GoogLeNet uWSGI Review Tracking Proxy CEIR UI NLTK Password 证件照 HaggingFace v2ray Transformers RAR COCO Algorithm Ptyhon Vmess 搞笑 FP32 Animate Python 继承 Zip OCR 算法题 Use Web 递归学习法 Bin LLM Markdown Numpy 净利润 Hilton Plotly Food CLAP 关于博主 Crawler HuggingFace Translation tqdm CSV API PDB Jupyter Nginx VPN Breakpoint GPT4 公式 mmap 云服务器 Input VSCode MD5 BF16 Clash Cloudreve XGBoost Diagram UNIX YOLO CC 论文 Plate Distillation WebCrawler DeepStream Pillow Dataset Datetime Github Docker Quantization Jetson CAM Math CUDA Quantize ModelScope OpenCV TensorFlow Tensor Sklearn uwsgi News RGB Excel 多线程音频图形思考法 ONNX Michelin Claude LoRA Anaconda BTC Conda Video Bipartite 强化学习 SPIE transformers LLAMA GIT Bitcoin XML Pytorch Pickle JSON 飞书 Mixtral 第一性原理 BeautifulSoup RL VGG-16 Git PIP 签证 WAN Search tar Heatmap Agent NameSilo 顶会 icon Data Freesound Bert PyTorch FP8 Ubuntu v0.dev Windows 多进程 FP16 OpenAI InvalidArgumentError 域名 PDF GGML FastAPI 图标 Color Augmentation Magnet CTC Website Domain Paper NLP Base64 Shortcut torchinfo Statistics Gemma Safetensors Tiktoken 版权 Llama git TTS DeepSeek Knowledge Django Rebuttal GPTQ CV TSV llama.cpp Vim Linux Paddle AI 论文速读 TensorRT 腾讯云 Hotel Logo LeetCode Random hf ResNet-50 Firewall Google 报税 Streamlit Pandas PyCharm Card 财报 diffusers git-lfs Disk ms-swift IndexTTS2 FlashAttention Attention Qwen

站点统计

本站现有博文332篇,共被浏览887099次

本站已经建立2590天!

原 CLAP 模型：对齐音频与文本的跨模态

作者：XD / 发表： 2025年8月26日 03:31 / 科研学习/ 阅读量：1428

CLAP 模型：对齐音频与文本的跨模态