EADST

Convert PDFs to Images

Use Python to convert PDF documents into images, page by page.

from pdf2image import convert_from_path
import os

def convert_pdf_to_images(pdf_path, output_folder):
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)

    # pdf2image
    images = convert_from_path(pdf_path)

    for i, image in enumerate(images):
        image_path = os.path.join(output_folder, f"page_{i+1}.jpg")
        image.save(image_path, 'JPEG')

def process_all_pdfs(pdf_folder):
    for root, dirs, files in os.walk(pdf_folder):
        for file in files:
            if file.lower().endswith('.pdf'):
                pdf_path = os.path.join(root, file)
                output_folder = os.path.join(root, os.path.splitext(file)[0])
                convert_pdf_to_images(pdf_path, output_folder)

pdf_folder = '/your_folder_path/'  
process_all_pdfs(pdf_folder)
相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
TensorRT Paper Linux 公式 ChatGPT Image2Text Knowledge Bert 递归学习法 Django Conda Proxy CC Algorithm Bin LeetCode PDF 强化学习 VSCode TensorFlow 飞书 BeautifulSoup Clash FP8 Llama GoogLeNet Tiktoken 多线程 Agent Qwen2 Input Bitcoin LaTeX 论文速读 Video 关于博主 COCO GIT git-lfs Python VPN 签证 C++ CLAP Base64 Github PDB Food printf Website GPT4 API Jetson BF16 Math Hotel 图形思考法 Tracking CUDA SPIE SAM Pickle torchinfo Rebuttal FP64 Permission Pytorch Bipartite Ptyhon CSV CEIR 净利润 Docker Firewall Domain PyTorch ONNX GPTQ llama.cpp Miniforge 搞笑 XGBoost WebCrawler UI IndexTTS2 HuggingFace Distillation TSV FP16 Numpy transformers Streamlit 腾讯云 Hilton ResNet-50 云服务器 Template Breakpoint SVR TTS Datetime News Land Git RAR NameSilo 继承 顶会 Cloudreve Data Color JSON Quantization tar Vim SQL Gemma Zip EXCEL UNIX FP32 Google Vmess Mixtral GGML 算法题 uWSGI Transformers NLTK Magnet Interview Random git Pillow ModelScope uwsgi FlashAttention 财报 scipy Baidu Qwen Nginx Freesound Shortcut OCR Plotly Hungarian LLAMA Markdown InvalidArgumentError Anaconda Logo 第一性原理 VGG-16 域名 Plate Crawler OpenCV Augmentation YOLO CV Disk Use LLM Card 阿里云 CTC 版权 Excel Search Tensor CAM Jupyter Review SQLite Animate DeepSeek Diagram Statistics v0.dev Password OpenAI 证件照 icon hf 论文 Dataset RGB WAN QWEN Michelin tqdm Sklearn AI BTC Pandas HaggingFace NLP Qwen2.5 Attention LoRA Safetensors Heatmap Web Paddle Quantize 音频 PIP diffusers PyCharm Claude 多进程 图标 DeepStream 报税 Ubuntu mmap FastAPI v2ray Windows XML Translation logger MD5
站点统计

本站现有博文328篇,共被浏览842939

本站已经建立2548天!

热门文章
文章归档
回到顶部