EADST

Convert PDFs to Images

Use Python to convert PDF documents into images, page by page.

from pdf2image import convert_from_path
import os

def convert_pdf_to_images(pdf_path, output_folder):
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)

    # pdf2image
    images = convert_from_path(pdf_path)

    for i, image in enumerate(images):
        image_path = os.path.join(output_folder, f"page_{i+1}.jpg")
        image.save(image_path, 'JPEG')

def process_all_pdfs(pdf_folder):
    for root, dirs, files in os.walk(pdf_folder):
        for file in files:
            if file.lower().endswith('.pdf'):
                pdf_path = os.path.join(root, file)
                output_folder = os.path.join(root, os.path.splitext(file)[0])
                convert_pdf_to_images(pdf_path, output_folder)

pdf_folder = '/your_folder_path/'  
process_all_pdfs(pdf_folder)
相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
公式 图标 OCR CTC TTS Augmentation 多进程 NLP GPT4 XGBoost 多线程 Plate CC Freesound CV Sklearn Diagram Git Pandas Land SVR llama.cpp Paddle tar Datetime ChatGPT mmap 飞书 FP32 NLTK News Claude 版权 Tensor Jetson Web 域名 Math Magnet Interview CUDA Jupyter Domain Image2Text Transformers torchinfo ResNet-50 VPN 签证 Llama ModelScope MD5 Plotly 证件照 Logo git-lfs Mixtral HuggingFace DeepSeek Search Pillow Color 财报 Tiktoken hf icon XML Safetensors WebCrawler PyTorch Shortcut GGML 净利润 递归学习法 LLAMA Ptyhon GoogLeNet FP8 Agent OpenAI PyCharm Clash Gemma Anaconda VGG-16 Windows CEIR CSV Base64 WAN PDB Bin Cloudreve COCO Proxy Miniforge Template GIT EXCEL LaTeX Dataset Heatmap Use Website Firewall Github Knowledge FP16 Crawler Baidu Numpy Linux FastAPI PDF 搞笑 Paper transformers Statistics logger Translation Quantization Card Conda SPIE UNIX RGB Pytorch Docker Django Pickle scipy PIP Animate Bert 音频 uWSGI BeautifulSoup Attention Password 继承 关于博主 git Tracking 算法题 BF16 UI 云服务器 HaggingFace IndexTTS2 FP64 Quantize SQL Bipartite SQLite Hilton Markdown Vmess CLAP diffusers 强化学习 Input TensorFlow TSV InvalidArgumentError SAM Random CAM OpenCV tqdm Python NameSilo GPTQ 腾讯云 Disk LeetCode Google QWEN LLM BTC Food LoRA YOLO Qwen Vim Distillation printf 第一性原理 Rebuttal 图形思考法 Qwen2.5 Data Review VSCode 阿里云 Video C++ Nginx Bitcoin 顶会 TensorRT Streamlit uwsgi Breakpoint Michelin 报税 JSON ONNX Qwen2 RAR Algorithm Excel Hungarian API v0.dev AI FlashAttention v2ray Ubuntu Zip Permission DeepStream Hotel
站点统计

本站现有博文324篇,共被浏览822259

本站已经建立2527天!

热门文章
文章归档
回到顶部