EADST

Convert PDFs to Images

Use Python to convert PDF documents into images, page by page.

from pdf2image import convert_from_path
import os

def convert_pdf_to_images(pdf_path, output_folder):
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)

    # pdf2image
    images = convert_from_path(pdf_path)

    for i, image in enumerate(images):
        image_path = os.path.join(output_folder, f"page_{i+1}.jpg")
        image.save(image_path, 'JPEG')

def process_all_pdfs(pdf_folder):
    for root, dirs, files in os.walk(pdf_folder):
        for file in files:
            if file.lower().endswith('.pdf'):
                pdf_path = os.path.join(root, file)
                output_folder = os.path.join(root, os.path.splitext(file)[0])
                convert_pdf_to_images(pdf_path, output_folder)

pdf_folder = '/your_folder_path/'  
process_all_pdfs(pdf_folder)
相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
Python Hungarian Magnet Shortcut SQLite Color 关于博主 CV 阿里云 Vmess Food FP8 Zip Ptyhon VSCode tar Mixtral Excel HaggingFace HuggingFace FP16 Nginx Markdown Plotly Windows LoRA Qwen2 Paddle Math C++ 财报 Disk 顶会 PDB AI FP32 继承 transformers Cloudreve Attention Diagram diffusers Website 证件照 报税 Jetson Template Firewall Sklearn Anaconda MD5 FastAPI 净利润 YOLO WebCrawler PyTorch Bert uWSGI Pickle git ModelScope Password ResNet-50 JSON Breakpoint mmap Miniforge UI Datetime Gemma Plate Git GoogLeNet Streamlit Michelin uwsgi Freesound Input Quantize RGB OCR 域名 SVR Safetensors Tensor printf 签证 Quantization Conda UNIX 云服务器 BeautifulSoup FlashAttention GPTQ 算法题 tqdm git-lfs CEIR PDF TensorRT Image2Text EXCEL Statistics News v2ray Heatmap Baidu Interview 多线程 XGBoost Hilton LLAMA 腾讯云 Web Django DeepSeek Video Logo Pillow Llama CTC 图形思考法 Data SQL Animate IndexTTS2 TensorFlow 飞书 Use 公式 Crawler Land Pytorch TTS BTC Claude Augmentation Proxy LaTeX Domain GIT OpenAI GPT4 Qwen Linux ONNX OpenCV WAN Pandas Jupyter hf VGG-16 XML 多进程 版权 Docker SAM Dataset LLM Bin FP64 LeetCode Qwen2.5 Ubuntu Algorithm VPN Card Numpy CSV ChatGPT Bipartite Search 音频 TSV scipy InvalidArgumentError Hotel torchinfo Knowledge 强化学习 Translation BF16 Agent llama.cpp Clash API DeepStream CLAP RAR Paper 搞笑 Base64 Github CC PIP Review Bitcoin PyCharm 第一性原理 Google Tiktoken 递归学习法 Random QWEN Transformers NLP logger Tracking SPIE Permission Distillation NameSilo COCO v0.dev NLTK GGML CAM CUDA Vim
站点统计

本站现有博文321篇,共被浏览768028

本站已经建立2452天!

热门文章
文章归档
回到顶部