EADST

Convert PDFs to Images

Use Python to convert PDF documents into images, page by page.

from pdf2image import convert_from_path
import os

def convert_pdf_to_images(pdf_path, output_folder):
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)

    # pdf2image
    images = convert_from_path(pdf_path)

    for i, image in enumerate(images):
        image_path = os.path.join(output_folder, f"page_{i+1}.jpg")
        image.save(image_path, 'JPEG')

def process_all_pdfs(pdf_folder):
    for root, dirs, files in os.walk(pdf_folder):
        for file in files:
            if file.lower().endswith('.pdf'):
                pdf_path = os.path.join(root, file)
                output_folder = os.path.join(root, os.path.splitext(file)[0])
                convert_pdf_to_images(pdf_path, output_folder)

pdf_folder = '/your_folder_path/'  
process_all_pdfs(pdf_folder)
相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
CUDA BeautifulSoup 证件照 Animate Bin tar Base64 DeepSeek VSCode scipy logger HaggingFace Zip PyCharm CV Statistics PIP C++ Search VPN Bitcoin LaTeX 强化学习 Use 搞笑 CEIR Bipartite EXCEL LLM UNIX GPT4 财报 Cloudreve Vmess Transformers SVR Gemma Color Disk Excel AI OpenCV 图形思考法 Anaconda torchinfo Dataset Jetson v2ray DeepStream Qwen2 PDF diffusers Firewall VGG-16 Ubuntu Freesound Numpy 递归学习法 Hotel Ptyhon Qwen2.5 Conda BTC Plate FP32 Card TSV BF16 Markdown Streamlit v0.dev transformers SQLite git-lfs Input Translation FlashAttention llama.cpp Baidu Land XGBoost LLAMA Logo CTC ChatGPT Git 域名 GIT 飞书 PyTorch uWSGI Hilton QWEN CC Safetensors Interview ONNX OpenAI Pillow LeetCode 版权 多进程 Template Heatmap CLAP Image2Text UI Random Clash Augmentation ResNet-50 Miniforge Pickle Website Django 净利润 SAM git 阿里云 uwsgi FP8 Python hf Docker Domain Datetime HuggingFace Tiktoken 算法题 Bert Windows Mixtral Claude 公式 顶会 Hungarian 报税 YOLO Paddle Proxy GGML Diagram Github Agent Web Vim Food TensorRT FP64 Tensor RAR Algorithm LoRA Sklearn Jupyter Qwen FastAPI NLP Pandas PDB Nginx TTS Pytorch Linux tqdm COCO Quantize JSON XML Permission SPIE Michelin 腾讯云 Review Tracking 继承 API NameSilo WAN 关于博主 第一性原理 Password Google MD5 CSV SQL Llama Magnet Distillation printf Plotly FP16 GoogLeNet Crawler Breakpoint 多线程 Video Knowledge 签证 News 音频 Math RGB mmap ModelScope Paper OCR IndexTTS2 Quantization Data Attention WebCrawler TensorFlow CAM Shortcut InvalidArgumentError GPTQ NLTK
站点统计

本站现有博文320篇,共被浏览759678

本站已经建立2428天!

热门文章
文章归档
回到顶部