EADST

Convert PDFs to Images

Use Python to convert PDF documents into images, page by page.

from pdf2image import convert_from_path
import os

def convert_pdf_to_images(pdf_path, output_folder):
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)

    # pdf2image
    images = convert_from_path(pdf_path)

    for i, image in enumerate(images):
        image_path = os.path.join(output_folder, f"page_{i+1}.jpg")
        image.save(image_path, 'JPEG')

def process_all_pdfs(pdf_folder):
    for root, dirs, files in os.walk(pdf_folder):
        for file in files:
            if file.lower().endswith('.pdf'):
                pdf_path = os.path.join(root, file)
                output_folder = os.path.join(root, os.path.splitext(file)[0])
                convert_pdf_to_images(pdf_path, output_folder)

pdf_folder = '/your_folder_path/'  
process_all_pdfs(pdf_folder)
相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
CEIR Michelin tqdm Shortcut uWSGI GIT Firewall Math WAN 关于博主 OpenAI Bert Datetime QWEN Gemma Heatmap Ubuntu Python mmap uwsgi UNIX TSV Statistics Translation torchinfo Disk 递归学习法 HuggingFace BF16 OpenCV TTS AI Safetensors Numpy 继承 Knowledge Color Password CTC MD5 OCR Streamlit VGG-16 SVR Docker Mixtral LLAMA CSV FlashAttention Vim Jupyter 搞笑 FP8 Qwen2 Augmentation CC Distillation SQLite Agent 音频 Paddle VPN CLAP TensorFlow Proxy RAR 签证 图标 diffusers git-lfs Magnet 版权 Dataset WebCrawler Breakpoint PDF SQL Nginx Random Pandas Llama 财报 顶会 Bitcoin SAM GoogLeNet 第一性原理 Input LLM Linux RGB Logo ChatGPT Attention Website Pickle Clash Pytorch 净利润 SPIE Card LeetCode Domain JSON 腾讯云 Paper Git Freesound 论文 LoRA EXCEL Data Baidu GPT4 Search FP64 InvalidArgumentError Markdown COCO Tensor XML Ptyhon git FP16 transformers Quantization ModelScope Miniforge HaggingFace PyTorch News Land CAM GPTQ ms-swift YOLO Qwen2.5 DeepStream Diagram UI Hungarian Windows XGBoost Anaconda Web CUDA Github Jetson Django Review Quantize Interview PIP 多进程 CV Template Pillow 公式 Conda Excel 证件照 FP32 多线程 云服务器 Base64 Google FastAPI Qwen DeepSeek Vmess NLP GGML NLTK llama.cpp icon ResNet-50 Plate 论文速读 C++ Use Claude Tracking v0.dev Bin 强化学习 报税 TensorRT Animate BeautifulSoup Tiktoken Plotly Transformers 域名 Hilton Rebuttal LaTeX API Crawler PyCharm Hotel 飞书 Zip NameSilo IndexTTS2 v2ray PDB tar 阿里云 Video logger printf VSCode Permission 图形思考法 Food Algorithm Cloudreve BTC 算法题 Sklearn ONNX Bipartite Image2Text scipy hf
站点统计

本站现有博文330篇,共被浏览860876

本站已经建立2569天!

热门文章
文章归档
回到顶部