EADST

Convert PDFs to Images

Use Python to convert PDF documents into images, page by page.

from pdf2image import convert_from_path
import os

def convert_pdf_to_images(pdf_path, output_folder):
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)

    # pdf2image
    images = convert_from_path(pdf_path)

    for i, image in enumerate(images):
        image_path = os.path.join(output_folder, f"page_{i+1}.jpg")
        image.save(image_path, 'JPEG')

def process_all_pdfs(pdf_folder):
    for root, dirs, files in os.walk(pdf_folder):
        for file in files:
            if file.lower().endswith('.pdf'):
                pdf_path = os.path.join(root, file)
                output_folder = os.path.join(root, os.path.splitext(file)[0])
                convert_pdf_to_images(pdf_path, output_folder)

pdf_folder = '/your_folder_path/'  
process_all_pdfs(pdf_folder)
相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
Domain FlashAttention 域名 PIP COCO HaggingFace CUDA CV Qwen2 Review HuggingFace Crawler Pillow 报税 UNIX git Math VPN AI Sklearn GIT PDB 版权 Quantization 图标 RGB Firewall Mixtral Website InvalidArgumentError SPIE v0.dev Quantize C++ 递归学习法 NLTK printf Anaconda uWSGI Llama TSV 证件照 Attention LLM Heatmap 关于博主 transformers Cloudreve XML CC Hilton Zip Color Tiktoken Bipartite FP8 Hungarian Qwen 签证 BeautifulSoup Image2Text Conda Qwen2.5 hf SQL torchinfo Streamlit 财报 阿里云 scipy 腾讯云 icon 第一性原理 Linux Bin GPT4 CLAP PDF Excel Permission TTS 公式 云服务器 API FP32 Bert uwsgi WAN SQLite UI GoogLeNet VSCode llama.cpp BF16 ResNet-50 SVR Pytorch Template 飞书 Freesound FP64 Jupyter 算法题 JSON Hotel 净利润 QWEN VGG-16 NameSilo Google DeepStream Interview Markdown 搞笑 TensorRT 强化学习 GPTQ Git Jetson Animate Base64 Vmess GGML Vim Card ONNX Translation Disk 顶会 Windows FastAPI Transformers Safetensors News YOLO IndexTTS2 PyTorch WebCrawler Claude CSV Docker LeetCode Diagram Django Plotly LaTeX 多进程 Magnet Ubuntu tar Data LoRA Distillation Ptyhon Paddle git-lfs PyCharm Michelin NLP FP16 MD5 Nginx Plate 多线程 图形思考法 Algorithm Datetime EXCEL XGBoost Pickle 音频 Python tqdm Statistics Github CTC Web Random Land Gemma Proxy Use Paper OpenCV Pandas logger Baidu Search DeepSeek Bitcoin TensorFlow Clash Breakpoint Password Agent Numpy Augmentation RAR Logo Video diffusers OCR Tracking Input ChatGPT Knowledge ModelScope CAM SAM OpenAI Shortcut Miniforge v2ray CEIR Dataset Tensor mmap BTC 继承 Food LLAMA
站点统计

本站现有博文322篇,共被浏览793036

本站已经建立2490天!

热门文章
文章归档
回到顶部