EADST

Convert PDFs to Images

Use Python to convert PDF documents into images, page by page.

from pdf2image import convert_from_path
import os

def convert_pdf_to_images(pdf_path, output_folder):
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)

    # pdf2image
    images = convert_from_path(pdf_path)

    for i, image in enumerate(images):
        image_path = os.path.join(output_folder, f"page_{i+1}.jpg")
        image.save(image_path, 'JPEG')

def process_all_pdfs(pdf_folder):
    for root, dirs, files in os.walk(pdf_folder):
        for file in files:
            if file.lower().endswith('.pdf'):
                pdf_path = os.path.join(root, file)
                output_folder = os.path.join(root, os.path.splitext(file)[0])
                convert_pdf_to_images(pdf_path, output_folder)

pdf_folder = '/your_folder_path/'  
process_all_pdfs(pdf_folder)
相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
printf 关于博主 Card HuggingFace Numpy Review Tensor Llama Hungarian Qwen2 Random DeepSeek hf Conda 视频信息 Paddle Video LeetCode Transformers 多线程 Web llama.cpp VGG-16 RAR EXCEL Streamlit Math Docker OCR Google CLAP Disk OpenAI RGB Color XGBoost PyTorch Tiktoken Jetson Use diffusers TensorFlow Windows SAM Firewall Bipartite SQLite Cloudreve Distillation FP16 NLP Bert JSON NameSilo torchinfo Ubuntu Bin Qwen Linux CUDA COCO Git FP32 QWEN CAM git-lfs 算法题 CC uWSGI 版权 Translation PDB Zip OpenCV GGML Hilton 音频 SPIE Freesound scipy FlashAttention SQL Markdown CEIR GoogLeNet XML Pytorch Gemma VSCode PIP Claude NLTK 继承 Password SVR Algorithm 报税 InvalidArgumentError Template git Attention 签证 多进程 TensorRT 净利润 BTC Animate 腾讯云 Ptyhon Image2Text YOLO Domain LLM 阿里云 Proxy 飞书 Datetime Miniforge Logo mmap 搞笑 Statistics Quantize Paper Pillow 财报 ModelScope 证件照 WAN Base64 FP64 Sklearn logger CTC Website CV uwsgi Permission Breakpoint Augmentation ChatGPT Tracking Input 域名 Interview Django GPT4 Clash MD5 Quantization Safetensors DeepStream Crawler Excel UNIX CSV GPTQ Magnet Qwen2.5 Mixtral Hotel Github tqdm Heatmap Diagram Plotly v0.dev Jupyter GIT BF16 AI UI PyCharm Baidu PDF Land ResNet-50 VPN Plate Shortcut Food Dataset TSV API C++ FP8 Michelin LaTeX LoRA Pandas Bitcoin WebCrawler Vim Python Data Pickle Vmess 公式 ONNX IndexTTS2 transformers LLAMA TTS HaggingFace v2ray Nginx BeautifulSoup Knowledge tar FastAPI Anaconda
站点统计

本站现有博文311篇,共被浏览740050

本站已经建立2377天!

热门文章
文章归档
回到顶部