EADST

Extract Webpage Information with Python

Here is the python program to extract webpage information with BeautifulSoup and save the data in a CSV file.

from bs4 import BeautifulSoup
import urllib.request
import pandas as pd

url = 'file:///Users/xd/Desktop/ieee/Region_5_Student_Branch_Counselors_and_Chairs.htm'
save_file = 'ieee_info_1'
html = urllib.request.urlopen(url).read()

soup = BeautifulSoup(html, "html.parser")

universities = soup.find_all('div', class_='spoName bullet pad-t15')
people = soup.find_all('div', class_='roster-results')

for u, p in zip(universities, people):
    info = p.find_all('p')
    university = u.get_text()
    name = info[0].get_text()
    if name == 'Position Vacant':
        continue
    title = info[2].get_text()
    address = info[3].get_text() + ', ' + info[4].get_text()
    email = info[-1].get_text()[7:]

    content = [[university, name, title, address, email]]
    list_name = ['university', 'name', 'title', 'address', 'email']
    data = pd.DataFrame(columns=list_name, data=content)
    data.to_csv("{}.csv".format(save_file), mode='a', index=False, header=False, encoding='utf-8')
About Me
XD
Goals determine what you are going to be.
Category
标签云
SVR Bert Zip Data Qwen Cloudreve NLTK Pandas Conda Land Jupyter ms-swift Ubuntu LoRA Claude VPN Statistics Datetime Docker transformers 证件照 QWEN FP32 CEIR icon Agent Safetensors XGBoost Markdown UNIX NameSilo 关于博主 Template Pickle XML DeepSeek 净利润 hf Ptyhon TSV scipy Michelin Augmentation HaggingFace Nginx GPTQ BF16 Baidu 图标 PDF Tensor Jetson git git-lfs tqdm VSCode Qwen2 BTC Excel 多线程 Transformers Linux Sklearn FP16 继承 搞笑 Quantize Heatmap YOLO diffusers CC Interview Permission 强化学习 Numpy Distillation ResNet-50 Git JSON OpenAI Miniforge LeetCode SQLite 公式 Disk Plate Django 报税 CV PyCharm Plotly OpenCV Proxy OCR Tiktoken GIT Windows VGG-16 SQL 腾讯云 Shortcut Random tar Card 签证 Logo PIP Google News FP64 API Algorithm Knowledge RGB TensorFlow Gemma PyTorch Password 阿里云 Paddle v0.dev COCO 飞书 RAR 音频 Streamlit mmap Bin printf LLM Crawler Web GPT4 图形思考法 Food 第一性原理 FastAPI SAM 论文速读 ModelScope TTS 算法题 Website Dataset IndexTTS2 NLP C++ CAM BeautifulSoup Rebuttal WAN 顶会 Hilton Diagram Vmess uWSGI Quantization CUDA Pytorch Freesound Bipartite Paper Hungarian Math DeepStream ChatGPT FP8 EXCEL Translation Hotel Use Magnet Tracking Animate LaTeX Mixtral CLAP Input Anaconda AI 域名 MD5 论文 多进程 ONNX Review RL logger Python 版权 GGML PDB FlashAttention Color CSV CTC InvalidArgumentError v2ray Pillow Github 递归学习法 Search SPIE 财报 UI Video Firewall Image2Text llama.cpp Bitcoin 云服务器 Clash Vim Breakpoint WebCrawler TensorRT Base64 HuggingFace Qwen2.5 Domain GoogLeNet torchinfo Llama LLAMA uwsgi Attention
站点统计

本站现有博文332篇,共被浏览868937

本站已经建立2577天!

热门文章
文章归档
回到顶部