EADST

Extract Webpage Information with Python

Here is the python program to extract webpage information with BeautifulSoup and save the data in a CSV file.

from bs4 import BeautifulSoup
import urllib.request
import pandas as pd

url = 'file:///Users/xd/Desktop/ieee/Region_5_Student_Branch_Counselors_and_Chairs.htm'
save_file = 'ieee_info_1'
html = urllib.request.urlopen(url).read()

soup = BeautifulSoup(html, "html.parser")

universities = soup.find_all('div', class_='spoName bullet pad-t15')
people = soup.find_all('div', class_='roster-results')

for u, p in zip(universities, people):
    info = p.find_all('p')
    university = u.get_text()
    name = info[0].get_text()
    if name == 'Position Vacant':
        continue
    title = info[2].get_text()
    address = info[3].get_text() + ', ' + info[4].get_text()
    email = info[-1].get_text()[7:]

    content = [[university, name, title, address, email]]
    list_name = ['university', 'name', 'title', 'address', 'email']
    data = pd.DataFrame(columns=list_name, data=content)
    data.to_csv("{}.csv".format(save_file), mode='a', index=False, header=False, encoding='utf-8')
About Me
XD
Goals determine what you are going to be.
Category
标签云
Gemma VGG-16 Video Tensor Linux YOLO JSON 域名 SPIE LLM Ptyhon Zip Algorithm Firewall Ubuntu Freesound Use 顶会 多线程 logger NLTK Card transformers Diagram UI uWSGI 第一性原理 腾讯云 Michelin uwsgi printf 音频 强化学习 FP32 版权 Transformers 算法题 WAN Qwen2 云服务器 Heatmap Jupyter TSV MD5 Qwen2.5 阿里云 COCO Augmentation Distillation EXCEL Statistics Web PyCharm Datetime XML SQL OpenCV torchinfo Crawler Jetson Dataset Safetensors Streamlit Quantize ChatGPT Magnet PIP Django mmap diffusers Google 净利润 Permission IndexTTS2 Agent Disk Logo Math ModelScope Image2Text SVR 公式 ONNX Miniforge CV LaTeX 继承 TensorRT Land git 图形思考法 News llama.cpp Pandas Paddle Base64 Llama 递归学习法 Sklearn Search Baidu v0.dev FP16 Bin Excel Pillow Python FP64 Windows hf 关于博主 Bipartite RAR PyTorch CUDA BeautifulSoup GPTQ VSCode Food CEIR VPN tqdm PDB ResNet-50 CTC Template Bitcoin FP8 CSV Mixtral Plotly OpenAI Animate TTS FlashAttention XGBoost Attention C++ OCR QWEN Website Color NameSilo CC Hotel Random Pytorch Quantization LLAMA Vim FastAPI Pickle Rebuttal Tiktoken Review Proxy 论文速读 Conda Input Docker 签证 RGB 报税 LoRA HaggingFace DeepStream Git Qwen v2ray HuggingFace 多进程 CLAP Nginx AI Anaconda Numpy WebCrawler Markdown tar Plate SQLite 证件照 LeetCode Hilton Tracking 图标 GPT4 InvalidArgumentError Hungarian BTC Vmess Cloudreve scipy icon BF16 API Interview DeepSeek 财报 Data Github SAM Shortcut Knowledge TensorFlow GIT Domain Claude Translation Breakpoint Bert 搞笑 git-lfs CAM GGML Paper GoogLeNet 论文 Password NLP 飞书 UNIX Clash PDF
站点统计

本站现有博文327篇,共被浏览826796

本站已经建立2533天!

热门文章
文章归档
回到顶部