EADST

Extract Webpage Information with Python

Here is the python program to extract webpage information with BeautifulSoup and save the data in a CSV file.

from bs4 import BeautifulSoup
import urllib.request
import pandas as pd

url = 'file:///Users/xd/Desktop/ieee/Region_5_Student_Branch_Counselors_and_Chairs.htm'
save_file = 'ieee_info_1'
html = urllib.request.urlopen(url).read()

soup = BeautifulSoup(html, "html.parser")

universities = soup.find_all('div', class_='spoName bullet pad-t15')
people = soup.find_all('div', class_='roster-results')

for u, p in zip(universities, people):
    info = p.find_all('p')
    university = u.get_text()
    name = info[0].get_text()
    if name == 'Position Vacant':
        continue
    title = info[2].get_text()
    address = info[3].get_text() + ', ' + info[4].get_text()
    email = info[-1].get_text()[7:]

    content = [[university, name, title, address, email]]
    list_name = ['university', 'name', 'title', 'address', 'email']
    data = pd.DataFrame(columns=list_name, data=content)
    data.to_csv("{}.csv".format(save_file), mode='a', index=False, header=False, encoding='utf-8')
About Me
XD
Goals determine what you are going to be.
Category
标签云
CC Animate Zip Data 多线程 C++ tqdm LLAMA icon 搞笑 TensorRT PDF Base64 FP16 图形思考法 uWSGI mmap Baidu DeepSeek Disk SQL TTS YOLO 关于博主 顶会 Web 强化学习 CSV Git Safetensors Bert GPTQ BTC ONNX diffusers Excel Tracking Website Cloudreve Mixtral git-lfs Github Freesound TensorFlow Password 证件照 Land llama.cpp Streamlit OpenAI FlashAttention Sklearn Ubuntu FP8 NLTK 图标 Windows XGBoost Math Image2Text Augmentation Shortcut OpenCV 继承 VGG-16 财报 Knowledge BF16 Bitcoin CTC 腾讯云 PIP VSCode RGB Markdown printf Claude WAN Distillation Quantization Python Search 多进程 CEIR LeetCode News Quantize Rebuttal 论文 净利润 ModelScope Miniforge WebCrawler Interview Paddle 签证 Card MD5 InvalidArgumentError Plotly ChatGPT SQLite Google Conda PDB hf Michelin OCR 第一性原理 Magnet tar EXCEL Domain BeautifulSoup Pytorch 音频 Jupyter Use 算法题 PyCharm Linux Breakpoint uwsgi Pickle COCO ResNet-50 GIT Dataset Tiktoken Bipartite IndexTTS2 Random 版权 FP64 LoRA Pandas Gemma CUDA SAM GoogLeNet Hotel Jetson Food Color Numpy LaTeX Algorithm FastAPI Vmess RAR Transformers Proxy Attention Clash git DeepStream 论文速读 CLAP Video 域名 XML Qwen2.5 Translation SVR Hungarian Tensor Heatmap Hilton API JSON Django logger UNIX Crawler SPIE Qwen2 Ptyhon torchinfo NLP UI HuggingFace HaggingFace TSV v2ray Logo CAM Template 飞书 FP32 GPT4 v0.dev 云服务器 Datetime Review Statistics QWEN 阿里云 Qwen Anaconda Paper 报税 transformers Bin Llama PyTorch Agent Pillow CV 公式 Plate Diagram AI NameSilo VPN 递归学习法 Vim scipy Firewall Input LLM GGML Permission Docker Nginx
站点统计

本站现有博文328篇,共被浏览850710

本站已经建立2557天!

热门文章
文章归档
回到顶部