EADST

Extract Webpage Information with Python

Here is the python program to extract webpage information with BeautifulSoup and save the data in a CSV file.

from bs4 import BeautifulSoup
import urllib.request
import pandas as pd

url = 'file:///Users/xd/Desktop/ieee/Region_5_Student_Branch_Counselors_and_Chairs.htm'
save_file = 'ieee_info_1'
html = urllib.request.urlopen(url).read()

soup = BeautifulSoup(html, "html.parser")

universities = soup.find_all('div', class_='spoName bullet pad-t15')
people = soup.find_all('div', class_='roster-results')

for u, p in zip(universities, people):
    info = p.find_all('p')
    university = u.get_text()
    name = info[0].get_text()
    if name == 'Position Vacant':
        continue
    title = info[2].get_text()
    address = info[3].get_text() + ', ' + info[4].get_text()
    email = info[-1].get_text()[7:]

    content = [[university, name, title, address, email]]
    list_name = ['university', 'name', 'title', 'address', 'email']
    data = pd.DataFrame(columns=list_name, data=content)
    data.to_csv("{}.csv".format(save_file), mode='a', index=False, header=False, encoding='utf-8')
About Me
XD
Goals determine what you are going to be.
Category
标签云
Base64 Bipartite diffusers InvalidArgumentError YOLO Clash TSV Cloudreve Proxy git Excel ONNX Anaconda Vmess CEIR 财报 LoRA CAM GPTQ MD5 Pickle UNIX EXCEL LeetCode Markdown Hilton mmap FP8 Tiktoken Linux 净利润 SAM Qwen2 C++ Paper 继承 OpenAI ModelScope 阿里云 Statistics VGG-16 Streamlit Python tar PyCharm Data TTS 飞书 Plotly v0.dev PDF Plate VPN Hungarian git-lfs 证件照 Github Video Miniforge Baidu Pytorch Random Website GoogLeNet Input Tensor TensorRT GGML CC CSV SPIE Transformers NLTK Animate HuggingFace FlashAttention Freesound Llama Translation AI Knowledge scipy Paddle VSCode logger RGB Vim FP64 BTC Jupyter CV Claude HaggingFace GIT Shortcut 搞笑 Permission Math 算法题 Numpy torchinfo Pandas Web Dataset Diagram ResNet-50 PIP Nginx printf CLAP Hotel Tracking Template Ubuntu Conda Augmentation WebCrawler Land FastAPI tqdm Ptyhon Image2Text uwsgi Google 报税 NameSilo Interview Windows XGBoost 关于博主 Bert 多线程 CTC RAR FP32 CUDA Qwen llama.cpp Datetime Magnet uWSGI NLP JSON TensorFlow 音频 Pillow COCO Qwen2.5 Gemma v2ray UI Disk 公式 OCR FP16 Mixtral GPT4 BF16 Zip QWEN LaTeX XML Bitcoin DeepSeek Django Review ChatGPT transformers PDB Domain API Sklearn Card Crawler Quantize LLM Logo Git SQLite LLAMA Password 腾讯云 Safetensors OpenCV Use Food 签证 域名 Firewall Attention 版权 DeepStream WAN Jetson Distillation Algorithm Bin Docker PyTorch Quantization BeautifulSoup SVR Breakpoint Michelin Color SQL Heatmap IndexTTS2 hf 多进程
站点统计

本站现有博文311篇,共被浏览743057

本站已经建立2383天!

热门文章
文章归档
回到顶部