EADST

Extract Webpage Information with Python

Here is the python program to extract webpage information with BeautifulSoup and save the data in a CSV file.

from bs4 import BeautifulSoup
import urllib.request
import pandas as pd

url = 'file:///Users/xd/Desktop/ieee/Region_5_Student_Branch_Counselors_and_Chairs.htm'
save_file = 'ieee_info_1'
html = urllib.request.urlopen(url).read()

soup = BeautifulSoup(html, "html.parser")

universities = soup.find_all('div', class_='spoName bullet pad-t15')
people = soup.find_all('div', class_='roster-results')

for u, p in zip(universities, people):
    info = p.find_all('p')
    university = u.get_text()
    name = info[0].get_text()
    if name == 'Position Vacant':
        continue
    title = info[2].get_text()
    address = info[3].get_text() + ', ' + info[4].get_text()
    email = info[-1].get_text()[7:]

    content = [[university, name, title, address, email]]
    list_name = ['university', 'name', 'title', 'address', 'email']
    data = pd.DataFrame(columns=list_name, data=content)
    data.to_csv("{}.csv".format(save_file), mode='a', index=False, header=False, encoding='utf-8')
About Me
XD
Goals determine what you are going to be.
Category
标签云
SVR Video YOLO Pickle CC Jupyter Attention LLM Numpy Statistics HuggingFace WebCrawler BTC FlashAttention Review XGBoost Knowledge NLTK RAR Jetson 公式 NameSilo hf Card Google GGML 证件照 CUDA 报税 Zip Paper Translation 净利润 Qwen FP8 Hotel Plate Vim Docker FastAPI Michelin printf git-lfs Baidu uwsgi 搞笑 COCO diffusers OpenCV 多线程 Freesound CAM BF16 OpenAI Miniforge Food Web Diagram Augmentation OCR mmap Windows PDB BeautifulSoup QWEN SQL AI Bitcoin Quantization llama.cpp 关于博主 Claude Animate JSON DeepStream TTS Python WAN Clash Ubuntu FP64 Bipartite Tracking Dataset Qwen2 Tensor Anaconda LaTeX Bert DeepSeek Agent Proxy Firewall 腾讯云 CTC Hungarian Pytorch InvalidArgumentError VPN TensorFlow ChatGPT v0.dev Conda API XML C++ Base64 logger Distillation Use Password tqdm tar UI Mixtral VGG-16 Gemma CSV IndexTTS2 Template UNIX Website 算法题 GIT Domain SQLite LoRA Quantize git Random Land Tiktoken Pandas FP16 财报 图形思考法 TensorRT LeetCode PDF Qwen2.5 MD5 Github Interview 递归学习法 版权 音频 Hilton Git v2ray scipy Crawler SAM uWSGI Ptyhon Magnet Permission Safetensors 第一性原理 RGB 飞书 TSV torchinfo Shortcut VSCode Logo ResNet-50 Nginx LLAMA Llama ModelScope FP32 Markdown CLAP Input HaggingFace Cloudreve Disk GoogLeNet NLP GPT4 Vmess PyTorch 继承 强化学习 多进程 Math CEIR Streamlit 签证 PyCharm ONNX Data CV transformers Plotly Breakpoint Datetime Algorithm Bin Paddle 阿里云 Heatmap SPIE Transformers Pillow Image2Text Sklearn Linux PIP 顶会 Color 域名 Excel Search EXCEL Django GPTQ
站点统计

本站现有博文319篇,共被浏览751754

本站已经建立2408天!

热门文章
文章归档
回到顶部