EADST

Extract Webpage Information with Python

Here is the python program to extract webpage information with BeautifulSoup and save the data in a CSV file.

from bs4 import BeautifulSoup
import urllib.request
import pandas as pd

url = 'file:///Users/xd/Desktop/ieee/Region_5_Student_Branch_Counselors_and_Chairs.htm'
save_file = 'ieee_info_1'
html = urllib.request.urlopen(url).read()

soup = BeautifulSoup(html, "html.parser")

universities = soup.find_all('div', class_='spoName bullet pad-t15')
people = soup.find_all('div', class_='roster-results')

for u, p in zip(universities, people):
    info = p.find_all('p')
    university = u.get_text()
    name = info[0].get_text()
    if name == 'Position Vacant':
        continue
    title = info[2].get_text()
    address = info[3].get_text() + ', ' + info[4].get_text()
    email = info[-1].get_text()[7:]

    content = [[university, name, title, address, email]]
    list_name = ['university', 'name', 'title', 'address', 'email']
    data = pd.DataFrame(columns=list_name, data=content)
    data.to_csv("{}.csv".format(save_file), mode='a', index=False, header=False, encoding='utf-8')
About Me
XD
Goals determine what you are going to be.
Category
标签云
Distillation Shortcut 公式 多线程 FlashAttention scipy Input Card Google Breakpoint Cloudreve tar TTS Template LoRA Sklearn VSCode BF16 Paper Baidu Qwen Docker Use Mixtral Freesound Attention InvalidArgumentError GPT4 Pytorch 飞书 Password v2ray Windows 搞笑 多进程 关于博主 ONNX MD5 Permission LLM UI diffusers GoogLeNet Interview tqdm git LLAMA Safetensors CUDA Ubuntu LaTeX Web ModelScope Anaconda Linux LeetCode AI CLAP SAM 净利润 GPTQ CEIR Search QWEN Random Datetime Bin RAR SQLite SVR Quantization 继承 VGG-16 VPN DeepStream Agent HaggingFace RGB logger Base64 uWSGI CV Plate v0.dev EXCEL WAN 腾讯云 递归学习法 XGBoost Math JSON Knowledge COCO Pickle torchinfo Translation Image2Text Pandas Qwen2.5 Llama Jupyter Bitcoin Pillow API Color TensorFlow Ptyhon Diagram transformers BTC uwsgi Magnet WebCrawler 强化学习 SQL Python FP16 XML 报税 Website Streamlit Disk GGML Food Github TSV 财报 Firewall 证件照 NLTK Qwen2 Numpy 版权 Land NameSilo Gemma Hotel YOLO UNIX FP32 CC IndexTTS2 OpenCV BeautifulSoup FastAPI PIP PyTorch 域名 Plotly C++ 云服务器 Bert GIT Conda Paddle Transformers Augmentation printf Django Video Dataset git-lfs TensorRT PDB Quantize ChatGPT 签证 PDF Proxy Domain mmap Michelin OpenAI FP64 Miniforge 音频 Clash Logo Git HuggingFace 顶会 Bipartite PyCharm Data Tensor Tracking 第一性原理 NLP OCR Review Animate Tiktoken Crawler Zip Excel 阿里云 DeepSeek Claude llama.cpp Vmess ResNet-50 Vim News Markdown SPIE 图形思考法 Statistics FP8 CSV Hungarian CTC Hilton 算法题 CAM Heatmap Algorithm Nginx Jetson hf
站点统计

本站现有博文321篇,共被浏览780692

本站已经建立2473天!

热门文章
文章归档
回到顶部