EADST

Extract Webpage Information with Python

Here is the python program to extract webpage information with BeautifulSoup and save the data in a CSV file.

from bs4 import BeautifulSoup
import urllib.request
import pandas as pd

url = 'file:///Users/xd/Desktop/ieee/Region_5_Student_Branch_Counselors_and_Chairs.htm'
save_file = 'ieee_info_1'
html = urllib.request.urlopen(url).read()

soup = BeautifulSoup(html, "html.parser")

universities = soup.find_all('div', class_='spoName bullet pad-t15')
people = soup.find_all('div', class_='roster-results')

for u, p in zip(universities, people):
    info = p.find_all('p')
    university = u.get_text()
    name = info[0].get_text()
    if name == 'Position Vacant':
        continue
    title = info[2].get_text()
    address = info[3].get_text() + ', ' + info[4].get_text()
    email = info[-1].get_text()[7:]

    content = [[university, name, title, address, email]]
    list_name = ['university', 'name', 'title', 'address', 'email']
    data = pd.DataFrame(columns=list_name, data=content)
    data.to_csv("{}.csv".format(save_file), mode='a', index=False, header=False, encoding='utf-8')
About Me
XD
Goals determine what you are going to be.
Category
标签云
MD5 VGG-16 Firewall 多线程 飞书 Hotel ChatGPT News Streamlit Attention SVR UNIX Base64 ModelScope Math ONNX Mixtral Plotly GPTQ FlashAttention git-lfs Qwen2.5 Diagram Proxy Web Algorithm 音频 Disk Ubuntu Dataset CLAP Bin Zip v2ray LLM LeetCode Input Logo InvalidArgumentError Hungarian DeepSeek HuggingFace Template 顶会 GPT4 Bitcoin Review Vmess 继承 第一性原理 Linux GGML Vim Heatmap OCR TTS 算法题 关于博主 HaggingFace Git logger SPIE Quantization BF16 XML Baidu LoRA Excel 搞笑 Python 财报 图形思考法 CSV git printf TensorRT Paddle WAN Pandas scipy Breakpoint Tensor SQL CTC Knowledge 版权 BTC PyTorch API transformers Google JSON 多进程 PyCharm Distillation RGB CAM Github BeautifulSoup Anaconda Website Color FP32 Password diffusers YOLO Interview Gemma GIT Cloudreve tar Qwen Image2Text ResNet-50 WebCrawler Shortcut CC uwsgi NLP Safetensors VPN Markdown torchinfo Llama Random CEIR tqdm Search PIP 公式 Datetime Freesound Data FastAPI Land Animate CV VSCode llama.cpp Hilton Food SQLite AI Michelin FP64 COCO Use Domain Nginx LLAMA Sklearn Video Clash uWSGI PDB Magnet 报税 强化学习 签证 UI Windows 域名 NameSilo TensorFlow Card Paper Bipartite Pickle EXCEL LaTeX hf Numpy v0.dev Conda TSV 证件照 Jetson Pillow GoogLeNet Pytorch C++ Qwen2 Augmentation IndexTTS2 Agent 阿里云 DeepStream 腾讯云 Crawler Docker Plate CUDA QWEN NLTK Jupyter RAR XGBoost FP16 Claude Miniforge Django Transformers mmap 净利润 SAM Ptyhon Permission FP8 OpenCV 云服务器 OpenAI Tracking Quantize Translation Statistics PDF Tiktoken Bert 递归学习法
站点统计

本站现有博文321篇,共被浏览768089

本站已经建立2452天!

热门文章
文章归档
回到顶部