EADST

Extract Webpage Information with Python

Here is the python program to extract webpage information with BeautifulSoup and save the data in a CSV file.

from bs4 import BeautifulSoup
import urllib.request
import pandas as pd

url = 'file:///Users/xd/Desktop/ieee/Region_5_Student_Branch_Counselors_and_Chairs.htm'
save_file = 'ieee_info_1'
html = urllib.request.urlopen(url).read()

soup = BeautifulSoup(html, "html.parser")

universities = soup.find_all('div', class_='spoName bullet pad-t15')
people = soup.find_all('div', class_='roster-results')

for u, p in zip(universities, people):
    info = p.find_all('p')
    university = u.get_text()
    name = info[0].get_text()
    if name == 'Position Vacant':
        continue
    title = info[2].get_text()
    address = info[3].get_text() + ', ' + info[4].get_text()
    email = info[-1].get_text()[7:]

    content = [[university, name, title, address, email]]
    list_name = ['university', 'name', 'title', 'address', 'email']
    data = pd.DataFrame(columns=list_name, data=content)
    data.to_csv("{}.csv".format(save_file), mode='a', index=False, header=False, encoding='utf-8')
About Me
XD
Goals determine what you are going to be.
Category
标签云
DeepStream OCR CC Vmess 公式 PDB mmap TensorRT scipy Baidu InvalidArgumentError v0.dev Knowledge XML 财报 Translation DeepSeek Logo 关于博主 域名 OpenCV Numpy FP64 Gemma Bert Animate Domain Template Land Python LLM 报税 SQLite 音频 AI Datetime 签证 Video Jetson CSV TTS transformers Search Ptyhon Augmentation Plotly Vim hf Linux CV Heatmap Plate torchinfo 阿里云 Michelin 多进程 GoogLeNet Quantize Github News Hilton EXCEL Random Ubuntu YOLO Interview LeetCode Sklearn CUDA Zip Crawler Review Miniforge Django CLAP 证件照 Llama ChatGPT 第一性原理 Shortcut logger LoRA PDF Tracking Dataset XGBoost ONNX Freesound VSCode 净利润 Bin PyTorch 腾讯云 Math BTC OpenAI diffusers Quantization SAM API Mixtral FlashAttention BeautifulSoup Bitcoin WAN FP16 UNIX GPT4 LLAMA VPN 云服务器 Website Qwen2.5 Pandas NLTK RAR Distillation Firewall Qwen Windows ResNet-50 Hotel 多线程 BF16 CTC Color Markdown HaggingFace FastAPI Transformers Google Cloudreve Data 继承 NLP Claude Tensor C++ 版权 FP32 Git TSV ModelScope NameSilo llama.cpp Qwen2 FP8 Attention WebCrawler GGML GPTQ RGB HuggingFace CEIR Paper Anaconda tar 图形思考法 git PIP Food QWEN uwsgi PyCharm Card Base64 tqdm v2ray Agent VGG-16 SVR Algorithm Magnet GIT Paddle JSON Statistics 飞书 printf Nginx 递归学习法 Pillow COCO LaTeX Proxy Pickle Conda Password TensorFlow Permission MD5 Bipartite Hungarian 算法题 Image2Text 顶会 Safetensors Docker Streamlit Pytorch Use Input Disk 强化学习 UI SQL Excel Clash 搞笑 SPIE uWSGI Web Breakpoint Diagram Jupyter git-lfs IndexTTS2 CAM Tiktoken
站点统计

本站现有博文321篇,共被浏览780316

本站已经建立2473天!

热门文章
文章归档
回到顶部