EADST

Extract Webpage Information with Python

Here is the python program to extract webpage information with BeautifulSoup and save the data in a CSV file.

from bs4 import BeautifulSoup
import urllib.request
import pandas as pd

url = 'file:///Users/xd/Desktop/ieee/Region_5_Student_Branch_Counselors_and_Chairs.htm'
save_file = 'ieee_info_1'
html = urllib.request.urlopen(url).read()

soup = BeautifulSoup(html, "html.parser")

universities = soup.find_all('div', class_='spoName bullet pad-t15')
people = soup.find_all('div', class_='roster-results')

for u, p in zip(universities, people):
    info = p.find_all('p')
    university = u.get_text()
    name = info[0].get_text()
    if name == 'Position Vacant':
        continue
    title = info[2].get_text()
    address = info[3].get_text() + ', ' + info[4].get_text()
    email = info[-1].get_text()[7:]

    content = [[university, name, title, address, email]]
    list_name = ['university', 'name', 'title', 'address', 'email']
    data = pd.DataFrame(columns=list_name, data=content)
    data.to_csv("{}.csv".format(save_file), mode='a', index=False, header=False, encoding='utf-8')
About Me
XD
Goals determine what you are going to be.
Category
标签云
QWEN CUDA 强化学习 Plotly v2ray Template Qwen2 Vmess Proxy Video Conda Quantize mmap Qwen2.5 TensorRT OCR InvalidArgumentError transformers Interview Password CEIR LLM XGBoost Food Tiktoken 财报 GGML Pickle Miniforge UNIX NLTK Ubuntu FP32 Heatmap Knowledge UI CSV Zip Breakpoint 阿里云 Docker GPTQ 关于博主 Use CC Bert Github LLAMA HuggingFace TensorFlow Dataset CAM Augmentation FP8 Nginx Tracking hf GPT4 Web C++ Python logger XML WAN Website 算法题 Baidu EXCEL 顶会 Pillow Pandas scipy Bitcoin Michelin uwsgi 搞笑 Anaconda 净利润 Git Plate Domain 多线程 PyCharm Tensor Random Firewall Agent v0.dev 递归学习法 Algorithm BeautifulSoup Bipartite 多进程 Bin Input git LoRA 签证 云服务器 报税 Jetson ModelScope Data Ptyhon Review Freesound Hotel HaggingFace PIP Search SQL Permission Diagram CTC uWSGI Google Mixtral Markdown Attention Rebuttal Base64 Hilton Datetime torchinfo SAM Math JSON Linux Crawler Animate Statistics IndexTTS2 DeepSeek icon Clash Magnet ONNX Image2Text Shortcut YOLO Paddle Llama SQLite CV NLP 飞书 WebCrawler Land PDB llama.cpp 音频 域名 Claude Sklearn VGG-16 Jupyter MD5 VSCode 公式 News 证件照 SPIE FlashAttention Hungarian Qwen Color Transformers 继承 OpenCV 版权 Logo ChatGPT SVR Gemma Card Distillation 图形思考法 第一性原理 Django printf TTS NameSilo LeetCode BTC RAR Paper API RGB FastAPI Quantization Vim FP64 tar GoogLeNet LaTeX FP16 AI 腾讯云 COCO CLAP ResNet-50 PyTorch Pytorch 图标 TSV DeepStream OpenAI Numpy Disk BF16 git-lfs Safetensors Translation tqdm Excel PDF diffusers Windows GIT Streamlit Cloudreve VPN
站点统计

本站现有博文324篇,共被浏览817262

本站已经建立2522天!

热门文章
文章归档
回到顶部