EADST

Extract Webpage Information with Python

Here is the python program to extract webpage information with BeautifulSoup and save the data in a CSV file.

from bs4 import BeautifulSoup
import urllib.request
import pandas as pd

url = 'file:///Users/xd/Desktop/ieee/Region_5_Student_Branch_Counselors_and_Chairs.htm'
save_file = 'ieee_info_1'
html = urllib.request.urlopen(url).read()

soup = BeautifulSoup(html, "html.parser")

universities = soup.find_all('div', class_='spoName bullet pad-t15')
people = soup.find_all('div', class_='roster-results')

for u, p in zip(universities, people):
    info = p.find_all('p')
    university = u.get_text()
    name = info[0].get_text()
    if name == 'Position Vacant':
        continue
    title = info[2].get_text()
    address = info[3].get_text() + ', ' + info[4].get_text()
    email = info[-1].get_text()[7:]

    content = [[university, name, title, address, email]]
    list_name = ['university', 'name', 'title', 'address', 'email']
    data = pd.DataFrame(columns=list_name, data=content)
    data.to_csv("{}.csv".format(save_file), mode='a', index=False, header=False, encoding='utf-8')
About Me
XD
Goals determine what you are going to be.
Category
标签云
torchinfo 音频 icon uwsgi Translation Vim 版权 财报 git Shortcut NLP Docker Statistics 图标 报税 GoogLeNet mmap logger Interview RAR Magnet Hotel scipy NLTK Domain OpenAI BTC 递归学习法 Dataset TensorRT Animate CV OCR CTC Bert 阿里云 tqdm Jetson Plotly CSV CEIR NameSilo COCO BF16 Streamlit 算法题 Diagram ONNX Permission News SQLite Pickle diffusers Qwen2.5 Augmentation Paddle Tracking hf GGML ModelScope Random Numpy 图形思考法 Sklearn Crawler VGG-16 HuggingFace Github 关于博主 Baidu Pillow LLM Quantization CAM Bin Heatmap Rebuttal PyTorch VSCode Review Llama Algorithm Knowledge SVR Proxy 域名 腾讯云 GPTQ OpenCV 搞笑 继承 v2ray ResNet-50 证件照 Qwen Claude TTS FP8 Tiktoken Miniforge Windows git-lfs DeepSeek 云服务器 飞书 Bitcoin Food Web BeautifulSoup Hungarian Jupyter Michelin Search UNIX CC GIT API Color transformers JSON Nginx Base64 Vmess Datetime Attention SAM Use SQL Input Logo QWEN Zip Cloudreve LoRA Paper Plate UI uWSGI AI XGBoost Disk Conda InvalidArgumentError WebCrawler Firewall Distillation CUDA 多线程 printf FP64 Pandas PDB Website Tensor Hilton Breakpoint llama.cpp 强化学习 Transformers Linux LaTeX Image2Text FlashAttention TSV Safetensors Land Pytorch Card Mixtral Data CLAP TensorFlow LeetCode Template PDF Python Clash VPN Math FP16 PyCharm FP32 签证 RGB LLAMA 多进程 Qwen2 顶会 FastAPI Anaconda SPIE ChatGPT Markdown Password Git WAN GPT4 Google Video PIP Ubuntu Freesound 第一性原理 IndexTTS2 Gemma YOLO 净利润 Ptyhon Agent Django Excel XML v0.dev Bipartite 公式 EXCEL Quantize MD5 tar DeepStream HaggingFace C++
站点统计

本站现有博文323篇,共被浏览800474

本站已经建立2499天!

热门文章
文章归档
回到顶部