EADST

Extract Webpage Information with Python

Here is the python program to extract webpage information with BeautifulSoup and save the data in a CSV file.

from bs4 import BeautifulSoup
import urllib.request
import pandas as pd

url = 'file:///Users/xd/Desktop/ieee/Region_5_Student_Branch_Counselors_and_Chairs.htm'
save_file = 'ieee_info_1'
html = urllib.request.urlopen(url).read()

soup = BeautifulSoup(html, "html.parser")

universities = soup.find_all('div', class_='spoName bullet pad-t15')
people = soup.find_all('div', class_='roster-results')

for u, p in zip(universities, people):
    info = p.find_all('p')
    university = u.get_text()
    name = info[0].get_text()
    if name == 'Position Vacant':
        continue
    title = info[2].get_text()
    address = info[3].get_text() + ', ' + info[4].get_text()
    email = info[-1].get_text()[7:]

    content = [[university, name, title, address, email]]
    list_name = ['university', 'name', 'title', 'address', 'email']
    data = pd.DataFrame(columns=list_name, data=content)
    data.to_csv("{}.csv".format(save_file), mode='a', index=False, header=False, encoding='utf-8')
About Me
XD
Goals determine what you are going to be.
Category
标签云
LeetCode FP8 Tensor Breakpoint Animate Michelin API TensorFlow NLP Bin transformers Hilton Plate TSV 证件照 HaggingFace Tiktoken Plotly Math printf Python BeautifulSoup UNIX Baidu 净利润 Transformers VSCode PDB 算法题 Mixtral Github XGBoost Knowledge AI 多进程 Tracking 多线程 BTC Domain OpenAI Paper ChatGPT Image2Text 关于博主 Use COCO CTC uWSGI Clash Safetensors 域名 logger Quantization FP64 NameSilo Qwen2.5 Land YOLO 腾讯云 Base64 tqdm tar Quantize Dataset Paddle FP16 Docker SPIE Jupyter Disk C++ CSV Translation GPTQ Anaconda WebCrawler Jetson Color VPN Hungarian 公式 Markdown Claude Augmentation v2ray Pickle Password mmap Qwen GPT4 IndexTTS2 RGB Nginx Vmess VGG-16 SQLite FlashAttention SAM Proxy CEIR PyCharm Cloudreve Freesound Google BF16 SVR diffusers 签证 HuggingFace ModelScope Crawler OCR git Logo RAR Shortcut Video Food CC Ubuntu Interview Django 版权 UI GIT Magnet OpenCV Diagram Vim Input Statistics 继承 Datetime WAN Numpy PDF MD5 Llama Ptyhon Qwen2 财报 Miniforge 搞笑 Template InvalidArgumentError Linux LLM EXCEL NLTK Excel Review 报税 LLAMA LaTeX Distillation XML Card DeepSeek ONNX GGML 阿里云 Git Website PIP Bipartite Firewall LoRA Permission Algorithm Windows Hotel v0.dev Random CLAP TTS Pillow DeepStream Heatmap uwsgi Gemma ResNet-50 CV Sklearn CAM Bert Web Pandas PyTorch Streamlit QWEN TensorRT 飞书 GoogLeNet Attention scipy CUDA 音频 llama.cpp Zip FastAPI torchinfo SQL git-lfs Bitcoin Data Pytorch Conda hf FP32 JSON
站点统计

本站现有博文309篇,共被浏览733175

本站已经建立2371天!

热门文章
文章归档
回到顶部