Train XGBoost Model with Pandas Input| 东毅居士

Train XGBoost Model with Pandas Input

作者：XD / 发表： 2022年9月28日 03:58 / 更新： 2022年9月28日 03:58 / 编程笔记 / 阅读量：1973

Train XGBoost Model with Pandas Input

import warnings
warnings.filterwarnings("ignore")
import pandas as pd
import numpy as np
import xgboost as xgb
from sklearn.metrics import classification_report

train=pd.read_csv('./train.csv')
test=pd.read_csv('./test.csv')


info=pd.read_csv('info.csv')
print(info.head()) # column name
print(info.shape)
new_info = info.drop_duplicates(subset=['id']) # remove duplicate row with same id
train2=pd.merge(train, new_info[['id', 'number']], how='left', on='id').fillna(0) # merge table horizontally

train_y=train2['result']
train_x=train2.drop(columns=['uaid','result','others'])
test_id = test['id']
test_y=test['result']
test_x=test.drop(columns=['uaid','result','others'])


model = xgb.XGBClassifier()
model.fit(train_x, train_y)
train_predict_y = model.predict(train_x)
print(classification_report(train_y, train_predict_y))


result=model.predict_proba(test_x)
result=pd.concat([test_y,pd.DataFrame(result)],axis=1)
result.to_csv('./test_result.csv')

本文作者：XD 转载请标明出处：http://www.eadst.com/blog/132

本站采用知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议进行许可。

上一篇
C++: Load Raw Data and Convert to Mat

下一篇
Paddle: paddle.multiply and paddle.matmul

相关标签

XGBoost Pandas Python

About Me

XD

Goals determine what you are going to be.

Category

标签云

uWSGI YOLO Distillation Land Food HaggingFace 阿里云 QWEN Git Review FP32 Claude uwsgi Domain hf BF16 Llama Hungarian XML Excel Heatmap SVR Numpy CEIR VPN FP16 Tensor DeepSeek Michelin Tracking Interview Tiktoken Vmess UNIX SQLite GoogLeNet Docker DeepStream Google OpenAI MD5 FP8 Paddle Datetime Windows 签证 Streamlit GPTQ 公式 LaTeX C++ Disk Base64 TensorFlow COCO CAM NLP 腾讯云 torchinfo VGG-16 CV Django Safetensors Ubuntu ONNX Jupyter Qwen2 FastAPI Quantization RGB 多线程 Quantize PyCharm Dataset LoRA 关于博主 transformers SQL logger Nginx Hotel Magnet ResNet-50 Proxy 飞书 VSCode BTC git-lfs Pickle NLTK Conda ChatGPT LLM Crawler Random v0.dev EXCEL Permission Use API Linux Video Markdown Breakpoint Ptyhon OpenCV printf PDB Baidu diffusers NameSilo llama.cpp 报税 Algorithm Cloudreve FP64 CUDA Knowledge Plate WebCrawler PDF Python Plotly OCR GIT 算法题 Clash ModelScope TSV Transformers Bert Web Bin Shortcut Card InvalidArgumentError Gemma Diagram Template Augmentation tqdm Pandas Sklearn Zip Data scipy v2ray Logo LeetCode CSV Color Anaconda LLAMA 证件照 Hilton tar Jetson Input Image2Text PyTorch HuggingFace Mixtral Bitcoin XGBoost Website 域名 BeautifulSoup Attention Firewall JSON Bipartite Math RAR Pytorch AI 财报 SPIE UI git FlashAttention Vim Translation 多进程 Github Qwen Qwen2.5 GPT4 CTC PIP Statistics 搞笑 GGML TensorRT Pillow Paper Password mmap 净利润

站点统计

本站现有博文298篇,共被浏览669430次

本站已经建立2259天!

热门文章

文章归档