1. Create a jupyter notebook to perform all your work in.
2. Pick any business-related data set from Kaggle. Describe in 1 paragraph the dataset
and the “business use” you hope to use it for.
3. Using pandas and numpy, wrangle the data to make it manageable.
4. Using k-means and Hierarchical clustering, run clustering on the data set. Describe the
results, meaning, and what you learned from the data.
5. Use 5-fold cross-validation and perform classification on the data set using
DecisionTrees and Random Forest. Describe the results, what you learned about the
data, and how well your classifications performed in testing.
6. Repeat problem 5 but this time perform regression on your data using SVM and
XgBoost. Again describe your results, what you learned about the data, and how
accurate regression was.
7. Using the python visualization library of your choice, create relevant visualizations for the
above tasks and to otherwise help interpret your data. Provide some functional
descriptions of your visualization and what can be deduced from them.
8. Submit all of the above in one jupyter notebook along with your datasets
EasyDue™ 支持PayPal, AliPay, WechatPay, Taobao等各种付款方式!
E-mail: firstname.lastname@example.org 微信:easydue