Predicting Rental Price
Predict rental price using data on renter, property and reviews.
People interested in renting an apartment or home, share information about themselves and their property
on Airbnb. Those who end up renting the property share their experiences through reviews. The dataset
contains information on 90 variables related to the property, host, and reviews for over 35,000 Airbnb rentals
in New York.
Construct a model using the dataset supplied and use it to predict the price of a set of Airbnb rentals included
Submissions will be evaluated based on RMSE (root mean squared error) (Wikipedia). Lower the RMSE,
better the model.
The submission file should be in text format (.csv) with only two columns, id and price. The price column
must contain predicted price. Number of decimal places to use is up to you. The file should contain a header
and have the following format:
Here is an illustration in R of how you can create a model, apply it to scoringData.csv and prepare a
submission file (sample_submission.csv).
• analysisData.csv: Data for building a model
• scoringData.csv: Use for applying predictings or scoring
• sample_submission.csv: Sample submission file in the correct format
Your submission should be in CSV format. You can upload this in a zip/gz/rar/7z archive, if you prefer.
Number of Predictions
We expect the solution file to have 9210 prediction rows. This file should have a header row. Please see
sample submission file on the data page.
1. Remember to use what we have learned in class so far: splitting the data, exploring variables to
understand them, picking good variables/features and applying the right technique.
2. Here are a few things you can do to make your models more predictive:
• Transform the data: This includes approaches such as: collapse levels of a factor, transform nonnumeric variables to numeric format, impute missing values.
• Identify predictors: Examine correlations (cor), construct tables (tapply) or make charts (ggplot2) to
look for relevant variables, i.e., good predictors of price. Experiment with reasonable non-linear
transformations of variables (e.g., square or cube of a predictor)
• Use a good model: We have examined models such as regressions, trees, and forests, all of which
can be used to model this data. Furthermore, you are free to identify more efficient packages to run
these models. E.g., ranger, xgboost
EasyDue™ 支持PayPal, AliPay, WechatPay, Taobao等各种付款方式!
E-mail: firstname.lastname@example.org 微信:easydue