这个任务是使用R语言,根据租房者,物业和评论来预测租金价格

Predicting Rental Price
Predict rental price using data on renter, property and reviews.
Description
People interested in renting an apartment or home, share information about themselves and their property on Airbnb. Those who end up renting the property share their experiences through reviews. The dataset contains information on 90 variables related to the property, host, and reviews for over 35,000 Airbnb rentals in New York.
Goal
Construct a model using the dataset supplied and use it to predict the price of a set of Airbnb rentals included in scoringData.csv.
Metric
Submissions will be evaluated based on RMSE (root mean squared error) (Wikipedia). Lower the RMSE,better the model.
Submission File
The submission file should be in text format (.csv) with only two columns, id and price. The price column must contain predicted price. Number of decimal places to use is up to you. The file should contain a header and have the following format:
Sample Code
Here is an illustration in R of how you can create a model, apply it to scoringData.csv and prepare a submission file (sample_submission.csv).
File description
• analysisData.csv: Data for building a model
• scoringData.csv: Use for applying predictings or scoring
• sample_submission.csv: Sample submission file in the correct format
File Format
Your submission should be in CSV format. You can upload this in a zip/gz/rar/7z archive, if you prefer.
Number of Predictions
We expect the solution file to have 9210 prediction rows. This file should have a header row. Please see sample submission file on the data page.
Notes:
1. Remember to use what we have learned in class so far: splitting the data, exploring variables to understand them, picking good variables/features and applying the right technique.
2. Here are a few things you can do to make your models more predictive:
• Transform the data: This includes approaches such as: collapse levels of a factor, transform nonnumeric variables to numeric format, impute missing values.
• Identify predictors: Examine correlations (cor), construct tables (tapply) or make charts (ggplot2) to look for relevant variables, i.e., good predictors of price. Experiment with reasonable non-linear transformations of variables (e.g., square or cube of a predictor)
• Use a good model: We have examined models such as regressions, trees, and forests, all of which can be used to model this data. Furthermore, you are free to identify more efficient packages to run these models. E.g., ranger, xgboost

 

预测租金价格
使用有关租户、房产和评论的数据预测租金价格。
描述
有兴趣出租公寓或房屋的人会在 Airbnb 上分享有关他们自己和房产的信息。 那些最终租房的人通过评论分享他们的经验。 该数据集包含纽约 35,000 多个 Airbnb 出租屋的 90 个变量的信息,这些变量与房产、房东和评论有关。
目标
使用提供的数据集构建模型,并使用它来预测 scoringData.csv 中包含的一组 Airbnb 租金的价格。
公制
提交的内容将根据 RMSE(均方根误差)(维基百科)进行评估。 降低 RMSE,更好的模型。
提交文件
提交文件应为文本格式 (.csv),只有两列,id 和 price。 价格列必须包含预测价格。 使用的小数位数由您决定。 该文件应包含标题并具有以下格式:
示例代码
下面是 R 中如何创建模型、将其应用于 scoringData.csv 并准备提交文件 (sample_submission.csv) 的说明。
文件描述
• analysisData.csv:用于构建模型的数据
• scoringData.csv:用于应用预测或评分
• sample_submission.csv:格式正确的样本提交文件
文件格式
您提交的内容应为 CSV 格式。 如果您愿意,可以将其上传到 zip/gz/rar/7z 存档中。
预测数量
我们期望解决方案文件具有 9210 个预测行。 这个文件应该有一个标题行。 请参阅数据页上的示例提交文件。
笔记:
1. 记住使用我们目前在课堂上学到的知识:拆分数据、探索变量以理解它们、选择好的变量/特征并应用正确的技术。
2. 这里有一些你可以做的事情来让你的模型更具预测性:
• 转换数据:这包括以下方法:折叠因子水平、将非数字变量转换为数字格式、估算缺失值。
• 识别预测变量:检查相关性 (cor)、构建表格 (tapply) 或制作图表 (ggplot2) 以寻找相关变量,即良好的价格预测变量。 尝试对变量进行合理的非线性变换(例如,预测变量的平方或立方)
• 使用好的模型:我们已经检查了回归、树木和森林等模型,所有这些模型都可以用来对这些数据建模。 此外,您可以自由地确定更有效的包来运行这些模型。 例如,游侠、xgboost