本次北美统计代写主要是R语言实现手写手别模型

Stat 432 Homework 11

＃＃操作说明

-您需要提交两个文件：
-您的.rmd RMarkdown（或Python）文件，应另存为HWx_yourNetID.Rmd。例如，`HW1_rqzhu.Rmd`。
-将RMarkdown文件编织为“ HW1_yourNetID.pdf”的结果。例如，“ HW1_rqzhu.pdf”。请注意，该文件必须是`.pdf`文件。无法接受`.html`格式。

-在报告中包括您的姓名和NetID。
-如果您将此文件或示例作业“ .Rmd”文件用作模板，请务必删除此说明部分。
-您的`.Rmd`文件应这样写：如果将它放置在包含您使用过的任何数据的文件夹中，则该文件可以正确编织而无需修改。
-确保正确设置种子，以便可以复制结果。
-对于某些问题，可以使用的软件包会有限制。请仔细阅读要求。

##问题1 [100分]提升

我们将再次使用来自“ ElemStatLearn”包的手写数字识别数据。我们仅考虑带有预定义的zip.train和zip.test的训练测试拆分。我们再次仅考虑两个数字：2和4。我们将使用交叉验证，使用训练数据选择最佳调整，然后根据测试数据评估最终模型。对于这个问题，请使用xgboost软件包（R和Python均提供），该软件包是boosting算法的快速实现。例如，`xgb.cv（）`实现了交叉验证的版本，可用于调整参数。有关更多详细信息，您需要阅读该软件包的文档

完成此问题时，必须考虑以下事项：

*这是一个分类问题，因此您需要为此问题指定适当的模型。
*使用两种不同的基础学习器：线性学习器和树学习器。对于树型学习者，您应该调整最大深度（选择两个不同的值）。这些调整参数应使用参数“ params”指定，该参数应为列表。您可能会发现[此文档]很有用。
*还可以通过选择三个不同的值来调整学习率。
*您需要考虑的另一件事是选择最佳调音的标准。我们通常使用误分类错误作为标准。但是，对于这个问题，让我们使用之前实践的AUC标准。同样，这可以在`xgb.cv（）`函数中指定。

拟合模型后，在测试数据上报告最终模型的预测准确性。

“`{r}
＃手写数字识别数据
图书馆（ElemStatLearn）

＃这是训练数据！
暗（zip.test）
火车= zip.test

＃这是测试数据！
暗（zip.train）
测试= zip.train
“`

## Instruction

– You are required to submit two files:
– Your `.rmd` RMarkdown (or Python) file, which should be saved as `HWx_yourNetID.Rmd`. For example, `HW1_rqzhu.Rmd`.
– The result of knitting your RMarkdown file as `HW1_yourNetID.pdf`. For example, `HW1_rqzhu.pdf`. Please note that this must be a `.pdf` file. `.html` format cannot be accepted.

– Include your Name and NetID in your report.
– If you use this file or the example homework `.Rmd` file as a template, be sure to remove this instruction section.
– Your `.Rmd` file should be written such that, if it is placed in a folder with any data you utilize, it will knit properly without modification.
– Make sure that you set seed properly so that the results can be replicated.
– For some questions, there will be restrictions on what packages you can use. Please read the requirements carefully.

## Question 1 [100 Points] Boosting

We will use the handwritten digit recognition data again from the `ElemStatLearn` package. We only consider the train-test split, with the pre-defined `zip.train` and `zip.test`. We again only consider the two digits: 2 and 4. We will use cross-validation to choose the best tuning using the training data, then evaluate the final model on the testing data. For this question, use the `xgboost` package (available in both R and Python), which is a fast implementation of the boosting algorithm. For example, the `xgb.cv()` implements a cross-validated version, and can be used to tune parameters. For more details, you need to read the documentations of the package

When completing this question, you must consider the following:

* This is a classification problem, so you need to specify the appropriate model for this question.
* Use two different base learners: linear and tree. For the tree base learner, you should tune the maximum depth (choose two different values). These tuning parameters should be specified using the argument `params`, which should be a list. You may find [this document]
* Another thing you need to consider is the criteria for selecting the best tuning. We normally use the mis-classification error as the criteria. However, for this question, let’s use the AUC criteria which we practices before. Again, this can be specified in the `xgb.cv()` function.

After fitting the model,report the prediction accuracy of your final model on the testing data.

“`{r}
# Handwritten Digit Recognition Data
library(ElemStatLearn)

# this is the training data!
dim(zip.test)
train = zip.test

# this is the testing data!
dim(zip.train)
test = zip.train
“`

R语言代写 | Stat 432 Homework 11

于2021-04-262021-04-26由easydue发布

代写案例

COMP2400课程介绍：计算机系统核心课程详解

代写案例

商科代写｜MKF2111 BUYER BEHAVIOUR PRACTICAL APPLICATIONS PART 2

代写案例

金融代写｜ACFI814 International Finance Coursework

R语言代写 | Stat 432 Homework 11

于2021-04-262021-04-26由easydue发布

相关文章

代写案例

COMP2400课程介绍： 计算机系统核心课程详解

代写案例

商科代写｜MKF2111 BUYER BEHAVIOUR PRACTICAL APPLICATIONS PART 2

代写案例

金融代写｜ACFI814 International Finance Coursework

COMP2400课程介绍：计算机系统核心课程详解