本次北美统计代写为R语言数据管理工具

STAT 440 Spring 2021 Final Project

必须完成作为数据管理工具的最终项目。您将创建一个用户定义的函数,名为
使用您首选的编程语言(R或Python)的数据管理工具。您的功能必须
根据本课程所涵盖的概念和想法成为您自己的代码。您的工具应允许三个
输入:i。数据文件位置(URL); ii。数据验证策略或数据清除方法的类型
需要应用于数据的内容; iii。数据验证策略所针对的列号或
应采用数据清理方法。
您的工具应允许导入各种数据,定界符和文件扩展名的文件结构
网址。
您的工具还应允许采用数据验证策略(1-3)和清洁方法(1-6)
按名称和编号执行。
例如,函数的自变量应允许“策略1”作为输入,然后执行
过滤并排列已被处理的数据的数字列之一(在输入iii。中指定)。
进口的。究竟需要过滤指定列(输入iii。)的值或级别是由您决定。
究竟哪种类型(升序还是降序)取决于您。无论如何,指定的列
必须进行过滤,并且必须按指定的列对数据进行排列(排序)。对于单个指定
列号,您的函数应允许该数字可以是任何一位数字作为输入。
为了评估学生创建的工具,课程工作人员将应用4个不同的数据集,这些数据正在不断增加
如果工具成功返回所需的输出,则难度会更高。例如,如果
学生的工具将数据集1,“策略2”和2的网址作为输入,并成功返回了频次
数据集1第二列的表格中,那么学生的最终项目得分为40分(满分10分)。
数据集的复杂性将增加,并且每个数据集的列号不必相同
困难的数据集。
明智的做法是将本课程中涵盖的先前数据集视为适用于您数据的合理测试数据
管理工具。该部分的学生可以一起工作(本科生和研究生
学生)(以2人为一组)或作为个人。不允许3人或3人以上的团体。
关于工具制作和用户定义功能的第14周笔记将很有帮助。如果等不了那么久
欢迎您查看以下参考资料:
•STAT 385 R注意“矢量化和用户定义的功能”
•G. Grolemund使用R进行动手编程的第2章“非常基础”
•Python中的用户定义函数

The final project, which is a data management tool, must be completed and submitted into your individual
student repo by 11:59 pm Sunday May 09, 2021. You are going to create a single user-defined function, called
a data management tool, in your preferred programming language (either R or Python). Your function must
be your own code based on the concepts and ideas covered in this course. Your tool should allow for three
inputs: i. the data file location (a URL), ii. the type of data validation strategy or data cleaning approach
that needs to be applied to the data, and iii. the column number for which the data validation strategy or
data cleaning approach should be applied.
Your tool should allow for various file structures of data, delimiters, and file extensions to be imported with
a URL.
Your tool should also allow for the data validating strategies (1-3) and cleaning approaches (1-6) to be
executed by name and number.
For example, an argument of your function should allow for “Strategy 1” as input which then performs
filtering and arranging on one of the numeric columns (specified in input iii.) of the data that has been
imported. Exactly what values or levels of the specified column (input iii.) need to be filtered is up to you.
Exactly which type of sorting – ascending or descending – is up to you. Regardless, that specified column
must be filtered and the data must be arranged (sorted) by that specified column. For a single specified
column number, your function should allow the number to be any single digit as input.
To assess the tool that students create, the course staff will apply 4 different datasets that are of increasing
difficulty which result in a higher grade if the tool successfully returns the desired output. For example, if
student’s tool takes the URL of dataset1, “Strategy 2”, and 2 as inputs and successfully returns a frequency
table for the second column of dataset1, then student’s final project score is 10 points out of 40 points.
Datasets will increase in complexity and the column number need not be the same for each increasingly
difficult dataset.
It would be wise to consider previous datasets covered in this course as reasonable test data for your data
management tool. Students in this section are allowed to work together (undergraduate and graduate
students) in groups of 2 or as individuals. Groups of 3 or more are not allowed.
The Week 14 Notes on tool-making and user-defined functions will be helpful. If you cannot wait that long,
you are welcome to check the following references:
• STAT 385 R Notes “Vectorization and User-defined Functions”
• Chapter 2 “The Very Basics” of G. Grolemund’s Hands-On Programming with R
• User-defined Functions in Python