这个作业是用R语言分析Facebook新闻数据的代写

FIT1043 Assignment 3

任务A:使用Shell命令调查Facebook数据从上面的链接下载文件FB_Dataset.csv.zip。使用Unix Shell来操作提交并回答以下问题。
1.解压缩文件。它有多大?
2.使用什么定界符分隔文件中的列?第二栏是唯一的Facebook帖子的标识符。打印输出中其他列的名称?
3.那里有多少个唯一页面?
4.此文件中Facebook帖子的日期范围是什么? (假设数据是有序的)
5.“唐纳德·特朗普”一词(无视案件)出现了多少次
职位名称?什么是第一次提到“唐纳德·特朗普”(忽略案件)?
职位名称和职位名称是什么?考虑不同的列除了帖子和帖子名称以外,您是否可以发表此帖子?
这个职位的反应是正面还是负面?你能看到多少反应这个帖子?
6.选择其中有“特朗普”一词的帖子ID和喜欢的帖子数(忽略帖子内容中提到),并且点赞次数大于100。根据like_count对数据进行排序(降序排列)并将其保存在名为“ trump.txt”。 (您需要添加输出的屏幕截图,包括前5行和报告中的列标题)。
任务B
任务B1:使用Shell命令和R分析墙街新闻在这个问题中,我们想看看一种会影响参与度的特定内容类型脸书为了简化此任务,我们将专门查看评论数针对“ thewallstreet-journal”的每种帖子类型(事件,链接,照片,状态和视频)发布。提取“ the-wall-street-journal”发布的所需信息
使用Shell命令并将结果保存为名为“ the-wall-streetjournal.csv”的CSV文件。
7.“ the-wall-street-journal”已要求您重点分析针对
注释的数量少于4000。您需要将如上所述生成的“ the-wall-streetjournal.csv”文件读入R,然后根据壁画新闻》的要求,并绘制箱形图以显示分布针对每种类型的帖子(事件,链接,照片,状态和视频)发表评论的数量。您需要展示一个情节,其中包含针对不同职位类型的不同箱形图。什么你可以从这个情节推断吗?您能找到最吸引人的职位类型吗?确保您的地块上有适当的标签和标题。
8.您可能已经注意到异常值的存在会影响可读性,箱图中数据的解释。通过滤除值来重绘箱线图(comments_count)大于1000。
9.平均而言,哪种类型的帖子(事件,链接,照片,状态或视频)平均最多对“ the-wall-street-journal.csv”有效吗?换句话说,哪个post_type具有中位数最高的评论数?
任务B2:根据您的偏好分析“ abc-news”帖子在此任务中,您可以使用R,Python或Shell命令或Python,R和壳牌回答问题。
The ‘abc-news’ asked you to help them to analyse the reactions of Facebook users to the posts
which they published about “Donald Trump” (ignore the case) by doing the following tasks.
10. Create a bar chart which shows the total number of reactions to the posts published by
“abc-news”, in which the posted message contains the term “Donald Trump”(ignore
the case) for each day of the week (‘Monday’, ‘Tuesday’, ‘Wednesday’, ‘Thursday’,
‘Friday’, ‘Saturday’, ‘Sunday’). Make sure the bar chart is sorted based on the weekdays
as shown in the screenshot below. Understanding what should be considered as a
reaction is a part of the answer to this question which you can figure out by checking
different columns of your dataset. You need to mention and justify the criterion which
you choose to define a reaction to a post. (Please pay attention that the plot does not
show the real values and it is created with fake data just to show you how the output
should be).
Figure 1:Sample output for question 10
11. Considering the created bar chart of question 10, name two days in which users have
shown the most reactions to the posts. Is there any difference between the number of
reactions during the weekdays and at the weekends?
12. We need to take a closer look at the total reactions in the two days which users have
shown the most reactions. Create two bar charts to show the hourly total reactions for
each of two days. What time did the most reactions happen on each day? Is there any
similarity between the number of hourly reactions in these two days? (Please pay
attention that the sample plot given below does not show the real values and it is created
with fake data just to show you how the output should be presented for this question.)
Figure 2: Sample output for question 12
13. Considering your exploration about the reactions in different days/times for the term
“Donald Trump” in posts by “abc-news”, answer the following questions.
a) What was the day and time which had the maximum number of reactions?
b) Do you think it is a good idea to recommend publishing a general post about
Trump in the days which you found in question 11 and the peak hours which
you found in question 12? What is your suggestion based on the analysis which
you did in this task? Justify your answer.