这个作业是用R语言分析Facebook的新闻数据

FIT1043 Assignment 3

任务A:使用Shell命令调查Facebook数据
从上面的链接下载文件FB_Dataset.csv.zip。使用Unix Shell来操作
提交并回答以下问题。
1.解压缩文件。它有多大?
2.使用什么定界符分隔文件中的列?第二栏是唯一的
Facebook帖子的标识符。打印输出中其他列的名称?
3.那里有多少个唯一页面?
4.此文件中Facebook帖子的日期范围是什么? (假设数据是有序的)
5.“唐纳德·特朗普”一词(无视案件)出现了多少次
职位名称?什么是第一次提到“唐纳德·特朗普”(忽略案件)?
职位名称和职位名称是什么?考虑不同的列
除了帖子和帖子名称以外,您是否可以发表此帖子?
这个职位的反应是正面还是负面?你能看到多少反应
这个帖子?
6.选择其中有“特朗普”一词的帖子ID和喜欢的帖子数(忽略
帖子内容中提到),并且点赞次数大于100。
根据like_count对数据进行排序(降序排列)并将其保存在名为
“ trump.txt”。 (您需要添加输出的屏幕截图,包括前5行和
报告中的列标题)。
任务B
任务B1:使用Shell命令和R分析墙街新闻
在这个问题中,我们想看看一种会影响参与度的特定内容类型
脸书为了简化此任务,我们将专门查看评论数
针对“ thewallstreet-journal”的每种帖子类型(事件,链接,照片,状态和视频)发布。提取“ the-wall-street-journal”发布的所需信息
使用Shell命令并将结果保存为名为“ the-wall-streetjournal.csv”的CSV文件。
7.“ the-wall-street-journal”已要求您重点分析针对
注释的数量少于4000。您需要将如上所述生成的“ the-wall-streetjournal.csv”文件读入R,然后根据
壁画新闻》的要求,并绘制箱形图以显示分布
针对每种类型的帖子(事件,链接,照片,状态和视频)发表评论的数量。您
需要展示一个情节,其中包含针对不同职位类型的不同箱形图。什么
你可以从这个情节推断吗?您能找到最吸引人的职位类型吗?
确保您的地块上有适当的标签和标题。
8.您可能已经注意到异常值的存在会影响可读性,
箱图中数据的解释。通过滤除值来重绘箱线图
(comments_count)大于1000。
9.平均而言,哪种类型的帖子(事件,链接,照片,状态或视频)平均最多
对“ the-wall-street-journal.csv”有效吗?换句话说,哪个post_type具有
中位数最高的评论数?
任务B2:根据您的偏好分析“ abc-news”帖子
在此任务中,您可以使用R,Python或Shell命令或Python,R和
壳牌回答问题。
The ‘abc-news’ asked you to help them to analyse the reactions of Facebook users to the posts
which they published about “Donald Trump” (ignore the case) by doing the following tasks.
10. Create a bar chart which shows the total number of reactions to the posts published by
“abc-news”, in which the posted message contains the term “Donald Trump”(ignore
the case) for each day of the week (‘Monday’, ‘Tuesday’, ‘Wednesday’, ‘Thursday’,
‘Friday’, ‘Saturday’, ‘Sunday’). Make sure the bar chart is sorted based on the weekdays
as shown in the screenshot below. Understanding what should be considered as a
reaction is a part of the answer to this question which you can figure out by checking
different columns of your dataset. You need to mention and justify the criterion which
you choose to define a reaction to a post. (Please pay attention that the plot does not
show the real values and it is created with fake data just to show you how the output
should be).
Figure 1:Sample output for question 10
11. Considering the created bar chart of question 10, name two days in which users have
shown the most reactions to the posts. Is there any difference between the number of
reactions during the weekdays and at the weekends?
12. We need to take a closer look at the total reactions in the two days which users have
shown the most reactions. Create two bar charts to show the hourly total reactions for
each of two days. What time did the most reactions happen on each day? Is there any
similarity between the number of hourly reactions in these two days? (Please pay
attention that the sample plot given below does not show the real values and it is created
with fake data just to show you how the output should be presented for this question.)
Figure 2: Sample output for question 12
13. Considering your exploration about the reactions in different days/times for the term
“Donald Trump” in posts by “abc-news”, answer the following questions.
a) What was the day and time which had the maximum number of reactions?
b) Do you think it is a good idea to recommend publishing a general post about
Trump in the days which you found in question 11 and the peak hours which
you found in question 12? What is your suggestion based on the analysis which
you did in this task? Justify your answer.


EasyDue™ 支持PayPal, AliPay, WechatPay, Taobao等各种付款方式!

E-mail: easydue@outlook.com  微信:easydue


EasyDue™是一个服务全球中国留学生的专业代写公司
专注提供稳定可靠的北美、澳洲、英国代写服务
专注提供CS、统计、金融、经济、数学等覆盖100+专业的作业代写服务