这个Homework是用R语言研究新闻中的气候变化报道

T81-576 Spring 2020, Week 12 Homework Assignment, Case Study 1
Analytics Applications
Homework Assignment – Case Study
Climate Change Coverage in the News
Prof. Farmer

T81-576 Spring 2020, Week 12 Homework Assignment, Case Study 2
Background
People often complain about important subjects being covered too little in the news. One such subject is
climate change. The scientific consensus is that this is an important problem, and it stands to reason that the
more people are aware of it, the better our chances may be of solving it. But how can we assess how widely
covered climate change is by various media outlets? Specifically, in this case study, the objective is to try to
answer some questions about which news outlets are giving climate change the most coverage.
Data
The data was scraped using BeautifulSoup and stored in Sqlite. It has been chopped it up into three separate
CSVs for this case study as the entire Sqlite database came out to about 1.5 GB.
The publications include the New York Times, Breitbart, CNN, Business Insider, the Atlantic, Fox News,
Talking Points Memo, Buzzfeed News, National Review, New York Post, the Guardian, NPR, Reuters, Vox,
and the Washington Post.
Sampling was not scientific – data was chosen on my familiarity of the publications and domain, and I tried to
get a range of political alignments, as well as a mix of print and digital publications. By count, the publications
break down between the years of 2016 and July 2017, although there is a not-insignificant number of articles
from 2015, and a possibly insignificant number from before then.
articles1.csv – 50,000 news articles (Articles 1-50,000)
articles2.csv – 49,999 news articles (Articles 50,001-100,00)
articles3.csv – Articles 100,001+
Homework Assignment
For this homework assignment, we are going to find the correlation, if any, between the characteristics of these
news outlets and the proportion of climate-change-related articles they publish. Some interesting
characteristics we could look at include ownership (independent, non-profit, or corporate) and political
leanings, if any. Below, I’ve done some preliminary research, collecting information from Wikipedia and the
providers’ own web pages:

T81-576 Spring 2020, Week 12 Homework Assignment, Case Study 3
Atlantic:
Owner: Atlantic Media; majority stake recently sold to Emerson collective, a non-profit
founded by Powell Jobs, widow of Steve Jobs
Lean Left
Breitbart:
Owner: Breitbart News Network, LLC
Founded by a conservative commentator
Right
Business Insider:
Owner: Alex Springer SE (publishing house in Europe)
Center / left-center
Buzzfeed News:
Private, Jonah Peretti CEO & Kenneth Lerer, executive chair (latter also co-founder of
Huffington Post)
Lean left
CNN:
Turner Broadcasting System, mass media
TBS itself is owned by Time Warner
Lean left
Fox News:
Fox entertainment group, mass media
Lean right / right
Guardian:
Guardian Media Group (UK), mass media
Owned by Scott Trust Limited
Lean left
National Review:
National Review Institute, a non-profit
Founded by William F Buckley Jr
Right
New York Post:
News corp, mass media
Right / right center
New York Times:
NY Times Company
Lean Left
NPR:
Non-profit
Center / left-center
Reuters:
Thomson Reuters Corporation (Canadian multinational mass media)
Center
Talking points memo:
Josh Marshall, independent
Left
Washington Post:
Nash Holdings LLC, controlled by J. Bezos
Lean left
T81-576 Spring 2020, Week 12 Homework Assignment, Case Study 4
Vox:
Vox Media, multinational
Lean left / left
For example, looking this over, we might hypothesize that right-leaning Breitbart would have a lower
proportion of climate related articles than, say, NPR. We can turn this into a formal hypothesis statement and
apply our learnings of text analytics and NLP to answer the question at hand.
Guidelines and Deliverables
At a minimum, the following should be provided in the homework assignment submission:
All R/Python code developed or used to complete the assignment
Any supplemental data sets beyond what was provides
Either an annotated Jupiter notebook/R-markdown file, or a written report, outline the descriptive
analytics insights (data exploration), the techniques applied, the findings, and any recommendations
for improvements if you had more time to work on this
A sample output for to determine the final analysis may look like the table below. This table includes counts
after various techniques (e.g., tokenization) have been applied, where
cc_words is the count of instances
climate change, or related wording, was included in an article by the specific media outlet.

id title author date year month url content tokenized num_wds uniq_wds cc_wds
publication
Atlantic 7178 7178 6198 7178 7178 7178 0 7178 7178 7178 7178 7178
Breitbart 23781 23781 23781 23781 23781 23781 0 23781 23781 23781 23781 23781
Business
Insider
6695 6695 4926 6695 6695 6695 0 6695 6695 6695 6695 6695
Buzzfeed
News
4835 4835 4834 4835 4835 4835 4835 4835 4835 4835 4835 4835
CNN 11485 11485 7024 11485 11485 11485 0 11485 11485 11485 11485 11485
Fox News 4351 4351 1117 4349 4349 4349 4348 4351 4351 4351 4351 4351
Guardian 8680 8680 7249 8640 8640 8640 8680 8680 8680 8680 8680 8680
NPR 11992 11992 11654 11992 11992 11992 11992 11992 11992 11992 11992 11992
National
Review
6195 6195 6195 6195 6195 6195 6195 6195 6195 6195 6195 6195
New York
Post
17493 17493 17485 17493 17493 17493 17493 17493 17493 17493 17493 17493

T81-576 Spring 2020, Week 12 Homework Assignment, Case Study 5

id title author date year month url content tokenized num_wds uniq_wds cc_wds
publication
New York
Times
7803 7803 7767 7803 7803 7803 0 7803 7803 7803 7803 7803
Reuters 10710 10709 10710 10710 10710 10710 10710 10710 10710 10710 10710 10710
Talking
Points
Memo
5214 5213 1676 2615 2615 2615 5214 5214 5214 5214 5214 5214
Vox 4947 4947 4947 4947 4947 4947 4947 4947 4947 4947 4947 4947
Washington
Post
11114 11114 11077 11114 11114 11114 11114 11114 11114 11114 11114 11114

Another example may show that you can determine the proportion of articles by political bias:
These are meant to be exemplary and to help generate thoughts for you all.
Good luck and please let me know if you have any questions!


EasyDue™ 支持PayPal, AliPay, WechatPay, Taobao等各种付款方式!

E-mail: easydue@outlook.com  微信:easydue


EasyDue™是一个服务全球中国留学生的专业代写公司
专注提供稳定可靠的北美、澳洲、英国代写服务
专注提供CS、统计、金融、经济、数学等覆盖100+专业的作业代写服务