The Data: You are going to be working with a subset of deidentified (anonymous), publicly accessible data from a study known as the Panel Study of Income Dynamics (PSID). For more than 50 years the PSID has been surveying thousands of families in the United States to learn about poverty and income, but also things like health, well-being, childhood, and intergenerational mobility (to name just a few of the many variables included in the study). In short, the PSID encompasses many different studies, and researchers use it to answer many different questions.
This particular subset of data was created for our use in class from the 2017 survey of families,meaning that one member of the family (referred to as ‘reference person’ in the official documentation) was interviewed on behalf of an entire family unit. Here, you have a small selection of variables created from survey questions on family income, (the reference person’s) education,family composition, and a major policy issue – food insecurity. The original dataset includes more than 5,000 variables, nearly 10,000 valid responses, and the original codebook was more than 2,000 pages long. For that reason, a very short version of the codebook is included below.
- Please note that you may only use this subset for our classwork. If you want to learn more about the survey or to access the full data for any work beyond this project, visit the PSID website and create an account here: https://psidonline.isr.umich.edu/default.aspx
- I have already cleaned the data to account for missing values within the data, so you will not need to worry about recoding missing values.
- This survey is designed to be representative of the United States; each of you will work with one of four subsets of the data that are representative not of the entire country but rather of the four Census regions within the country (East, Central, South, West). That means that while your analysis will be similar, your specific answers will vary based on region. Find your name below and make sure you work with the appropriate data file on R Cloud. (Or download the appropriate data file if you choose to work on your laptops.)
o Respondents who live in the noncontiguous states of Hawaii and Alaska are not included in the dataset.
Main Requirements: First, calculate and interpret univariate descriptive statistics to describe family income, food insecurity, and the number of families in the study who have children in their home. Generate at least two figures to assist with your interpretation (one bar plot and one histogram). Second, how do families with and without children compare to one another in terms of food insecurity, poverty, and income? Estimate a cross tabulation and difference of means test as appropriate. Note: you will not be analyzing all of the variables listed here. Instead, your job is to read the codebook and choose from among the variables provided to answer the following questions to the best of your ability:
- 1. Descriptive Statistics
o Income & Poverty: Generate a frequency table for poverty. What percent of families in your region fall below the poverty line? Calculate descriptive statistics for family income in your assigned region; be sure to include measures of central tendency and dispersion. Then, generate a histogram that describes the distribution of family income in that region. Finally, write a few sentences explaining what you learn by analyzing these two variables.
- Bonus: Include in your discussion of income mention of the 25th and 75th percentiles, and produce a boxplot for income.
o Food Insecurity: In the year before they were surveyed, what proportion of families in your region report that they worried about food running out before they got more money? Generate a frequency table to answer this question. Then examine the overall level of food security in your region by generating a frequency table that shows how many families are highly secure, somewhat secure, and so forth.
- 2. Bivariate Analysis
o Difference of Means: Are there differences on average in terms of income, between families with and without children in these data? Evaluate whether or not any differences you find are statistically significant. This will mean estimating a two-sample t-test, presenting the results in a table, and writing a few sentences explaining your results. (There are two types of t-tests you can estimate in R, pay attention to the distinction between independent and two-sample t-tests.)
o Cross Tabulation: Are families with children more or less likely to fall below the poverty line? To be food insecure, on average? Conduct cross-tabulation analyses in which you report whether or not any differences you observe are statistically significant, as well as whether or not they are substantively important (the size of the relationship). Present your results in a table, and write a few sentences explaining your analysis.
- Bonus: Produce side-by-side, polished bar plots of overall food insecurity,with one plot for families with children and one for families without, using ggplot2. Write a few sentences describing what you discover about this policy problem.
- Submit: In Google Classroom, upload and submit a Word document that includes your tables and analysis with your script included as an appendix. If you were working as an analyst or researcher, you wouldn’t simply turn in your code or raw output; you would take the time to format it neatly and interpret your results for your client or the public; that’s what should be in the word document. If you prefer, you may append an R Markdown file instead of the script, but you must still submit a polished, short, memo.
- Formatting your data report:
o Executive Summary: Preview your analysis, including any context or background,and summarize your major takeaway(s). Briefly outline the contents of the report.
o Analytic Sections: Include tables, figures, and text to explain your method and present your findings for each piece of analysis. Break these up by issue or question.
You can follow the sections above, or organize the analysis in your own way.
o Conclusions: Summarize your findings with more detail than was provided in the executive summary. What future questions need answering? How pressing of an issue is food insecurity for families in your assigned region of the country
o Appendix: Attach your script. You may choose to include extra tables or figures that provide additional detail but seemed a little extraneous to you.
o Excluding your appendix, 3-5 pages should be enough to accomplish these aims!
A Few Recommended Steps and Resources:
- Open R Studio Cloud; create and save a new R script file; and import your data in order to get started with this assignment. The data are in .dta or Stata format, so you’ll need to import the file with that in mind. Remember to test your code and save the script as you go.
o If you choose to complete the assignment in R Studio on your desktop, you will need to download the appropriate dataset from Canvas to your laptop before getting started, and make sure that you have loaded the appropriate packages (and installed them if you are working on your laptop).
- Chapters 7 and 8 in Fogarty, as well as the scripts that accompany them should provide most of the guidance you will need in order to conduct the descriptive component of your analysis and to generate basic plots.
- Chapter 9 in Fogarty, and the accompanying code includes all that you need to estimate a difference of means test (two sample t-test). Chapter 10 as well as the script from that chapter includes all that you need in order to conduct a cross tabulation analysis.
- When you have the code correct for your plots, export and save the figures as .png files. Use the table templates provided on Canvas to prepare the results of your analysis in Word.
- Throughout the assignments, the following packages may be needed: foreign, haven, car,plyr, ggplot2, descry, gridExtra, DescTools, tibble, readxl, ltm, scales, and rmarkddown.
They have all been installed on the R Studio Cloud server.
EasyDue™ 支持PayPal, AliPay, WechatPay, Taobao等各种付款方式!
E-mail: firstname.lastname@example.org 微信:easydue