本次英国作业是Excel统计相关的一个Homework**统计代写**

This translates into a certain amount of storage at the end of the season, which then has to be the supply for the rest of the year. You could compute the deficits in the two periods separately and add them up. The second approach would be to just use the annual rain and the annual demand and compute the deficit. Will these two give you the same answer, or different answers? Use only one, and justify. Will your answer to this question be different if you considered (a) an open reservoir with evaporation, and (b) the monthly data instead of the aggregate February to May data. Justify your answer with a mini-example.

(c) Working again with the deficit data you calculated in part (b) compute the worst cumulative deficit over multiple years, and report the number of years of continuous deficit, as well as the magnitude of the cumulated deficit. Given this information, how would you re-evaluate your choice for the size of the storage tank? What are we now trying to account for, and why? If you choose to increase the size of the storage tank, provide the new value, and indicate the number of years in which the householder may run out of water if the tank were increased to this size.

What does this imply about the reliability of the new storage tank vs. the old one? What is the chance that such a sequence of events will unfold at least once in the 20 years we are planning for?

(d) At this point you may have recognized that the cumulative deficit computed in part (c) translates into the need for water we have to store to meet the demand (assuming we don’t ration) during a drought, AND that if we consider the “supply” time series to be random, then we may get a different estimate of the storage needed if we consider different likely 27 year sequences. So, if we calculate the storage needed to meet the demand given the 27 years of data we have may be relatively large (e.g., if 5 dry years happened to come in a sequence by chance) or small (e.g., if the exceptionally dry years are interspersed with wet years). The goal of part (d) is to find the amount of storage we should plan for if the desired reliability is p% (e.g., 90%). We’ll pursue this goal by generating a number (e.g., 100) of scenarios and computing the storage needed for each of them. Each of the resulting storage estimates (100) is equally likely.

First recognize that if we want the specified demand to be met in 90% of the supply scenarios that we generate, then this specification is equivalent to the 90^{th} percentile of the estimated storage values. Why? Explain briefly. Now, let’s pursue the generation of the scenarios. Consider that each year’s rainfall is random, has the same causal structure and does not depend on prior years, i.e., it is independent and identically distributed. Then we can use the bootstrap (randomly sample with replacement from the original sample) to generate the scenarios. Since the policy directive is to plan for the next 20 years, we’ll draw 20 years rainfall at random from the 27 years of data. Also, since we are primarily interested in the storage needed, and not the duration of the “drought” at this point, we’ll just record the storage needed from each scenario. Perform the analysis and report the storage needed if we were to specify reliabilities of 50% and 90% respectively. Compare with your previous assessments.

To compute a cumulative deficit, see the example in the storage.xls spreadsheet attached.

Recall that the definition of the probability of an event is the number of times that event occurs as a fraction of the total number of events. So, if you got a 5 year dry sequence with an accumulated deficit of 30cm, this particular event occurred once in 27 years. So it has a chance of occurring once in 27 years, based on the raw data that we have. This suggests a probability of 1/27 in any given year. So, one way to calculate the probability of at least one such “drought” occurring in the 20 year period is =1-(26/27)^{20} =0.53à in any year there is a 26/27 chance that such an event will not occur. If, we assume that each year is random, and independent, then, the chance that the deficit does not occur in each of the 20 years = (26/27)^{20}. Since occurring at least once in 20 years is the complement of not occurring, the chance of it occurring at least once is as above. A little reflection shows that this analysis is not quite right. We cannot quite talk about a 5 year long drought in a given year. There are two factors that need to be considered in the description of such a “drought”. The first is the duration (e.g., 5 years), and the second is the severity (e.g., the accumulated deficit). So, the question asked re the chance in 20 years is paraphrased as the chance that a drought at least this long and with a severity at least this high may occur. This is a more daunting task than the analysis presented above in this footnote, and is dealt with in the next part of this problem.

For an illustration of the process go to the spreadsheet storage.xls and look at the bootstrap sheet. Notes: (a) The storage needed is not the same as the cumulative deficit computed earlier—in the previous problem we did not consider the capacity to carry water in storage from one year to the next, here we will – so the procedure changes – check it. (b) in the example we draw 20 years of values from 20 years, in your case you’ll draw 20 years of values from 27 years; (c) you’ll need to record the storage needed from each scenario by typing it in a new cell, and each time you type a number and hit return, a new scenario will be generated; (d) generate 50 to a 100 scenarios and then sort the resulting storage values to get the percentiles and report the reliability (we learned how to do something like this in part (b) of the problem. To draw a bootstrap sample from the original data, we apply the idea discussed in class. Let’s say the data can be divided into k classes (e.g., k=3: 0<x<0.4, 0.4<x<0.9, 0.9<x<1), and that we have estimated their probabilities (by counting the % of historical values in each class) as p_{1}, p_{2}, …p_{k}(e.g., 0.3,0.3,0.4). Now if we generate a random number between 0 and 1, (e.g., 0.6) we can sample a data value from the class that corresponds to the appropriate value of the cumulative probability distribution (e.g., 0.3, 0.6, 1) for the k classes. For our example, this would be class 2.

Since we treat each of the historical observations as equally likely, we could consider sorting them in ascending order, and assigning an equal probability (e.g., 1/28 for 27 data points that imply 28 intervals) to each class of values that is generated in this way. Then if we generate a random number (e.g., 0.88) between 0 and 1, we can look up its position (it is between 24/28 and 25/28) in the cumulative distribution of these intervals and draw the corresponding ranked value (rank 25) from the historical data set. We repeat this process the desired number of times (20) to generate a bootstrap sample or scenario.