Example

We now reconstruct the demo chp.11.1 from start to finish to illustrate the capabilities of the YOURCAST software and provide further illustration to the user on how to take advantage of them.

Most users will not have their data in a format easily readable by yourcast(). Thus for this example we will start with the raw .txt files and take advantage of the yourprep() software designed to help users construct the dataobj list easily.

We have stored the original files we used to create the dataobj returned by typing data(chp.11.1) in YOURCAST's data folder. You can view these files, which have the extension .txt, by typing:

> dir(paste(.libPaths()[1],"/YourCast/data",sep=""))
 [1] "adjacency.txt"    "chp.11.10.RData"  "chp.11.11.RData"  "chp.11.12.RData"
 [5] "chp.11.13.RData"  "chp.11.1.RData"   "chp.11.2.RData"   "chp.11.3.RData"
 [9] "chp.11.4.RData"   "chp.11.5.RData"   "chp.11.7.1.RData" "chp.11.7.2.RData"
[13] "chp.11.8.1.RData" "chp.11.8.2.RData" "chp.11.8.3.RData" "chp.11.9.1.RData"
[17] "chp.11.9.2.RData" "chp.2.6.1.RData"  "chp.2.6.2.RData"  "chp.2.7.1.RData"
[21] "chp.2.7.2.RData"  "chp.2.7.3.RData"  "cntry.codes.txt"  "cs2s204545.txt"
[25] "csid204500.txt"   "csid204505.txt"   "csid204510.txt"   "csid204515.txt"
[29] "csid204520.txt"   "csid204525.txt"   "csid204530.txt"   "csid204535.txt"
[33] "csid204540.txt"   "csid204545.txt"   "csid204550.txt"   "csid204555.txt"
[37] "csid204560.txt"   "csid204565.txt"   "csid204570.txt"   "csid204575.txt"
[41] "csid204580.txt"

We could load these files using the data() command, but for this example we will pretend they are files loaded into our working directory that we want yourcast() to be able to read.

The function yourprep() in the YOURCAST package is designed to help you turn these raw files into a dataobj that yourcast() can read. The yourprep() function works by scanning either the working directory or another directory you specify for files beginning with the tag csid. In the data folder we scanned above, there are several files whose names consist of the csid tag and a CSID code in the format we will specify to the function. These are the labels yourprep() needs to be able to recognize and process the files. All files should have an extension so that yourprep() knows how to read them; currently the function supports fixed-width .txt files, comma-separated value files, and Stata .dta files.

Let's examine the first of these cross section text files, csid204500.txt. As we can see below, this file contains all the years from the first observed year to the last predicted year, with missing values replaced by NAs. Because it was created in the software, this file already has the years written in as rownames in a way that can read (and for this reason has only three column labels).

"rspi2" "popu2" "time"
"1950" NA 5457 1920
"1951" NA 6319 1921
"1952" NA 7009 1922
"1953" NA 7553 1923
"1954" NA 7978 1924
...
"2026" NA NA 1996
"2027" NA NA 1997
"2028" NA NA 1998
"2029" NA NA 1999
"2030" NA NA 2000

However, we expect that most users will have input that looks like the next file in the directory, csid204505.txt. Below we can see that the observation year is an extra variable rather than a rowname.

year rspi2 popu2 time
1950 NA 3978 1920
1951 NA 4091 1921
1952 NA 4306 1922
1953 NA 4583 1923
1954 NA 4889 1924
...
2026 NA NA 1996
2027 NA NA 1997
2028 NA NA 1998
2029 NA NA 1999
2030 NA NA 2000

If this is the case, you should set the argument year.var to TRUE in yourprep(); this will automatically convert the year variable to a rowname as long as it is labeled year.

The data directory also includes some of the optional files that we included in our dataobj for the chp.11.1 demo. The first is adjacency.txt, a list of proximity scores for all the possible pairs of the geographic units. The second is cntry.codes.txt, a list of all the CSID codes for the geographic units and their respective labels. We will load these using arguments in the yourprep() function.

We're now ready to run the yourprep() function. Since the function already grabs all files beginning with csid tag, we only need to specify the directory where the files are stored and the names of the optional files, G.names and adjacency. Note that we have set year.var=TRUE since one of our files has the observation year as a separate variable rather than as the rowname:

dta <- yourprep(dpath=paste(.libPaths()[1],"/YourCast/data",sep=""),
                year.var=TRUE, sample.frame=c(1950,2000,2001,2030),
                G.names="cntry.codes.txt", adjacency="adjacency.txt",
                verbose=TRUE)

We have now created a dataobj called dta. Examining the dataobj, we can see that it includes the two required elements, data and index.code, as well as two optional elements.

> names(dta)
[1] "data"       "index.code" "G.names"    "adjacency"

Examining the data element, we can see that it includes all the cross section files that were in the dpath:

> names(dta$data)
 [1] "204500" "204505" "204510" "204515" "204520" "204525" "204530" "204535"
 [9] "204540" "204545" "204550" "204555" "204560" "204565" "204570" "204575"
[17] "204580"

We're now ready for a run of yourcast(). The first run of the program in the chp.11.1 demo file uses the Lee-Carter model. This model uses few of the capacities of the YOURCAST package since it does no smoothing, but is good for a quick run of the function. Use of the smoothing options can be seen in many of the demos and is explained the yourcast() documentation. The code below produces an output object called ylc that is of class yourcast:

ylc <- yourcast(formula=log(rspi2/popu2) ~ time, dataobj=dta, model="LC")

The main output from the yourcast() function is the yhat element of the output list, which contains the observed and predicted values for every cross section. This output is difficult to appreciate without graphics, but we can get a quick summary of our run of the function by typing summary(ylc):

> summary(ylc)
Model: LC 
Number of cross sections: 17 
Formula: log(rspi2/popu2) ~ time 

Observed period: 1950-2000
Forecast period: 2001-2030

Smoothing parameters:
 Ha.sigma  Ht.sigma Hat.sigma 
      0.3       0.3       0.2 

Geo units included:
[1] "2045"

See 'help(plot.yourcast)' for instructions on how to plot observed and predicted 'y' values

Here we can see basic information about the output object. More information not printed automatically is available by typing names(summary(ylc)).

We're now ready to plot the observed and predicted values to study the model output. This can be done simply by typing plot(ylc), but we have added a few arguments here to enhance the graphical output. The argument dvlabel gives a title for the plots by describing the dependent variable. The second argument, age.insamp.predict, tells the function not to plot predicted yhat values in sample in the age plots. You can see more of these options by typing help(plot.yourcast).

plot(ylc, dvlabel="Respiratory Infections", age.insamp.predict=FALSE)

Since we did not specify which type of plot we wanted, the program gave us the default combination of age and time plots. However, the plotting function can also do either of these plots separately, as well as three dimensional plots. To see these, we need to use the family argument. For example, here is a call for the three dimensional plot:

plot(ylc, dvlabel="Respiratory Infections", family="threedim")

Finally, if your analysis includes a large number of geographic areas such that viewing output sequentially on the device is inconvenient, there an option in the plotting function to save the output for each geographic code as a .pdf file in the working directory rather than printing it to the device window. Just set print="pdf".

This ends the example section of the users guide. Please visit the help files for individual functions or send an email to the YourCast listserv if you have problems.

Gary King 2010-09-14