The 1970 Draft Lottery in R

In November of 1969, with the Vietnam War raging, President Nixon signed an executive order instructing the Selective Service to reinstitute the draft. The order stipulated that the selection be a random process based on the birthdays of men born between January 1, 1944 and December 31, 1950. The order did not specifically state how the birthdays should be selected.

The Selective Service placed the days of the year, including February 29, into 366 plastic capsules. These capsules were placed in a glass container, mixed, then drawn one at a time. The first capsule drawn contained the date September 14. This date was assigned the number 1. The significance of this assignment was the fact that all men born on September 14 between the years 1944 and 1950, would be the first selected to serve in Vietnam. Once all men born on this date were inducted into the service, then the Selective Service would begin to induct men born on the date contained in the second capsule drawn, etc.

Selective service lottery drawing.

Figure 1. Rep. Alexander Pirnie, R-NY, draws the first capsule in the lottery drawing held on Dec. 1, 1969. The capsule contained the date, Sept. 14.

The last capsule drawn contained the date December 31. It was estimated by the Pentagon that men with draft numbers in the last third, numbers 200 to 366, would escape the draft entirely. In fact, no man with a draft number higher than 195 was called to duty.

The fairness of the draft lottery was immediately debated. Critics contended that the process was not truly random. A New York Times article quoted a White House source as saying "discussions that the lottery was not random are purely speculative." In that same New York Times article, Senator Edward Kennedy was quoted as asking the National Sciences the "apparent lack of randomness" in the selection.

The Data

The data is publicly available on the internet. One source is the Data and Story Library. The draft lottery data is located at the following URL:

http://lib.stat.cmu.edu/DASL/Datafiles/DraftLottery.html

If you have not imported data into R from external sources, you might want to first work through the activity Importing Data in R.

One technique, as explained in Importing Data in R, suggests copying the data into a plain text file. Open a simple text editor (e.g., Notepad on Windows or Textedit on the Mac). Copy and paste the lottery data from the above URL, including headers (but not the descriptive information above the headers), and save the file as lottery.txt. What follows are the first few lines in our version of lottery.txt

Day	Month	Mo.Number	Day_of_year	Draft_No.
1	Jan	1	1	305
2	Jan	1	2	159
3	Jan	1	3	251
4	Jan	1	4	215
5	Jan	1	5	101
6	Jan	1	6	224
7	Jan	1	7	306

If you decide to use Microsoft Word to create the file lottery.txt, you must make sure you save the file as a plain text file. Do not save it using Word's proprietary "doc" format.

Importing the Data into R

It is now a simple matter to import the data in lottery.txt into R's workspace.

> lottery=read.table(file=file.choose(), header=TRUE)

Because we use the file.choose() method, this command will pop open a dialog box that allows us to "browse" our system in a familiar manner. Browse to the folder or directory where you saved the file lottery.txt, select lottery.txt in the usual manner for your operating system, then click the Open button. The data from lottery.txt will be imported and stored in the dataframe lottery. Note: If you are not yet familiar with dataframes in R, you might want to try the activity Dataframes in R before continuing.

To test the success of our data import, execute the following command.

> print(lottery)

We list only the first seven lines (and the last two) of the response as it is quite extensive (there are 366 lines, one for each day of the year --- aren't you glad we didn't have to type this data in by hand?).

    Day Month Mo.Number Day_of_year Draft_No.
1     1   Jan         1           1       305
2     2   Jan         1           2       159
3     3   Jan         1           3       251
4     4   Jan         1           4       215
5     5   Jan         1           5       101
6     6   Jan         1           6       224
7     7   Jan         1           7       306
...
...

365  30   Dec        12         365         3
366  31   Dec        12         366       100

Some descriptive comments are in order.

  1. There is an extra first column that was not in our file lottery.txt. The number in this column run from 1 to 366, one for each record in the file.
  2. The headers of the columns are "Day", "Month", "Mo.Number", "Day_of_year", and "Draft_no." Thus, in the last entry of the response above, we interpret the numbers in that last row as follows:
    1. 366 is the number of the row.
    2. 31 is the day of the month; i.e., December 31.
    3. Dec is the month; i.e., December.
    4. 12 is the number of the month; i.e., December is the 12th month of the year.
    5. 366 is the day of the year; i.e., Dec. 31 is the 366th day of the year.
    6. 100 is the draft number; i.e., men born on December 31 were assigned a draft number of 100. This means that all men with draft numbers 1-99 will be selected prior to any man with a draft number of 100.

A Scatterplot of Draft Number versus Day of the Year

We can use the names command to determine the column headers of the dataframe lottery.

> names(lottery)
[1] "Day"         "Month"       "Mo.Number"   "Day_of_year"
[5] "Draft_No." 

Now, back to the charge: Was this system used by the Selective Service truly random and fair?

Actually, the question of fairness requires a deep understanding of statistical concepts and technique. However, with a few simple visualizations (graphs), perhaps we can make a rudimentary judgement as to fairness. For example, if one were to make a scatterplot of the draft number versus the day of the year, one would expect to see no correlation.

To access the data in a column of the dataframe lottery, we suffix lottery with a dollar sign followed by the name or header of the column whose data we need (see Dataframes in R). Thus, lottery$Day_of_year accesses the data in the column of the dataframe with header "Day_of_year". Therefore, obtaining a scatterplot of Draft_No. versus Day_of_year is a simple task.

> plot(lottery$Day_of_year,lottery$Draft_No.)

The result of this command is the scatterplot shown in Figure 2.

A scatterplot of <strong>Draft_No.</strong> versus <strong>Day_of_year</strong>.

Figure 2. A scatterplot of Draft_No. versus Day_of_year.

One could also "attach" the dataframe lottery (see Dataframes in R). When we "attach" a dataframe, we can access the columns without using the dollar notation. Thus, we can plot Draft_No. versus Day_of_year with the following commands.

> attach(lottery)
> plot(Day_of_year,Draft_No.)

It is good practice to "detach" the dataframe when finished.

> detach(lottery)

Readers should check that these commands produce a scatterplot identical to that shown in Figure 2.

Efficient Use of Dataframes

R's plot command, coupled with a "model formula," it the most efficient way to produce a scatterplot. Without further explanation, enter the following code. Note: Remember that ~ is a "tilde", not a minus sign, and is located to the immediate left of the 1 key on the second row from the top of your keyboard.

> plot(Draft_No. ~ Day_of_year, data=lottery)

This command will produce the scatterplot shown in Figure 3. Note that it is identical to the scatterplot shown in Figure 2.

A scatterplot of <strong>Draft_No.</strong> versus <strong>Day_of_year</strong>.

Figure 3. A scatterplot of Draft_No. versus Day_of_year.

We used the syntax plot(model formula, data = dataframe) to obtain the scatterplot in Figure 3. Some explanatory comments are in order.

  1. For our "model formula," we used Draft_No. ~ Day_of_year. These columns both contain numeric data (we'll soon see what happens when a column contains categorical data). This forces the plot command to plot the draft number versus the day of the year, as seen in Figure 3. If we want a plot of the day of the year versus the draft number, we would simply reverse these variables in the model formula; i.e. plot(Day_of_year ~ Draft_No., data = lottery).
  2. We used data=lottery in our plot command. This forces the plot command to use the columns in the dataframe lottery.

Interpreting the Scatterplot

Is there a correlation between the draft number and the day of the year as witnessed in Figure 3? One might easily surmise that the data is truly random, as did the Selective Service board in 1970. However, let's not be too hasty and try a different visualization before coming to a conclusion.

Side-by-Side Boxplots

When crafting the scatterplot in Figure 3, we used the "model formula" Draft_No. ~ Day_of_year. In that case, both of these columns contained numeric data. Thus, the command plot(Draft_No. ~ Day_of_year, data = lottery) used the columns in the dataframe lottery and produced the scatterplot in Figure 3. What will happen when one of the variables in the "model formula" contain categorical data? We're about to find out.

In the activity Boxplots in R we learned how to use R's boxplot command to produce a boxplot of a data set. To examine the "fairness" of the Selective Service's draft lottery, we will produce "side-by-side" boxplots for each month of the year. That is, we will produce 12 boxplots, one for each month of the year, each containing an analysis of the associated draft numbers for that month. The following command will produce these "side-by-side" boxplots shown in Figure 4.

> boxplot(Draft_No. ~ Month, data=lottery)

Side-by-side boxplots of draft numbers for each month.

Figure 4. Side-by-side boxplots of draft numbers for each month.

Because the data in Month is categorical (you can see this by typing lottery$Month), the model formula Draft_No. ~ Month causes the boxplot command to group the numerical data in Draft_No. according to the categories in Month. Therefore, the command boxplot(Draft_No. ~ Month, data=lottery) creates 12 boxplots, one for each month. For example, the boxplot for April (see Apr in Figure 4) contains an analysis for only those draft numbers that were assigned to birth-dates in April. Similar comments are in order for the remaining months.

Unfortunately, the months are sorted in alphabetical order (the default behavior). It would be more appropriate if they were sorted in chronological order, January first, February second, etc. One solution would be to boxplot the draft numbers versus the month number.

> boxplot(Draft_No. ~ Mo.Number, data=lottery)

This command produces the side-by-side boxplots shown in Figure 5.

Figure 5. Side-by-side boxplots of draft numbers for each month, 1 representing January, 2 representing February, etc.

The plot in Figure 5 is close to what we want, but it would be a much better plot if we could replace 1, 2, 3, etc., with Jan, Feb, Mar, etc. This is easily done. In the code that follows, each plus sign (+) is R's line continuation character. When you type in the first line of the following code, hit the Enter or Return key. This will produce the plus sign on your screen and you can type in the next line of code and again hit the Enter key. This produces another line continuation character (the plus sign) and you can type in the last line of code and hit Enter.

> boxplot(Draft_No. ~ Mo.Number, data=lottery,
+ names=c("Jan","Feb","Mar","Apr","May","Jun",
+ "Jul","Aug","Sep","Oct","Nov","Dec"))

The names argument labels the side-by-side boxplots shown in Figure 6 with the name of the month that corresponds to the month number.

Side-by-side boxplots of draft numbers sorted by month.

Figure 6. Side-by-side boxplots of draft numbers sorted by month.

Interpretation of Results

The image in Figure 6 is perfect. The months are now sorted in chronological order. But now, what does the image of side-by-side boxplots tell us?

Remember, the heavy horizontal bar in each box is the median of the data set. The median draft number for the month of December is very disconcerting. Remember, the lower the draft number, the more likely you would be inducted to serve in Vietnam. Why does the month of December have a median that is significantly lower than most of the other months. It seems that the men with birthdays in December are being unfairly selected. Indeed, with the exception of October, the last remaining months of the year all have medians that are significantly lower than the medians of the previous months. Something strange is going on!

One story offers a hint of an explanation. It seems that the capsules containing birthdays for January were placed in a shoe-box, thoroughly mixed, then poured into the glass container shown in Figure 1. Then the same procedure was followed for the capsules containing birthdays in February, stirring them thoroughly in a shoe-box, then pouring them into the glass container. This same procedure was followed for the remaining months. December was the last month processed, or so the story goes.

However, this is quite disturbing. If capsules were selected from the top of the glass container, they were more likely to be a December birthday. According to the story, the person making the draws did not always reach deep into the pile of capsules. This may be one explanation for why so many December birthdays were selected early in the process and assigned low draft numbers (which correlates to a higher chance of being drafted).

This story may be an oversimplification. Readers are encouraged to explore the reasons for why this process failed to be as random as hoped. You might also be interested in exploring what corrections were made to the process in ensuing years.

Enjoy!

We hope you enjoyed this analysis of the 1969 Selective Service lottery using the R system. We encourage you to explore further.