Importing Annual Probation and Parole Surveys into R

28 September 2018

Every year, the Bureau of Justice Statistics collects administrative data from probation and parole agencies in the United States, publishing national and state level data. These annual surveys can be useful for learning about those supervised by the criminal justice system, and the states that supervise them. This post is a walkthrough on how to import all years of these surveys into R, and the output is two .csv files containing all data from all years, one for parole data and one for probation.

desk.jpg

Because the Institute for Social Research requires a login to download data from these surveys, files for each year from each survey will have to be downloaded manually.

To download these files, navigate to both the Annual Probation Survey Series page. Then, on each page under the “Studies” tab, click on each link for each survey of years of interest. Now you should have several pages open that look like this. Under the “Download” dropdown, chose the “Delimited” option, which will download a zipped folder containing a tsv file. Note that while some years do have an “R” specific download option, not all of them do, and for consistency the “R” option is not used here. Do not unzip this file! Create a folder with the path “paroleProbationSurveys/probation/raw” and drag all the downloaded zip folders into it.

setwd('/YOURDIRECTORYHERE/paroleAndProbation/probation/raw/')

files <- list.files()
sapply(files, unzip)
sapply(files, file.remove)

dataFiles <- list.files(recursive = TRUE)
dataFiles <- dataFiles[grep(".tsv", dataFiles)]

yearFile <- data.frame(YEAR = 1994:2015, SOURCE = c(
  "ICPSR_29668", "ICPSR_29669", "ICPSR_29670",
  "ICPSR_29671", "ICPSR_29672", "ICPSR_29673",
  "ICPSR_28361", "ICPSR_28362", "ICPSR_28363",
  "ICPSR_28364", "ICPSR_28365", "ICPSR_28366",
  "ICPSR_31323", "ICPSR_31324", "ICPSR_34319",
  "ICPSR_34320", "ICPSR_34321", "ICPSR_34717",
  "ICPSR_35256", "ICPSR_35631", "ICPSR_36343",
  "ICPSR_36618"))

probation <- sapply(dataFiles, read.table, sep = '\t', header = TRUE, stringsAsFactors = FALSE)

idx <- sapply("/", regexpr, names(probation))
names(probation) <- substr(names(probation),1,idx-1)
names(probation) <- yearFile$YEAR[match(names(probation), yearFile$SOURCE)]

probation<-Map(cbind, probation, YEAR=names(probation))

probation = Reduce(function(...) merge(..., all=T), probation)

write.csv(probation, "../probationSurvey.csv", row.names = FALSE)

Then navigate to the Annual Parole Survey Series page and repeat the process above, except putting the files in a folder with the path “paroleProbationSurveys/parole/raw”, and using the following code:

setwd('/YOURDIRECTORYHERE/paroleAndProbation/parole/raw/')

files <- list.files()
sapply(files, unzip)
sapply(files, file.remove)

dataFiles <- list.files(recursive = TRUE)
dataFiles <- dataFiles[grep(".tsv", dataFiles)]

yearFile <- data.frame(YEAR = 2015:1994, SOURCE = c(
  "ICPSR_36619", "ICPSR_36320", "ICPSR_35629",
  "ICPSR_35257", "ICPSR_34718", "ICPSR_34382",
  "ICPSR_34381", "ICPSR_34380", "ICPSR_31332",
  "ICPSR_31331", "ICPSR_31330", "ICPSR_31329",
  "ICPSR_31328", "ICPSR_31327", "ICPSR_31326",
  "ICPSR_31325", "ICPSR_29667", "ICPSR_29666",
  "ICPSR_29665", "ICPSR_29664", "ICPSR_29663",
  "ICPSR_29662"))

parole <- sapply(dataFiles, read.table, sep = '\t', header = TRUE, stringsAsFactors = FALSE)

idx <- sapply("/", regexpr, names(parole))
names(parole) <- substr(names(parole),1,idx-1)
names(parole) <- yearFile$YEAR[match(names(parole), yearFile$SOURCE)]

parole<-Map(cbind, parole, YEAR=names(parole))

parole = Reduce(function(...) merge(..., all=T), parole)

write.csv(parole, "../paroleSurvey.csv", row.names = FALSE)

Now you should have the files probationSurvey.csv and paroleSurvey.csv, both containing all annual data for their respective surveys, in a folder named paroleAndProbation.

Note: this code works for all current years of data available on these surveys, but will have to be manually adjusted as new years are added.

. filed under incarceration, statistics, and R.

rss