Importing Surveys of Incarcerated Populations into R

Importing surveys of inmates in local jails, state prisons, and federal prisons from SAS + ASCII format into R.

07 December 2017


The Bureau of Justice Statistics (BJS) has a ton of data regarding law-enforcement and incarceration. For a class research project, I am looking into examining data on the pre-incarceration incomes of incarcerated parents. For this topic, I am looking to examine three datasets: the Survey of Inmates in Federal Correctional Facilities (SIFCF), the Survey of Inmates in State Correctional Facilities (SISCF), and the Survey of Inmates in Local Jails (SILJ). I will be downloading the most recent datasets from each, which includes 2004 data from the SIFCF and SISCF as well as 2002 data from the SILJ.

This data, for the most part, is formatted to be accessible in closed-source programs that I would need to purchase a license to use. There are two problems with that: federally funded data should not require the public to purchase software to access it, and I don’t want to use that software. Here’s how to circumvent that and import the data from ASCII + SAS format into R:

#download datasets
#I am not allowed to re-distribute this original dataset
#use "ASCII + SAS setup" links
#unzip and concatenate datasets to same two folders (jails: "ICPSR_04359", prisons: "ICPSR_04572")

#set dir to dir with both data folders

#convert data to R, save in repsective folders
#SAScii package allows R to follow .sas input directions for .txt ASCII files, so long as the user specifies line in the .sas file at which "INPUT" starts manually
#Use a text editor to open .sas files and find "INPUT" lines
#define input lines
input04359_DS0001 <- 3681
input04359_DS0002 <- 50
input04572_DS0001 <- 3904
input04572_DS0002 <- 3898
input04572_DS0003 <- 424
input04572_DS0004 <- 424

#define lrcel, also manually from .sas files
lrcel04359_DS0001 <- 4679
lrcel04359_DS0002 <- 14326
lrcel04572_DS0001 <- 8845
lrcel04572_DS0002 <- 8845
lrcel04572_DS0003 <- 2580
lrcel04572_DS0004 <- 2580

#load data. takes a long time.
data04359_DS0001 <- read.SAScii("ICPSR_04359/DS0001/04359-0001-Data.txt", "ICPSR_04359/DS0001/", input04359_DS0001, lrecl = lrcel04359_DS0001)
data04359_DS0002 <- read.SAScii("ICPSR_04359/DS0002/04359-0002-Data.txt", "ICPSR_04359/DS0002/", input04359_DS0002, lrecl = lrcel04359_DS0002)
data04572_DS0001 <- read.SAScii("ICPSR_04572/DS0001/04572-0001-Data.txt", "ICPSR_04572/DS0001/", input04572_DS0001, lrecl = lrcel04572_DS0001)
data04572_DS0002 <- read.SAScii("ICPSR_04572/DS0002/04572-0002-Data.txt", "ICPSR_04572/DS0002/", input04572_DS0002, lrecl = lrcel04572_DS0002)
data04572_DS0003 <- read.SAScii("ICPSR_04572/DS0003/04572-0003-Data.txt", "ICPSR_04572/DS0003/", input04572_DS0003, lrecl = lrcel04572_DS0003) 
data04572_DS0004 <- read.SAScii("ICPSR_04572/DS0004/04572-0004-Data.txt", "ICPSR_04572/DS0004/", input04572_DS0004, lrecl = lrcel04572_DS0004)

#save in "data" folder

And that’s it. Analyze away.

. cross-posted on steemit and my site. filed under incarceration, statistics, and R.