Social Security Agency Death Master File Extract & Transform with Bash

The DMF from the SSA (via dmf.ntis.gov) is a pretty typical ‘fixed width’, ascii dataset.

It’s an interesting data set; and the only one I know that has it’s own Facebook Page (?!?)

This is a quick (and dirty) shell script to convert it into a more load-friendly CSV file.

This is the *nix version of an earlier post which did the same thing using Powershell.

Use it on the decompressed raw file with –

$ ./csvmake.sh ssdm1 >> ssdm1.csv

Cut and paste this into a text file called csvmaker.sh

#!/bin/sh
# Substitute any offending semicolons and commas with white space
cat $1 | tr ";" " " | tr "," " " |
# Insert new clean semicolon separators
awk '{
ACD=substr($0, 1, 1)
LName=substr($0, 11, 20)
Suffix=substr($0, 31, 4)
FName=substr($0, 35, 15)
MName=substr($0, 50, 15)
PCode=substr($0, 65, 1)
# Reformat SSNs to 11 digit format
SSNa=substr($0, 2, 2)
SSNb=substr($0, 4, 2)
SSNc=substr($0, 7, 2)
# Reformat Date of Death formats
DODDate=substr($0, 66, 2)
DODMonth=substr($0, 68, 2)
DODYear=substr($0, 70, 4)
# Reformat Date of Birth formats
DOBDate=substr($0, 74, 2)
DOBMonth=substr($0, 76, 2)
DOBYear=substr($0, 78, 4)
# Print it all back out
printf ("%s,%s-%s-%s,%s,%s,%s,%s,%s,%s-%s-%s,%s-%s-%s\n", \
ACD, SSNa, SSNb, SSNc, LName, Suffix, FName, MName, PCode, \
DODYear, DODMonth, DODDate, DOBYear, DOBMonth, DOBDate ) \
| tr -d " "
}'
# END
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s