Automatic Import of HealthStream Survey Data

If your healthcare institution is subject to Pay for Performance rules (probably Value Based Purchasing), chances are you are a HealthStream customer. In which case, someone on your staff probably has the unenviable job of using the Insights Online portal to get at your Patient Experience survey data.

The following PowerShell script can be scheduled, to download (via SFTP) your Raw_Extract data and import it into your SQL Server.

For scheduling, you’ll need to use the Windows Task Scheduler rather than the SQL Server Agent Job. For some reason the PowerShell array handler @(..) it does not play nicely with the SQL Server Agent Job.

You’ll also need to install two dependencies (PSFTP and 7-Zip) to make this work.

Also, modify the $fromdate and $todate depending how HealthStream admins setup your exract schedule.


##
## Colin A. White August 2013
##
## PowerShell Script to get weekly Healthstream
## compressed, password protectd archive
##

$psftpPath = "C:\Program Files (x86)\PuTTY\psftp.exe"
$7zpath = "C:\Program Files\7-Zip\7z.exe"

$sftpHost = "hsrftp.healthstreamresearch.com"
$userName = "HealthStreamUserName"
$userPassword = "HealthStreamPassword"

# Used to guess the zip file names
$fromdate = (Get-Date).AddDays(-15).ToString("yyyMMdd");
$todate = (Get-Date).AddDays(-9).ToString("yyyMMdd");

$filename = "RawData_Extract_$($fromdate)_$($todate)"
# Testing stub
# $filename = "RawData_Extract_20130720_20130726"

# Download via SFTP
$cmd = @("lcd D:\Healthstream", "get $($filename).zip", "bye")
$cmd | & $psftpPath -pw $userPassword "$userName@$sftpHost"

# Extract via 7-Zip
& $7zPath e -oD:\Healthstream -y -p"HealthStreamPassword" "D:\Healthstream\$($filename).zip"

# Bulk INSERT via SQL
Import-Module "sqlps" -DisableNameChecking
Invoke-Sqlcmd "BULK INSERT [Healthstream].[dbo].[HealthstreamData] FROM 'D:\Healthstream\$($filename).csv' WITH (FIRSTROW=2, CODEPAGE='Raw', FIELDTERMINATOR=',', ROWTERMINATOR='\n')"

# Remove left over text qualifiers
Invoke-Sqlcmd " UPDATE [Healthstream].[dbo].[HealthstreamData] SET [RespondentID] = REPLACE([RespondentID], `'`"`', `'`');
UPDATE [Healthstream].[dbo].[HealthstreamData] SET [ClientName] = REPLACE([ClientName],`'`"`', `'`');
UPDATE [Healthstream].[dbo].[HealthstreamData] SET [SurveyType] = REPLACE ([SurveyType], `'`"`', `'`');
UPDATE [Healthstream].[dbo].[HealthstreamData] SET [UnitDescription] = REPLACE([UnitDescription], `'`"`', `'`');
UPDATE [Healthstream].[dbo].[HealthstreamData] SET [SutdyStartDate] = REPLACE([SutdyStartDate], `'`"`', `'`');
UPDATE [Healthstream].[dbo].[HealthstreamData] SET [StudyEndDate] = REPLACE ([SutdyStartDate], `'`"`', `'`');
UPDATE [Healthstream].[dbo].[HealthstreamData] SET [StudyEndDate] = REPLACE([StudyEndDate], `'`"`', `'`');
UPDATE [Healthstream].[dbo].[HealthstreamData] SET [DischargeDate] = REPLACE([DischargeDate], `'`"`', `'`');
UPDATE [Healthstream].[dbo].[HealthstreamData] SET [CollectionDate] = REPLACE ([CollectionDate], `'`"`', `'`');
UPDATE [Healthstream].[dbo].[HealthstreamData] SET [Question] = REPLACE([Question], `'`"`', `'`');
UPDATE [Healthstream].[dbo].[HealthstreamData] SET [QuestionDesc] = REPLACE ([QuestionDesc], `'`"`', `'`');
UPDATE [Healthstream].[dbo].[HealthstreamData] SET [AnswerValue] = REPLACE([AnswerValue], `'`"`', `'`');
UPDATE [Healthstream].[dbo].[HealthstreamData] SET [AnswerDesc] = REPLACE([AnswerDesc], `'`"`', `'`');
UPDATE [Healthstream].[dbo].[HealthstreamData] SET [AnswerIncludeInAdjN] = REPLACE ([AnswerIncludeInAdjN], `'`"`', `'`');
"

# Post-processing Clean Up
If ((Test-Path "D:\Healthstream\$($filename).csv") -eq $True) { Remove-Item "D:\Healthstream\$($filename).csv" }
If ((Test-Path "D:\Healthstream\$($filename).zip") -eq $True) { Move-Item -Path "D:\Healthstream\$($filename).zip" -Destination "Microsoft.PowerShell.Core\Filesystem::\\aFileServer\YourFileArchive\Healthstream\$($filename).zip" -Force}

Advertisements

De-duplicating Massive Tables

Challenge

When tables run to tens or even hundreds of millions of rows, removing duplicate entries is no longer a trivial effort. For smaller tables of say, less than 1 million rows, this works great –

SELECT * INTO #temp FROM [dbo].[goodSizedTable]
TRUNCATE TABLE [dbo].[goodSizedTable]
INSERT INTO [dbo].[goodSizedTable]
SELECT DISTINCT * FROM #temp
DROP TABLE #temp

This doesn’t scale too well with humongous tables and limited server RAM or undersized volumes for tempdb. If you’re unfortunate enough to have tempdb hosted on spinning disk (as opposed to SSD), you’ll really notice I/O problems (Buffer Waits).

When you start seeing timeout errors with SqlCmd.ExecuteNonQuery() you’ll need to set SqlCmd.CommandTimeout to something much larger than the default 30 seconds. This might ‘keep the wheels on’ but doesn’t feel like a solution.

Solution

Begin by counting your duplicates.

with myCTE AS(
SELECT [Column1], ROW_NUMBER() OVER (PARTITION BY [Column1], [Column2], [Column3] ORDER BY [Column1]) AS [Row]
FROM [dbo].[humongousTable]
)
SELECT * FROM myCTE WHERE [Row] > 1

If the number is non-trivial then the pattern for your solution lies in there. Though not rigorously tested, this seems to run an order of magnitude faster.

DELETE [dups]
FROM (SELECT ROW_NUMBER() OVER (PARTITION BY [Column1], [Column2], [Column3] ORDER BY [Column1]) AS [SeqID] FROM [dbo].[humongousTable]) AS [dups]
WHERE [SeqID] > 1

Example Excel Spreadsheet Import into SQL Server with TSQL

There are a number of ways to get Excel (.xls) data into SQL Server but this seems to be the most reliable (and simplest!). Particularly when trying to schedule it as a SQL Server Agent Job. It allows you to import from specific Worksheet tabs too.

INSERT INTO [dbo].[MyTable]
SELECT *
FROM OPENDATASOURCE('Microsoft.ACE.OLEDB.12.0','Data Source=\\$host\D$\MyData.xls; Extended Properties=Excel 8.0')...[MyWorkSheetName$]

Extracting Password Protected Archives with PowerShell

Challenge

It appears there is no clean (easy) way in PowerShell to extract a compressed archive that is password protected. Booo!

Solution

After much searching it seems the easiest way involves downloading and installing the free and Open Source 7Zip.

This then makes the solution very easy…

$7-ZipPath = "C:\Program Files\7-Zip\7z.exe"
$zipFile = "passwordprotectedtest.zip"
$zipFilePassword = "Foo"

& $7-ZipPath e -oC:\ -y -tzip -p"$zipFilePassword" "$zipFile"