PowerShell WebClient example to download a password secured file and email the completion status.

I seem to have got a lot of comments and questions on this from an earlier post so I promised to update this to include email status alerts. This is mainly for those enterprise folks that that haven’t yet move to PS v3 (though this works too on v3).

Automating a status email can be done with a simple function which takes two parameters as text strings.

  1. A subject line
  2. A body or detail message
## Define a simple function to send email alerts ##
function sendEmail ($subject, $detail) {
    $mail = New-Object System.Net.Mail.MailMessage
    $mail.From = "sender@yourdomain.com";
    $mail.To.Add("recipient@theirdomain.org");
    $mail.Subject = $subject;
    $mail.Body = $detail;
    $smtp = New-Object System.Net.Mail.SmtpClient -ArgumentList "your.smtpserver.com";
    $smtp.Credentials = New-Object System.Net.NetworkCredential("yourdomain\yourusername", "yourpassword");
    $smtp.UseDefaultCredentials = $true
    $smtp.Send($mail);
}

Next, the PowerShell code to perform the web download. I’m using a “Try, Catch”
block so we can get any error message into the body of an email to report failure status.


$date = Get-Date -format yyMMdd;
$mypath = "C:\tmp";

Try{
    ## Define the target URL ##
	$url = "http://colinwhite.net/foo.csv";

    ## Instantiate a WebClient Object from the .Net classes ##
	$request = New-Object System.Net.Webclient

    ## credentials for the proxy
    $request.Proxy.Credentials = [System.Net.CredentialCache]::DefaultNetworkCredentials

    ## Define the website login password ##
	$passwd	= ConvertTo-SecureString "YourPassword" -AsPlainText -Force;

    ## Define the website login username ##
	$request.Credentials = New-Object System.Management.Automation.PSCredential ("YourUserName", $passwd);

    ## Make the Get request and pipe output to the destination file ##
	$request.Downloadstring($url) | Out-File "$($mypath)\foo $($mydate).csv" -force;

    ## Send an email confirming the download completed ##
	sendEmail ("Downloaded $($date)", "Downloaded successfully to $($mypath\foo $($mydate).csv")	

    ### Send a success message to the console. Uncomment this for testing ##
    #Write-Host("Download Finished")

}

Catch [System.Net.WebException], [System.IO.IOException] {
    ## If there's an error send a message to the console ##
	#Write-Host "Nothing to download: (404)";

    ## Send an email indicating a download error ##
	sendEmail ("Failed Download on $($date)", "Nothing to download - $_error")

    ## Exit gracefully ##
	Exit

	}

Throw this all into one long PowerShell script (download.ps1) then add it to the Windows Scheduler as a Scheduled Task and you should be good to go.

Advertisements

PowerShell WebClient Example

UPDATE [August 2014] : I made an update to this post here.

 

I’m still amazed that PowerShell has no commandlet equivalent of WGET or CURL. Even in SQL Server Integration Services (SSIS) this is missing.

In the age of RESTful webservices, the official guidance (at the time of writing) is to write your own with custom .Net code. Seriously?!?

It seems they are indeed serious…

So here for your amusement is a simple  function you can use in your PS1 scripts. It calls the .Net WebClient class.

## Declare a function that takes source and destination arguments
Function Get-Webclient ($url, $out) {

$proxy = [System.Net.WebRequest]::GetSystemWebProxy()

$proxy.Credentials = [System.Net.CredentialCache]::DefaultCredentials

$request = New-Object System.Net.WebCLient

$request.UseDefaultCredentials = $true ## Proxy credentials only

$request.Proxy.Credentials = $request.Credentials

$request.DownloadFile($url, $out)

}

## Call the function with our source and destination.
Get-Webclient "http://www.google.com" "C:\tmp\Foo3.txt"

But what if the destination is password protected and you need authentication credentials?

## Declare a function that takes source and destination arguments
Function Get-Webclient ($url, $out) {

$proxy = [System.Net.WebRequest]::GetSystemWebProxy()

$proxy.Credentials = [System.Net.CredentialCache]::DefaultCredentials

$request = New-Object System.Net.WebCLient

$request.UseDefaultCredentials = $true

$request.Proxy.Credentials = $request.Credentials

$passwd	= ConvertTo-SecureString "mypassword" -AsPlainText -Force; ## Website credentials

$request.Credentials = New-Object System.Management.Automation.PSCredential ("username", $passwd);

$request.DownloadFile($url, $out)

}

## Call the function with our source and destination.
Get-Webclient "http://www.securedwebsite.com" "C:\tmp\Foo3.txt"

Automatic Import of HealthStream Survey Data

If your healthcare institution is subject to Pay for Performance rules (probably Value Based Purchasing), chances are you are a HealthStream customer. In which case, someone on your staff probably has the unenviable job of using the Insights Online portal to get at your Patient Experience survey data.

The following PowerShell script can be scheduled, to download (via SFTP) your Raw_Extract data and import it into your SQL Server.

For scheduling, you’ll need to use the Windows Task Scheduler rather than the SQL Server Agent Job. For some reason the PowerShell array handler @(..) it does not play nicely with the SQL Server Agent Job.

You’ll also need to install two dependencies (PSFTP and 7-Zip) to make this work.

Also, modify the $fromdate and $todate depending how HealthStream admins setup your exract schedule.


##
## Colin A. White August 2013
##
## PowerShell Script to get weekly Healthstream
## compressed, password protectd archive
##

$psftpPath = "C:\Program Files (x86)\PuTTY\psftp.exe"
$7zpath = "C:\Program Files\7-Zip\7z.exe"

$sftpHost = "hsrftp.healthstreamresearch.com"
$userName = "HealthStreamUserName"
$userPassword = "HealthStreamPassword"

# Used to guess the zip file names
$fromdate = (Get-Date).AddDays(-15).ToString("yyyMMdd");
$todate = (Get-Date).AddDays(-9).ToString("yyyMMdd");

$filename = "RawData_Extract_$($fromdate)_$($todate)"
# Testing stub
# $filename = "RawData_Extract_20130720_20130726"

# Download via SFTP
$cmd = @("lcd D:\Healthstream", "get $($filename).zip", "bye")
$cmd | & $psftpPath -pw $userPassword "$userName@$sftpHost"

# Extract via 7-Zip
& $7zPath e -oD:\Healthstream -y -p"HealthStreamPassword" "D:\Healthstream\$($filename).zip"

# Bulk INSERT via SQL
Import-Module "sqlps" -DisableNameChecking
Invoke-Sqlcmd "BULK INSERT [Healthstream].[dbo].[HealthstreamData] FROM 'D:\Healthstream\$($filename).csv' WITH (FIRSTROW=2, CODEPAGE='Raw', FIELDTERMINATOR=',', ROWTERMINATOR='\n')"

# Remove left over text qualifiers
Invoke-Sqlcmd " UPDATE [Healthstream].[dbo].[HealthstreamData] SET [RespondentID] = REPLACE([RespondentID], `'`"`', `'`');
UPDATE [Healthstream].[dbo].[HealthstreamData] SET [ClientName] = REPLACE([ClientName],`'`"`', `'`');
UPDATE [Healthstream].[dbo].[HealthstreamData] SET [SurveyType] = REPLACE ([SurveyType], `'`"`', `'`');
UPDATE [Healthstream].[dbo].[HealthstreamData] SET [UnitDescription] = REPLACE([UnitDescription], `'`"`', `'`');
UPDATE [Healthstream].[dbo].[HealthstreamData] SET [SutdyStartDate] = REPLACE([SutdyStartDate], `'`"`', `'`');
UPDATE [Healthstream].[dbo].[HealthstreamData] SET [StudyEndDate] = REPLACE ([SutdyStartDate], `'`"`', `'`');
UPDATE [Healthstream].[dbo].[HealthstreamData] SET [StudyEndDate] = REPLACE([StudyEndDate], `'`"`', `'`');
UPDATE [Healthstream].[dbo].[HealthstreamData] SET [DischargeDate] = REPLACE([DischargeDate], `'`"`', `'`');
UPDATE [Healthstream].[dbo].[HealthstreamData] SET [CollectionDate] = REPLACE ([CollectionDate], `'`"`', `'`');
UPDATE [Healthstream].[dbo].[HealthstreamData] SET [Question] = REPLACE([Question], `'`"`', `'`');
UPDATE [Healthstream].[dbo].[HealthstreamData] SET [QuestionDesc] = REPLACE ([QuestionDesc], `'`"`', `'`');
UPDATE [Healthstream].[dbo].[HealthstreamData] SET [AnswerValue] = REPLACE([AnswerValue], `'`"`', `'`');
UPDATE [Healthstream].[dbo].[HealthstreamData] SET [AnswerDesc] = REPLACE([AnswerDesc], `'`"`', `'`');
UPDATE [Healthstream].[dbo].[HealthstreamData] SET [AnswerIncludeInAdjN] = REPLACE ([AnswerIncludeInAdjN], `'`"`', `'`');
"

# Post-processing Clean Up
If ((Test-Path "D:\Healthstream\$($filename).csv") -eq $True) { Remove-Item "D:\Healthstream\$($filename).csv" }
If ((Test-Path "D:\Healthstream\$($filename).zip") -eq $True) { Move-Item -Path "D:\Healthstream\$($filename).zip" -Destination "Microsoft.PowerShell.Core\Filesystem::\\aFileServer\YourFileArchive\Healthstream\$($filename).zip" -Force}

De-duplicating Massive Tables

Challenge

When tables run to tens or even hundreds of millions of rows, removing duplicate entries is no longer a trivial effort. For smaller tables of say, less than 1 million rows, this works great –

SELECT * INTO #temp FROM [dbo].[goodSizedTable]
TRUNCATE TABLE [dbo].[goodSizedTable]
INSERT INTO [dbo].[goodSizedTable]
SELECT DISTINCT * FROM #temp
DROP TABLE #temp

This doesn’t scale too well with humongous tables and limited server RAM or undersized volumes for tempdb. If you’re unfortunate enough to have tempdb hosted on spinning disk (as opposed to SSD), you’ll really notice I/O problems (Buffer Waits).

When you start seeing timeout errors with SqlCmd.ExecuteNonQuery() you’ll need to set SqlCmd.CommandTimeout to something much larger than the default 30 seconds. This might ‘keep the wheels on’ but doesn’t feel like a solution.

Solution

Begin by counting your duplicates.

with myCTE AS(
SELECT [Column1], ROW_NUMBER() OVER (PARTITION BY [Column1], [Column2], [Column3] ORDER BY [Column1]) AS [Row]
FROM [dbo].[humongousTable]
)
SELECT * FROM myCTE WHERE [Row] > 1

If the number is non-trivial then the pattern for your solution lies in there. Though not rigorously tested, this seems to run an order of magnitude faster.

DELETE [dups]
FROM (SELECT ROW_NUMBER() OVER (PARTITION BY [Column1], [Column2], [Column3] ORDER BY [Column1]) AS [SeqID] FROM [dbo].[humongousTable]) AS [dups]
WHERE [SeqID] > 1

Downloading Medicare Data with Webclient

Challenge

Automate the download of data sets from data.medicare.gov data mart using their Socrata Open Data API (SODA).

Solution

This version pulls Hospital HCAHPS data in XML but can be easily edited to bring back csv or xls formats.

# # Check and Correct the .Net CLR Env settings
if ([Environment]::Version.Major -lt 4) { $configFile = Join-Path $PSHOME -ChildPath '.\powershell.exe.config'if (-not(Test-Path $configFile)) { @' <?xml version="1.0"?> <configuration> <startup useLegacyV2RuntimeActivationPolicy="true"> <supportedRuntime version="v4.0.30319" /> <supportedRuntime version="v2.0.50727" /> </startup> </configuration> '@ | Out-File -FilePath$configFile -Encoding UTF8 "Created $configFile" } 'Restart powershell in order to make it run in .NET 4' }

# Set the extention to xml, csv, xls
$xml_source = "https://data.medicare.gov/resource/jp3v-djai.xml";

# Substitute your password for **PWD**
$dest_pwd = ConvertTo-SecureString "**PWD**" -AsPlainText -Force;

# Specify where to put the downloaded file and make sure the extension matches
$xml_destination = "V:\test.xml";

# Use the default user settings for your enterprise Proxy access
$proxy = [System.Net.WebRequest]::DefaultWebProxy;
$proxy.Credentials = [System.Net.CredentialCache]::DefaultNetworkCredentials;

# Instantiate the .net WebClient object and call the proxy and credential methods
$Wc = New-Object System.Net.Webclient;
$Wc.Proxy = $proxy;

# Substitute **UID** for your login User ID
$Wc.Credentials = New-Object System.Management.Automation.PSCredential ("**UID**", $dest_pwd);

# Call the actual WebCLient DownloadFile method
$Wc.DownloadFile($xml_source,$xml_destination);

# Return some console message for debugging
if ($Wc -ne 0){Write-Host "Opened $url successfully" -ForegroundColor Red }else {Write-Host "Unable to access $url" -ForegroundColor Yellow }