Sunday, December 23, 2012

Detect SQL script changes by using .SQL file checksum

I recently spent some time to find out the way of detecting the modification of SQL script file. Why I needed it is not that important, but quick summary is 'there is a application running a SQL script every 15 mins; if there is any change in script it runs the script; otherwsie it exits; to specify the script changes, the script has version number variable (in SQLCMD variable) and compare it with a SQL table (version table containing latest version data).' The downside of using version number in SQL script is that everytime someone changes the script, s/he also should manually increment version number. Sometime people forgot to do it, hence the bothersome problem.
So the solution I bring up is to detect SQL script file change by using the hash checksum of the file.
First step is to read the SQL script file in BLOB format by using OPENROWSET().

:setvar script1 C:\Script1.sql
DECLARE @val BIGINT
SELECT @val = master.sys.fn_repl_hash_binary(BulkColumn) 
FROM OPENROWSET(BULK '$(script1 )', SINGLE_BLOB) AS A

OPENROWSET(BULK ...) uses BULK rowset provider that accepts full file path and read mode such as BLOB, CLOB. The sentance above returns binary data of the file in BulkColumn.

Next step is to hash the binary data. SQL Server has a built-in function called fn_repl_hash_binary() that accepts binary data and returns a hash value.

So by comparing this @val with the version column value of version table, I was able to decide whether the SQL script is modified or not.

One of the problem I ran into though is OPENROWSET(BULK) only accepts FULL file path. I only can specify relative path in my case. So it was a headache... Fortunately the SQL script used SQLCMD mode,so I can get around this problem by using this approach.

(1) First at the begining of main script, I added the following statements. It calls VersionCheck.cmd batch file (see (2)) that will create _ChkSum.sql. Running _ChkSum.sql actually calculates the checksum of SQL script file(s) and save the result to #ChkSum temp table. Once _ChkSum.sql is run, I read the #ChkSum temp table and fetch the chksum value. Then I simply compare the variable with stored SQL table value.

-- Run version check. If there is no script change, exit.
:!! VersionCheck.cmd
:r _ChkSum_.sql
GO 

DECLARE @chk NUMERIC(38)
SELECT TOP 1 @chk = chk FROM #ChkSum

IF EXISTS (SELECT * FROM Master.Sys.Databases WHERE name = 'MyDB') 
   IF EXISTS (SELECT * FROM $(dbName).Sys.Tables WHERE name = 'VersionControl')
      IF EXISTS (SELECT * FROM $(dbName).dbo.VersionControl 
                 WHERE Id=1 AND ChkSum=@chk) 
      BEGIN   
         RAISERROR('No change. Exit now.', 2, 1);   
      END
GO


(2) Here is VersionCheck.cmd. This file contains OPENROWSET() statement that calculates SQL Script file checksum. The example below calculate the checksum of two SQL files and add them up.

@echo off
@echo :setvar script1 script1.sql > _ChkSum_.sql
@echo :setvar script2 script2.sql >> _ChkSum_.sql

@echo SET NOCOUNT ON >> _ChkSum_.sql
@echo DECLARE @val BIGINT >> _ChkSum_.sql
@echo DECLARE @chk NUMERIC(38) >> _ChkSum_.sql

@echo SELECT @val = master.sys.fn_repl_hash_binary(BulkColumn)  >> _ChkSum_.sql
@echo FROM OPENROWSET(BULK '%~dp0$(script1)', SINGLE_BLOB) AS A >> _ChkSum_.sql
@echo SET @chk = CAST(@val as NUMERIC(38)) >> _ChkSum_.sql

@echo SELECT @val = master.sys.fn_repl_hash_binary(BulkColumn)  >> _ChkSum_.sql
@echo FROM OPENROWSET(BULK '%~dp0$(script2)', SINGLE_BLOB) AS A >> _ChkSum_.sql
@echo SET @chk = @chk + CAST(@val as NUMERIC(38)) >> _ChkSum_.sql

@echo CREATE TABLE #ChkSum(chk NUMERIC(38)) >> _ChkSum_.sql
@echo INSERT #ChkSum VALUES (@chk) >> _ChkSum_.sql
@echo GO >> _ChkSum_.sql

  • To specify absolute full file path, I used %~dp0 in OPENROWSET(). This will be replaced by current working directory.
  • Please note that I used @chk variable as NUMERIC(38) because arithmetic overflow can occur if @chk is defined as BIGINT.
  • After sum of each checksum is added up, I saved the value to temp table (#ChkSum). Temp table can last in session so it can read in main script.
As a downside, using this approach might be slow if file is huge since it requires file reading.
But with relatively big file, this approach worked fine to me.

No comments:

Post a Comment