For a new data migration project, we have a SSIS SSDT solution with quite a few projects in it. Visual studio does not all to deploy all projects at once with a single mouse click. So I created a simple powershell script that does the job. It reads the .sln file, grabs each project and then pushes the .ISPAC file into the SSIS catalog. Also, it creates the catalog itself when required. I only use it to deploy to the dev box, but with some extra logging and options it could even be used within a CI/CD pipeline.
T-SQL MERGE ELT with lineage
Disclaimer: The solution in this post relies on the T-SQL merge statement which Azure SQL Data Warehouse does not support.
Combined with the lineage solution, which uses temporal tables, it is advised to use SQL Server 2016+ or Azure SQL V12+.
I’m a big fan of the MERGE statement. Oh yes. That and ELT over ETL. Turns out they go hand in hand 🙂
In this blog I outline my preferred approach to load data from staging into dimension tables. It is based on Alex Whittles’ work.
On my github you can find a complete example solution illustrating:
- SCD1 approach via dbo.prc_SCD1_DimCustomer
- SCD2 approach via dbo.prc_SCD2_DimCustomer
- Simple lineage for both via dbo.prc_UpdateLineage
In ETL even a simple lineage system can help to track and troubleshoot the origins of the data loaded in your model.
Hence it only felt right to include it here as well. Though it might be to simplistic for more complex projects.
This solution is based on SQL temporal tables to keep a history record as the data changes are brought in. Input is taken from the SCD merge statements to know what exact keys were updated or inserted:
- dbo.lineage -> parent table, with fields to indicate what was loaded and how many records it affected.
- dbo.lineagedetails -> contains an updated and inserted column, with xml structured data such as the business keys.
Because both tables are system versioned (temporal) you get 4 tables: the 2 above plus their history tables which you can query for previous ETL runs.
In this first example, the dimension table dbo.DimCustomer gets loaded based on a staging view.
It exploits the core of T-SQL MERGE. Nothing fancy to it, yet very effective. You can control what fields get overwritten by modifying the
"WHEN MATCHED AND" "THEN UPDATE SET" logic.
I prefer to make use of views as a source input. It abstracts the sometimes complex join statements, allows for transformations and name the output fields to match the target table (DimCustomer).
The MERGE output clause is used to feed the lineage procedure.
Note that technically you do not need the extra columns such as “IsRowCurrent” in SCD1 – it is there since I use the same DimCustomer for both SCD examples.
In this case, when data is updated, the old record must be flagged obsolete by setting IsRowCurrent=0 and set the ValidTo datestamp.
Then, to insert the changed data as a new record, a little trick is used.
New records are inserted directly from within the MERGE statements, the updated rows however are first inserted into the temporary table #DimCustomer and the action UPDATE or INSERT is added.
Then, from the temp table it is added to DimCustomer via
"WHERE [Action] = 'UPDATE'". The result is a newly added row into dbo.DimCustomer.
#DimCustomer is used because a. SQL server requires this when a foreign key constraint exists (i.e. from the fact table) and b. because in this case we use it as lineage input.
To test out both approaches git clone the repo, and deploy the database and IS project to your dev instance. Then, in SQL management or Azure data studio you can use example.sql. I welcome your thoughts.
DACPAC compare you say?
DACPAC’s are a nice way to deploy your SQL database projects. I have been using SSDT database projects for a few years now, and could not live without them. Especially handy during a development cycle when you deploy (publish) to a local dev instance multiple times per day. Beyond that, deploying your release to UAT and eventually production could be more challenging. Not all data migrations for instance are not supported out of the box.
Recently I came across the need to recreate all database objects for each release. We figured this was the most stable option, and it was possible as the database -which is used for reporting- is fetching data from another database, and basically only contains a bunch of views and procedures along with some static data. The in house deployment solution was not supporting DACPAC (sqlpackage.exe) based deployments. So it had to be plain SQL.
As I like to have things as automated as possible, I tend to leverage processes such as TFS build. This was the initial approach:
- Build task: Fetch two predefined SQL scripts from source control:
- Script 1: to drop all desired objects – drop database was not possible.
- Script 3: to reinsert the static data, after deploying the DACPAC sql.
- Build task: build the reporting solution (.sln), so it generates a new DACPAC.
- Build task: use sqlpackage.exe to compare the freshly built reporting DACPAC against an empty database, and have it output the SQL script ( /a:script option). This is script 2.
We could then have the deployment tooling execute these 3 scripts in sequence when releasing to UAT/production. Having an empty database hosted up for this is quite an overkill, so I wanted to have the reporting database DACPAC to compare against an “empty DACPAC” during the build process. Sadly SQLpackage.exe does not support this. After some research I was relieved that the underlying DAC framework API’s do support it though. So I ended up creating a simple C# console application, which we are now using during the build process.
If you can use this approach, have a look at: https://github.com/vminds/Daccompare
consoleLogger.WriteInfoLine("Generating deploy script"); string script = DacServices.GenerateDeployScript(sourcePac, targetPac, _targetDatabaseName, _deployOptions); File.WriteAllText(_scriptpath, script);
SSRS powershell deployment
Yet another post on this topic? Well yes, historically deploying -the now called paginated reports- was pretty much relying on the cumbersome RS.exe. Luckily Microsoft started a powershell based ReportingServicesTools github repo back in 2016. Which is quite an improvement, but still I wanted a fully fledged way to deploy reports based on the actual release build of my SSDT reporting solution. And guess what, turns out Tim Abell had already created an awesome project on that. It had a few shortcomings for what I wanted to get done, hence I forked the project, and added a few features such as:
- Logging integration using ScriptLogger module
- Log dump of reportfolder hierarchy
- Caller script (DeploySSRSSolution.ps1) which will call the solution deploy
- Remove of RDL/datasource which are no longer in the project
- Hash compare of RDL files (OverwriteReports param)
- Support for database based credentials on datasources
Eventually I will refactor it to make full use of the Reporting Services Tools from Microsoft.
Check it out here: https://github.com/vminds/ssrs-powershell-deploy