[this intelligent life]



This Intelligent Life

By Mark Wojciechowicz 05 Aug, 2017

One Friday, on the way home from work, it occurred to me that I've probably written some form of this SQL test framework a half a dozen times.  I realized that I would save a lot of time for myself and others, who are interested in testing ETL code, if I wrote an open source library. So, I took it upon myself that weekend, to pour all my learnings, from a wealth of mistakes, into a code base that would handle all the boilerplate stuff that you need for writing tests efficiently.

My goal was to create something that was easy to use, easy to debug and easy to get started. SqlTest is meant to be used in a C# test project in conjunction with a test harness like nunit or mstest. I’m well aware that most SQL / SSIS developers do not live in .net and some are downright afraid of it. In fact, another popular testing framework, tSQLt , is an excellent option for many projects because you can write tests in native SQL, plus you get all the goodies that come with it, such as fakes. However, there were several reasons why I chose to go my own way:

  1.  tSQLt wraps everything in transactions, so all test behavior is rolled back at the end of each test. This is great for database tests, but no good for SSIS tests which are going to initiate their own connections and transactions.
  2. Nunit has parameterized tests (or TestCase ). This allows you to create a dozen related tests very quickly, each with different input to the same test. This is tremendously efficient.
  3. By going the route of SqlTest and Nunit, this allows a consistent approach to testing for both the database and SSIS.
  4. Database tests in mstest are an elaborate hunk of junk.
    1. They don’t allow you to wrap a test in a transaction and roll it back, if you use Setup / tear down
    2. They have several layers of set up and tear down methods, one which executes for a given test and another which executes for all.
    3. There are 7 different windows to find a test, setup, teardown, initialize, cleanup, asserts and assert properties. Never mind that every test is in its own window. Needless to say, it is a nightmare to debug.
    4. Diffs cause eye-pain in source control
    5. But they do give you syntax highlighting, which is nice.  I’m telling you all this so you know what horrors I’m sparing you from

The initial release of SqlTest contains the essentials for testing. Methods for:

  • executing any adhoc sql statement for setting up and running a test
  • fetching a scalar result from the database
  • executing a package in the catalog
  • faking a table and dropping the fake
  • faking a view and dropping the fake

There are certainly some other features which will be useful, like executing a package from the file system for those poor souls who’ve yet to migrate to the project model. And there are some things which are less useful, like asserts on a dataset, which tends to be very fragile. I will plan to add more in time – as well, I hope for some suggested features from you.

There’s documentation about how it works here: https://github.com/MarkWojciechowicz/SqlTest/wiki

 I encourage you to try it out today, this is an excellent compliment for any SSDT database project.   Just create a new c# project in the same solution.  You can install  SqlTest  from nuget by opening the Package Manager Console in visual studio and typing Install-Package SqlTest. 

A sample comes with the install, but below is a quick example to see how to use it.

Enjoy!

By Mark Wojciechowicz 08 Jul, 2017
SSIS allows the ability to force a task to fail in order to test unhappy paths.  It's a bit confusing about how to do this because you could do any of the following:
  • Add an error to a sql task like divide by zero
  • Set the execution result to failure in a script task
  • Change the Forced execution value in the property of the task
The first two options could work, but for the purposes of avoiding changing code, you can just set the ForceExecutionResult property for the task as shown below.  The other execution result properties do not do anything and fail package on failure will not help either.

By Mark Wojciechowicz 30 Jun, 2017
Jenkins (the CI server) has a myriad of plugins, one of which is for executing powershell scripts.  I rely on powershell heavily because I find a number of the other add-ins very limiting.  In this particular case, I needed to run nunit tests and I needed the job to fail when the tests fail.  I'll post more on the nunit thing at a later date, for which this is a plugin but it won't work for database projects.

When the powershell script throws an error, the build just writes the error to the log and keeps chugging along.  The magic button to make the build fail is to add this to the top of the script:
By Mark Wojciechowicz 17 Jun, 2017
Recently, I had the task of rebuilding a very large table in order to partition it and put it on a columnstore index.  While testing this out in the lower environments, I quickly blew up the log and ran out of disk on the server, so I decided to batch it out instead.  Below is an example of the query I used.
By Mark Wojciechowicz 09 Jun, 2017
One handy thing with BIML, the T4 Templating tool for SSIS, is that you can reference cs files which can contain a .net class with helper methods.  This can help keep your biml code clean and readable.

In this example, I'll show how to iterate over a list to generate custom dataflow tasks.  Some BIML examples illustrate generating dataflows based on a set of tables.  However, it does not seem very realistic that the source query and target table look precisely the same.  This example will assume we have some custom query and a target table.

Previously, I have used a similar method which pulled metadata from a database table.  However, I find that it is difficult to maintain a project across contributors because everyone needs to insure that the table data is refreshed before regenerating the biml.  Also, it's difficult to troubleshoot because the data driving the solution is in a separate place from the code.  This example puts the code and metadata in one place.

First,  we will add a cs file to the project with this silly method:
  1. Write click the project > Add Biml File
  2. Rename the Biml file to PackageSource.cs
  3. Replace the biml template with the following code: 
By Mark Wojciechowicz 16 May, 2017
In most places I've worked, wfh was an essential option that helped avoid long commutes, allowed me to take care of personal business (like the dentist or picking the kids up off the bus) and allowed for some sanctuary to code.  Some places I've worked used this mode so often that you never had a clue if someone was in the office or at home.  This is because they used communication tools so effectively like skype, webex and slack to name a few.  There are some other tools that are often classified for project management, which were equally essential for communication - trello, jira or any flavor of kanban.  And this is one of the key reasons why wfh was even possible.  For those businesses that don't have a work from home policy or, in worse cases, strictly forbid it, they have some catching up to do to remain competitive in the market place.  I've turned down plenty of offers for this very reason.  

Companies that don't have a wfh policy often manage their workforce by the duration that their workers occupy their seats, with stringent hours or even such policies as unpaid lunches.  These desperate actions are in place because the employer is measuring the wrong thing with the wrong tools.  Measuring work input seems like a sensible approach, but it creates incentives to work less efficiently, because there is no incentive to succeed or complete work.  On the other hand, measuring output turns the focus on what's getting done and ignores how that happens.  Should someone have an inspiration in the the middle of the night, they can go work on that to their heart's content.  If they are measured by time, there is little point in thinking on a project off hours.  

When you begin to measure by stories completed, as in a burn down chart, or monitor the rate of check-ins, you can start to manage to the finish line.  The workforce starts to look like a sprint team, as opposed to a bunch of out-of-shape 6th graders, dragging their feet around the track in gym class.  Techniques like paired (or mob) programming become an option, because we don't consider two people working on one task as halving our effort, but raising the overall quality of the code.  And as the code quality rises, fewer defects are created and the total cost of the code goes down.  Skills and techniques are transferred so the team as a whole becomes more effective and the cost of future projects drop.

So why would I turn down a job that has no work from home policy?  Clearly it's a more oppressive environment that shuns learning and sharing and smells of desperation and defeat.  I think anyone would.
By Mark Wojciechowicz 18 Apr, 2017
I was setting up an SSDT solution in git at a new client, when I discovered a periodic issue in which my changes did not appear in the team explorer window. I knew full well I had made changes, and when I ran git status from the command prompt the unstaged changes were listed. I thought something was wrong with my software install, either with git for windows or visual studio, but it turned out to be another issue altogether.  There is a discussion about it here:

http://stackoverflow.com/questions/37704514/visual-studio-2015-database-project-directory-contains-a...

The short answer is that a file with the extension .jfm is used by SSDT for processing and is not essential for source control.  This file gets locked by the Esent engine (also used in windows) and blocks git from operating correctly.  The file should be added to gitignore and removed from the repo.  Once done, team explorer works correctly again.

For the sake of putting this information somewhere, here is what I have in the gitignore file (this is targeted for an MSBI application):

By Mark Wojciechowicz 11 Aug, 2016
The above chart was created in Raw , which is a handy web-based tool for creating quick data visualizations.  I came about Raw from a course I started on D3 and Data Visualization , taught by Udacity.  I have taken quite a few online classes, but I am super impressed with the material for this course and how well it's presented.  I think Udacity really has the online interface down and I am excited to take some more.
By Mark Wojciechowicz 03 Jun, 2016
So there are some pretty basic steps for embedding content into a webpage.  
  1. In the report view, click on File > Publish To Web
  2. Create the embed code and hit next
  3. Copy the link for the iFrame
  4. Paste the iFrame link into your webpage code or blog.
By Mark Wojciechowicz 04 May, 2016
Recently, the company that I work for decided to overhaul my team's BI blog, looking for something a bit more polished.  It's true that our posts were raw and often had grammatical errors.  It's also true that many of the posts had a nominal number of hits, probably amounting to the number of BI developers in the Philly region or, maybe, the number of twitter followers.  But these posts did take a lot of time and thought and, for me, more importantly, it was a place to document those tidbits that I can never seem to remember and  always seem to need to look up.

An additional twist is that my employer is really looking for posts that are less technically oriented.  They want something that an IT manager might read on a long break in the men's room.  Perhaps, something that might make the reader think, "maybe if I turn my Oracle database into a data lake, I'll make that next big promotion."  And, while I will help middle management get a bit more middler, my friends and colleagues and I would still like to chronicle our technical wanderings.

This blog will be dedicated to that end.

We come in peace,
Mark Wojciechowicz  
Share by: