Development

Concatenate multiple row values in SQL Server with CLR

One of the good things about SQL Server (as with most other Microsoft products) is that there are often multiple ways to achieve a single result.  One such example is the concatenation of cell values from multiple rows into a single, delimited string.

Straight off the bat, people will suggest things like:

  • SELECT string concatenation using COALESCE technically returns the first encountered non-NULL value.  However, this can be utilised by selecting values into a variable using the COALESCE function. Perfect example available on Stackoverflow.
  • String concatenation with the FOR XML function. This function is actually used to return a series of selected values as XML.  However, utilising the PATH argument (FOR XML PATH) you can specify the names of the elements you wish to return.  Passing an empty string will result in (effectively) a single string. Again, here’s a great example on Stackoverflow.
  • Concatenating values using a recursive CTE Common table expressions (CTEs) are a an incredibly powerful addition to SQL Server 2008, one that I personally love to use both to simplify T-SQL and increase readability.  The fact that CTEs can be used recursively just adds to their worth. RedGate’s SimpleTalk has a perfect example.
  • Using a Scalar or table-valued function with looping/recursion to concatenate values. User Defined Functions are an integral part of SQL Server development.  These can be designed to deliver pretty much any functionality you want within a single, parameterised function call, much like the built in SQL functions.  The RedGate SimpleTalk link above also has examples of how to use a recursive scalar function or a table-valued function with a WHILE loop to provide string concatenation functionality. 

While these are all perfectly valid solutions, as always with SQL Server it’s important to investigate the optimum solution for your particular need.  After attempting all of the above and encountering some severe performance degradation as number of rows increased, one solution provided the ideal functionality with a huge performance boost (around 1800 times faster): the .NET common language runtime (CLR).

Simply put, CLR integration enables you to write .NET code which can then be called from within User Defined Functions, Stored Procedures and Triggers.  You get access to the .NET Framework’s library of built in functions as well as the benefits of the object oriented model.  Managed code is also much faster at performing complex calculations and logical operations than T-SQL.

Using Andy Novick’s excellent concatenate aggregate function, concatenating a series of values becomes as simple as a single function call in a SQL script.  My requirement was to be able to select the last n number of data points per group, and concatenate these into a single, comma delimited string.  Using this CLR function, it becomes as simple as this:

;WITH cte_Values AS
(
SELECT
D.Id
,MAX(D.Value) as [Value]
,ROW_NUMBER() OVER
(
PARTITION BY D.Id
ORDER BY D.DateStamp DESC
) as [Recency]
FROM
SourceTable D

GROUP BY
D.Id
,D.DateStamp
)

-- select out concatenated data, maximum of 30 data points per item
SELECT
C.Id
,dbo.[agg_concat] (CAST(C.Value as nvarchar(30)),',') as [ValueSeries]
FROM
cte_Values C
WHERE
C.Recency <= 30
GROUP BY
C.Id

A nice, simple solution that saves trying to shoehorn some built in SQL function into doing something it was never really intended to do.  It just shows that even for DBAs, there are times where it’s much better to use a little managed code to achieve your goal than attempt to force your DBMS of choice to do something at which it’s not so good.

Call it a case of selecting the right tools for the job at hand.  Thanks to our resident DB ninja, Ibrahim Naji for finding the path of least resistance.

comments powered by Disqus