SQL-CLR Best Practices

Jul 6, 11:00 pm

Article Author: Simon Robinson
.NET 3.5 Books

Introduction


SQL Server 2005 brings CLR integration, and hence the option to write managed code instead of SQL. You can write managed code for:


  • Procedures

  • Triggers

  • User-defined functions and aggregations

  • User-defined types

The Options Prior to SQL Server 2005


Summary of talk: DAT400, SQLCLR vs T-SQL: Best Practices for Development in the Database, talk by Gert Drapers, 6 July 2005.


Please bear in mind when reading this that I’ve put the TechEd reports together based on notes made and photos taken during each talk. I’ve tried as far as possible to stick to what the speaker has said but it’s inevitable that I’m going to have misremembered or misunderstood a few things, so don’t expect 100% accuracy. (And don’t expect photos, where provided, to have the same quality as real screenshots!). In cases where I wanted to add thoughts of my own, I’ve normally done so in italics.


A talk and a written report are two different things with different requirements. In order to make the article easily readable I’ll have extensively paraphrased the speaker’s comments; I may have added my own headers, or swapped the order of some material. In some cases I’ve omitted material that is not appropriate for the typical ASP Today readership.


Prior to SQL Server 2005, the options for writing code were:


  • ANSI SQL

  • T-SQL

  • Extended stored procedures

  • sp_OA OLE automation interfaces

Of these, extended stored procedures were in most cases the only realistic extensibility option. They are written in C or C++ and for performance reasons were usually in-process. This meant that poorly written code carried the obvious problems of bringing down the process. Not only that but the requirement for C or C++ code meant the bar was set quite high in terms of required developer skills. Stored procedures have now been placed on the deprecation list, meaning the facility may eventually be removed from a future SQL Server version.


sp_OA had a lower bar and was safer, but was very slow since it relies on late-bound COM calls. On performance grounds it is definitely not recommended.


Hence in practice, the only sensible options for writing new code are now T-SQL (of which ANSI SQL is a subset) or managed CLR code.


Benefits of SQL-CLR


The question arises whether you should use T-SQL or the CLR for a given piece of code. Points to consider that would tend to favour SQL-CLR include:


  • T-SQL is interpreted and hence slower.

  • Error handling in T-SQL was frankly abysmal (although it has improved in SQL Server 2005)

  • String handling in T-SQL is unoptimized and slow

  • SQL-CLR is compiled so faster

  • SQL-CLR lets you take advantage of the huge base class library

This all implies that for computation-heavy (rather than data-heavy) processes you’re better off with SQL-CLR, especially if it’s the kind of operation for which the query optimizer won’t be able to help.


Example: A SQL-CLR UDF


Gert presented an example of acouple of user-defined functions (UDFs) that use regular expressions – this is typical of the kind of situation where SQL-CLR is clearly the best choice. Figure 1 shows the code for the UDFs.



Figure 1. SQL-CLR UDFs that do some regular-expression-based processing


Figure 2 and Figure 3 show how this UDF is consumed in T-SQL. Figure 2shows the declarations of the functions, whileFigure 2 shows them being invoked.



Figure 2. Declaring the SQL-CLR UDFs



Figure 3. Consuming the SQL-CLR UDFs


T-SQL is slower for this kind of situation, although still faster than it was in SQL Server 2000. This is particularly important because UDFs are often used in situations where they’ll be invoked repeatedly, eg. for every row in a query.


Returning Data Structures


What if your function doesn’t return a scalar value but needs to return a rowset. This is still easy to do in the CLR as the CLR has a technique for exposing arbitrary data as a rowset. This technique is illustrated in Figure 4.



Figure 4. Converting data to a rowset


The code required has been simplified since beta 1. You simply need two functions, one that returns an IEnumerable instance. This is just the same technique as used in array enumerators. ( Simon adds: It’s clear from the code in Image4.jpg that the SqlFunction attribute plays a role in communicating to SQL Server the nature of the data in the rowset. Unfortunately Gert didn’t make it clear – at least to me – what the role of the other function is. ).


Consuming External Resources


One area that SQL-CLR excels is when you need to obtain data from external resources. Typical such resources include:


  • The File System

  • The Event Log

  • A Web Service

  • The Registry

Gert did mention that although Web services are often quoted as a good data source for using SQL-CLR, he has doubts about this, due to the long latency that’s typically involved when invoking Web services; this has implications for locking SQL resources.


Benefits of T-SQL


Up to now the situations discussed have been ones in which on the whole there is no debate about the benefits of using SQL-CLR. Here’s a couple of reasons why in other cases you may prefer T-SQL.


Syntax Checking


T-SQL will be syntax-checked when you execute eg. the CREATE PROCEDURE or CREATE FUNCTION command. If there’s an error, the procedure won’t load. In contrast SQL-CLR access is dynamic – it behaves in many ways in a similar way to the sp_executesql function. That means some syntax errors might only be detected at runtime.


Optimization


The Sql optimization engine needs to know whether a function is deterministic and precise (a deterministic function returns the same result each time – for example, a call to return the current date-time would prevent a function being deterministic. On the other hand a function that returns a floating point number would not be precise – due to rounding errors). For T-SQL code, SQL Server can tell absolutely whether a function is deterministic or precise, however for SQL-CLR code, the only way it can tell is by looking at the applied CLR SqlFunction attribute that was applied to the code – and of course this attribute could be lying, if the developer simply coded it wrong.


Data Access


T-SQL invokes SQL more efficiently – it shares the stack frame and data buffers. By contrast SQL-CLR is one level removed, with a separate stack frame and requiring an unmanaged-managed transition and marshalling.


Comparing Performance


Gert now presented some performance comparisons. He took two custom UDFs, one of which summed over a long series of numbers, while the other simply returned NULL . Each function was coded separately in both SQL-CLR and T-SQL. He ran T-SQL code which timed various loops each of which invoked one of these functions a large number of times in a loop. The results were surprising:













FunctionTime for T-SQL versionTime for SQL-CLR versionPerformance ratio
Sum series460ms10ms43
Return null5046ms386ms1.4



Why is the function that returns null slower? The reason, Gert claimed, is that the SQL optimizer sees the null returned, assumes some recursion is going on, and therefore lowers the priority. This means it is difficult to get meaningful comparisons, although it’s pretty obvious that for the numeric-intensive function, SQL-CLR is far quicker.


Calls Between Stacks


There are some restrictions to observe when calling between T-SQL and SQL-CLR.


  • You must only use T-SQL supported data types. That means no arrays or non-UDT objects.

  • You can’t use inheritance or polymorphism.

  • Watch your parameter types. T-SQL has input parameters, and OUT and NOUT parameters. SQL-CLR has ByVal (input), ByRef (in effect, in and out), and <Out()> – which has no T-SQL equivalent.

  • .NET has no equivalent of DB_NULL . You can use null for reference types, but not for types like int . This can lead to exceptions being thrown.

  • .NET cannot easily represent either VARCHAR (because .NET strings are Unicode, the equivalent of NVARCHAR ) or TIMESTAMP .

Not strictly a restriction but it’s worth remembering that the CLR decimal type is not the same as SQL_DECIMAL.


Polymorphism: The Zoo Example


Gert now presented an example of SQL-CLR code that involved polymorphism and would thus generate an error if used as a UDF. The example assumes we are declaring a type, Cat , and deriving another type, Tiger , from Cat . Figure 5 shows part of the definition of Tiger.



Figure 5. The Tiger class


Not shown in Figure 5 but relevent is that Cat defines a PawCount property that returns the number of paws (an integer). and a FeedTheCat() method.


Figure 6 shows some SQL code that consumes the Cat class.



Figure 6. Consuming the Cat class


Figure 7 shows the equivalent code but using the Tiger class. And as you can see from this figure an error is generated.



Figure 7. Consuming the Tiger class


The cause of the error is that although PawCount is a valid property in Tiger as far as the CLR is concerned, T-SQL doesn’t recognize it because the property was inherited from Cat , and T-SQL doesn’t understand inheritance or polymorphism. If you do want to run code like that in Figure 7, you’d need to define a separate property in Tiger , which could simply be implemented to invoke the base class property.

Founders at Work

Commenting is closed for this article.