Error Logging in ASP.NET

Feb 25, 11:00 pm

Article Author: Scott Waterhouse
.NET 3.5 Books

Introduction


Administration and correction of web site errors (such as unhandled exceptions thrown while the site is running) can be a tiresome job as is implementing error logging. When error logging is implemented correctly, application administration becomes much easier and the number of errors decline. I personally feel that this aspect of an application development is underestimated.


Thanks to the classes provided within the .NET framework, you can easily log errors using different techniques. You can record them to a simple text file, send emails or mobile phone text messages to an administrator. You can log them in the Windows Event log. If you were creative you could even have a siren with a colored light go off in your office if you really want to. For serious errors, a combination of these techniques can be used to ensure the administrators are notified as soon as possible. This article will examine the common methods as well as when, how and why each method should be used.


As I see it, error handling can be as complex as the main application itself. The reason that the solution to error handling is so complex is because one size never fits all. I would quickly question the credentials of anyone in IT that has the "one size fits all" mentality to any aspect of development. One size can never fit all due to the huge variety of development environments that exist in the IT industry. We do not all work for companies with 50+ number of developers with weeks worth of analysis time and millions of pounds worth of hardware and software. Some developers work in teams as small as 10 people, and other teams wish they had as many as 10 people! Depending on your background, you may favor some approaches and ignore others.


Company size is not the only factor. The way the company is structured also plays an important part in the error handling strategy. Besides how the errors are logged, you need to establish who should be the recipients. A seasoned database administrator will know how to fix database errors within minutes, especially if they have been involved in the project that has generated the errors. Most types of database errors should be logged and sent to the database administrator(s). Purchase failures (i.e. failing to process the transaction when the user clicked the Buy button) may go to the lead developer, or deployment manager depending on the structure. Similarly, the person or people responsible for the deployment may be able to diagnose such problems in minutes. They are usually put in these positions for their specialist skill and experience. You can see from this how errors can be fixed quicker just by being specific about who gets what errors. This is similar to how project work is distributed in the first place (i.e. the database gurus do the database work).


In a very large environment with lots of developers, the error could be traced back to the developer that wrote the code module that is failing. A simple implementation of this is shown later. However, a system like this could contribute to a blame culture and un-welcome side effects, which will be examined later.


Before I go into the details of the article, I would like to point out that I use the term error handling to refer to both errors and exceptions only as a matter of preference. For those of you with a black and white view on what distinguishes an error from an exception, I make no such distinction. As developers and administrators, we are accountable for anything that goes wrong.


System Requirements


  • SQL Server 2000 (version 7 or 2005 should be fine)

  • Visual Studio .NET 2003 with the .NET framework 1.1 installed

  • IIS running on Windows 2000 or later

Sample Code


Within the sample download you will find the following:


  • Two database scripts – one for the RareMerchandise website database, the other for the error logging

  • A RareMerchandise folder containing the website and Visual Studio .NET 2003 project files

To install the sample application, follow the steps below:


1) Run the RareMerchandise_Script.sql file in SQL Query Analyzer within SQL Server. This will create the RareMerchandise database. Within the database, select the Customer user and reset the password to like2buy. You will also need to set the permissions on the stored procedures that are prefixed with usp_ for this user.


2) Repeat the above step with the RareMerchandiseError_Script.sql file. This time, reset the Logger user password to like2buy. Again, set the permissions on the usp_ stored procedures for the user.


3) Place the RareMerchandise folder within the C:Inetpubwwwroot folder on your machine, or wherever you place your web applications and set this folder to be an application by right clicking on the folder in IIS, choosing properties and then clicking on Create.


Logging Methods


As mention in the introduction there are many different ways to log errors and alert administration of them. The main logging methods are described below in detail.


Database Logging


Databases are very suitable for error logging. Think about it, in commercial applications they log purchases, deliveries, complaints, so why not errors? Errors may be stored quite simply in the database within one database table as shown below:


Table name: tblErrors
















ErrorIDErrorDescriptionStackTraceLogTime
101Database is downLogin failed for user ‘sa’ 2006-08-01 20:30:32
102Product Search failedObject reference not set to an instance 2006-08-01 20:34:09
103Product Search failedObject reference not set to an instance 2006-08-01 20:41:32



The ErrorDescription field is a custom description. The text I have put in the StackTrace column consists of shortened versions of 2 messages that are all too familiar to developers. It is always a good idea to log errors in a different database than the one being used for the application. Otherwise, if the database goes down, how would you log this using the database method?


The ErrorID column is an Identity column to uniquely identify each record in the table. The StackTrace column in this example is intended to hold the contents of the StackTrace property of the exception. Obviously, its entire contents cannot be fit in the preview here so I have only included the relevant part within each row.


As you can see from this example, storing the contents of the stack trace for each error may seem like overkill. The Custom Summary Report example given later uses the database logging method as part of a more sophisticated logging strategy. I am therefore saving the detailed examination of using databases until later. You may find that the Custom Summary Report example is too much for your resources and timescales and perhaps the email or text message techniques given below will suffice.


Emails


Some web applications send out an email every time an error occurs. I see such examples many times when I glance over ASP.NET textbooks. I personally think sending out an email per error is overkill. I have seen this done within commercial applications, usually when the website originally serviced a small number of people, but then when the company expands and more people use the site the errors become more frequent and what was manageable is now just too much to handle.


You may overload the mail server or simply give the administrators too much to look at. Perhaps many minor problem emails are masking the major problem emails, which is another reason to keep the logging compact. Even if the titles of the email clearly distinguish the major errors from the minor ones, scrolling through the list will be tiresome. Also, the same errors can be occurring from different causes so relying on the titles is not always a good idea. On the other hand, an email per error highlights the impact of the problem and these emails will stop coming in when the problem has been resolved. If you are going to log an email per error then take full advantage and log as much detail as possible. Imagine if your email box is full of errors from the site and inside the email message it just says something like the following:



Object reference not set to an instance of an object.


This has to be the most popular error message of all time for .NET developers. This happens if you just log the ErrorMessage property of the stack trace. Include the entire stack trace to allow developers to focus on the specific code within the application that caused the error. If you replicate the error on your test system, which I assume contains a release of the compiled application in debug mode, you will also get line numbers in the stack trace. Never do this on a live system of course.


In addition to this, put as much extra information in the email as possible. You may look at the code line that the exception identified as being the cause of the error and be clueless as to why it caused an error. If the website code that caused the error uses items in configuration keys, application state, session state, form variables, query string variables, view state variables or cookies, print the contents of these variables within the error message. By doing so, you could be providing the answer to the problem in many cases. Perhaps something that should never be null is null.


Here is some sample code to print out all of the variables in application state, session state and cookies within the body of an email using HTML format:



Response.Write("Items in Application state: – <br/><br/>")
For i As Integer = 0 To Application.AllKeys.Length – 1 Response.Write(Application.AllKeys(i) & ": " _ & Application.Contents(i) & "<br/>")
Next
Response.Write("<br/><br/>")
Response.Write("Items in Session state: – <br/><br/>")
For i As Integer = 0 To Session.Keys.Count – 1 Response.Write(Session.Keys(i) & ": " _ & Session.Contents(i) & "<br/>")
Next
Response.Write("<br/><br/>")
Response.Write("Items in cookies: – <br/><br/>")
For i As Integer = 0 To Request.Cookies.Count – 1 Response.Write(Request.Cookies(i).Name & ": " _ & Request.Cookies(i).Value & "<br/>")
Next


You may add any other extra information that you feel may be linked to the cause of the error. For example, logging the user’s operating system version and browser version may reveal some interoperability issues on the website you are working on.


You can take further advantage and store extra useful information within these collections that may help diagnose problems. A good example would be a record of the number of active sessions for a website. To implement it, use the following code within the global.asax file of the website for the Application_Start, Session_Start and Session_End events respectively:



‘ fires once when the application is started
Sub Application_Start(ByVal sender As Object, ByVal e As EventArgs) ‘ initiate the variable Application("ActiveSessionsCount ") = 0
End Sub
‘ fires each time a new session starts
Sub Session_Start(ByVal sender As Object, ByVal e As EventArgs) ‘ increment the count Application("ActiveSessionsCount ") += 1
End Sub
Sub Session_End(ByVal sender As Object, ByVal e As EventArgs) ‘ decrement the count Application("ActiveSessionsCount ") -= 1
End Sub


And then to print this information in the email error, just add the following code line at the end of the previous code extract:



Response.Write("<hr />Active sessions: & Application("ActiveSessionsCount") _
  & "<br/><br/>")


This extra small piece of information can help. I must note that sessions do persist longer than the time that the user is on the site (usually 20 minutes depending upon application configuration) but it does give a good approximate number of the number of current users on a website. However, application state variables are not shared within web gardens, which is an application hosted by multiple processes on the same server, or a web farm; an application hosted by multiple processes.


How is this useful? Assume for a moment that the errors on the web application are database timeouts (i.e. trying to establish a connection to the database either for searching or updating failed because there are no further connections available) or the site is going slow. If the ActiveSessionsCount is a really high number, you can conclude that the high number of concurrent users is responsible for the slow/non-existent service, especially if the website was working fine under lighter loads. You know now that you need to either upgrade your hardware and software, implement some advanced caching system so that fewer database connections are used, or both. All of this extra information from just 10 extra lines of code, excluding the commented lines. However, if the same problems occur under lighter loads you need to investigate other causes.


Text Messages


Mobile phone text messages can be a good choice when the administrators of the application are off-site and there is a problem. How many people these days go somewhere without their mobile phone? Emails can be as descriptive as necessary and it was previously outlined that you should take full advantage of that. However, text messages should be kept short. Try and do it so that the whole error fits on the screen without the phone user needing to scroll through the message. Also do not send a text message for each and every error. Reserve it for just the urgent ones and perhaps hourly reports. Examples include the following:



DATABASE IS DOWN!
14:00-15:00 10 bookings failed
15:00-16:00 2 bookings failed
Website is slow!


As the messages need to be kept short, they are limited in the detail they can provide. However, when the person is away from their workplace and cannot access their email, a short message is better than not knowing anything is wrong. To incorporate the use of text messages in your application, refer to www.aspsms.com or www.codeproject.com/aspnet/EasySMTP_package.asp.


Text Files


Text files are a very basic way of logging errors, but are useful when other mediums fail, provided that these files are maintained and not simply allowed to keep growing and consuming more and more space, which can cause other problems. A problem with the database may stop the errors being logged if database logging is used and the same database is being used to log errors, although this is not recommended. However, this problem can still happen when a separate database is used for error logging. When the database logging fails, you could use text files as a simple secondary medium. Consider the process of buying a product and assume the code for logging the errors to a database and the code for logging errors to text files is contained within the code methods LogErrorToDatabase() and LogErrorToTextFile() respectively. Now examine the example code extract below:



Try
  PurchaseItem(chosenItem)
Catch ex As Exception LogError("Product Purchase Failed", ex)
End Try


The LogError() method would look like the following:



Public Sub LogError(strErrorDescription As String, ex As Exception)
  Try
    LogErrorToDatabase(strErrorDescription, ex)
  Catch exDatabaseError As Exception
    LogErrorToTextFile(strErrorDescription, ex)
    ‘ besides logging the original error (the purchase error) 
    ‘ to the secondary medium (the text file)
    ‘ you could do something extra here to warn the administrator(s)
    ‘ of the database logging problem!
    ‘ send an email or a text message or a 
    ‘ popup message to their screen
    ‘ refer to the other methods and make a choice!
  End Try
End Sub


The code above is there to allow you to appreciate what is happening, or what should be happening, in the error logging. Instead of writing such code yourself, you can use the Logging block from the Microsoft Application Blocks, which aims to encapsulate common functionality to prevent common code rewriting. They have other blocks for database access and so on. An examination of this is beyond the scope of the article and the blocks do get updated regularly but do check them out.


Do not underestimate the power of text files. Despite all of the features and service packs that ship with enterprise software, Notepad is still a very popular program. Developers have been using it for miscellaneous tasks for a long time and will continue to do so long after Windows Vista and whatever follows it.


A good tactic when using text files is to split them. One massive file full of all the errors is rather useless and will be slow to read from and write to. A high number of users make this problem worse as concurrency issues come into play. Eventually, errors will result and we don’t want the error handling code producing errors! Why not split the errors into major categories? Examine the following structure:


DatabaseErrors.txt


ProductSearchErrors.txt


BookingErrors.txt


SendingEmailErrors.txt


An administrator could clearly see which category of errors is the most frequent by examining the size of the files. Note that within the example, you may prefer the log extension to the txt extension. These files would need to be truncated regularly, perhaps every week or month, so that drive space is not wasted by storing errors that have probably been addressed.


Another method to split the files would be to have a file per day. The following structure represents text files named using the YYYY-MM-DD date format.



2006_08_10.txt
2006_08_11.txt
2006_08_12.txt


You can combine the 2 above methods to have something like the following:



2006_08_10/
  ProductSearchErrors.txt
  BookingErrors.txt
  SendingEmailErrors.txt
2006_08_11/ DatabaseErrors.txt
ProductSearchErrors.txt


This structure clearly shows that there were no database errors on the 10th of August and there were no booking errors or sending email errors on the 11th. For those present, the size of the files would indicate the magnitude of the errors. By themselves, the files would be hard to query, especially as the number of directories and files grow over time, but you could create a simple windows or web application with some basic reporting functionality to interpret them more efficiently.


Windows Event Viewer


This is another popular technique and is actually used by many applications already running on Windows machines. However, it does require that the administrators of the application have access to the production server. If it is a large organization and the server is in a different state, this can be a problem when there are networking issues.


SQL Server, firewall programs, anti-virus programs and Windows itself report information in the Event Log, including errors. I cover it here only because I know it to be used for error logging.


As it covers a broader spectrum of events besides errors, it uses categories such as Application, Security and System. For your own application you can create a separate category and have all errors logged within it. As it does not allow the use of sub categories within the defined category, and the accessibility issue mentioned previously, I do not feel that Windows Event Log is a good choice by itself for error logging. The whole point of using categories is to break the errors into parts. You cannot easily use the technique outlined previously of splitting the database errors from the product search errors and purchase errors within the Event Viewer.


I suppose the Event Log would be suitable if you built an application that acted as a report interface to the Windows Event Log. The error logging puts the data into the seemingly difficult-to-read Event Log and then the interface application derives reports and statistics from it. I do however believe the other methods and techniques outlined within this article to be better, quicker to implement and more effective in most cases. Another reason for choosing the Event Log as an option is if you have no time to implement the other options. It is only a 1 line call within the code when you are using the Microsoft Application Blocks. Perhaps you can do this while you implement a better way for the next deployment.


Other Methods


Having popup messages appear on computer screens is another method of alerting administrators. In some cases, this method can be more suitable than sending an email. Administrators get many email messages and the error may go unnoticed for a while. Popups get all the attention, which is why we have popup blockers in our browsers! Also, if the error is regarding an emailing problem, the alert email may in turn fail to be sent! Popup messages can easily be done by having the software run a Windows net send command. Refer back to the previous methods described and how the errors were split according to category and sent to the relevant people. Do the same here. Ensure that database errors send messages to the database administrator’s screen. Perhaps send the same message to 2 or more specific machines if he or she moves around to specific locations on the network.


You just need to ensure that the net send command is permitted on the network and test the error handling to make sure that specific errors go to the correct computer screens. This can get tricky if IP addresses and usernames change. This method is rather limited and only works if the administrator is at their computer when the error occurs assuming they are required to respond quickly to the error.


I also mentioned the use of a siren in the introduction of this article. Perhaps this is a bit extreme but the idea here is to get you thinking. Perhaps the methods previously outlined would not be best suited to you. No one can anticipate every type of application and working environment out there. What would suit you that we have not covered? What could you do if logging to the database, sending an email, writing to a text file, sending an SMS text message, sending a notice document to a printer and sending a message in a popup all failed?


You can do almost anything these days in IT with all of the programmable USB devices and the classes of the .NET framework, which are being constantly refined and added to. Use your imagination and do some research. I would certainly appreciate any feedback from you if have any other interesting ways for error logging and alerting.


Factors


Before devising any techniques to form an error handling strategy, there are a few things to consider. This section will discuss the main factors in determining a strategy.


Company Size


Do you really have time to build what seems to be an extra application on top of what has already been developed? This is how you should see error handling, as a separate application. Not something that is just bolted on at the end with no prior regards to design. Big companies have more time, money and technical resources in terms of hardware, software, developers and testers and so on. Not only is it possible to handle more in such rich environments, I believe that the resources should be split so that a specific part of the development force is solely responsible for error handling considering how serious the task is. The developers should rotate so that they are involved with development one week and then error handling/support the next. This way, when developing they know what to look out for and when testing/supporting they know the likely causes of any errors. They already have business analysts, systems analysts, designers, coders, testers, consultants and deployment personnel so why not error handlers?


However, it is not a dead end for the small guys. Smaller companies generally build smaller applications and putting another application on top to handle the errors may not seem like such a big task for applications that have few features. Even if it does, you could break it down and give priority to sections that need error handling more than others. For example, you may not have time to build error handling to monitor the entire application but you may have time to build something that will handle the purchase/booking errors, which seems to be the most important part of any business application because this is when the end user buys something. Ask anyone! Payment, pricing and purchase errors are the worst. First you need to fix the error, then the clients need to be told that they cannot really have that product at that price. How embarrassing! Other parts of the system can be awarded error handling at a later date when time permits. Admitting you lack resources, if this is the case, is no shame.


Company Culture


Does the company you work for compete strongly with competitor websites? Does your team have a team building spirit where mistakes are openly welcomed and resolved or is it a blame culture? Are developers within your company encouraged or even forced to compete with each other? An extensive error hand ling system whereby mistakes, and more importantly the frequency of particular mistakes, can be traced to a small group of people or an individual may not be such a good idea for developers working within a company that has a blame culture.


I am totally against working in such environments. There is a fine line between accountability and subjugation. I write this article with the intentions of promoting higher error handling standards in applications so that errors are few and fixed quickly. I do not encourage readers to use the methods and strategies to build a system to name and publicly shame developers. We all make mistakes; otherwise there would not be such a thing as error handling! If you are unfortunate enough to work in a blame culture company, I would advise a very general reporting system such as simply sending emails to the relevant people when errors result. Refer back to the Logging Methods section.


Deadlines versus Quality


Different companies have different priorities. Would you be held personally responsible if your extra attention to error handling resulted in a late delivery of the project or would your extra attention to detail and quality be rewarded? If you are in the latter group, consider yourself very lucky. There are some really harsh environments out there. Unfortunately for some of you, this will be all too familiar.


Strategies


This section examines two different approaches to implementing an error handling system based on the methods and factors examined previously.


A Quick Approach


This approach should be favored by those of you that want swift results and have time or resource restrictions. It will allow you to start with basic logging and alerting and gradually phase in a more robust system if the need arises. I am not suggesting that this approach is sub-standard. Parts of this strategy may actually be incorporated into the Custom Summary Report strategy that I’ll examine next.


Using emails is a good and quick-to-build approach to error handling. From day 1, you can have a basic system that simply emails every error to a central email inbox monitored by a few developers and administrators. However, sending an email per error can be tiresome. Instead of sending an email per error, extend the system to send an email each time the error count reaches a specified multiple. For example, for every 10 errors send an email. You therefore alert the administrators without constantly interrupting them.


You can then build upon this by splitting the errors into categories. Database emails go to the database administrator(s), which they receive for every 10 database errors that occur for example. You could then extend this by having different multiple boundaries for different categories of error. You may not want to be informed of every search error so you could have a high multiple for this; say 25. On the other hand, every purchase error is serious. A buyer is more valuable than a looker. In this case, use a multiple of 3, 2, or even 1. The multiple for database errors could lie somewhere in between. However, if you set the database error type multiple to 20 and the database goes off line and there are 20 or more people on the website, 20 errors will result very quickly!


The multiples set could depend upon the company structure and size. In a small environment, if the same people are responsible for the website and the database, they may set the multiples higher so that they are not unnecessarily disturbed. By this, I mean that these people would agree to investigate each error that they are sent under the condition that they are sent a restricted number of messages. They would be unlikely to check as many as a thousand emails in great detail. On the other hand, larger teams, especially teams working on critical applications that need close monitoring, can set lower multipliers as they can spare the resources to monitor errors more closely.


Besides reporting errors efficiently, you need to store them efficiently. Set up different email boxes just for the errors to go in. For example, the database administrator should have an inbox for their main duties and an extra inbox just for the database errors. Otherwise the error handling emails will interrupt their other work. In the case of top priority errors, they could go to their main inbox as well. This is why defining the categories of the errors from the outset is important.


You can then build on this and incorporate the good use of mobile phone text messages. A good idea may be to implement a sign-in system for the website administrators. If they are logged into the building send them an email. When they are logged out, send a text message. This way, you could also have "do not disturb" hours for each administrator whereby each administrator has a time shift and any errors within that time goes to them. Again, this depends on company size and structure. If you are the only database administrator, you may not get away with having "do not disturb" hours.


Next incorporate the use of text files as a secondary medium when the emailing of errors fails for whatever reason. Split and group the text files as outlined in the previous section covering text files. Besides the text files recording the data that the emailing system should do, have the application alert the administrators that the error handling has been switched to the secondary system. Allow for the case where you cannot do this by email, due to email problems preventing the primary error logging system working in the first place; ensure that in that scenario, another reporting method is used, such as a mobile phone text message or popup message on the computer screen.


Custom Summary Report


If time and resources permit and your application needs high quality maintenance, you should consider this approach. It uses database logging with the ability to derive summary information that will allow administrators to identify and fix the most frequent problems quicker. A basic example is given below.


If time permits, this can be extended by defining specific sub-categories underneath the main categories, as you will see in the example within the next section called Custom Summary Report, with developer-signed code so that you can more easily determine where in the application the error came from. This will allow the administrators to replicate the problem and see for themselves what is happening, and hopefully know why. The data in the script file is based on the extended example. Therefore, the records within the tblErrorType and tblError tables and the structure of the tblErrorType table shown below for the briefer version will be different to the data within the script.


Remember to use a separate database for error logging than the one used for the production website. Otherwise, database connection issues cannot be logged. Using database logging is also easier to maintain. The records can easily be deleted automatically as they become old or deleted selectively from the "error pool" when specific problems identified in there have been fixed.


This approach uses custom defined error types, which are defined by the developers of the application. Examine the database structure and extract records below.


tblCategoryType

























CategoryTypeIDCategoryType
100Database problems
200Product searching
300Product purchasing
400Customer emailing
500Members section problems
600General website problems



tblSeverityType



















SeverityTypeIDSeverityType
1Highest
2High
3Medium
4Low



tblErrorType








































ErrorTypeIDCategoryTypeIDSeverityTypeIDErrorType
11001Database connection failed
21001Sql command timed out
32002The search criteria was invalid
42004No products found in the search
53004No items in shopping cart for checkout
63003Chosen product is no longer available
74001Mail server is down
84002Html content to be emailed is blank
95003Unauthorized user was redirected
105004Login failed, password incorrect
116003Web page not found



tblError






















ErrorIDErrorTypeIDDayDateTotal
100112006/08/0180
100232006/08/0115
100342006/08/015
100432006/08/0245
100662006/08/023



Category types are defined as 3 digit numbers to allow the administrators to define similar errors within groups. You could extend the example by breaking down each category so that all database errors start with 1 and then 2 digits, all product search errors start with 2, and so on. The next example involving these tables includes this extension, which is similar to how HTTP errors are defined (within the 400 range). The extract of the tblError records start at 1001 so that it is easier to see how the records within these multiple pages combine. The error types act as error category sub types and the actual error records are stored within the tblError table. Within the tblError table, the values of the ErrorTypeID and DayDate fields are a unique combination.


With this data in a database like this, you can quickly generate report information on which problems are the most frequent. Using the severity level information, you can avoid alerting administrators of the low priority errors and keep their attention on the high priority ones. You can also derive when problems occur, determine which problems are linked with others or any other type of analysis on this data.


Look at the data for a moment. There was clearly a major database problem on the 1st of August, assuming that 80 accounts for a large proportion of people to use the website in any one day. If the website serves 20,000 users a day, declaring a major database problem would be premature. You may also see that there were not many purchase errors. This could be because the database and product search errors preventing many users from getting this far in the purchase. See how these problems are linked and why recording the totals like this is useful. Compare this simplistic but effective approach to having a list of 300 exception stack traces in a database table or in your mail box to read through. When the errors are in the thousands over time, this becomes more apparent.


You may prefer to use colors rather than severity levels. A red alert could be the highest, an orange alert is not as bad as a red alert but is worse than a yellow alert. However, there are only a limited number of appropriate colors to use. Also, you may wonder why the CategoryType, SeverityType and ErrorType fields were not just called Type within their respective tables. The reason is that if you did a SQL query that retrieved data from more than one of these tables, you would have multiple column names with the same name and would need to use aliases in the query. Otherwise, you may get data processing bugs and display bugs as you intended to process or show the contents of a different field.


Getting this strategy to work in the first place can be a very time consuming task. You need to define many types of errors in your application to catch in order to make this solution effective. You also need to define the severity level of each error and change your exception/error handling code to use this new approach. Think about your company size and deadlines before you attempt this. An example error handling block that would use this approach would look like the following:



Dim objConn As New SqlClient.SqlConnection(Application("ConnectionString"))
Try objConn.Open()
Catch ex As SqlClient.SqlException LogCustomError(Enumerations.ErrorCategorySubType.DatabaseConnectionFailed)
End Try
‘ then use the connection for product purchasing or some other task …


You may notice here that the exception is being ignored, at least from the perspective of the user. Sometimes you will want to raise the problem to the user. For example, if they wish to check out on your website and the user’s shopping basket is empty for some reason. In other cases, you may not wish to alert the user. For example, if a product search query failed to get a connection to the database, ignore the exception and try against the secondary database. If successful, there is no cause for alarm but do log the error so that the problem is resolved. It all depends on how you want to treat your users and it opens up a whole new can of worms in terms of design. In my experience, excellent web/application design has often come from people who cannot code themselves out of a paper cup.


The LogCustomError() function would accept the error code, calculate today’s date and then record the data in the database. An enum that I used in the example is defined in a class called Enumerations, which is used to pass in the error codes to the LogCustomError() function to make the code easier to read and manage. The Enumerations class looks something like this.



Public Class Enumerations
  Enum ErrorType
    DatabaseConnectionFailed = 1
    SqlCommandTimedOut = 2
    SearchCriteriaInvalid = 3
    ‘ and so on
  End Enum
End Class


The LogCustomError() function in the above code would then log the error in the database, perhaps using a stored procedure. It would need to check if an entry for such an error has already been created for the current date. If it has,then it should increment its accumulating error total. If not, it should add a new record with this error code and today’s date and a count of 1. You could use the following stored procedure, which is used in the sample application:



CREATE PROCEDURE usp_LogError
( CategorySubTypeID AS INT -- 201, or 202, etc ) AS DECLARE Today AS SMALLDATETIME
SET Today = GetDate() DECLARE ErrorTypeID AS INT
SET ErrorTypeID = (SELECT ErrorTypeID FROM tblErrorType WHERE CategorySubTypeID = CategorySubTypeID)
DECLARE ErrorID AS INT SET ErrorID = ( SELECT ErrorID FROM tblError WHERE Year(Daydate) = Year(Today) AND Month(Daydate) = Month(Today) AND Day(Daydate) = Day(Today) AND ErrorTypeID = ErrorTypeID
)
— if we have recorded this error today already, add 1 to the total
IF (ErrorID IS NOT NULL) BEGIN UPDATE tblError SET Total = (Total + 1) WHERE ErrorID = ErrorID;
END
— otherwise, add this new error for today
ELSE BEGIN INSERT INTO tblError (ErrorTypeID, DayDate, Total) VALUES (ErrorTypeID, Today, 1);
END
RETURN


Note: Although the above code may not seem thread-safe, it is not mission critical to have an absolute count of the errors when the error numbers are high enough to warrant thread-safe processing. In other words, imagine 100 errors that are raised roughly at the same time. If the count was mistakenly 95 or 105, you would know you have a problem to address in either case.


If there is only a small list of error types in the database, this is not an effective solution. In the example, I have only defined 6 main types and a few sub types within tblErrorType for brevity. You need to make the list as extensive as possible. Break them down into more specific error types and add to the list as the application becomes larger and performs more tasks. Try and cover everything you can. Sometimes you may be convinced that a particular error would never happen but cover it anyway in case it does.


When implementing something like this, there are a few things that will help. It is a good idea to have the errors held within the database to be deleted when they are more than a few days old. Otherwise the database will grow unnecessarily, which can lead to more problems. Most databases can handle millions of rows but when these databases are backed up regularly, the amount of space consumed can become a gradual problem. The limit on how large you should let your database grow will depend on how often it is backed up, how long these backups are preserved and how much space you have. Entire books are dedicated to database administration. Why hold error information on problems that have since been fixed? Or upon fixing an error you could delete all entries for this specific error based on its custom-defined ErrorTypeID and then move on to addressing the other errors still left in the "error pool".


It is also a good idea to have some part of the application designated just for raising the exceptions that will cause the errors in order to test the error logging when this implementation goes live. Also recall the previous database example in the Logging Methods section, which defined a database table that held all information about each individual error. Perhaps you could have this besides the report-type database defined in this strategy as a source of extra information on specific problems. This solution can also be extended by incorporating the use of some of the basic techniques previously outlined. When error totals reach specific thresholds, alert the user by email or text message or screen message popup. Read the previous section for ideas on how to incorporate this. You can also extend the capabilities of this by incorporating developer-signed code shown next.


Custom Summary Report with Developer-Signed Code


This builds upon the previous strategy and is useful in large development environments. Look again at the Custom Summary Report strategy. In a small environment, one or two developers were probably responsible for the entire product search functionality. When the summary shows high totals of product search errors, these guys are the people that need to deal with it.


In a large environment, it is not so easy. In this case, the custom-defined error types should be defined by the developers that developed specific parts of the software and then signed by those developers as belonging to them. Refer back to the database structure within the previous section. Here the tblErrorType table has been extended and now resembles the structure within the sample code. This extract only contains errors related to product searching but you will see that the category type has broken down into several specific sub-types. Two new tables have also been added. The extract is shown below.




























tblErrorType


ErrorTypeID

CategorySubTypeIDSeverityTypeIDErrorType
12012The search criteria was invalid
22023The search criteria was not set
32034No products found in the search
42043Search returned results but was slow
52051Results from search were invalid
62063Special offer search failed
72072Specific single product search failed
































tblDeveloperErrorTypes


ErrorTypeID

DeveloperID
31
33
13
21
42
51
62
73

















tblDeveloper


DeveloperID

NamePositionEmailAddressPhoneNumber
1Scott WaterhouseTester and troubleshooter55555 444555
2Mark HamalDevelopment Manager33333 221101
3Laura WhiteSenior Developer43434 555789



Note: The tblErrorType table now has a CategorySubTypeID and the values for this are different to the values within the tblCategoryType. In this scenario, tblCategoryType acts as a reference table to group the main types of error in the application.


The search error types have been split into several types. Each of these types links back to some specific product search functionality in the application which can now be traced back to the developer(s). Looking at the database structure, there is no reason why you could not have a task belonging to more than one developer or vice versa. Error type 3 in the example is assigned to both Scott and Laura. Also notice that this extra information can be used to get the contact details of the responsible person if a particular type of error needs fixing quickly, or even ensure that errors marked with a high level of severity are under the control of senior staff.


I must point out that the only reason I am recommending such a solution in a large environment is so that the most relevant developers can be alerted of the problems. We all know that if you wrote something you can probably diagnose and fix it quicker than anybody else. I would not want this strategy to be used to create a blame name and shame culture whereby you total people’s errors and the like. Remember the points I have made about company culture.


Strategy Guidelines


Treat this section as a brief checklist of things to consider.


If deadlines are tight, consider phasing in error handling features gradually. Consider the example I mentioned earlier in which I included the number of active sessions, using just a few additional lines of code. Try and keep the error code as generic as possible so that it is easier to incorporate into other parts of the system later on.


When phasing in error handling, cover the most sensitive parts of the applications first. Assuming the application is an e-commerce site, I would start with building error handling for the parts of the application that handle payments and purchases/bookings. High quality error handling on one sensitive part of the application is preferable to poor quality rushed error handling that covers an entire application.


Don’t go quality mad and run out of time. Don’t spend longer on the error handling for the application than you did on the application itself. Over-checking costs time and resources. This however may not apply in critical systems such as banks and online gambling

Founders at Work

Commenting is closed for this article.