Saturday, 13 October 2007

Sql stored procedure performance trick

I recently had to revisit a stored procedure I had written quite a while ago as its performance had deteriorate significantly.

The stored procedure effectively took in a car registration number and searched against a legacy database to try and find the vehicle. The legacy data for the most part was quite clean, however sometimes there would be a space between the first part of the car reg and the last part (E.g. 'ABC1 12D') and sometimes there wouldn't be.

My original stored procedure (sort of - its been sanitised):
ALTER PROCEDURE spFindVehicle
@VehReg VARCHAR(10)
as

DECLARE @VehID int

DECLARE @CleanReg VARCHAR(10)
SET @CleanReg = REPLACE(LTRIM(RTRIM(LOWER(@VehReg))), ' ', '')

SELECT top 1 @VehID = ID
FROM tblVehicle
WHERE REPLACE(LTRIM(RTRIM(LOWER(VehReg))), ' ', '') = @CleanReg

SELECT @VehID
This worked, and does return the correct vehicle ID however it was externaly slow - requiring nearly 300,000 reads. (You can see how badly performing you procedure is by running profiler and it will tell you how many reads your procedure requires and how long it took, or by adding "SET STATISTICS IO ON" within query analyser).

The database was correctly indexed but since a function is been used in the where clause they where not being used!

I analysed the data and found that less than 800 of the vehicles actually had the space within the registration out of over a million records so to improve the stored procedure I simply did an initial very quick check without using the function, and if this failed I then did the second more costly check:-
ALTER PROCEDURE spFindVehicle
@VehReg VARCHAR(10)
as

DECLARE @VehID int

DECLARE @CleanReg VARCHAR(10)
SET @CleanReg = REPLACE(LTRIM(RTRIM(LOWER(@VehReg))), ' ', '')

SELECT top 1 @VehID = ID
FROM tblVehicle
WHERE VehReg = @CleanReg

IF(@VehID is null)
BEGIN
SELECT top 1 @VehID = ID
FROM tblVehicle
WHERE REPLACE(LTRIM(RTRIM(LOWER(VehReg))), ' ', '') = @CleanReg
END

SELECT @VehID
This now only takes 3 reads for 99% of the checks, and only adds 3 reads to the slower check resulting in a massive performance gain.

Now there are a few other ways this could have been done - firstly I could have added a column to the vehicles table with sterilised data and done the compare on that - this would have very efficient but as the database was legacy I didn't really want to modify it.

Another possible solution would have been to have a second table with a link between vehicle ID and sterilised registration and keep the data in sync through triggers or by modifying the update/create procedures. As there where significantly more inserts then look ups I decided that the above would make more sense and be more maintainable (Triggers bug me for some reason!).

Hope this helps, and sorry for not blogging in a while!

No comments: