Sunday, March 11, 2012

Best practice for conditional insert else select?

I have several places where I need to get the id (primary key) of a resource, inserting a row if the resource does not exist (i.e. an artificial key to be used as an FK for another table). I should probably change this varchar key lookup to use a hash index, but that is beside the point.

So the table is essentially like:

CREATE TABLE MyLookup(id int identity primary key nonclustered, mykey varchar(256));

CREATE CLUSTERED INDEX mylookup_cidx_mykey ON MyLookup(mykey);

I see two main approaches for how I can do my get-id-with-insert-if-needed.

(Approach 1)

DECLARE @.id INT;

SELECT @.id = id FROM MyLookup WHERE mykey = 'some key value';

IF (@.id is null)

BEGIN

INSERT MyLookup ('some key value');

SET @.id = SCOPE_IDENTITY();

END

(Approach 2)

DECLARE @.id INT;

INSERT MyLookup SELECT 'some key value' WHERE NOT EXISTS (SELECT id FROM MyLookup WHERE mykey = 'some key value');

IF (@.@.ROWCOUNT = 0)

SELECT @.id = id FROM MyLookup WHERE mykey = 'some key value';

ELSE

SET @.id = SCOPE_IDENTITY();

From some quick tests in profiler, approach 2 seems to be a bit faster and have lower resource utilization. But I'm not sure if it maybe takes some more aggressive locks even in the unnecessary case where the mykey row value of 'some key value' already exists. Approach 2 also looks cleaner to me, but I don't mind a bit of extra code if it gives me better scalability through less lock contention.

Any tip on what is considered the best practice for a conditional insert like this, or a tip on how to get detailed lock info for a query? The lock info for profiler was all greek to me, it just had a hex value with each lock acquired/released, so I have no idea what it was telling me. Is my only solution to just run exhaustive tests and look at the perf numbers from the black box?

I went ahead and did some testing, no big surprises. Although the tests were single client, so it doesn't give me any info about locking.

I found unsurprisingly that the more sparse the table, the narrower the gap between the two options. But as the likelihood of needing an insert went down, the first approach became more effective. In my production environment I would hazard a guess that I need an insert around 5% of the time, so I'll probably go with approach 1 in general. Although I did change my approach 1 to actually incorporate approach 2 with in - first I do a select, and then if the id was null then I do a conditional insert. There is a slight increase in maintenance cost since I have to duplicate the code for the existential check, but I think it is worthwhile in my cases that are fairly high traffic.

Any other points of view on this?

No comments:

Post a Comment