Saturday, February 25, 2012

Best indexing option?

Hi,
I have to copy some data from one table to another. Its billing
information from an invoice to a Customer table.
The requirement is that if ANY billing information for a customer
changes, a new Customer record must be created. Currently I have this
query to do that.
SELECT ptc.CustomerId, dc.DocumentContactId
FROM dts.Customer AS ptc
INNER JOIN dbo.vDocumentContactDetail AS dc ON
ptc.CustomerName = dc.CompanyName
AND ptc.Line1 = dc.Line1
AND ptc.Line2 = dc.Line2
AND ptc.City = dc.City
AND ptc.State = dc.StateName
AND ptc.Country = dc.CountryName
AND ptc.Email = dc.Email
AND ptc.PhoneNumber = dc.PhoneNumber
AND ptc.FaxNumber = dc.FaxNumber
AND dc.FirstName + ' ' + dc.LastName = ptc.ContactName
This query is used by another view to get the customer id (which is
generated later and stored in the Customer table.
I noticed in the execution plan that there is a hash which is includes
CompanyName, dc.Email, and I believe the last part of the join, the
ContactName, although I'm not sure because the Execution plan
includings [Exp1004] = [Exp1005].
Any other suggestions to make this more performant? Or am I limited?I don't quite understand the requirement. From looking at the SELECT
it appears that it looks for a perfect match on everything EXCEPT the
keys, and returns the two keys. I can't say that I've ever seen
anything like that before except when merging two unrelated data
sources or resolving data redundancy problems.
Making a wild guess, I am wondering if the main thing done with
ptc.CustomerId and dc.DocumentContactId when they are returned is
compare them to see if they are the same so that some action (new
Customer record) can be taken if they are not. If so the entire
process appears to me to be inside out.
What I usually see, and write, when I want to check for changes:
SELECT ptc.CustomerId, dc.DocumentContactId
FROM dts.Customer AS ptc
JOIN dbo.vDocumentContactDetail AS dc
ON ptc.CustomerId = dc.DocumentContactId
WHERE ptc.CustomerName <> dc.CompanyName
OR ptc.Line1 <> dc.Line1
OR ptc.Line2 <> dc.Line2
OR ptc.City <> dc.City
OR ptc.State <> dc.StateName
OR ptc.Country <> dc.CountryName
OR ptc.Email <> dc.Email
OR ptc.PhoneNumber <> dc.PhoneNumber
OR ptc.FaxNumber <> dc.FaxNumber
OR dc.FirstName + ' ' + dc.LastName <> ptc.ContactName
That returns the keys that have differences. I don't know if that is
what you need, but maybe it is.
Roy Harvey
Beacon Falls, CT
On Tue, 13 Nov 2007 06:35:23 -0800, Andy <andyj@.med-associates.com>
wrote:
>Hi,
>I have to copy some data from one table to another. Its billing
>information from an invoice to a Customer table.
>The requirement is that if ANY billing information for a customer
>changes, a new Customer record must be created. Currently I have this
>query to do that.
>SELECT ptc.CustomerId, dc.DocumentContactId
>FROM dts.Customer AS ptc
> INNER JOIN dbo.vDocumentContactDetail AS dc ON
> ptc.CustomerName = dc.CompanyName
> AND ptc.Line1 = dc.Line1
> AND ptc.Line2 = dc.Line2
> AND ptc.City = dc.City
> AND ptc.State = dc.StateName
> AND ptc.Country = dc.CountryName
> AND ptc.Email = dc.Email
> AND ptc.PhoneNumber = dc.PhoneNumber
> AND ptc.FaxNumber = dc.FaxNumber
> AND dc.FirstName + ' ' + dc.LastName = ptc.ContactName
>This query is used by another view to get the customer id (which is
>generated later and stored in the Customer table.
>I noticed in the execution plan that there is a hash which is includes
>CompanyName, dc.Email, and I believe the last part of the join, the
>ContactName, although I'm not sure because the Execution plan
>includings [Exp1004] = [Exp1005].
>Any other suggestions to make this more performant? Or am I limited?|||Andy,
If there is no index on any of the columns in the ON-clause, then
hashing is to be expected.
I suggest you experiment a bit with (compound) indexes. Start with the
column that is most selective, and add (other selective) column until
the total (compound) index has enough selectivity to be fast. I wouldn't
be surprised if it is enough to just index column Email, because it is
probably highly selective.
Just add a lot of indexes on both tables, check the query plan to see
which are used, and drop the unused indexes.
HTH,
Gert-Jan
Andy wrote:
> Hi,
> I have to copy some data from one table to another. Its billing
> information from an invoice to a Customer table.
> The requirement is that if ANY billing information for a customer
> changes, a new Customer record must be created. Currently I have this
> query to do that.
> SELECT ptc.CustomerId, dc.DocumentContactId
> FROM dts.Customer AS ptc
> INNER JOIN dbo.vDocumentContactDetail AS dc ON
> ptc.CustomerName = dc.CompanyName
> AND ptc.Line1 = dc.Line1
> AND ptc.Line2 = dc.Line2
> AND ptc.City = dc.City
> AND ptc.State = dc.StateName
> AND ptc.Country = dc.CountryName
> AND ptc.Email = dc.Email
> AND ptc.PhoneNumber = dc.PhoneNumber
> AND ptc.FaxNumber = dc.FaxNumber
> AND dc.FirstName + ' ' + dc.LastName = ptc.ContactName
> This query is used by another view to get the customer id (which is
> generated later and stored in the Customer table.
> I noticed in the execution plan that there is a hash which is includes
> CompanyName, dc.Email, and I believe the last part of the join, the
> ContactName, although I'm not sure because the Execution plan
> includings [Exp1004] = [Exp1005].
> Any other suggestions to make this more performant? Or am I limited?

No comments:

Post a Comment