Showing posts with label thoughts. Show all posts
Showing posts with label thoughts. Show all posts

Tuesday, March 27, 2012

Best SQL IDE

Anyone have any thoughts on the best IDE for SQL Server development? I'd
like object explorer, drag-and-drop, code formatting, code completion, code
analysis, execution, query builder, the WORKS! I've tried RapidSQL, having
gotten excited by their online demo, but I have not found it well-behaved.
It completely crashed no less than 5 times in as many minutes. D'oh!
TIA!
RobertPersonally, I like Management Studio (SQL Server 2005) with PromptSQL (which
gives you IntelliSense in SSMS, Query Analyzer, etc). I'm pretty religious
about code formatting, and I have no need for a "query builder" (especially
those with bugs or really nasty side effects).
"Best" is going to be pretty subjective, since everyone has their own
opinions, criteria, etc. Like what is the best car, I can't even count how
many variables go into that, and why you get as many answers as there are
people.
A
"Robert Davis" <radbase@.yahoo.com> wrote in message
news:eP5T2VIwFHA.724@.TK2MSFTNGP14.phx.gbl...
> Anyone have any thoughts on the best IDE for SQL Server development? I'd
> like object explorer, drag-and-drop, code formatting, code completion,
> code analysis, execution, query builder, the WORKS! I've tried RapidSQL,
> having gotten excited by their online demo, but I have not found it
> well-behaved. It completely crashed no less than 5 times in as many
> minutes. D'oh!
> TIA!
> Robert
>|||I havn't had time to try this out myself, but there is a 3rd party tool for
enabled T-SQL intellisense style code completion. It claims to work with
Query Analyzer, VS 2003 and 2005 Enterprise Manager.
http://www.promptsql.com/
I don't know about formatting. I have my own style of structured indenting
and comes so natural I don't even think about it when typing.
"Robert Davis" <radbase@.yahoo.com> wrote in message
news:eP5T2VIwFHA.724@.TK2MSFTNGP14.phx.gbl...
> Anyone have any thoughts on the best IDE for SQL Server development? I'd
> like object explorer, drag-and-drop, code formatting, code completion,
> code analysis, execution, query builder, the WORKS! I've tried RapidSQL,
> having gotten excited by their online demo, but I have not found it
> well-behaved. It completely crashed no less than 5 times in as many
> minutes. D'oh!
> TIA!
> Robert
>|||Hi
You may want to look through the Readers choices in the September SQL Server
Magazine
http://www.windowsitpro.com/SQLServ.../786/Index.html
John
"Robert Davis" <radbase@.yahoo.com> wrote in message
news:eP5T2VIwFHA.724@.TK2MSFTNGP14.phx.gbl...
> Anyone have any thoughts on the best IDE for SQL Server development? I'd
> like object explorer, drag-and-drop, code formatting, code completion,
> code analysis, execution, query builder, the WORKS! I've tried RapidSQL,
> having gotten excited by their online demo, but I have not found it
> well-behaved. It completely crashed no less than 5 times in as many
> minutes. D'oh!
> TIA!
> Robert
>|||I've had good success with ApexSQLEdit - has intellsense, execution
plans [not as graphic as SQL Q/A - i.e., in a flowchart view, but not
with icons for ss, scans, etc., but has the text to explain it], and
integration with source control.
www.apexsql.com
They have other good tools as well.
Robert Davis wrote:

>Anyone have any thoughts on the best IDE for SQL Server development? I'd
>like object explorer, drag-and-drop, code formatting, code completion, code
>analysis, execution, query builder, the WORKS! I've tried RapidSQL, having
>gotten excited by their online demo, but I have not found it well-behaved.
>It completely crashed no less than 5 times in as many minutes. D'oh!
>TIA!
>Robert
>
>|||Here is another query builder to keep an eye on:
http://151.100.3.84/technicalpreview/
Pamela

Best SAN configuration for partitioned table

Hi,
I wondered if anyone had any thoughts on the best way to go about
splitting up a san array for a 500gb fact table. Its going to need to
be partitioned to allow for the overnight processing to complete on
time but what is the best way to split the san array for it.
I have about 10-13 san disks available for the table which leaves me
enough space on the other disks for the other database objects and
tempdb, logs etc.
The table will be partitioned into 13 logical weeks but would it be
best to allocate one disk per partition or have a 13 disk raid group
and put all 13 partitions on that and have it striped?
Any thoughts?
Thanks
Ian.
I am *far* from an expert, but here's my thoughts...
For having each partition on a separate disk or spindle, the question is are
you going to be using that partitioned table in parallel? Meaning, are you
going to be accessing or modifying several, if not all 13 weeks, at the same
time? If so, then it would make sense (depending on the processing power of
the SAN - EMC's DMX would be able to handle this all in parallel) to
separate the partitions into distinct spindles.
If you're not sure about parallel operations, then I would probably through
them into a JBOD (Just a Bunch Of Disks) set up like your second
alternative.
Here's my school of thought on this: You separate your TEMPDB (Very
important in SQL Server 2005), your LOGS, your INDEXES, and your TABLES onto
separate sets of spindles. That's usually a good starting point for an EMC
type of SAN. From there, you would then get into further tuning to see if
you would receive any additional benefit of separating objects (Tables,
Partitions, Indexes, etc) onto distinct spindles... remember you're
increasing management of the disk system when you do that, so it's helpful
to determine the benefit, if any.
FYI, I originated from the Oracle School of Thought, so it might clash with
some SQL Server admins ;)
JASON
"ianwr" <ianwrigglesworth@.yahoo.co.uk> wrote in message
news:1186730967.614276.138730@.z24g2000prh.googlegr oups.com...
> Hi,
> I wondered if anyone had any thoughts on the best way to go about
> splitting up a san array for a 500gb fact table. Its going to need to
> be partitioned to allow for the overnight processing to complete on
> time but what is the best way to split the san array for it.
> I have about 10-13 san disks available for the table which leaves me
> enough space on the other disks for the other database objects and
> tempdb, logs etc.
> The table will be partitioned into 13 logical weeks but would it be
> best to allocate one disk per partition or have a 13 disk raid group
> and put all 13 partitions on that and have it striped?
> Any thoughts?
> Thanks
> Ian.
>
|||Hi Ian,
thanks for your question around distributing data across spindles on a SAN.
Based on our experiences with many SQL 2005 data warehousing customers, I
would encourage you to distribute your data across all spindles. This should
give you good IO parallelism in case your query touches only one partition,
and it should also give you similarly high parallelism for a query that
touches many partitions. Depending on how large your data in a single
partition is and how many cores you have in your system, you want to avoid
cases where you exercise only one spindle and have many cores idle waiting
for data from the IO subsystem.
As Jason pointed out, it certainly makes sense to keep log and tempdb on
separate sets of spindles. You might want to experiment with index and data
on the same set of spindles. Usually, this already works sufficiently well as
compared to separate index and data. AS always, further tuning may be needed
depending on the characteristics of your workload.
Hope this makes sense and helps you with your SAN configuration.
Best regards,
Torsten Grabs
Program Manager
Microsoft SQL Server Query Processor
"Jason Fay" wrote:

> I am *far* from an expert, but here's my thoughts...
> For having each partition on a separate disk or spindle, the question is are
> you going to be using that partitioned table in parallel? Meaning, are you
> going to be accessing or modifying several, if not all 13 weeks, at the same
> time? If so, then it would make sense (depending on the processing power of
> the SAN - EMC's DMX would be able to handle this all in parallel) to
> separate the partitions into distinct spindles.
> If you're not sure about parallel operations, then I would probably through
> them into a JBOD (Just a Bunch Of Disks) set up like your second
> alternative.
> Here's my school of thought on this: You separate your TEMPDB (Very
> important in SQL Server 2005), your LOGS, your INDEXES, and your TABLES onto
> separate sets of spindles. That's usually a good starting point for an EMC
> type of SAN. From there, you would then get into further tuning to see if
> you would receive any additional benefit of separating objects (Tables,
> Partitions, Indexes, etc) onto distinct spindles... remember you're
> increasing management of the disk system when you do that, so it's helpful
> to determine the benefit, if any.
> FYI, I originated from the Oracle School of Thought, so it might clash with
> some SQL Server admins ;)
> --
> JASON
>
> "ianwr" <ianwrigglesworth@.yahoo.co.uk> wrote in message
> news:1186730967.614276.138730@.z24g2000prh.googlegr oups.com...
>
>

Best SAN configuration for partitioned table

Hi,
I wondered if anyone had any thoughts on the best way to go about
splitting up a san array for a 500gb fact table. Its going to need to
be partitioned to allow for the overnight processing to complete on
time but what is the best way to split the san array for it.
I have about 10-13 san disks available for the table which leaves me
enough space on the other disks for the other database objects and
tempdb, logs etc.
The table will be partitioned into 13 logical weeks but would it be
best to allocate one disk per partition or have a 13 disk raid group
and put all 13 partitions on that and have it striped?
Any thoughts?
Thanks
Ian.I am *far* from an expert, but here's my thoughts...
For having each partition on a separate disk or spindle, the question is are
you going to be using that partitioned table in parallel? Meaning, are you
going to be accessing or modifying several, if not all 13 weeks, at the same
time? If so, then it would make sense (depending on the processing power of
the SAN - EMC's DMX would be able to handle this all in parallel) to
separate the partitions into distinct spindles.
If you're not sure about parallel operations, then I would probably through
them into a JBOD (Just a Bunch Of Disks) set up like your second
alternative.
Here's my school of thought on this: You separate your TEMPDB (Very
important in SQL Server 2005), your LOGS, your INDEXES, and your TABLES onto
separate sets of spindles. That's usually a good starting point for an EMC
type of SAN. From there, you would then get into further tuning to see if
you would receive any additional benefit of separating objects (Tables,
Partitions, Indexes, etc) onto distinct spindles... remember you're
increasing management of the disk system when you do that, so it's helpful
to determine the benefit, if any.
FYI, I originated from the Oracle School of Thought, so it might clash with
some SQL Server admins ;)
JASON
"ianwr" <ianwrigglesworth@.yahoo.co.uk> wrote in message
news:1186730967.614276.138730@.z24g2000prh.googlegroups.com...
> Hi,
> I wondered if anyone had any thoughts on the best way to go about
> splitting up a san array for a 500gb fact table. Its going to need to
> be partitioned to allow for the overnight processing to complete on
> time but what is the best way to split the san array for it.
> I have about 10-13 san disks available for the table which leaves me
> enough space on the other disks for the other database objects and
> tempdb, logs etc.
> The table will be partitioned into 13 logical weeks but would it be
> best to allocate one disk per partition or have a 13 disk raid group
> and put all 13 partitions on that and have it striped?
> Any thoughts?
> Thanks
> Ian.
>|||Hi Ian,
thanks for your question around distributing data across spindles on a SAN.
Based on our experiences with many SQL 2005 data warehousing customers, I
would encourage you to distribute your data across all spindles. This should
give you good IO parallelism in case your query touches only one partition,
and it should also give you similarly high parallelism for a query that
touches many partitions. Depending on how large your data in a single
partition is and how many cores you have in your system, you want to avoid
cases where you exercise only one spindle and have many cores idle waiting
for data from the IO subsystem.
As Jason pointed out, it certainly makes sense to keep log and tempdb on
separate sets of spindles. You might want to experiment with index and data
on the same set of spindles. Usually, this already works sufficiently well a
s
compared to separate index and data. AS always, further tuning may be needed
depending on the characteristics of your workload.
Hope this makes sense and helps you with your SAN configuration.
Best regards,
Torsten Grabs
Program Manager
Microsoft SQL Server Query Processor
"Jason Fay" wrote:

> I am *far* from an expert, but here's my thoughts...
> For having each partition on a separate disk or spindle, the question is a
re
> you going to be using that partitioned table in parallel? Meaning, are yo
u
> going to be accessing or modifying several, if not all 13 weeks, at the sa
me
> time? If so, then it would make sense (depending on the processing power
of
> the SAN - EMC's DMX would be able to handle this all in parallel) to
> separate the partitions into distinct spindles.
> If you're not sure about parallel operations, then I would probably throug
h
> them into a JBOD (Just a Bunch Of Disks) set up like your second
> alternative.
> Here's my school of thought on this: You separate your TEMPDB (Very
> important in SQL Server 2005), your LOGS, your INDEXES, and your TABLES on
to
> separate sets of spindles. That's usually a good starting point for an EM
C
> type of SAN. From there, you would then get into further tuning to see if
> you would receive any additional benefit of separating objects (Tables,
> Partitions, Indexes, etc) onto distinct spindles... remember you're
> increasing management of the disk system when you do that, so it's helpful
> to determine the benefit, if any.
> FYI, I originated from the Oracle School of Thought, so it might clash wi
th
> some SQL Server admins ;)
> --
> JASON
>
> "ianwr" <ianwrigglesworth@.yahoo.co.uk> wrote in message
> news:1186730967.614276.138730@.z24g2000prh.googlegroups.com...
>
>sql

Best SAN configuration for partitioned table

Hi,
I wondered if anyone had any thoughts on the best way to go about
splitting up a san array for a 500gb fact table. Its going to need to
be partitioned to allow for the overnight processing to complete on
time but what is the best way to split the san array for it.
I have about 10-13 san disks available for the table which leaves me
enough space on the other disks for the other database objects and
tempdb, logs etc.
The table will be partitioned into 13 logical weeks but would it be
best to allocate one disk per partition or have a 13 disk raid group
and put all 13 partitions on that and have it striped?
Any thoughts?
Thanks
Ian.I am *far* from an expert, but here's my thoughts...
For having each partition on a separate disk or spindle, the question is are
you going to be using that partitioned table in parallel? Meaning, are you
going to be accessing or modifying several, if not all 13 weeks, at the same
time? If so, then it would make sense (depending on the processing power of
the SAN - EMC's DMX would be able to handle this all in parallel) to
separate the partitions into distinct spindles.
If you're not sure about parallel operations, then I would probably through
them into a JBOD (Just a Bunch Of Disks) set up like your second
alternative.
Here's my school of thought on this: You separate your TEMPDB (Very
important in SQL Server 2005), your LOGS, your INDEXES, and your TABLES onto
separate sets of spindles. That's usually a good starting point for an EMC
type of SAN. From there, you would then get into further tuning to see if
you would receive any additional benefit of separating objects (Tables,
Partitions, Indexes, etc) onto distinct spindles... remember you're
increasing management of the disk system when you do that, so it's helpful
to determine the benefit, if any.
FYI, I originated from the Oracle School of Thought, so it might clash with
some SQL Server admins ;)
--
JASON
"ianwr" <ianwrigglesworth@.yahoo.co.uk> wrote in message
news:1186730967.614276.138730@.z24g2000prh.googlegroups.com...
> Hi,
> I wondered if anyone had any thoughts on the best way to go about
> splitting up a san array for a 500gb fact table. Its going to need to
> be partitioned to allow for the overnight processing to complete on
> time but what is the best way to split the san array for it.
> I have about 10-13 san disks available for the table which leaves me
> enough space on the other disks for the other database objects and
> tempdb, logs etc.
> The table will be partitioned into 13 logical weeks but would it be
> best to allocate one disk per partition or have a 13 disk raid group
> and put all 13 partitions on that and have it striped?
> Any thoughts?
> Thanks
> Ian.
>|||Hi Ian,
thanks for your question around distributing data across spindles on a SAN.
Based on our experiences with many SQL 2005 data warehousing customers, I
would encourage you to distribute your data across all spindles. This should
give you good IO parallelism in case your query touches only one partition,
and it should also give you similarly high parallelism for a query that
touches many partitions. Depending on how large your data in a single
partition is and how many cores you have in your system, you want to avoid
cases where you exercise only one spindle and have many cores idle waiting
for data from the IO subsystem.
As Jason pointed out, it certainly makes sense to keep log and tempdb on
separate sets of spindles. You might want to experiment with index and data
on the same set of spindles. Usually, this already works sufficiently well as
compared to separate index and data. AS always, further tuning may be needed
depending on the characteristics of your workload.
Hope this makes sense and helps you with your SAN configuration.
Best regards,
Torsten Grabs
Program Manager
Microsoft SQL Server Query Processor
"Jason Fay" wrote:
> I am *far* from an expert, but here's my thoughts...
> For having each partition on a separate disk or spindle, the question is are
> you going to be using that partitioned table in parallel? Meaning, are you
> going to be accessing or modifying several, if not all 13 weeks, at the same
> time? If so, then it would make sense (depending on the processing power of
> the SAN - EMC's DMX would be able to handle this all in parallel) to
> separate the partitions into distinct spindles.
> If you're not sure about parallel operations, then I would probably through
> them into a JBOD (Just a Bunch Of Disks) set up like your second
> alternative.
> Here's my school of thought on this: You separate your TEMPDB (Very
> important in SQL Server 2005), your LOGS, your INDEXES, and your TABLES onto
> separate sets of spindles. That's usually a good starting point for an EMC
> type of SAN. From there, you would then get into further tuning to see if
> you would receive any additional benefit of separating objects (Tables,
> Partitions, Indexes, etc) onto distinct spindles... remember you're
> increasing management of the disk system when you do that, so it's helpful
> to determine the benefit, if any.
> FYI, I originated from the Oracle School of Thought, so it might clash with
> some SQL Server admins ;)
> --
> JASON
>
> "ianwr" <ianwrigglesworth@.yahoo.co.uk> wrote in message
> news:1186730967.614276.138730@.z24g2000prh.googlegroups.com...
> > Hi,
> >
> > I wondered if anyone had any thoughts on the best way to go about
> > splitting up a san array for a 500gb fact table. Its going to need to
> > be partitioned to allow for the overnight processing to complete on
> > time but what is the best way to split the san array for it.
> >
> > I have about 10-13 san disks available for the table which leaves me
> > enough space on the other disks for the other database objects and
> > tempdb, logs etc.
> >
> > The table will be partitioned into 13 logical weeks but would it be
> > best to allocate one disk per partition or have a 13 disk raid group
> > and put all 13 partitions on that and have it striped?
> >
> > Any thoughts?
> >
> > Thanks
> >
> > Ian.
> >
>
>

Monday, March 19, 2012

Best Practice, Stored Procedures & Datasets

Hi,
Would be interested to hear your thoughts on whether is best to minimise or
maximise the use of stored procedures in SQL Reporting. Is it best to have
as much as possible coming from Stored Procedures or is it better to have as
little as possible. At this stage I am more concerned about management
rather than performance. However I'd love to hear arguments from all sides !
Cheers,
JayI prefer to use stored procedures as much as possible. There are two causes
for this: I think it's more efficient to let SQL Server handle the
processing needed to return the correct dataset and just use RS to do the
formatting, and it's usually easier to change the stored procedures later
than your report. But if you know you won't be able to access the SQL server
later, you should put the queries and logic in your RS report.
Kaisa M. Lindahl
"Jay Sanderson" <jay@.REMOVEMEacttiv.com> wrote in message
news:evEMmciJGHA.604@.TK2MSFTNGP14.phx.gbl...
> Hi,
> Would be interested to hear your thoughts on whether is best to minimise
> or maximise the use of stored procedures in SQL Reporting. Is it best to
> have as much as possible coming from Stored Procedures or is it better to
> have as little as possible. At this stage I am more concerned about
> management rather than performance. However I'd love to hear arguments
> from all sides !
> Cheers,
> Jay
>|||Yes, I would recommend using Stored Procedures as much as possible as you can
usually reuse them in other reports. I find it particualarly useful to use
SPs for returning default paramaters as they are usually the same across most
reports I build and if you need to change any logic you just do it once and
you dont even need to republish the report or anything.
If you store the sql query in the report itself this will become
unmanageable as the amount of reports increase. Any small business logic
change will mean you have to trawl through all your reports to edit your
queries and then republsh the reports. SPs will save you all this hassle.
"Kaisa M. Lindahl" wrote:
> I prefer to use stored procedures as much as possible. There are two causes
> for this: I think it's more efficient to let SQL Server handle the
> processing needed to return the correct dataset and just use RS to do the
> formatting, and it's usually easier to change the stored procedures later
> than your report. But if you know you won't be able to access the SQL server
> later, you should put the queries and logic in your RS report.
> Kaisa M. Lindahl
> "Jay Sanderson" <jay@.REMOVEMEacttiv.com> wrote in message
> news:evEMmciJGHA.604@.TK2MSFTNGP14.phx.gbl...
> > Hi,
> >
> > Would be interested to hear your thoughts on whether is best to minimise
> > or maximise the use of stored procedures in SQL Reporting. Is it best to
> > have as much as possible coming from Stored Procedures or is it better to
> > have as little as possible. At this stage I am more concerned about
> > management rather than performance. However I'd love to hear arguments
> > from all sides !
> >
> > Cheers,
> >
> > Jay
> >
>
>

Saturday, February 25, 2012

Best Data Type for a Tracking ID?

Hello,
We're in the table design process and could really use outside thoughts on
our options.
The table in question will grow to several million records. This table will
update and insert depending on what the tracking ID does as it lives out it
day long life span.
Example Stored Proc:
UPDATE tbTracking
SET myCount = myCount +1
WHERE trackingID = @.trackingID AND productID = @.productID
IF (@.@.rowcount = 0)
BEGIN
INSERT INTO tbTracking (...) VALUES (...) ;
END
Everyday at midnight the table will be truncate just afer many other queries
generated reports and store cumulative data into other tables.
Our goal is extreme performance, with that in mind what is the best data
type for trackingID and productID? We have the luxury to make them whatever
is best, as this is a completely new project.
Internally debated NVARCHR(50), INT, Binary, UNIQUEIDENTIFIER and the like,
but we're in need of advise from those with more experience with these
matters.
Thank you for your time.
MarkMark S. wrote:

> Hello,
> We're in the table design process and could really use outside thoughts on
> our options.
> The table in question will grow to several million records. This table wil
l
> update and insert depending on what the tracking ID does as it lives out i
t
> day long life span.
> Example Stored Proc:
> UPDATE tbTracking
> SET myCount = myCount +1
> WHERE trackingID = @.trackingID AND productID = @.productID
> IF (@.@.rowcount = 0)
> BEGIN
> INSERT INTO tbTracking (...) VALUES (...) ;
> END
> Everyday at midnight the table will be truncate just afer many other queri
es
> generated reports and store cumulative data into other tables.
> Our goal is extreme performance, with that in mind what is the best data
> type for trackingID and productID? We have the luxury to make them whateve
r
> is best, as this is a completely new project.
> Internally debated NVARCHR(50), INT, Binary, UNIQUEIDENTIFIER and the like
,
> but we're in need of advise from those with more experience with these
> matters.
> Thank you for your time.
> Mark
This doesn't make much sense to me. As described it isn't clear whether
trackingid is a surrogate key or not. If it isn't then I don't
understand what its purpose is. What is the business meaning of
trackingid? Why isn't the datatype predetermined? What key or keys
exist in this table?
Maybe we are talking surrogate keys here, in which case see:
http://www.aspfaq.com/show.asp?id=2504

> Everyday at midnight the table will be truncate just afer many other queri
es
> generated reports and store cumulative data into other tables.
A guiding in principle in data warehouse applications is usually to
capture the data at the finest possible level of granularity. I expect
you've considered this, but I just thought it worth restating for the
benefit of all.
David Portas
SQL Server MVP
--|||Who or what determines the value of TrackingID?
If this key is to uniquely identify an event (such as a support call), then
perhaps (8 byte) datetime would be the logical choice. It has inherent
meaning, and unless you have hundreds of support calls coming in per minute,
it is unlikely that this key value would be duplicated, and retry logic on
the insert procedure or statement can handle the unlikely event that does.
There may even be a need to purge the tbTracking table based on < a specific
date/time rather than truncating the entire table at the end of the day.
As for ProductID, this should be an existing attribute in your database.
"Mark S." <marks@.yahoo.com> wrote in message
news:Ow6CKQ9CGHA.4004@.tk2msftngp13.phx.gbl...
> Hello,
> We're in the table design process and could really use outside thoughts on
> our options.
> The table in question will grow to several million records. This table
> will update and insert depending on what the tracking ID does as it lives
> out it day long life span.
> Example Stored Proc:
> UPDATE tbTracking
> SET myCount = myCount +1
> WHERE trackingID = @.trackingID AND productID = @.productID
> IF (@.@.rowcount = 0)
> BEGIN
> INSERT INTO tbTracking (...) VALUES (...) ;
> END
> Everyday at midnight the table will be truncate just afer many other
> queries generated reports and store cumulative data into other tables.
> Our goal is extreme performance, with that in mind what is the best data
> type for trackingID and productID? We have the luxury to make them
> whatever is best, as this is a completely new project.
> Internally debated NVARCHR(50), INT, Binary, UNIQUEIDENTIFIER and the
> like, but we're in need of advise from those with more experience with
> these matters.
> Thank you for your time.
> Mark
>|||In my experiences, int data type is the best performer especially when it
comes to indexing, joins, etc. (This comment relates to 32-bit O/S and SQL
Server 2000.)
I would make a meaningless key:
TrackingRowId int, identity, primary key, clustered
ProductRowId int, identity, primary key, clustered
SQL Server takes care of assigning the next number. The clustered index
will always insert data at the end of the data pages which avoids page split
s.
If you want to reset the tracking id each midnight, simply change the reseed
the tables as part of your processing.
Personally...
I would NOT use a character string for a key because its slower to compare
strings.
I would NOT use a UNIQUEIDENTIFIER for a key because it is a pain in the
butt when you have to write a manual query to pull data.
Finally, I have to reiterate David's point of capturing data at the lowest
level. Are you sure you want to summarize and truncate on a nightly basis.
What if a bug is discovered? You won't have the original data to recreate
the totals. What if a new way of summarizing is added? You won't have the
original data to drill into or summarize.
And, to reiterate JT's point of having a datetime column that can be very
handy. It is debateable whether to use it as the key or not. If you're
truncating the tables each night, it may be a waste.
Just my two cents,
Joe
"Mark S." wrote:

> Hello,
> We're in the table design process and could really use outside thoughts on
> our options.
> The table in question will grow to several million records. This table wil
l
> update and insert depending on what the tracking ID does as it lives out i
t
> day long life span.
> Example Stored Proc:
> UPDATE tbTracking
> SET myCount = myCount +1
> WHERE trackingID = @.trackingID AND productID = @.productID
> IF (@.@.rowcount = 0)
> BEGIN
> INSERT INTO tbTracking (...) VALUES (...) ;
> END
> Everyday at midnight the table will be truncate just afer many other queri
es
> generated reports and store cumulative data into other tables.
> Our goal is extreme performance, with that in mind what is the best data
> type for trackingID and productID? We have the luxury to make them whateve
r
> is best, as this is a completely new project.
> Internally debated NVARCHR(50), INT, Binary, UNIQUEIDENTIFIER and the like
,
> but we're in need of advise from those with more experience with these
> matters.
> Thank you for your time.
> Mark
>
>|||Gentleman thank you for your feedback.
Joe, other than the UniqueIndentifier being unfriendly for humans, do you
find it faster than INTs?
As to th other questions and caveats, all that has been considered
previously, and wasn't mentioned in my question in the interests of brevity
and an attempt to focus my question on a single point. Regardless, thank you
for your full consideration.
Cheers,
Mark
"Joe from WI" <JoefromWI@.discussions.microsoft.com> wrote in message
news:50A0324B-1446-4044-A850-706B59A04169@.microsoft.com...
> In my experiences, int data type is the best performer especially when it
> comes to indexing, joins, etc. (This comment relates to 32-bit O/S and
> SQL
> Server 2000.)
> I would make a meaningless key:
> TrackingRowId int, identity, primary key, clustered
> ProductRowId int, identity, primary key, clustered
> SQL Server takes care of assigning the next number. The clustered index
> will always insert data at the end of the data pages which avoids page
> splits.
> If you want to reset the tracking id each midnight, simply change the
> reseed
> the tables as part of your processing.
> Personally...
> I would NOT use a character string for a key because its slower to compare
> strings.
> I would NOT use a UNIQUEIDENTIFIER for a key because it is a pain in the
> butt when you have to write a manual query to pull data.
> Finally, I have to reiterate David's point of capturing data at the lowest
> level. Are you sure you want to summarize and truncate on a nightly
> basis.
> What if a bug is discovered? You won't have the original data to recreate
> the totals. What if a new way of summarizing is added? You won't have
> the
> original data to drill into or summarize.
> And, to reiterate JT's point of having a datetime column that can be very
> handy. It is debateable whether to use it as the key or not. If you're
> truncating the tables each night, it may be a waste.
> Just my two cents,
> Joe
> "Mark S." wrote:
>|||The last line of the article David suggested
http://www.aspfaq.com/show.asp?id=2504
Says alot about GUI() not being optimized, if anyone disagrees, feel free to
speak up:
"the wider datatype leads to a drop in index performance (if clustered, each
insert almost guaranteed to 'dirty' a different page), and an increase in
storage requirements; " and five other cons.
Thank you.|||To maximize INSERT performance: use INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED
WITH FILLFACTOR = 100
This causes every new row to be added at the end of the table--minimizing
index maintenance and eliminating page splits. Use caution, however: since
SQL Server doesn't automatically reorganize indexes, a high volume of DELETE
activity will cause the index to become sparse, which can reduce SELECT
performance, but it will not affect INSERT performance.
The INT datatype matches the word size of most Intel processors (32-bit), so
comparisons require fewer CPU cycles.
The stored procedure is an example of what not to do. It's a recipie for
primary key constraint violations. There's nothing that prevents two
transactions from trying to INSERT the same information at the same time.
Most of the time what will happen is that one connection will succeed with
the INSERT and the other will UPDATE the newly inserted row, but a collision
will occur if identical UPDATE statements occur simultaneously on two
unbound connections followed by (since the row doesn't yet exist)
simultaneous identical INSERT statements.
The correct method is to use something like this:
BEGIN TRAN
IF EXISTS (SELECT WITH(UPDLOCK, HOLDLOCK))
UPDATE
ELSE
INSERT
COMMIT
Some people use the following instead, but I prefer the above method since
it is easier to read and understand
BEGIN TRAN
INSERT...SELECT...WHERE NOT EXISTS(SELECT WITH(UPDLOCK, HOLDLOCK))
IF @.@.ROWCOUNT = 0
UPDATE
COMMIT
Note that there isn't any marked reduction in performance or concurrency
between this and your sample, because UPDLOCK doesn't block SELECTs, and any
blocking that does occur is necessary to maintain integrity. Without the
EXISTS clause, the INSERT or UPDATE will be applying an exclusive lock
anyway which involves reading the index page into memory. With the EXISTS
clause, the SELECT reads the index page and applies an update lock, and the
INSERT or UPDATE simply transition from an update lock to an exclusive lock
in memory--no additional physical read is necessary.
"Mark S." <marks@.yahoo.com> wrote in message
news:Ow6CKQ9CGHA.4004@.tk2msftngp13.phx.gbl...
> Hello,
> We're in the table design process and could really use outside thoughts on
> our options.
> The table in question will grow to several million records. This table
> will update and insert depending on what the tracking ID does as it lives
> out it day long life span.
> Example Stored Proc:
> UPDATE tbTracking
> SET myCount = myCount +1
> WHERE trackingID = @.trackingID AND productID = @.productID
> IF (@.@.rowcount = 0)
> BEGIN
> INSERT INTO tbTracking (...) VALUES (...) ;
> END
> Everyday at midnight the table will be truncate just afer many other
> queries generated reports and store cumulative data into other tables.
> Our goal is extreme performance, with that in mind what is the best data
> type for trackingID and productID? We have the luxury to make them
> whatever is best, as this is a completely new project.
> Internally debated NVARCHR(50), INT, Binary, UNIQUEIDENTIFIER and the
> like, but we're in need of advise from those with more experience with
> these matters.
> Thank you for your time.
> Mark
>|||Thank you very much.
"Brian Selzer" <brian@.selzer-software.com> wrote in message
news:%23oG2PTDDGHA.3980@.TK2MSFTNGP12.phx.gbl...
> To maximize INSERT performance: use INT IDENTITY(1, 1) PRIMARY KEY
> CLUSTERED WITH FILLFACTOR = 100
> This causes every new row to be added at the end of the table--minimizing
> index maintenance and eliminating page splits. Use caution, however:
> since SQL Server doesn't automatically reorganize indexes, a high volume
> of DELETE activity will cause the index to become sparse, which can reduce
> SELECT performance, but it will not affect INSERT performance.
> The INT datatype matches the word size of most Intel processors (32-bit),
> so comparisons require fewer CPU cycles.
> The stored procedure is an example of what not to do. It's a recipie for
> primary key constraint violations. There's nothing that prevents two
> transactions from trying to INSERT the same information at the same time.
> Most of the time what will happen is that one connection will succeed with
> the INSERT and the other will UPDATE the newly inserted row, but a
> collision will occur if identical UPDATE statements occur simultaneously
> on two unbound connections followed by (since the row doesn't yet exist)
> simultaneous identical INSERT statements.
> The correct method is to use something like this:
> BEGIN TRAN
> IF EXISTS (SELECT WITH(UPDLOCK, HOLDLOCK))
> UPDATE
> ELSE
> INSERT
> COMMIT
> Some people use the following instead, but I prefer the above method since
> it is easier to read and understand
> BEGIN TRAN
> INSERT...SELECT...WHERE NOT EXISTS(SELECT WITH(UPDLOCK, HOLDLOCK))
> IF @.@.ROWCOUNT = 0
> UPDATE
> COMMIT
> Note that there isn't any marked reduction in performance or concurrency
> between this and your sample, because UPDLOCK doesn't block SELECTs, and
> any blocking that does occur is necessary to maintain integrity. Without
> the EXISTS clause, the INSERT or UPDATE will be applying an exclusive lock
> anyway which involves reading the index page into memory. With the EXISTS
> clause, the SELECT reads the index page and applies an update lock, and
> the INSERT or UPDATE simply transition from an update lock to an exclusive
> lock in memory--no additional physical read is necessary.
> "Mark S." <marks@.yahoo.com> wrote in message
> news:Ow6CKQ9CGHA.4004@.tk2msftngp13.phx.gbl...
>