Thursday, March 22, 2012

Best practices: GROUP BY clause

I was wondering what the best way to write a GROUP BY clause when there are many (and time consuming) operations in the fields by grouped.

Fictious example:

SELECT DeptNo, AVG(Salary) FROM Department GROUP BY DeptNo;

This will give me the average salary per department. Let's say, however that
I had 10-15 fields being returned (along with the AVG(Salary)) and some fields even had operations being performed on them. Is it better to create a temporary table to calculate the sum per department (or a VIEW) and then
perform a JOIN with the rest of the data?

Fictious example:

SELECT DATENAME(y, StartDate), DATENAME(m, StartDate), DATEPART(d, StartDate), SUBSTR(DeptName, 1, 10), SomeFunction(SomeField), SomeFunction(SomeField), AVG(Salary)
GROUP BY DATENAME(y, StartDate), DATENAME(m, StartDate), DATEPART(d, StartDate), SUBSTR(DeptName, 1, 10), SomeFunction(SomeField), SomeFunction(SomeField);

Am I better off writing my query this way or using a JOIN on some temporary table or view?

ThanksWrite your query this way. One way to maximize the efficiency of a process is to reduce the number of times the server has to scan through the data. By putting all your aggregate functions in a single statement, the server only needs to run through the dataset one time.

But are all those datename and datepart functions necessary? That seems kind of wastefull. You could accomplish the same thing just by sorting by date.|||Bindman...That was just an example I made up. I am not asking this for a particular case right now, but I have in the past had queries that had many fields, and many of those fields had math/string/etc functions performed on them. Most of the time This was to provide formatting for a query that would be dumped into a report or out put to the user. For example, I might format an ID by Left padding with zeroes:
RIGHT(REPLICATE(MyPadChar, MyFieldWidth) + CAST(MyID AS VARCHAR), MyFieldWidth) AS [MyFormattedID]......

so all those fields would appear in my group by clause....I was wondering if this was a good practice.

Thanks|||It's acceptable in my opinion.

Anybody else want to comment on this?|||SELECT * FROM (
SELECT DATENAME(y, StartDate) AS Col1
, DATENAME(m, StartDate) AS Col2
, DATEPART(d, StartDate) AS Col3
, SUBSTRING(DeptName, 1, 10) AS Col4
, SomeFunction(SomeField) AS Col5
, SomeFunction(SomeField) AS Col6
, AVG(Salary) AS Col7
FROM myTable99) AS XXX
GROUP BY Col1, Col2, Col3, Col4, Col5, Col6|||Yeah, I thought about suggesting that. I've used it for clarity of coding before, but can you think of any reason it might or might not be more efficient? I guess the question is, when you include a formula in the output and also specify it in the GROUP BY clause, does the server calculate the formula twice, or is it smart enough to just calculate it once?|||Kaiser. Thanks for the hint...that'll clean things up a whole lot...
As for blindman, yeah, I'd like to know if evaluation takes place twice when the quesy is run.|||I guess the question is, when you include a formula in the output and also specify it in the GROUP BY clause, does the server calculate the formula twice, or is it smart enough to just calculate it once?

It had better be once, since it's a derived table...I never checked, but a SHOWPLAN should tell you what's up.

But again, this is M$, so you never know...

But since I'm a betting kinda guy...

$US1000.00 on Once to Win....|||Kaiser. Thanks for the hint....

Your welcome....Kaiser? Try Brett, x002548, or Ski (get it...)

Where are you in the world?|||Then would you agree that your subquery example would not be more efficient than coding formulas in the WHERE clause, though it scores points for clarity?|||No...would you agree that using formulas would cause non sargable predicates there by invalidating the use of any index?|||Yes, but the use of indexes is lost anyway when you filter on the results of forumulas in the subquery. I don't see how either of these methods would make efficient use of indexes.|||In the subquery, Query Analyser still complains about the subquery you mentioned. If it sees an aggregate function with other fields, you have to GROUP BY the other fields.
Sorry, I read it too fast the first time. I thought I saw:

SELECT ClientID AS Col1, OrderID AS Col2, SUM(Price) AS Col3
GROUP BY Col1, Col2;

I did not see the subquery and thought that what you gave me was similiar to
the ORDER BY 1, 2, 3... clause (instead of using actual column names)|||scores points for clarity? are you talking about post #5?

yes, assuming all the errors were fixed up!|||Yeah, I meant post #5. (the thread is getting a little long).

Um, the subquery won't run because of the aggregate missing a GROUP BY clause.

I also couldn't get the group clause to work on a column alias...|||select col1, col2, col3, col4, col5, col6, col7
from (
select datename(y, startdate) as col1
, datename(m, startdate) as col2
, datepart(d, startdate) as col3
, substring(deptname, 1, 10) as col4
, somefunction(somefield) as col5
, somefunction(somefield) as col6
, avg(salary) as col7
from mytable99
group
by datename(y, startdate)
, datename(m, startdate)
, datepart(d, startdate)
, substring(deptname, 1, 10)
, somefunction(somefield)
, somefunction(somefield)
) as xxx

No comments:

Post a Comment