Friday, February 24, 2012

Best Approach at Refreshing large table

I have a large table that needs to be refreshed on a regular basis. This
data is read-only, it will not be modified. I import the refreshed data in
its entirety (all data exists in a fresh table).
I can think of 3 ways to do the refresh but I'm not sure how to evaluate
which option is best:
1. Import the refreshed table
Drop the original table
Rename the refreshed table to original table
2. Import the refreshed table
Truncate the old table
Insert all the records from the refresh table into the original table
3. Import the refreshed table
Delete all records in the original table that do not exist in the
refresh table
Insert all records in the refresh table that do not exist in the
original table
Update all records in the original that are not the same as those in the
refresh table
Method 1 might lead to "invalid object" errors if users attempt to access
the table during the refresh so I do not think this is a good choice.
I think method 2 and 3 need to be evaluated based on the locks they use and
the time they take to execute.
What would happen if a user attempts to access the table during the TRUNCATE
and INSERT? Will it be locked from any SELECTs until after the TRUNCATE?
After the INSERT completes?
What about option 3? What kind of access will a user have during these 3
data modification actions INSERT, DELETE and UPDATE?
As I mentioned, the data is read-only and I need to try to maintain maximum
accessibility for the users.
Any comments are appreciated.
DaveHi,
If you need to provide maximum accssibility to user then go for step3.
Ensure that when ever you do a DML, do it row level.
So only that record will be locked and users will be able to view data with
out any problems.
Thanks
Hari
MCDBA
"DaveF" <davef@.comcast.net> wrote in message
news:OEhj0n36DHA.2480@.TK2MSFTNGP10.phx.gbl...
quote:

> I have a large table that needs to be refreshed on a regular basis. This
> data is read-only, it will not be modified. I import the refreshed data

in
quote:

> its entirety (all data exists in a fresh table).
> I can think of 3 ways to do the refresh but I'm not sure how to evaluate
> which option is best:
> 1. Import the refreshed table
> Drop the original table
> Rename the refreshed table to original table
> 2. Import the refreshed table
> Truncate the old table
> Insert all the records from the refresh table into the original table
> 3. Import the refreshed table
> Delete all records in the original table that do not exist in the
> refresh table
> Insert all records in the refresh table that do not exist in the
> original table
> Update all records in the original that are not the same as those in

the
quote:

> refresh table
>
> Method 1 might lead to "invalid object" errors if users attempt to access
> the table during the refresh so I do not think this is a good choice.
> I think method 2 and 3 need to be evaluated based on the locks they use

and
quote:

> the time they take to execute.
> What would happen if a user attempts to access the table during the

TRUNCATE
quote:

> and INSERT? Will it be locked from any SELECTs until after the TRUNCATE?
> After the INSERT completes?
> What about option 3? What kind of access will a user have during these 3
> data modification actions INSERT, DELETE and UPDATE?
> As I mentioned, the data is read-only and I need to try to maintain

maximum
quote:

> accessibility for the users.
> Any comments are appreciated.
> Dave
>
>
>
|||Dave,
If availability is the concern, and the data is truly read-only, then here's
what I would do...
Let's make believe you're refreshing "ORDERS"
A1.) Import fresh data into staging table, say it's called "ORDERS_STAGE".
B1.) Set transaction isolation level serializable.
B2.) Begin transaction.
B3.) Rename existing table, say it's called "ORDERS", to "ORDERS_OLD"
B4.) Rename staging table, say it's called "ORDERS_STAGE" to "ORDERS"
B5.) Commit transaction
C1.) Drop "old" table.
This way, the data is only 'unavailable' for milliseconds, and the side
benefit is that, because of the locks acquired, your users will not receive
the 'invalid object name' errors.
I do this in production environments all the time.
Of course, there's a little more to it if you've got DRI and what-not, but
I'm sure you get the idea.
James Hokes
"DaveF" <davef@.comcast.net> wrote in message
news:OEhj0n36DHA.2480@.TK2MSFTNGP10.phx.gbl...
quote:

> I have a large table that needs to be refreshed on a regular basis. This
> data is read-only, it will not be modified. I import the refreshed data

in
quote:

> its entirety (all data exists in a fresh table).
> I can think of 3 ways to do the refresh but I'm not sure how to evaluate
> which option is best:
> 1. Import the refreshed table
> Drop the original table
> Rename the refreshed table to original table
> 2. Import the refreshed table
> Truncate the old table
> Insert all the records from the refresh table into the original table
> 3. Import the refreshed table
> Delete all records in the original table that do not exist in the
> refresh table
> Insert all records in the refresh table that do not exist in the
> original table
> Update all records in the original that are not the same as those in

the
quote:

> refresh table
>
> Method 1 might lead to "invalid object" errors if users attempt to access
> the table during the refresh so I do not think this is a good choice.
> I think method 2 and 3 need to be evaluated based on the locks they use

and
quote:

> the time they take to execute.
> What would happen if a user attempts to access the table during the

TRUNCATE
quote:

> and INSERT? Will it be locked from any SELECTs until after the TRUNCATE?
> After the INSERT completes?
> What about option 3? What kind of access will a user have during these 3
> data modification actions INSERT, DELETE and UPDATE?
> As I mentioned, the data is read-only and I need to try to maintain

maximum
quote:

> accessibility for the users.
> Any comments are appreciated.
> Dave
>
>
>
|||Hari,
His data is static.
Step 3 would blow chunks in any sort of VLDB situation.
Not to mention that your suggestion of row-level DML goes against the grain
of set-based RDBMS theory.
This is nothing more than a simple table-swap op.
James Hokes|||Excellent!
Thanks very much.
"James Hokes" <noemail@.noway.com> wrote in message
news:e4$vP846DHA.452@.TK2MSFTNGP11.phx.gbl...
quote:

> Dave,
> If availability is the concern, and the data is truly read-only, then

here's
quote:

> what I would do...
> Let's make believe you're refreshing "ORDERS"
> A1.) Import fresh data into staging table, say it's called "ORDERS_STAGE".
> B1.) Set transaction isolation level serializable.
> B2.) Begin transaction.
> B3.) Rename existing table, say it's called "ORDERS", to "ORDERS_OLD"
> B4.) Rename staging table, say it's called "ORDERS_STAGE" to "ORDERS"
> B5.) Commit transaction
> C1.) Drop "old" table.
> This way, the data is only 'unavailable' for milliseconds, and the side
> benefit is that, because of the locks acquired, your users will not

receive
quote:

> the 'invalid object name' errors.
> I do this in production environments all the time.
> Of course, there's a little more to it if you've got DRI and what-not, but
> I'm sure you get the idea.
> James Hokes
> "DaveF" <davef@.comcast.net> wrote in message
> news:OEhj0n36DHA.2480@.TK2MSFTNGP10.phx.gbl...
This[QUOTE]
data[QUOTE]
> in
table[QUOTE]
> the
access[QUOTE]
> and
> TRUNCATE
TRUNCATE?[QUOTE]
3[QUOTE]
> maximum
>

No comments:

Post a Comment