Showing posts with label files. Show all posts
Showing posts with label files. Show all posts

Sunday, March 25, 2012

Best practise/architecture question

I need to load a lot of Excel, CSV, ... etc. files. These files have hundreds of columns and I need to validate the data. Some are simple range type checking, some are more complex checking involve multiple columns.

There may have several hundreds of such rules. And I may need to let the program to automatically correct some invalid data in the future.

Where to implement it in SSIS?
Or just load the files without any checking (all type to text), and checking using T-SQL?

(BTW, I don't have biztalk server).

Thanks in advance.

Read more >> Options >>

SSIS has seevral components in the data flow that can help you in the data cleansing/transformation: Derived column, script tasks, etc; so if you have to meve the data from point a to point B; you could apply all the transformation rules as a part of that process.

Based on the litle information I got, I would try with first with SSIS.

|||

I will use SSIS problem. However, I don't want to "hard code" all these rules using SSIS component because

1. There are so many rules. And rules may need to be updated.

2. User will manage the rules. It's not possible to teach them to use VS.Net to update the SSIS package

Maybe a script component to call a C# assembly, which maintain parse rules in a, for example, XML file....

|||

We have a similar issue - we have about 6-8 flat file types, with 50-80 columns each and one-lots of rules for each column. The approach I'm taking is a hybrid. I'm first loading into a catch-all (all characters) staging database, and then using isdate, isnumeric, etc. in TSQL to then pull the data back out into SSIS to do some of the things that SSIS is good at. Where isdate & isnumeric fails on the extract, I'm putting a generic error value in using a CASE in my select (either -1, or '9999-12-31' for dates) so I can at least get the data cleaned before it heads back into SSIS & the production database. From there I'm doing my multiple column validation, or other "softer" error handling. My guess is you will not be able to give the users something to configure absolutely every validation. For us it's going to be a trade-off - I'm going to enforce the high-level/blatant validations up front and do things that aren't going to change often, while giving the user the ability to configure range validations, etc.

|||

Yes, I was thinking doing the same thing. However, it sounds it ruins the purpose of using SSIS as an ETL tool.

But it sounds not easy to use SSIS as a configurable ETL tool.

|||

ydbn wrote:

Yes, I was thinking doing the same thing. However, it sounds it ruins the purpose of using SSIS as an ETL tool.

But it sounds not easy to use SSIS as a configurable ETL tool.

I think your scenario represent the same challenge regardless the ETL tool you choose. Do you have an example of your ideal solution using a different tool? If you shared that with us, I am pretty sure we could help in getting something similar in SSIS terms.

BTW, what about creating a custom task that gets the rules from a table per every file type? You could store the transformations rules in a table where they can be easily updated.

There are more than one way to tackle this problem; but flexibility won't come for free; if you want something robust an reusable, get ready to spend some time in the design table.

sql

Thursday, March 22, 2012

Best Practices sFTPing files between servers

Hi-

I have a sql 2005 database that stores weekly time and attendance data for 500 users. I need to send the file on a scheduled basis.

I'd alo like to have programmatic control over sending the file via my asp,net application's admin pages. (I have roles and security set up on the site)

My question is two fold:

Can asp.net be scripted to manage a data extraction to a file, (I gotta HOPE so) THEN is there any sFTP functions built into asp.net?

I know I can do this in SQL2005, then have a scheduled event fire off the file via sFTP. but like I said, I'd like to have programmatic control over the file sending. (Of course they have to be sent on SUNDAY!)

Thanks in advance for any advise

You can have a button that when its cliked, you check on the day. If it is Sunday (for example) then call some stored procedures which basically will run a DTS/SSIS package which you can confugure it to take the needed data from the database and dump it in Excel file. Then you send this Excel file (e.g. Weekly Report) to the management though email (as attached file).

How to run the DTS/SSIS package from ASP.NET, please have a look on these links:


http://www.sqlteam.com/article/how-to-asynchronously-execute-a-dts-package-from-asp-or-aspnet

http://www.sqldev.net/dts/ExecutePackage.htm#Visual%20Csharp

Good luck.

|||

4Ever-

First, thank you for helping out so much in these forums.

Second, great idea, but I have a specific file format constraints as well as an sFTP directory my flat file needs to show up in.

So let me repackage my question.

Is there a way to (stored procedures?) script sql server 2005 to export and import data to and from a sFTP directory on another server? I gotta think I am not the only one that ever had this requirement!

Thanks again for your help.

Dan

|||

Harperator:

4Ever-

First, thank you for helping out so much in these forums.

Thanks Dan.

I think this is a good time to use BizTalk, but there might be a less expensive solutoions..

What about createing a C# windows services to do this or if you have a complex logic use VB script (not sure about it)?

So, to make things clear .. You have a sFTP which reside in diffrent machine that your SQL Server , right?

If this is right, so what is preventing from using the suggest solution above (my previous post)? If it is the file strcture .. can you have it in diffrent format (e.g. CSV) then organzing it in your special strcture?

|||

Hum...

here's the deal:

I have asp.net program running on iis on a win server running sql 2005.

I need to SFTP a file weekly, to and from a remote server.

I need to automate the process.

I can do this by creating a ssis job in sql, have it output the file, then write a batch file to send it via sftp using securefx by Vandike software.

I am just wondering if there is not a simple microsoft solution already built into sql, or iis, of asp.net or windows server.

|||

I do not know if this will help but let us see ..

In your "Transformation" you have a source which is SQL and the destination (e.g. Excel file .. select that path which indicate to your file which is already in another server).

Do your transformation -> Data will be in Excel.

Let your client get the file (e.g. Copy & Pase).

Have a batch which will basically copy data from an empty Excel file to your REAL file -> to have a new Excel file ready to receive the data from SQL Server.

I do not know if this will help .. I hope it is.

Good luck.

|||

Thanks!

Dan

(I broke down and bought sql 2005 unleashed)

Best Practices for RS and VSS

We are starting to use VSS2005 for storing projects and rdl files. We are
mainly using it for the versioning capabilities in VSS. Does anyone have any
tips for best practices on how to utilize Visual Studio 2005 and VSS 2005?
IF anyone has a link or two to sites that go over this topic that would be
helpful too.
The problem have had so far is trying to check out the rdl files through VSS
and then open them in Visual Sutdio to edit them. Whenever I have checked
out a file in VSS and then opened the file using the options in VSS to get to
Visual Studio (devenv) I either get a message stating that the carriage
return is messd up in the file or it opens the rdl code... not the designer
window.
If I check out the file in VSS and then open the file through Visual Studio
without using VSS to get to Visual Studio, everything works fine, but that
just seems a little too manual. Thanks in advacne for your help.On Jul 5, 4:50 pm, bsod55 <bso...@.discussions.microsoft.com> wrote:
> We are starting to use VSS2005 for storing projects and rdl files. We are
> mainly using it for the versioning capabilities in VSS. Does anyone have any
> tips for best practices on how to utilize Visual Studio 2005 and VSS 2005?
> IF anyone has a link or two to sites that go over this topic that would be
> helpful too.
> The problem have had so far is trying to check out the rdl files through VSS
> and then open them in Visual Sutdio to edit them. Whenever I have checked
> out a file in VSS and then opened the file using the options in VSS to get to
> Visual Studio (devenv) I either get a message stating that the carriage
> return is messd up in the file or it opens the rdl code... not the designer
> window.
> If I check out the file in VSS and then open the file through Visual Studio
> without using VSS to get to Visual Studio, everything works fine, but that
> just seems a little too manual. Thanks in advacne for your help.
This is a kind-of undocumented territory. This link might be of
assistance.
http://geekswithblogs.net/VROD/archive/2006/11/22/97817.aspx
Also, if VSS 2005 is not suitable, you could try Subversion or some
other Source Control software.
Hope this helps.
Regards,
Enrique Martinez
Sr. Software Consultant

Sunday, March 11, 2012

best practice for ragged delimited text import

I have tab delimited text files which may have optional fields (meaning they can be not present at all) to the right of the required fields that I care about. It would appear that using a Flat File Connection with Delimited Format (tab) set will choke if it is initially configured with a file that has something like:

data\tdata\tdata\r\n

and it then encounters

data\tdata\tdata\toptionaldata\r\n

It chokes. I know this could be parsed line by line, but that seems silly. It seems like there should be a way to ignore columns beyond a certain point (e.g. Format "Delimited Ragged Right").

Is there some way to do this with a directly with a flat file connection?

Thanks,

--Andrew

If you are guaranteed to have at least one more column than the required ones, you can configure the columns you care about with the tab delimiter, and then a final column with the CR/LF delimiter that will get the rest of the row.

If not (and it looks like you can't, from your example), then you'll have to configure the last required column with the CR/LF delimiter. That way the last column will hold the last value that you want, and the rest of the row, if it exists. Then use a Derived Column transform with the FINDSTRING and SUBSTRING functions to get everything up to the tab.

Sorry, but I don't think there is an easier solution.

Wednesday, March 7, 2012

best option when master db is not available

i want to setup my database to new server machine. i have backup as well as copy of database files (data & log), but i lost my master database from existing server.

so i have 2 ways of having my database

1. restore from backup

2. attach the data files.

which one is best option, when we start afresh on new machine with new master database.

Either method should work. I would prefer the RESTORE.

IF the file location (folder) is different, you may need to add the WITH MOVE option to the RESTORE.

Refer to Books Online, Topic: RESTORE

|||yes both options will work.......i'll advise you to go with restore db from backup but ensure that you start SQL Server in single user mode and then only you can perform restoration..........refer BOL its the best resource|||

thanks for your replies. I have gone thru the BOL.

One thing i need to reconfirm before i proceed, with the loss of master database i lost all the information that the master database hold, in this situation, which of the option will be best.

|||

i feel that there is some confusion in the requirement. ie Whether u r trying to resotre Master Database or User database. To restore a master database of one machine to another machine you have many restriction. OS/SQL SErver Version/Service pack and configuration (if i remember correctly) should be the same. I have also read somewhere that the backup of the same physical machine can ionly be restored(i have never tried this).

To transfer the objects from one server to other there are scripts available. You need to transfer Login/Jobs/DTS. THis is possible through Scripts.

(a) Install new instance of sql server in new machine

(b) transfer the LOgin refer : http://support.microsoft.com/default.aspx/kb/246133

(c) Make script of JOBs and run the script in the destination

(d) use save as option or file object trasfer for DTS

(e) Use Backup /restore for user databases.

Madhu

|||

there should not be any confusion, i want to restore user database.

since my old master db is lost, i lost login, etc. logins i can recreate manually, but i am worried is there is thing else about my User database which is lost along with master db, which i may not recover.

|||

All of the data in the users databases should be intact.

Any Jobs would be in the msdb database, so if you didn't lose msdb, you still have the jobs.

Most likely, the only thing lost is logins.

best location for the tran log?

I have a SQL 2000 database on RAID 5 storage. The data files and the
tran log are all on the same RAID 5 volume - the E drive. My only other
option for placement of the tran log is a partition of the system drive
- the D drive. The system drive is mirrored. The operating system
resides on the other logical partition of the system drive - the C
drive. I'm concerned that if I move the tran log to the D drive it
might conflict with operating system activity. This is a dedicated SQL
Server so I'm hoping that it won't be too much overhead on the system
drive to place the tran log there as well. Am I better off moving the
tran log to the system drive or keeping it on the E drive with the rest
of the data?
*** Sent via Developersdex http://www.codecomments.com ***
Don't just participate in USENET...get rewarded for it!There shouldn't be a problem with sharing a mirrored drive with the OS for
the logs if SQL Server is the only app running on the server.
Andrew J. Kelly SQL MVP
"T Dubya" <timber_toes@.bigfoot.com> wrote in message
news:%23Qx5Um0LFHA.3340@.TK2MSFTNGP14.phx.gbl...
>I have a SQL 2000 database on RAID 5 storage. The data files and the
> tran log are all on the same RAID 5 volume - the E drive. My only other
> option for placement of the tran log is a partition of the system drive
> - the D drive. The system drive is mirrored. The operating system
> resides on the other logical partition of the system drive - the C
> drive. I'm concerned that if I move the tran log to the D drive it
> might conflict with operating system activity. This is a dedicated SQL
> Server so I'm hoping that it won't be too much overhead on the system
> drive to place the tran log there as well. Am I better off moving the
> tran log to the system drive or keeping it on the E drive with the rest
> of the data?
>
> *** Sent via Developersdex http://www.codecomments.com ***
> Don't just participate in USENET...get rewarded for it!|||I'd recommend that you move the transaction log for a couple of reasons.
1. By having the transaction log and data files on the same drives you risk
a loss on failure, or an extension in time to restore.
2. RAID 5 is not as efficient as RAID 1 for performance of the write
intensive transaction log.
If the server is dedicated to SQL server then there should be very little
competition for the drive from other processes.
"T Dubya" wrote:

> I have a SQL 2000 database on RAID 5 storage. The data files and the
> tran log are all on the same RAID 5 volume - the E drive. My only other
> option for placement of the tran log is a partition of the system drive
> - the D drive. The system drive is mirrored. The operating system
> resides on the other logical partition of the system drive - the C
> drive. I'm concerned that if I move the tran log to the D drive it
> might conflict with operating system activity. This is a dedicated SQL
> Server so I'm hoping that it won't be too much overhead on the system
> drive to place the tran log there as well. Am I better off moving the
> tran log to the system drive or keeping it on the E drive with the rest
> of the data?
>
> *** Sent via Developersdex http://www.codecomments.com ***
> Don't just participate in USENET...get rewarded for it!
>|||Thanks, Andrew, for the response. I'll give it a try. The main reason
I had reservations about it is because before I realized that the D
drive was in fact just a logical partition of the system drive, I tried
to dump the entire database to that location. The database dump ran
much slower that way than when I dumped it to the same raid 5 where the
data itself resides - about twice as slow, in fact. I just wanted to be
sure that putting the tran logs on the operating system drive wouldn't
cause a similar slow down. Since it is a dedicated SQL Server hopefully
it won't pose a problem. Thanks again.
*** Sent via Developersdex http://www.codecomments.com ***
Don't just participate in USENET...get rewarded for it!

best location for the tran log?

I have a SQL 2000 database on RAID 5 storage. The data files and the
tran log are all on the same RAID 5 volume - the E drive. My only other
option for placement of the tran log is a partition of the system drive
- the D drive. The system drive is mirrored. The operating system
resides on the other logical partition of the system drive - the C
drive. I'm concerned that if I move the tran log to the D drive it
might conflict with operating system activity. This is a dedicated SQL
Server so I'm hoping that it won't be too much overhead on the system
drive to place the tran log there as well. Am I better off moving the
tran log to the system drive or keeping it on the E drive with the rest
of the data?
*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!There shouldn't be a problem with sharing a mirrored drive with the OS for
the logs if SQL Server is the only app running on the server.
--
Andrew J. Kelly SQL MVP
"T Dubya" <timber_toes@.bigfoot.com> wrote in message
news:%23Qx5Um0LFHA.3340@.TK2MSFTNGP14.phx.gbl...
>I have a SQL 2000 database on RAID 5 storage. The data files and the
> tran log are all on the same RAID 5 volume - the E drive. My only other
> option for placement of the tran log is a partition of the system drive
> - the D drive. The system drive is mirrored. The operating system
> resides on the other logical partition of the system drive - the C
> drive. I'm concerned that if I move the tran log to the D drive it
> might conflict with operating system activity. This is a dedicated SQL
> Server so I'm hoping that it won't be too much overhead on the system
> drive to place the tran log there as well. Am I better off moving the
> tran log to the system drive or keeping it on the E drive with the rest
> of the data?
>
> *** Sent via Developersdex http://www.developersdex.com ***
> Don't just participate in USENET...get rewarded for it!|||I'd recommend that you move the transaction log for a couple of reasons.
1. By having the transaction log and data files on the same drives you risk
a loss on failure, or an extension in time to restore.
2. RAID 5 is not as efficient as RAID 1 for performance of the write
intensive transaction log.
If the server is dedicated to SQL server then there should be very little
competition for the drive from other processes.
"T Dubya" wrote:
> I have a SQL 2000 database on RAID 5 storage. The data files and the
> tran log are all on the same RAID 5 volume - the E drive. My only other
> option for placement of the tran log is a partition of the system drive
> - the D drive. The system drive is mirrored. The operating system
> resides on the other logical partition of the system drive - the C
> drive. I'm concerned that if I move the tran log to the D drive it
> might conflict with operating system activity. This is a dedicated SQL
> Server so I'm hoping that it won't be too much overhead on the system
> drive to place the tran log there as well. Am I better off moving the
> tran log to the system drive or keeping it on the E drive with the rest
> of the data?
>
> *** Sent via Developersdex http://www.developersdex.com ***
> Don't just participate in USENET...get rewarded for it!
>

best location for the tran log?

I have a SQL 2000 database on RAID 5 storage. The data files and the
tran log are all on the same RAID 5 volume - the E drive. My only other
option for placement of the tran log is a partition of the system drive
- the D drive. The system drive is mirrored. The operating system
resides on the other logical partition of the system drive - the C
drive. I'm concerned that if I move the tran log to the D drive it
might conflict with operating system activity. This is a dedicated SQL
Server so I'm hoping that it won't be too much overhead on the system
drive to place the tran log there as well. Am I better off moving the
tran log to the system drive or keeping it on the E drive with the rest
of the data?
*** Sent via Developersdex http://www.codecomments.com ***
Don't just participate in USENET...get rewarded for it!
There shouldn't be a problem with sharing a mirrored drive with the OS for
the logs if SQL Server is the only app running on the server.
Andrew J. Kelly SQL MVP
"T Dubya" <timber_toes@.bigfoot.com> wrote in message
news:%23Qx5Um0LFHA.3340@.TK2MSFTNGP14.phx.gbl...
>I have a SQL 2000 database on RAID 5 storage. The data files and the
> tran log are all on the same RAID 5 volume - the E drive. My only other
> option for placement of the tran log is a partition of the system drive
> - the D drive. The system drive is mirrored. The operating system
> resides on the other logical partition of the system drive - the C
> drive. I'm concerned that if I move the tran log to the D drive it
> might conflict with operating system activity. This is a dedicated SQL
> Server so I'm hoping that it won't be too much overhead on the system
> drive to place the tran log there as well. Am I better off moving the
> tran log to the system drive or keeping it on the E drive with the rest
> of the data?
>
> *** Sent via Developersdex http://www.codecomments.com ***
> Don't just participate in USENET...get rewarded for it!
|||I'd recommend that you move the transaction log for a couple of reasons.
1. By having the transaction log and data files on the same drives you risk
a loss on failure, or an extension in time to restore.
2. RAID 5 is not as efficient as RAID 1 for performance of the write
intensive transaction log.
If the server is dedicated to SQL server then there should be very little
competition for the drive from other processes.
"T Dubya" wrote:

> I have a SQL 2000 database on RAID 5 storage. The data files and the
> tran log are all on the same RAID 5 volume - the E drive. My only other
> option for placement of the tran log is a partition of the system drive
> - the D drive. The system drive is mirrored. The operating system
> resides on the other logical partition of the system drive - the C
> drive. I'm concerned that if I move the tran log to the D drive it
> might conflict with operating system activity. This is a dedicated SQL
> Server so I'm hoping that it won't be too much overhead on the system
> drive to place the tran log there as well. Am I better off moving the
> tran log to the system drive or keeping it on the E drive with the rest
> of the data?
>
> *** Sent via Developersdex http://www.codecomments.com ***
> Don't just participate in USENET...get rewarded for it!
>
|||Thanks, Andrew, for the response. I'll give it a try. The main reason
I had reservations about it is because before I realized that the D
drive was in fact just a logical partition of the system drive, I tried
to dump the entire database to that location. The database dump ran
much slower that way than when I dumped it to the same raid 5 where the
data itself resides - about twice as slow, in fact. I just wanted to be
sure that putting the tran logs on the operating system drive wouldn't
cause a similar slow down. Since it is a dedicated SQL Server hopefully
it won't pose a problem. Thanks again.
*** Sent via Developersdex http://www.codecomments.com ***
Don't just participate in USENET...get rewarded for it!

Friday, February 24, 2012

Best and quickest approach to importing thousands of 2 meg XML files into XMl column?

Hi all,
I intend to use SQL Server 2005 (which I haven't used before), to store hundreds of thousands of XML files this autumn and I'm trying to figure out which approach is the best to do this. These files are very complex (multiple nested elements) and use multiple namespaces (imports).The files will be on the same server as the Database. Below is a list of some ideas I have.

1) Use Some DTS process to import the XML files if possible in SQL Server 2005?
2) Create a stored procedure that interates through a local directory, opens each XML file and inserts it's content into the database column using the bulkload method (any examples would be appreciated). Does this require the use of some scripting code such as VB script?
3) Run a server-side script such as PHP that loops through a local directory and stores the content of the XML file into a variable which is passed to a Stored procedure which stores the variable in to the database column? I have tried this and it seems very unstable and slow, roughly 2 minutes per file (the application server and database servers are on different boxes). I'm worried when I need to loop through thousands of files it will crash the servers!

Any suggestions would be appreciated

Muhi

Muhi,

SQL Server provides a command line tool called 'BCP' which is one of the good ways to bulk load XML data into an XML column. For instance you can store your XML instances in a file with a delimiter inbetween each instance (default delimiter is ,) and you can use the following command to bulk load your data:

bcp YourDB..YourTable in "XMLData.csv" -T -b 300 -N -h "TABLOCK"

This will insert the instances from 'XMLData.csv' into 'YourTable' in database 'YourDB'. BOL has a lot more information about the command line options in BCP. There are also more hints you can provide to BCP to improve performance. For instance if you have a clustered index on the table and if you know that your input data in the file is ordered on the index column (say id) then you can provide another hint to BCP -h "ORDER(id)" to avoid some additional sorts.

Thanks

Babu

|||

Hi Muhi,

try SQL Server 2005 Books Online

Examples of Bulk Importing and Exporting XML Documents

http://msdn2.microsoft.com/en-us/library/ms191184.aspx

Best regards

Jiri

Sunday, February 12, 2012

Beginer Problem: Is there any way to create the destination file at runtime?

I have to generate flat files from SQL Server tables. Its a backup package for few tables which will be run daily. I want the names of the destination flat files to be automatically generated at run time.

I created an expression for the Flat file destination connection and appended the date and the time to the flat file name. For today my file name is 'table02-12-2007.txt'. for tomorrow it should be 'table02-13-2007'. When I evaluate the expression it shows the correct value.

But when i run the package, I receive an error indicating - Destination file is not found. Error is because the destination file is not present at runtime and I was hoping that the package will be create the file on the fly.

Is there any way to create the destination file at runtime? Or Am I missing a step?

Any help will be greatly appreciated.

Thanks

Hi Don,

You probably need to set the DelayValidation property for the package to True. Do this by clicking on the Control Flow and hitting F4 for properties.

One issue here is the destination file for the Flat File Connection Manager does not exist - by design. You have to configure the package so it will stop looking for it - which is what occurs during validation.

Hope this helps,
Andy

|||

Thanks Andy for your quick response.

I changed the delay validation but that didnt work. There were spaces in destination path "C:\My Folder\My Files" which was not allowed.

Is there any way we can have spaces in the path or connection string for flat file?

Thanks

|||

Hi Don,

I believe you can enclose the file name in quotes.

Hope this helps,
Andy

Beginer Problem

I have to generate flat files from SQL Server tables. Its a backup package for few tables which will be run daily. I want the names of the destination flat files to be automatically generated at run time.

I created an expression for the Flat file destination connection and appended the date and the time to the flat file name. For today my file name is 'table02-12-2007.txt'. for tomorrow it should be 'table02-13-2007'. When I evaluate the expression it shows the correct value.

But when i run the package, I receive an error indicating - Destination file is not found. Error is because the destination file is not present at runtime and I was hoping that the package will be create the file on the fly.

Is there any way to create the destination file at runtime? Or Am I missing a step?

Any help will be greatly appreciated.

Thanks

Hi Don,

You probably need to set the DelayValidation property for the package to True. Do this by clicking on the Control Flow and hitting F4 for properties.

One issue here is the destination file for the Flat File Connection Manager does not exist - by design. You have to configure the package so it will stop looking for it - which is what occurs during validation.

Hope this helps,
Andy

|||

Thanks Andy for your quick response.

I changed the delay validation but that didnt work. There were spaces in destination path "C:\My Folder\My Files" which was not allowed.

Is there any way we can have spaces in the path or connection string for flat file?

Thanks

|||

Hi Don,

I believe you can enclose the file name in quotes.

Hope this helps,
Andy