Which edition of SQL Server 2005 did you install? If you installed the express edition, Analysis Services is not included in that edition. To access Data Mining features, you can install an eval edition of SQL Server 2005 from http://technet.microsoft.com/en-us/sqlserver/bb462637.aspx
The DM Addins needs Analysis Server to run and cannot run off a local file from your hard drive. You'll need Analysis Server installed to use the addins from Office.
|||I have the evaluation version, but don't know how to set it up to have my hard drive as the server. Do you know how I can get the SQL analysis services to run so that it will appear in services.msc?|||While installing the evaluation edition, did you select Analysis Services as as installation option? You can go to Add/Remove Programs, find Microsoft SQL Server 2005, select Modify and check the list of products installed. Analysis Services should be one of the options.|||Thanks a lot for you help. If I had 50 products and 60,000 different customers with different assortments of products per customers (basket), do you know how microsoft's associate function can help determine the association rules that have the highest probability of occurring? Thanks again.
|||Are you referring to the PredictAssociation function? Or just how to setup the mining model to do a market-basket analysis?|||Market-basket analysis would be great. Is there any way to get it to give the probabilities of Product C given that Product A and B are also sold, or will it just be able to do the probability of product B given that A is sold? Thanks
|||
It does both. The Association Rules algorithm uses all rules that "fire" for the input set. For example if you had rules such as
A->C
A,B->C
A,D->C
and you had A,B, and D in your input, all three rules would "fire". The resultant probability is based on the probability of the rules, and may not match any one in particular.
|||So would I just select the 50,000 clients and 50 products and then run the associate function? Do you know how long it would take for the associate or clustering feature to run with this amount of data on a local hard drive?|||
The amount of time to perform such an operation is not long. However, reading your original question - "the association rules with the highest probability of occurring", the answer is simply those rules whose LHS itemsets have the highest support. Take for example you have rules
A -> C
A,B -> D
A,B -> C
A, B, C -> E
Say you have 50,000 cases and you have support(A)=5,000, support (A,B) = 2,000, support(A,B,C) = 1000.
The probability given any input case of each rule firing would then be
10% A->C
4% A,B->D
4% A,B->C
2% A,B,C->E
No comments:
Post a Comment