Monday, June 9, 2008

Subtle Power of Data minig Algorithms

Microsoft gives 9 Different Data Mining algorithms with SQL server
How much Data is enough
- As Many as is a valid Model
- OverTraining of data possible by Bad parameters
- Too much Data can make a bland mode
What is a Valid Model
- Accuracy use Lift & Profit Charts, Scatter Plots, or Classification Matrix
- Reliability - Cross Validation
- Usefulness - Requires Human eye
How to decide which prediction is most accurate
- Lift chart - Bigger the number better the score
- Scatter plots - need a lot of data
- Clustering is not good for predicting, but is good for trending
- Naive Bayes - Oldest, usually least accurate
- Decision Trees - used for classification and association
- Regression - Look for Regressor- use to predict regression
- Neural NetWork - Finds Pattersn when all others fails...Difficult to interpret results
Fold - Partitions - Want models that are exactly the same
- Make subsets of data
- Chose model with least standard deviation
Control "Depth of Insight"
- Select based on most important columns or column values

No comments: