SSAS Model viewer – mining attributes/features results queries

After you have designed and processed/trained a mining model using data from the underlying mining structure, the mining model is complete and contains mining model content.

You can use this content to make predictions or analyze your data.

To retrieve mining model content, you can create a query against the data mining model.

The easiest way to create a content query is to execute the following DMX statement in SQL Server Management Studio:

SELECT * FROM [<mining model name>].CONTENT

Some of the algorithm specific useful SQL queries are as follows:

Decision_Trees:

Declare @GetNODE_UNIQUE_NAME varchar(max);

Declare @GetNODE_UNIQUE_NAME varchar(max);

Declare @NODE_UNIQUE_NAME varchar(max);

CREATE TABLE #TableNODE_UNIQUE_NAME (NODE_UNIQUE_NAME varchar(max));

set @GetNODE_UNIQUE_NAME = N’SELECT * FROM OPENQUERY(SSASlinkedservername,”SELECT NODE_UNIQUE_NAME FROM [‘ + @ModelName + N’].CONTENT WHERE NODE_TYPE = 2”)’

–Print (@GetNODE_UNIQUE_NAME);

insert into #TableNODE_UNIQUE_NAME

EXEC (@GetNODE_UNIQUE_NAME);
select @NODE_UNIQUE_NAME=NODE_UNIQUE_NAME from TableNODE_UNIQUE_NAME –Print (@NODE_UNIQUE_NAME);
drop table #TableNODE_UNIQUE_NAME

Declare @GetModelResults varchar(max);

Declare @GetModelResults varchar(max);

Declare @GetModelResultsForRootNode varchar(max);

set @GetModelResults = N’SELECT *,getdate(),’+@ModelConfigurationKey+N’ FROM OPENQUERY(SSASlinkedservername,”SELECT FLATTENED [MODEL_NAME], [NODE_CAPTION],[NODE_TYPE], (SELECT ATTRIBUTE_NAME, ATTRIBUTE_VALUE, [SUPPORT], [PROBABILITY] FROM NODE_DISTRIBUTION WHERE VALUETYPE <> 1 ) AS t  from [‘ + @Model + N’].content WHERE IsDescendant(””””,””’+@NODE_UNIQUE_NAME+N””’)”)’

set @GetModelResultsForRootNode = N’SELECT *,getdate(),’+@ModelConfigurationKey+N’ FROM OPENQUERY(SSASlinkedservername,”SELECT FLATTENED [MODEL_NAME], [NODE_CAPTION],[NODE_TYPE], (SELECT ATTRIBUTE_NAME, ATTRIBUTE_VALUE, [SUPPORT], [PROBABILITY] FROM NODE_DISTRIBUTION WHERE VALUETYPE <> 1 ) AS t  FROM [‘ + @Model + N’].CONTENT WHERE NODE_TYPE = 2”)’

Advertisements

SSAS model viewer – Feature or Attribute Characteristics vs Discrimination for Naive Bayes Model

When you open a Naïve Bayes model using Browse, the model is displayed in an interactive viewer with four different panes. Use the viewer to explore correlations, and get information about the model and the underlying data.

The purpose of the viewer is to help you explore the interaction between input and output attributes (inputs and dependent variables) that were discovered by the Microsoft Naive Bayes model.

Dependency Network

The first window you’ll use is the Dependency Network. It shows you at a glance which inputs are closely correlated to the selected outcome.

dependency network in Naive Bayes viewer

Explore the dependency network

  1. First, click the target outcome, Yearly Income, which is represented as a node in the graph.The highlighted nodes surrounding the target variable are those that are statistically correlated with this outcome. Use the legend at the bottom of the viewer to understand the nature of the relationship.
  2. Click the slider at the left of the viewer and drag it downward.This control filters the independent variables, based on the strengths of the dependencies. When you drag the slider down, only the strongest links remain in the graph.
  3. After you have filtered the graph, click the button, Copy Graph View. Then select a worksheet in Excel, and press Ctrl+V.This option copies the view that you have selected, including filters and highlighting.

Attribute Profiles

The Attribute Profiles windows gives you a visual indication of how all other variables are related to the individual outcomes.

Explore the profiles

  1. To hide some values so that you can more easily compare outcomes, click the column heading and drag it under another column.attribute profiles in Naive Bayes viewer
  2. Click in any cell to view the distribution of values in the Mining Legend.  Because the attributes associated with different outcomes are displayed visually, it is easy to spot interesting correlations, such as how incomes are distributed by region.

Attribute Characteristics

The Attribute Characteristics view is useful for in-depth examination of a particular outcome variable and the contributing factors.

attribute characteristics in Naive Bayes viewer

Explore the attribute characteristics

  1. Click Value and select an item from the Value.As you select a target outcome, the graph updates to show the factors that are most strongly associated with the outcome, sorted by importance.Note that if you create a model using the Analyze Key Influencers (Table Analysis Tools for Excel) option, you can create models that have more than one predictable attribute. However, all other wizards in the Data Mining add-ins limit you to one predictable attribute.
  2. Click Copy to Excel to create a table, in a new worksheet, listing the scores for all attributes that are related to the selected target outcome.

Attribute Discrimination

The Attribute Discrimination view helps compare two outcomes, or one outcome vs. all other outcomes.

attribute discrimination in Naive Bayes viewer

Explore attribute discrimination

  1. Use the controls, Value 1 and Value 2, to select the outcomes that you want to compare.For example, in this model there were some interesting attributes in the low income group, so we chose the lowest income group in the first dropdown list, and chose All other states in the second dropdown list.The attributes are sorted by order of importance (calculated based on the training data). Therefore, occupation is the factor most closely correlated with income (for this target group, at least),

    To see the exact figures, click the colored bar and view the Mining Legend.

MSDN Reference

SSAS Mining Structure Data Types

Microsoft Reference

When we create a mining model or a mining structure in Microsoft SQL Server Analysis Services, we must define the data types for each of the columns in the mining structure. The data type tells the analysis engine whether the data in the data source is numerical or text, and how the data should be processed. For example, if our source data contains numerical data, we can specify whether the numbers be treated as integers or by using decimal places.

Analysis Services supports the following data types for mining structure columns:

Data Type Supported Content Types
Text Cyclical, Discrete, Discretized, Key Sequence, Ordered, Sequence
Long Continuous, Cyclical, Discrete, Discretized, Key, Key Sequence, Key Time, Ordered, Sequence, Time

Classified

Boolean Cyclical, Discrete, Ordered
Double Continuous, Cyclical, Discrete, Discretized, Key, Key Sequence, Key Time, Ordered, Sequence, Time

Classified

Date Continuous, Cyclical, Discrete, Discretized, Key, Key Sequence, Key Time, Ordered

You can find details for discrete vs discretized here.

Microsoft SSAS logistic regression mining model node types

For the DMX queries here is the reference of the node types for Microsoft SSAS logistic regression mining model:

NODE_TYPE
A logistic regression model outputs the following node types:

Node Type ID Description
1 Model.
17 Organizer node for the subnetwork.
18 Organizer node for the input layer.
19 Organizer node for the hidden layer. The hidden layer is empty.
20 Organizer node for the output layer.
21 Input attribute node.
23 Output attribute node.
24 Marginal statistics node.

So the DMX query:

SELECT * FROM [Model_Name].Content
WHERE NODE_TYPE = 23

Mining Model Content for Logistic Regression Models

microsoft decision trees algorithm parameters COMPLEXITY_PENALTY SCORE_METHOD SPLIT_METHOD

The Microsoft Decision Trees algorithm supports parameters that affect the performance and accuracy of the resulting mining model. You can also set modeling flags on the mining model columns or mining structure columns to control the way that data is processed.

Setting Algorithm Parameters

The following table describes the parameters that you can use with the Microsoft Decision Trees algorithm.

COMPLEXITY_PENALTY
Controls the growth of the decision tree. A low value increases the number of splits, and a high value decreases the number of splits. The default value is based on the number of attributes for a particular model, as described in the following list:

  • For 1 through 9 attributes, the default is 0.5.
  • For 10 through 99 attributes, the default is 0.9.
  • For 100 or more attributes, the default is 0.99.
FORCE_REGRESSOR
Forces the algorithm to use the specified columns as regressors, regardless of the importance of the columns as calculated by the algorithm. This parameter is only used for decision trees that are predicting a continuous attribute.

SCORE_METHOD

Determines the method that is used to calculate the split score. The following options are available:

ID Name
1 Entropy
3 Bayesian with K2 Prior
4 Bayesian Dirichlet Equivalent (BDE) with uniform prior

(default)

The default is 4, or BDE.

SPLIT_METHOD

Determines the method that is used to split the node. The following options are available:

ID Name
1 Binary: Indicates that regardless of the actual number of values for the attribute, the tree should be split into two branches.
2 Complete: Indicates that the tree can create as many splits as there are attribute values.
3 Both: Specifies that Analysis Services can determine whether a binary or complete split should be used to produce the best results.

The default is 3.

MAXIMUM_INPUT_ATTRIBUTES

Defines the number of input attributes that the algorithm can handle before it invokes feature selection.

The default is 255.

Set this value to 0 to turn off feature selection.

[Available only in some editions of SQL Server]

MAXIMUM_OUTPUT_ATTRIBUTESDefines the number of output attributes that the algorithm can handle before it invokes feature selection.

The default is 255.

Set this value to 0 to turn off feature selection.

[Available only in some editions of SQL Server]

MINIMUM_SUPPORTDetermines the minimum number of leaf cases that is required to generate a split in the decision tree.

The default is 10.

You may need to increase this value if the dataset is very large, to avoid overtraining.

 

Microsoft Decision Trees Algorithm Technical Reference

SSAS DMX Query for Dependency Network pane of the Microsoft Naive Bayes Viewer

SELECT NODE_CAPTION, MSOLAP_NODE_SCORE
FROM TM_NaiveBayes.CONTENT
WHERE NODE_TYPE = 10
ORDER BY MSOLAP_NODE_SCORE DESC

The node caption is used to identify the node, rather than the node name, because the caption shows both the attribute name and attribute value.

The MSOLAP_NODE_SCORE is a special value provided only for the input attribute nodes, and indicates the relative importance of this attribute in the model. You can see much the same information in the Dependency Network pane of the viewer; however, the viewer does not provide scores.

The NODE_TYPE = 10 represents the input attribute, and NODE_TYPE = 11 for each value of the attribute.

Reference: Naive Bayes Model Query Examples

Choosing Microsoft Prediction Model or Algorithm

From Micosoft.. “An algorithm in data mining (or machine learning) is a set of heuristics and calculations that creates a model from data. To create a model, the algorithm first analyzes the data you provide, looking for specific types of patterns or trends. The algorithm uses the results of this analysis over many iterations to find the optimal parameters for creating the mining model. These parameters are then applied across the entire data set to extract actionable patterns and detailed statistics.”

Reference : SSAS has 9 data mining models or algorithms

Choosing an Algorithm by Type

SQL Server Data Mining includes the following algorithm types:

  • Classification algorithms predict one or more discrete variables, based on the other attributes in the dataset.
  • Regression algorithms predict one or more continuous numeric variables, such as profit or loss, based on other attributes in the dataset.
  • Segmentation algorithms divide data into groups, or clusters, of items that have similar properties.
  • Association algorithms find correlations between different attributes in a dataset. The most common application of this kind of algorithm is for creating association rules, which can be used in a market basket analysis.
  • Sequence analysis algorithms summarize frequent sequences or episodes in data, such as a series of clicks in a web site, or a series of log events preceding machine maintenance.

However, there is no reason that you should be limited to one algorithm in your solutions. Experienced analysts will sometimes use one algorithm to determine the most effective inputs (that is, variables), and then apply a different algorithm to predict a specific outcome based on that data. SQL Server Data Mining lets you build multiple models on a single mining structure, so within a single data mining solution you could use a clustering algorithm, a decision trees model, and a Naïve Bayes model to get different views on your data. You might also use multiple algorithms within a single solution to perform separate tasks: for example, you could use regression to obtain financial forecasts, and use a neural network algorithm to perform an analysis of factors that influence forecasts. You can use the Microsoft Decision Trees algorithm not only for prediction, but also as a way to reduce the number of columns in a dataset, because the decision tree can identify columns that do not affect the final mining model.

Choosing an Algorithm by Task

To help you select an algorithm for use with a specific task, the following table provides suggestions for the types of tasks for which each algorithm is traditionally used.

Examples of tasks Microsoft algorithms to use
Predicting a discrete attribute:

Flag the customers in a prospective buyers list as good or poor prospects.

Calculate the probability that a server will fail within the next 6 months.

Categorize patient outcomes and explore related factors.

Microsoft Decision Trees Algorithm

Microsoft Naive Bayes Algorithm

Microsoft Clustering Algorithm

Microsoft Neural Network Algorithm

Predicting a continuous attribute:

Forecast next year’s sales.

Predict site visitors given past historical and seasonal trends.

Generate a risk score given demographics.

Microsoft Decision Trees Algorithm

Microsoft Time Series Algorithm

Microsoft Linear Regression Algorithm

Predicting a sequence:

Perform clickstream analysis of a company’s Web site.

Analyze the factors leading to server failure.

Capture and analyze sequences of activities during outpatient visits, to formulate best practices around common activities.

Microsoft Sequence Clustering Algorithm
Finding groups of common items in transactions:

Use market basket analysis to determine product placement.

Suggest additional products to a customer for purchase.

Analyze survey data from visitors to an event, to find which activities or booths were correlated, to plan future activities.

Microsoft Association Algorithm

Microsoft Decision Trees Algorithm

Finding groups of similar items:

Create patient risk profiles groups based on attributes such as demographics and behaviors.

Analyze users by browsing and buying patterns.

Identify servers that have similar usage characteristics.

Microsoft Clustering Algorithm

Microsoft Sequence Clustering Algorithm

Reference Microsoft