In Microsoft SQL Server Analysis Services, you can define the both the physical data type for a column in a mining structure, and a logical content type for the column when used in a model,
The data type determines how algorithms process the data in those columns when you create mining models. Defining the data type of a column gives the algorithm information about the type of data in the columns, and how to process the data. Each data type in Analysis Services supports one or more content types for data mining.
The content type describes the behavior of the content that the column contains. For example, if the content in a column repeats in a specific interval, such as days of the week, you can specify the content type of that column as cyclical.
Some algorithms require specific data types and specific content types to be able to function correctly. For example, the Microsoft Naive Bayes algorithm cannot use continuous columns as input, and cannot predict continuous values.
Discrete means that the column contains a finite number of values with no continuum between values. For example, a gender column is a typical discrete attribute column, in that the data represents a specific number of categories.
The values in a discrete attribute column cannot imply ordering, even if the values are numeric. Moreover, even if the values used for the discrete column are numeric, fractional values cannot be calculated. Telephone area codes are a good example of discrete data that is numeric.
The Discrete content type is supported by all data mining data types.
Discretization is the process of putting values of a continuous set of data into buckets so that there are a limited number of possible values. You can discretize only numeric data.
Thus, the discretized content type indicates that the column contains values that represent groups, or buckets, of values that are derived from a continuous column. The buckets are treated as ordered and discrete values.
You can discretize your data manually, to ensure that you get the buckets you want, or you can use the discretization methods provided in SQL Server Analysis Services. Some algorithms perform discretization automatically.
The Discretized content type is supported by the following data types: Date, Double, Long, and Text.
Continuous means that the column contains values that represent numeric data on a scale that allows interim values. Unlike a discrete column, which represents finite, countable data, a continuous column represents scalable measurements, and it is possible for the data to contain an infinite number of fractional values. A column of temperatures is an example of a continuous attribute column.
When a column contains continuous numeric data, and you know how the data should be distributed, you can potentially improve the accuracy of the analysis by specifying the expected distribution of values. You specify the column distribution at the level of the mining structure. Therefore, the setting applies to all models that are based on the structure. The Continuous content type is supported by the following data types: Date, Double, and Long.