How to create SQL Server table with computed column of specific data type

The key is to use CAST as shown below

CREATE TABLE dbo.Products
(
ProductID int IDENTITY (1,1) NOT NULL
, QtyAvailable smallint
, UnitPrice money
, InventoryValue AS CAST(QtyAvailable * UnitPrice AS Numeric(4,2))
);

— Insert values into the table.
INSERT INTO dbo.Products (QtyAvailable, UnitPrice)
VALUES (25, 2.00), (10, 1.5);

— Display the rows in the table.
SELECT ProductID, QtyAvailable, UnitPrice, InventoryValue
FROM dbo.Products;

 

Advertisements

How to create SQL Server table with computed column

The syntax is Field Name “AS”.

Example:

CREATE TABLE dbo.Products
(
ProductID int IDENTITY (1,1) NOT NULL
, QtyAvailable smallint
, UnitPrice money
, InventoryValue AS QtyAvailable * UnitPrice
);

— Insert values into the table.
INSERT INTO dbo.Products (QtyAvailable, UnitPrice)
VALUES (25, 2.00), (10, 1.5);

— Display the rows in the table.
SELECT ProductID, QtyAvailable, UnitPrice, InventoryValue
FROM dbo.Products;

What is true positive and true negative – confusion matrix

This blog explains the concept beautifully.

Some paragraphs from it:

Confusion matrix from the four outcomes

A confusion matrix is formed from the four outcomes produced as a result of binary classification.

Four outcomes of classification

A binary classifier predicts all data instances of a test dataset as either positive or negative. This classification (or prediction) produces four outcomes – true positive, true negative, false positive and false negative.

  • True positive (TP): correct positive prediction
  • False positive (FP): incorrect positive prediction
  • True negative (TN): correct negative prediction
  • False negative (FN): incorrect negative prediction
Four outcomes of classification.
Classification of a test dataset produces four outcomes – true positive, false positive, true negative, and false negative.

Confusion matrix

A confusion matrix of binary classification is a two by two table formed by counting of the number of the four outcomes of a binary classifier. We usually denote them as TP, FP, TN, and FN instead of “the number of true positives”, and so on.

Predicted
Positive Negative
Observed Positive TP (# of TPs) FN (# of FNs)
Negative FP (# of FPs) TN (# of TNs)

Basic measures derived from the confusion matrix

Various measures can be derived from a confusion matrix.

First two basic measures from the confusion matrix

Error rate (ERR) and accuracy (ACC) are the most common and intuitive measures derived from the confusion matrix.

Error rate

Error rate (ERR) is calculated as the number of all incorrect predictions divided by the total number of the dataset. The best error rate is 0.0, whereas the worst is 1.0.

Error rate calculation.
Error rate is calculated as the total number of two incorrect predictions (FN + FP) divided by the total number of a dataset (P + N).
  • \mathrm{ERR = \displaystyle \frac{FP + FN}{TP + TN + FN + FP} = \frac{FP + FN}{P + N}}

Accuracy

Accuracy (ACC) is calculated as the number of all correct predictions divided by the total number of the dataset. The best accuracy is 1.0, whereas the worst is 0.0. It can also be calculated by 1 – ERR.

Accuracy calculation.
Accuracy is calculated as the total number of two correct predictions (TP + TN) divided by the total number of a dataset (P + N).
  • \mathrm{ACC = \displaystyle \frac{TP +TN}{TP + TN + FN + FP} = \frac{TP + TN}{P + N}}

Other basic measures from the confusion matrix

Error costs of positives and negatives are usually different. For instance, one wants to avoid false negatives more than false positives or vice versa. Other basic measures, such as sensitivity and specificity, are more informative than accuracy and error rate in such cases.

Sensitivity (Recall or True positive rate)

Sensitivity (SN) is calculated as the number of correct positive predictions divided by the total number of positives. It is also called recall (REC) or true positive rate (TPR). The best sensitivity is 1.0, whereas the worst is 0.0.

Sensitivity calculation.
Sensitivity is calculated as the number of correct positive predictions (TP) divided by the total number of positives (P).
  • \mathrm{SN = \displaystyle \frac{TP}{TP + FN} = \frac{TP}{P}}

Specificity (True negative rate)

Specificity (SP) is calculated as the number of correct negative predictions divided by the total number of negatives. It is also called true negative rate (TNR). The best specificity is 1.0, whereas the worst is 0.0.

Specificity calculation.
Specificity is calculated as the number of correct negative predictions (TN) divided by the total number of negatives (N).
  • \mathrm{SP = \displaystyle \frac{TN}{TN + FP} = \frac{TN}{N}}

Precision (Positive predictive value)

Precision (PREC) is calculated as the number of correct positive predictions divided by the total number of positive predictions. It is also called positive predictive value (PPV). The best precision is 1.0, whereas the worst is 0.0.

Precision calculation.
Precision is calculated as the number of correct positive predictions (TP) divided by the total number of positive predictions (TP + FP).
  • \mathrm{PREC = \displaystyle \frac{TP}{TP + FP}}

False positive rate

False positive rate (FPR) is calculated as the number of incorrect positive predictions divided by the total number of negatives. The best false positive rate is 0.0 whereas the worst is 1.0. It can also be calculated as 1 – specificity.

False positive rate calculation.
False positive rate is calculated as the number of incorrect positive predictions (FP) divided by the total number of negatives (N).
  • \mathrm{FPR = \displaystyle \frac{FP}{TN + FP} = 1 - SP}

IIS logs for SharePoint site

  1. Open Central Administration and click “Managed Web Application” and get the Web Application Name’
  2. Open IIS Manager and navigate to the site corresponding to your web application.
  3. Click the “Advanced Settings” in Action pane and locate the Site Id  for example is 9999999999.
  4. Click “Logging” icon and find out where the IIS logs are kept (in which folder).
  5. Open this folder in Windows Explorer and find your related folder in it according your IIS Site ID. In this scenario the folder is W3SVC9999999999.
  6. Open that folder and order by “Modified Date” as descending order.

Reference

Discrete vs Discretized

In Microsoft SQL Server Analysis Services, you can define the both the physical data type for a column in a mining structure, and a logical content type for the column when used in a model,

The data type determines how algorithms process the data in those columns when you create mining models. Defining the data type of a column gives the algorithm information about the type of data in the columns, and how to process the data. Each data type in Analysis Services supports one or more content types for data mining.

The content type describes the behavior of the content that the column contains. For example, if the content in a column repeats in a specific interval, such as days of the week, you can specify the content type of that column as cyclical.

Some algorithms require specific data types and specific content types to be able to function correctly. For example, the Microsoft Naive Bayes algorithm cannot use continuous columns as input, and cannot predict continuous values.

Discrete

Discrete means that the column contains a finite number of values with no continuum between values. For example, a gender column is a typical discrete attribute column, in that the data represents a specific number of categories.

The values in a discrete attribute column cannot imply ordering, even if the values are numeric. Moreover, even if the values used for the discrete column are numeric, fractional values cannot be calculated. Telephone area codes are a good example of discrete data that is numeric.

The Discrete content type is supported by all data mining data types.

Discretized

Discretization is the process of putting values of a continuous set of data into buckets so that there are a limited number of possible values. You can discretize only numeric data.

Thus, the discretized content type indicates that the column contains values that represent groups, or buckets, of values that are derived from a continuous column. The buckets are treated as ordered and discrete values.

You can discretize your data manually, to ensure that you get the buckets you want, or you can use the discretization methods provided in SQL Server Analysis Services. Some algorithms perform discretization automatically.

The Discretized content type is supported by the following data types: DateDoubleLong, and Text.

Continuous

Continuous means that the column contains values that represent numeric data on a scale that allows interim values. Unlike a discrete column, which represents finite, countable data, a continuous column represents scalable measurements, and it is possible for the data to contain an infinite number of fractional values. A column of temperatures is an example of a continuous attribute column.

When a column contains continuous numeric data, and you know how the data should be distributed, you can potentially improve the accuracy of the analysis by specifying the expected distribution of values. You specify the column distribution at the level of the mining structure. Therefore, the setting applies to all models that are based on the structure. The Continuous content type is supported by the following data types: DateDouble, and Long.

Returning Predictions with Probabilities on a trained SSAS model for a new dataset

The following sample query uses the decision tree model that was created in the Basic Data Mining Tutorial. The query passes in a new set of sample data, from the table dbo.ProspectiveBuyers in AdventureWorks2012 DW, to predict which of the customers in the new data set will purchase a bike.

The query uses the prediction function PredictHistogram (DMX), which returns a nested table that contains useful information about the probabilities discovered by the model. The final WHERE clause of the query filters the results to return only those customers who are predicted as likely to buy a bike, with a probability greater than 0%.

SELECT  [TM_DecisionTree].[Bike Buyer],  PredictHistogram([Bike Buyer]) as ResultsFrom  [TM_DecisionTree]PREDICTION JOIN  OPENQUERY([Adventure Works DW Multidimensional 2012],    ‘SELECT      [FirstName],      [LastName],      [MaritalStatus],      [Gender],      [YearlyIncome],      [TotalChildren],      [NumberChildrenAtHome],      [HouseOwnerFlag],      [NumberCarsOwned]    FROM      [dbo].[ProspectiveBuyer]    ‘) AS tON  [TM_DecisionTree].[First Name] = t.[FirstName] AND  [TM_DecisionTree].[Last Name] = t.[LastName] AND  [TM_DecisionTree].[Marital Status] = t.[MaritalStatus] AND  [TM_DecisionTree].[Gender] = t.[Gender] AND  [TM_DecisionTree].[Yearly Income] = t.[YearlyIncome] AND  [TM_DecisionTree].[Total Children] = t.[TotalChildren] AND  [TM_DecisionTree].[Number Children At Home] = t.[NumberChildrenAtHome] AND  [TM_DecisionTree].[House Owner Flag] = t.[HouseOwnerFlag] AND  [TM_DecisionTree].[Number Cars Owned] = t.[NumberCarsOwned]WHERE [Bike Buyer] = 1AND PredictProbability([Bike Buyer]) >’.05′

Microsoft Reference