Discrete vs Discretized

In Microsoft SQL Server Analysis Services, you can define the both the physical data type for a column in a mining structure, and a logical content type for the column when used in a model,

The data type determines how algorithms process the data in those columns when you create mining models. Defining the data type of a column gives the algorithm information about the type of data in the columns, and how to process the data. Each data type in Analysis Services supports one or more content types for data mining.

The content type describes the behavior of the content that the column contains. For example, if the content in a column repeats in a specific interval, such as days of the week, you can specify the content type of that column as cyclical.

Some algorithms require specific data types and specific content types to be able to function correctly. For example, the Microsoft Naive Bayes algorithm cannot use continuous columns as input, and cannot predict continuous values.

Discrete

Discrete means that the column contains a finite number of values with no continuum between values. For example, a gender column is a typical discrete attribute column, in that the data represents a specific number of categories.

The values in a discrete attribute column cannot imply ordering, even if the values are numeric. Moreover, even if the values used for the discrete column are numeric, fractional values cannot be calculated. Telephone area codes are a good example of discrete data that is numeric.

The Discrete content type is supported by all data mining data types.

Discretized

Discretization is the process of putting values of a continuous set of data into buckets so that there are a limited number of possible values. You can discretize only numeric data.

Thus, the discretized content type indicates that the column contains values that represent groups, or buckets, of values that are derived from a continuous column. The buckets are treated as ordered and discrete values.

You can discretize your data manually, to ensure that you get the buckets you want, or you can use the discretization methods provided in SQL Server Analysis Services. Some algorithms perform discretization automatically.

The Discretized content type is supported by the following data types: DateDoubleLong, and Text.

Continuous

Continuous means that the column contains values that represent numeric data on a scale that allows interim values. Unlike a discrete column, which represents finite, countable data, a continuous column represents scalable measurements, and it is possible for the data to contain an infinite number of fractional values. A column of temperatures is an example of a continuous attribute column.

When a column contains continuous numeric data, and you know how the data should be distributed, you can potentially improve the accuracy of the analysis by specifying the expected distribution of values. You specify the column distribution at the level of the mining structure. Therefore, the setting applies to all models that are based on the structure. The Continuous content type is supported by the following data types: DateDouble, and Long.

Advertisements

Power BI – Group Workspace and Content Pack – Dashboards Reports and Datasets

There are three basic types of content or building blocks in Power BI:

Datasets, Reports and Dashboards

Datasets are reporting models with attributes and measures (calculations) that a user can EXPLORE and build different types of visualizations. Datasets can also be RENAMED, DELETED and REFRESHED

Reports are collections of visuals.

Dashboards are a collection of tiles that are pinned from Reports by mouse over the desired chart and clicking on the Push Pin Icon. Tiles can be rearranged and resized.

Where do we store Dashboards, Reports and Datasets?

You will have two options for storing your content:

  • My Workspace
  • Group Workspaces

So how do I organize everything in Power BI?

Any content that will need governance or will potentially reach a large number of users should be stored in a group workspace. Content that does not require much collaboration and governance can be stored in your personal workspace. Another thing to consider is that a personal workspace is limited to 10GB in the Pro version. So is each group workspace. However, every new group gets another 10GB of storage which makes a group concept even more alluring.

OK, now that I have organized everything, how do I share it?

If you have a dashboard stored in your personal workspace, you can use a Share Dashboard feature.

Only dashboards can be shared (reports and datasets cannot).

Share Dashboard option is not available for dashboards created in Group workspaces.

Another way to share content, is by creating an organizational content pack.

We can create an organizational content pack by clicking the gear icon on the top right hand corner of the page

We will then have an option to specify who has access to the content pack, its Title and Description (both are required) and also what dashboards, reports and datasets should be included in it

Then click Publish button at the bottom of the page to complete content pack creation process.

After the content pack has been published, users with required access will have an option to consume it by clicking on Get Data->Get

The Definitive Guide on Collaboration in Power BI Reference

SharePoint Foundation Web Application Service Stuck at Starting

Navigate to the node using remote desktop and open command prompt as administrator.

Navigate to the bin directory containing stsadm:

cd C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\15\BIN\

start the service using:

stsadm -o provisionservice -action start -servicetype spwebservice

You will get operation completed successfully message. Now issue iisrest /noforce.

 

Restarting Microsoft SharePoint Foundation Web Application service is generally preferred through STSADM command instead of Central Admin

Today we had issues on one of the web nodes and hence we restarted Microsoft SharePoint Foundation Web Application using Central Admin. The service kept showing “Starting”.

We redid our IIS bindings expicilty.

The using STSADM command stsadm -o provisionservice -action start -servicetype spwebservice executed on DOS prompt on \Program Files\Common Files\Microsoft Shared\Web Server Extensions\15\BIN directory.

Issued iisreset command.

Now the service shows started through Central Admin.

MSDN Reference

SSRS Report is taking forever – OPTION (OPTIMIZE FOR UNKNOWN) hint – Parameter sniffing

Today one of the SSRS reports kept on loading.
On analyzing the stored procedure associated with it, we decided to use OPTION (OPTIMIZE FOR UNKNOWN) hint.
Why?
This stored procedure was creating 5 temp tables. We checked and found out that all these 5 temp table creation was instantaneous.
It was using the 5 temp tables and 8 other tables and performing the query, which was taking forever.
So we added the OPTION (OPTIMIZE FOR UNKNOWN) hint in this query and Viola! the query took less than 1 second.

So what is this parameter sniffing?

The Brent OzarTurgay Sahtiyan and Gregory Larsen have explained nicely what is parameter sniffing.

SQL Server compiles the stored procedures using (sniffing) the parameters send the first time the procedure is compiled and put it in plan cache ( or procedure cache). After that every time the procedure executed again, SQL Server retrieves the execution plan from the cache and uses it, regardless different parameters are passed.

The potential problem with this approach is the parameters that were used when the plan was cached might not produce an optimal plan for all execution of the SP, especially those that have significantly different set of records returned depending on the parameters passed. For instance, if you passed parameters that required a large number of records to be read, the plan might decide a table or index scan would be the most efficient method to process the SP. Then if the same SP was called with a different set of parameters that would only return a specific record, it would used the cached execution plan and perform an table or index scan operation to resolve it’s query, even if a index seek operation would be more efficient in returning the results for the second execution of the SP.

Read here to know more about scans vs seeks.

References

Reference 1

Reference 2

MSDN Execution Plan Caching and Reuse –  SQL Server has a pool of memory that is used to store both execution plans and data buffers. The percentage of the pool allocated to either execution plans or data buffers fluctuates dynamically, depending on the state of the system. The part of the memory pool that is used to store execution plans is referred to as the procedure cache.

SQL Server execution plans have the following main components:

  • Query PlanThe bulk of the execution plan is a re-entrant, read-only data structure used by any number of users. This is referred to as the query plan. No user context is stored in the query plan. There are never more than one or two copies of the query plan in memory: one copy for all serial executions and another for all parallel executions. The parallel copy covers all parallel executions, regardless of their degree of parallelism.

 

  • Execution ContextEach user that is currently executing the query has a data structure that holds the data specific to their execution, such as parameter values. This data structure is referred to as the execution context. The execution context data structures are reused. If a user executes a query and one of the structures is not being used, it is reinitialized with the context for the new user.