Analysis Services SSAS Processing Architecture details and associated XMLA commands

Really nice article by T. K. Anand on SSAS processing

Below are some of the highlights in case, the msdn article stops working.

XMLA Commands for Processing

<Process>

The Process command tells the Analysis server to process an object. It has the following syntax:

<Process xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
  
    Adventure Works DW
    Adventure Works DW
    Fact Internet Sales
    Internet_Sales
  
  <Type>ProcessFull</Type>
</Process>

<Batch>

The Batch command is a convenient way to send multiple commands to the server in a single XMLA request. It has the following syntax:

<Batch Transaction="true" xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
  <Process>
    
      Adventure Works DW
      Customer
    
    <Type>ProcessFull</Type>
  </Process>
  <Process>
    
      Adventure Works DW
      Product
    
    <Type>ProcessFull</Type>
  </Process>
  <Process>
    
      Adventure Works DW
      Adventure Works DW
    
    <Type>ProcessFull</Type>
  </Process>
</Batch>

<Parallel>

Batch commands are always executed sequentially. However, the Parallel command within a Batch allows you to specify multiple Process commands that are executed in parallel. Parallel is a special command in that it can only appear inside a Batch; it cannot be sent by itself. It cannot contain any command other than Process. It has the following syntax:

Following is an example of a Batch that processes two dimensions in parallel and then two partitions in parallel.

<Batch xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
  <Parallel>
    <Process>
      
        Adventure Works DW
        Customer
      
      <Type>ProcessFull</Type>
    </Process>
    <Process>
      
        Adventure Works DW
        Product
      
      <Type>ProcessFull</Type>
    </Process>
  </Parallel>
  <Parallel>
    <Process>
      
        Adventure Works DW
        Adventure Works DW
        Fact Internet Sales
        Internet_Sales_2001
      
      <Type>ProcessFull</Type>
    </Process>
    <Process>
      
        Adventure Works DW
        Adventure Works DW
        Fact Internet Sales
        Internet_Sales_2002
      
      <Type>ProcessFull</Type>
    </Process>
  </Parallel>
</Batch>

Processing Options

The Type element in the Process command specifies the type of processing to be done. This is also referred to as the “processing option”. Choosing the right processing option is very important since it typically affects large volumes of data. This section describes the different processing options for each type of object.

Following are the types of Analysis Services objects that can be processed:

  • Database
  • Dimension
  • Cube
  • Measure group
  • Partition
  • Mining structure
  • Mining model

Before we look at the processing options, we first need to understand what exactly processing is doing. Processing generally sends queries to the relational data source and uses the results to populate the Analysis Services object. Following are the storage contents that are built by processing for each object type. A complete discussion of the Analysis server storage format is beyond the scope of this document.

  • Database
    • None (a database is just a container of dimensions, cubes, and mining structures; it has no data storage of its own)
  • Dimension
    • Attribute stores
    • Hierarchy stores
    • Bitmap indexes
  • Cube
    • MDX Script cache
  • Measure group
    • None (a measure group is just a container of partitions; it has no data storage of its own)
  • Partition
    • Fact data
    • Aggregation data
    • Bitmap indexes
  • Mining structure
    • Training data
  • Mining model
    • Algorithm content

When an object is processed, the processing option specifies what part of the storage content of the object is to be built and how. The following table contains the list of all processing options and which objects they apply to.

Processing Option Database Dimension Cube Measure Group Partition Mining Structure Mining Model
ProcessFull X X X X X X X
ProcessClear X X X X X X X
ProcessDefault X X X X X X X
ProcessData X X X X
ProcessIndexes X X X X
ProcessUpdate X
ProcessAdd X X
ProcessStructure X X
ProcessScriptCache X
ProcessClearStructureOnly X

ProcessFull

ProcessFull applies to all objects. It discards the storage contents of the object and and rebuilds them. ProcessFull is recursively applied to all descendants of the object as well.

ProcessClear

ProcessClear applies to all objects. It discards the storage contents of the object. ProcessClear is recursively applied to all descendants of the object as well.

ProcessDefault

ProcessDefault applies to all objects. It does the bare minimum required to bring the object to a fully processed state. In other words, it builds only the storage contents that are not currently built. For example, if a partition has fact and aggregation data, then ProcessDefault will only build the bitmap indexes.

The only exception to the above rule is bindings. When you change the bindings of an object (e.g., KeyColumns of a dimension attribute), the object retains its data, but remembers that the bindings have changed. ProcessDefault on the object will discard all the storage contents and rebuild them.

The server handles ProcessDefault by analyzing the object and dynamically converting it to another processing option such as ProcessFull, ProcessIndexes, etc. ProcessDefault is recursively applied to all descendants of the object as well.

ProcessData

ProcessData applies only to the OLAP objects, i.e. dimension, cube, measure group and partition. It discards the storage contents of the object and rebuilds only the “data”. For dimensions, it builds only the attribute and hierarchy stores. For partitions, it builds only the fact data.

Essentially ProcessData builds the bare minimum required for the object to be available for queries. Indexes are considered optional and affect only the query performance. ProcessData is recursively applied to all descendants of the object as well.

ProcessIndexes

ProcessIndexes applies only to the OLAP objects, i.e., dimension, cube, measure group, and partition. It requires that the object must already have its “data” built; otherwise, it raises an error. ProcessIndexes preserves the data and rebuilds the “indexes”. For dimensions, it builds the bitmap indexes. For partitions, it builds the aggregation data and bitmap indexes. ProcessIndexes is recursively applied to all descendants of the object as well.

ProcessUpdate

ProcessUpdate applies only to dimensions. It is the equivalent of incremental dimension processing in Analysis Services 2000. It sends SQL queries to read the entire dimension table and applies the changes—member updates, additions, deletions.

Since ProcessUpdate reads the entire dimension table, it begs the question, “How is it different from ProcessFull?” The difference is that ProcessUpdate does not discard the dimension storage contents. It applies the changes in a “smart” manner that preserves the fact data in dependent partitions. ProcessFull, on the other hand, does an implicit ProcessClear on all dependent partitions. ProcessUpdate is inherently slower than ProcessFull since it is doing additional work to apply the changes.

Depending on the nature of the changes in the dimension table, ProcessUpdate can affect dependent partitions. If only new members were added, then the partitions are not affected. But if members were deleted or if member relationships changed (e.g., a Customer moved from Redmond to Seattle), then some of the aggregation data and bitmap indexes on the partitions are dropped. The cube is still available for queries, albeit with lower performance.

ProcessAdd

ProcessAdd applies only to dimensions and partitions.

ProcessAdd is a new processing option for dimensions that did not exist in Analysis Services 2000. It essentially optimizes ProcessUpdate for the scenario where only new members are added. ProcessAdd never deletes or updates existing members. It only adds new members. The user can restrict the dimension table so that ProcessAdd reads only the new rows.

ProcessAdd for partitions is the equivalent of incremental partition processing in Analysis Services 2000. The user typically specifies an alternate fact table or a filter condition pointing to the new rows. ProcessAdd internally creates a temporary partition, processes it with the specified fact data, and merges it into the target partition.

 

ProcessStructure

ProcessStructure applies only to cubes and mining structures.

ProcessStructure for cubes is the equivalent of the Analysis Services 2000 processing option, processBuildStructure, in DSO. It discards the storage contents of the cube and its partitions. It implicitly does a ProcessDefault on all dimensions of the cube and marks the cube as processed. At this point, the cube is available to queries but it will not return any fact data. This is supported mostly for backward-compatibility reasons. It was useful in Analysis Services 2000 for parallel processing utilities. Once the “structure” of a cube is processed, its partitions can be processed in parallel by multiple client sessions without running into locking conflicts.

ProcessStructure on a mining structure discards its storage contents (training data) and rebuilds them. It does not affect the contents of the mining models under the mining structure.

ProcessScriptCache

ProcessScriptCache applies only to cubes. The MDX script in a cube can contain CACHE statements. ProcessScriptCache evaluates the script and persists the results for the CACHE statements.

ProcessClearStructureOnly

ProcessClearStructureOnly applies only mining structures. It clears the storage contents (training data) of the mining structure while preserving the contents of its mining models.

Dependencies and Impact Analysis

Processing an object can produce side effects by affecting the storage contents of other objects. To understand this better, we need to first list the dependencies between objects.

  • Database
    • None
  • Dimension
    • A data-mining dimension depends on its source-mining model.
  • Cube
    • A cube depends on the dimensions that it uses.
  • Measure group
    • A measure group depends on the dimensions that it uses (typically a subset of the dimensions used by the parent cube).
    • A linked measure group depends on its source measure group.
  • Partition
    • A partition depends on the same dimensions as its parent measure group.
  • Mining structure
    • An OLAP mining structure depends on its source cube.
  • Mining model
    • An OLAP mining model depends on its source cube.

The following table lists the side effects produced by each processing option for each object type. Since ProcessDefault is dynamically converted to another processing option, no side effects are listed for it.

Object Type Processing Option Objects that it depends on Object that depend on it
Database ProcessFull
ProcessClear
Dimension ProcessFull ProcessDefault ProcessClear
ProcessClear ProcessClear
ProcessData ProcessDefault ProcessClear
ProcessIndexes
ProcessUpdate Clear affected indexes and aggregations on partitions
ProcessAdd
Cube ProcessFull ProcessDefault ProcessClearStructureOnly
ProcessClear ProcessClearStructureOnly
ProcessData ProcessDefault ProcessClearStructureOnly
ProcessIndexes
ProcessStructure ProcessDefault ProcessClearStructureOnly
ProcessScriptCache
Measure Group ProcessFull ProcessDefault ProcessClearStructureOnly / ProcessFull *
ProcessClear ProcessClearStructureOnly / ProcessClear
ProcessData ProcessDefault ProcessClearStructureOnly / ProcessFull
ProcessIndexes
Partition ProcessFull ProcessDefault ProcessClearStructureOnly / ProcessFull
ProcessClear ProcessClearStructureOnly / ProcessFull
ProcessData ProcessDefault ProcessClearStructureOnly / ProcessFull
ProcessIndexes
ProcessAdd ProcessClearStructureOnly / ProcessFull
Mining Structure ProcessFull ProcessDefault
ProcessClear
ProcessStructure ProcessDefault
ProcessClearStructureOnly
Mining Model ProcessFull ProcessDefault ProcessClear
ProcessClear ProcessClear

* ProcessClearStructureOnly is applied to OLAP mining structures. ProcessFull or ProcessClear is applied to linked measure groups.

From the above table, we see that when an object is processed, it can have an adverse effect (loss of data, aggregations, indexes, etc.) on other objects that depend on it. The ProcessAffectedObjects attribute on the Batch command tells the server to reprocess these objects using the ProcessDefault option. This is a convenient way to process an object and ensure that other objects are not adversely affected.

Out of Line Bindings

The Bindings, DataSource and DataSourceView elements in the <Process> and <Batch> commands are collectively referred to as “out of line bindings”. This is an advanced processing feature that provides the ability to specify or override the bindings of an object at processing time. Note that the object does not persist or remember these bindings after the processing has completed. The bindings need to be re-specified in every processing command as appropriate.

Following are some scenarios where out of line bindings are useful.

ProcessAdd

ProcessAdd on a partition is the operation in which new fact table rows are added to the partition. It is important to identify the new rows by specifying an alternate fact table or a filter condition. This is done using out of line bindings. The following command (scripted out from the Process dialog) illustrates this.

<Process xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
  
    Adventure Works DW
    Adventure Works DW
    Fact Internet Sales
    Internet_Sales_2004
  
  <Type>ProcessAdd</Type>
  <Bindings>
    <Binding>
      <DatabaseID>Adventure Works DW</DatabaseID>
      <CubeID>Adventure Works DW</CubeID>
      <MeasureGroupID>Fact Internet Sales</MeasureGroupID>
      <PartitionID>Internet_Sales_2004</PartitionID>
      <Source xsi:type="QueryBinding">
        <DataSourceID>Adventure Works DW</DataSourceID>
        <QueryDefinition>SELECT * FROM [dbo].[FactInternetSales] WHERE OrderDateKey &gt; '1280'</QueryDefinition>
      </Source>
    </Binding>
  </Bindings>
</Process>

ProcessAdd on dimensions is similar in concept but more complicated in practice. Unlike partitions, dimensions can be built out of multiple (snowflaked) tables. Hence the Dimension object does not have a single <Source> property that can be overridden in the out of line bindings. The only way to do this is to specify an out of line DataSourceView in which the dimension table is filtered appropriately. The following command illustrates this. Details of the DataSourceView are omitted in the interest of clarity.

<Process xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
  <Type>ProcessAdd</Type>
  
    Adventure Works DW
    Adventure Works DW
    Fact Internet Sales 1
    Internet_Sales_2004
  
  <DataSource xsi:type="PushedDataSource">
    <root Parameter="InputRowset"/>
    <EndOfData Parameter="EndOfInputRowset"/>
  </DataSource>
  <Bindings>
    <Binding>
      <DatabaseID>Adventure Works DW</DatabaseID>
      <CubeID>Adventure Works DW</CubeID>
      <MeasureGroupID>Fact Internet Sales</MeasureGroupID>
      <CubeDimensionID>Dim Product</CubeDimensionID>
      <AttributeID>Product Name</AttributeID>
      <KeyColumns>
        <KeyColumn>
          <Source xsi:type="ColumnBinding">
            <TableID/>
            <ColumnID>ProductKey</ColumnID>
          </Source>
        </KeyColumn>
      </KeyColumns>
    </Binding>
    ...
    <Binding>
      <DatabaseID>Adventure Works DW</DatabaseID>
      <CubeID>Adventure Works DW</CubeID>
      <MeasureGroupID>Fact Internet Sales</MeasureGroupID>
      <MeasureID>Sales Amount</MeasureID>
      <Source xsi:type="ColumnBinding">
        <TableID/>
        <ColumnID>SalesAmount</ColumnID>
      </Source>
    </Binding>
    ...
  </Bindings>
</Process>

Push-Mode Processing

Processing is usually done by Analysis Services in pull-mode, i.e., the Analysis server pulls the data out of the relational data source and populates the cube. Push-mode processing is the opposite where the client pushes the data into the Analysis server. Push-mode processing is done by sending a rowset as a parameter to the <Process> command. The rowset itself can be chunked across multiple <Process> requests. Out of line bindings are required for push-mode processing in order to map the columns of the rowset parameter to the object being processed.

The Integration Services Partition Processing Data Flow Item uses chunked push-mode processing to pump data into a partition. The following command (extracted from Profiler during an Integration Services package execution) illustrates push-mode processing. The PushedDataSource references two parameters, InputRowset and EndOfInputRowset. The InputRowset parameter contains the rowset chunk in the XMLA rowset format. The EndOfInputRowset parameter is a boolean that indicates if this is the last chunk.

<Process xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
  <Type>ProcessAdd</Type>
  
    Adventure Works DW
    Adventure Works DW
    Fact Internet Sales 1
    Internet_Sales_2004
  
  <DataSource xsi:type="PushedDataSource">
    <root Parameter="InputRowset"/>
    <EndOfData Parameter="EndOfInputRowset"/>
  </DataSource>
  <Bindings>
    <Binding>
      <DatabaseID>Adventure Works DW</DatabaseID>
      <CubeID>Adventure Works DW</CubeID>
      <MeasureGroupID>Fact Internet Sales</MeasureGroupID>
      <CubeDimensionID>Dim Product</CubeDimensionID>
      <AttributeID>Product Name</AttributeID>
      <KeyColumns>
        <KeyColumn>
          <Source xsi:type="ColumnBinding">
            <TableID/>
            <ColumnID>ProductKey</ColumnID>
          </Source>
        </KeyColumn>
      </KeyColumns>
    </Binding>
    ...
    <Binding>
      <DatabaseID>Adventure Works DW</DatabaseID>
      <CubeID>Adventure Works DW</CubeID>
      <MeasureGroupID>Fact Internet Sales</MeasureGroupID>
      <MeasureID>Sales Amount</MeasureID>
      <Source xsi:type="ColumnBinding">
        <TableID/>
        <ColumnID>SalesAmount</ColumnID>
      </Source>
    </Binding>
    ...
  </Bindings>
</Process>

Integration Services

SQL Server Integration Services is a high performance ETL (Extraction, Transformation, and Load) component that can be used for a variety of data transformation and movement tasks. Integration Services can be used for automating and scheduling complex data warehouse maintenance operations.

Processing is a typical maintenance operation in a data warehouse and Integration Services offers two mechanisms for processing Analysis Services objects.

Analysis Services Processing Task

The Analysis Services Processing Task connects to the specified Analysis server and processes one or more objects with the specified processing options. It can be included in an Integration Services package workflow along with other data warehouse maintenance tasks such as data cleansing, transformation, etc.

ms345142.sql2k5_asprocarch_4(en-US,SQL.90).gif

Analysis Services Data Flow Items

Integration Services offers three Analysis Services Data Flow Items:

  • Partition Processing
  • Dimension Processing
  • Data Mining Model Training

These items can be included in an Integration Services Data Flow task as a destination. The package designer chooses the destination object and maps the columns of the data flow to it. When the task is executed, the data flows through the pipeline and into the object on the Analysis server. This is accomplished using the chunked push-mode processing mechanism in Analysis Services.

The following pictures show the editor for the partition-processing item. The dimension processing and data-mining model training items are similar.

ms345142.sql2k5_asprocarch_5(en-US,SQL.90).gif

ms345142.sql2k5_asprocarch_6(en-US,SQL.90).gif

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s