User:Stefmol: Difference between revisions

Content deleted Content added

Inline

Revision as of 23:31, 21 November 2010

Aggregates (data warehouse)

Aggregates in a data warehouse is used in dimensional models and is one of the methods that has the most dramatic effect on query time on large sets of datas. At the simplest form an aggregate is a simple summery table that can be derived by performing a Group by SQL query. The most common use of aggregates is to take a dimension and change the granuality of this dimension. When the granularity is of the dimension is changed, the fact table has to be partially summerized to fit the new grain of the new dimention, thus creating new dimensional and fact tables, fitting this new level of grain. Aggregates are sometimes referred to at pre-calculated summary data, since aggregations usually is precomputed, partially summariced data, that is stored in new aggregated tables. So the reason why aggregates can make such an dramatic increase in the performance of the data warehouse is by reducing the number of rows to be accesed when responding to a query.

Kimball is quoted[1] saying:

The single most dramatic way to affect performance in a large data warehouse is to provide a proper set of aggregate (summary) records that coexist with the primary base records. Aggregates can have a very significant effect on performance, in some cases speeding queries by a factor of one hundred or even one thousand. No other means exist to harvest such spectacular gains

Having aggregates and atomic data increses the complexity of the dimensional model. This complexity should be tranparent to the users of the data warehouse, thus when a request is made, the data warehouse should return data from the table with the correct grain. So when requests to the data warehouse are made, aggregate navigator functionality should be implemented, to help determine the correct table with the correct grain. The number of possible aggregations is determined by every possible combination of dimension granularities. Since it would produce a lot of overhead to build all possible aggregations, if not be impossible, it is a good idea to choose a subset of tables on which to make aggregations. The best way to choose this subset and decide which aggregations to build is to monitor queries and design aggregations to match query patterns.

Aggregates in data warehouses helps achieve an extraordinary improvement of the speed of data.

Aggregate navigator Aggregate strategy how to and what to Selecting aggregates

This database-related article is a stub. You can help Wikipedia by expanding it.

@@ Line 1: / Line 1: @@
 ==Aggregates (data warehouse)==
+Aggregates in a data warehouse is used in dimensional models and is one of the methods that has the most dramatic effect on query time on large sets of datas. At the simplest form an aggregate is a simple summery table that can be derived by performing a ''Group by'' SQL query. The most common use of aggregates is to take a dimension and change the granuality of this dimension. When the granularity is of the dimension is changed, the fact table has to be partially summerized to fit the new grain of the new dimention, thus creating new dimensional and fact tables, fitting this new level of grain. Aggregates are sometimes referred to at pre-calculated summary data, since aggregations usually is precomputed, partially summariced data, that is stored in new aggregated tables. So the reason why aggregates can make such an dramatic increase in the performance of the data warehouse is by reducing the number of rows to be accesed when responding to a query.
+[[Ralph Kimball|Kimball]] is quoted[http://www.rkimball.com/html/articles_search/articles1996/9608d54.html] saying:
-[[Materialized view|Oracles materialized view]]
+<blockquote>The single most dramatic way to affect performance in a large data warehouse is to provide a proper set of aggregate (summary) records that coexist with the primary base records. Aggregates can have a very significant effect on performance, in some cases speeding queries by a factor of one hundred or even one thousand. No other means exist to harvest such spectacular gains</blockquote>
-Aggregates in a data warehouse is used in dimensional models and is one of the methods that has the most dramatic effect on query time on large sets of datas. At the simplest form an aggregate is a simple summery table that can be derived by performing a ''Group by'' SQL query. Aggregations in a dimensional model is built from the fact table by changing the granularity on specific dimensions and aggregating up data along these dimensions. Aggregates are sometimes referred to at pre-calculated summary data since aggregations is  partially summarizing the detailed atomic data in a base fact table and storing the result in a new fact table.
+Having aggregates and atomic data increses the complexity of the dimensional model. This complexity should be tranparent to the users of the data warehouse, thus when a request is made, the data warehouse should return data from the table with the correct grain. So when requests to the data warehouse are made, aggregate navigator functionality should be implemented, to help determine the correct table with the correct grain.
+The number of possible aggregations is determined by every possible combination of dimension granularities. Since it would produce a lot of overhead to build all possible aggregations, if not be impossible, it is a good idea to choose a subset of tables on which to make aggregations. The best way to choose this subset and decide which aggregations to build is to monitor queries and design aggregations to match query patterns.
-The result of
-The best way to decide which aggregations to build is to monitor queries and design aggregations to match query patterns.
-<p>Aggregate tables improve data warehouse performance by reducing the number
-of rows the RDBMS must access when responding to a query. At the simplest
-level, this is accomplished by partially summarizing the data in a base
-fact table and storing the result in a new fact table. Some new terminology will
-be necessary to differentiate aggregate tables from those in the original
-schema.</p>
 <p>Aggregates in data warehouses helps achieve an extraordinary improvement of the speed of data.<br/>
-[[Ralph Kimball|Kimball]] is quoted[http://www.rkimball.com/html/articles_search/articles1996/9608d54.html] saying:<br/>
-<blockquote>The single most dramatic way to affect performance in a large data warehouse is to provide a proper set of aggregate (summary) records that coexist with the primary base records. Aggregates can have a very significant effect on performance, in some cases speeding queries by a factor of one hundred or even one thousand. No other means exist to harvest such spectacular gains</blockquote>
 </p>
 Aggregate navigator
-Aggregate strategy
+Aggregate strategy how to and what to
 Selecting aggregates