User:Stefmol: Difference between revisions
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
==Aggregates (data warehouse)== |
==Aggregates (data warehouse)== |
||
Aggregates |
Aggregates are used in dimensional models to produce dramatic positive effects on time it takes to query large sets of datas. At the simplest form an aggregate is a simple summery table that can be derived by performing a ''Group by'' SQL query. A more common use of aggregates is to take a dimension and change the granuality of this dimension, when changing the granularity of the dimension the fact table has to be partially summerized to fit the new grain of the new dimention, thus creating new dimensional and fact tables, fitting this new level of grain. Aggregates are sometimes referred to at pre-calculated summary data, since aggregations usually is precomputed, partially summariced data, that is stored in new aggregated tables. So the reason why aggregates can make such an dramatic increase in the performance of the data warehouse is the reduction of the number of rows to be accesed when responding to a query. |
||
[[Ralph Kimball|Kimball]] which is widely regarded as one of the original architects of data warehousing says[http://www.rkimball.com/html/articles_search/articles1996/9608d54.html]: |
[[Ralph Kimball|Kimball]] which is widely regarded as one of the original architects of data warehousing says[http://www.rkimball.com/html/articles_search/articles1996/9608d54.html]: |
||
Line 9: | Line 9: | ||
The number of possible aggregations is determined by every possible combination of dimension granularities. Since it would produce a lot of overhead to build all possible aggregations, if not be impossible, it is a good idea to choose a subset of tables on which to make aggregations. The best way to choose this subset and decide which aggregations to build is to monitor queries and design aggregations to match query patterns. |
The number of possible aggregations is determined by every possible combination of dimension granularities. Since it would produce a lot of overhead to build all possible aggregations, if not be impossible, it is a good idea to choose a subset of tables on which to make aggregations. The best way to choose this subset and decide which aggregations to build is to monitor queries and design aggregations to match query patterns. |
||
⚫ | |||
Having aggregate data in the dimesional model makes the enviroment more complex. to make this extra coplexity transparent to the user, functionality known as aggregate navigation is used to query the dimensional and fact tables with the correct grain level. The aggregate navigation essentily examines the query to see if it can be answered using a smaller, aggregate table. |
|||
Implementations of aggregate navigators can be found in a range of technologies: |
|||
*OLAP engines |
|||
<p>Aggregates in data warehouses helps achieve an extraordinary improvement of the speed of data.<br/> |
|||
*Materialized views |
|||
*Relational OLAP (ROLAP) services |
|||
</p> |
|||
*BI application servers or query tools |
|||
It is gennerally recomented to use either of the first three choises, since the benefits in the latte case is restricted to a simgle fron end BI tool(s 354) |
|||
⚫ | |||
Aggregate strategy |
===Aggregate strategy=== |
||
====how to==== |
|||
====what to==== |
|||
Selecting aggregates |
Selecting aggregates |
||
Revision as of 00:04, 22 November 2010
Aggregates (data warehouse)
Aggregates are used in dimensional models to produce dramatic positive effects on time it takes to query large sets of datas. At the simplest form an aggregate is a simple summery table that can be derived by performing a Group by SQL query. A more common use of aggregates is to take a dimension and change the granuality of this dimension, when changing the granularity of the dimension the fact table has to be partially summerized to fit the new grain of the new dimention, thus creating new dimensional and fact tables, fitting this new level of grain. Aggregates are sometimes referred to at pre-calculated summary data, since aggregations usually is precomputed, partially summariced data, that is stored in new aggregated tables. So the reason why aggregates can make such an dramatic increase in the performance of the data warehouse is the reduction of the number of rows to be accesed when responding to a query.
Kimball which is widely regarded as one of the original architects of data warehousing says[1]:
The single most dramatic way to affect performance in a large data warehouse is to provide a proper set of aggregate (summary) records that coexist with the primary base records. Aggregates can have a very significant effect on performance, in some cases speeding queries by a factor of one hundred or even one thousand. No other means exist to harvest such spectacular gains
Having aggregates and atomic data increses the complexity of the dimensional model. This complexity should be tranparent to the users of the data warehouse, thus when a request is made, the data warehouse should return data from the table with the correct grain. So when requests to the data warehouse are made, aggregate navigator functionality should be implemented, to help determine the correct table with the correct grain. The number of possible aggregations is determined by every possible combination of dimension granularities. Since it would produce a lot of overhead to build all possible aggregations, if not be impossible, it is a good idea to choose a subset of tables on which to make aggregations. The best way to choose this subset and decide which aggregations to build is to monitor queries and design aggregations to match query patterns.
Aggregate navigator
Having aggregate data in the dimesional model makes the enviroment more complex. to make this extra coplexity transparent to the user, functionality known as aggregate navigation is used to query the dimensional and fact tables with the correct grain level. The aggregate navigation essentily examines the query to see if it can be answered using a smaller, aggregate table.
Implementations of aggregate navigators can be found in a range of technologies:
- OLAP engines
- Materialized views
- Relational OLAP (ROLAP) services
- BI application servers or query tools
It is gennerally recomented to use either of the first three choises, since the benefits in the latte case is restricted to a simgle fron end BI tool(s 354)
Aggregate strategy
how to
what to
Selecting aggregates