bigdata - Cube Data

On-Line Analytic Processing(OLAP)

On-Line Analytic Processing generally involves highly complex queries that use one or more aggregations.

OLAP and Data Warehouses

Data from many separate databases may be integrated into the warehouse. The warehouse is usually only updated overnight. Data warehouses play an important role in OLAP applications. First, the warehouse is necessary to organize and centralize data in a way that supports OLAP queries. Second, OLAP queries are usually complex and touching much of the data and take too much time to be executed in a transaction-processing system with high throughput requirements.

A Multidimensional View of OLAP Data

In typical OLAP applications there is a central relation or collection of data called the fact table. Often, it helps to think of the objects in the fact table as arranged in a multidimensional space. Two broad directions that have been taken by specialized systems that support cube-structured data for OLAP: ROLAP and MOLAP. ROLAP, which is Relational OLAP. In this approach, data may be stored in relations with a specialized structure called a “star schema”. MOLAP, which is Multidimensional OLAP. A specialized structure “data cude” is used to hold the data, including its aggregates.

Star Schemas

A star schema consists of the schema for the fact table, which links to several other relations, called “dimension tables”.

Slicing and Dicing

A choice of partition for each dimension “dices” the cude. The result is that the cude is divided into smaller cubes that represent groups of points whose statistics are aggregated by a query that performs this partitioning in its “group by” clause. Through the “where” clause, a query has the option of focusing on particular partitions alone one or more dimensions.(on a particular “slice” of the cube).
Alt text

Alt text

Alt text

Alt text

Data Cubes

The formal data cube precomputes all possible aggregates in a systematic way.

The Cube Operator

Given a fact table F, we can define an augmented table CUBE(F) that adds an additional value, denoted , to each dimension. The has the intuitive meaning “any,” and it represents aggregation along the dimension in which it appears.

The Cube Operator in SQL

SQL gives us a way to apply the cube operator within queries. If we add the term “WITH CUBE” to a group-by clause, then we get not only the tuple for each group, but also the tuples that represent aggregation along one or more of the dimensions along which we have grouped. These tuples appear in the result with NULL where we have used *.
Alt text

Alt text

However, SalesRollup would not contain tuples such as
Alt text

Author

s-serenity

Posted on

2023-12-12

Updated on

2024-02-21

Licensed under

You need to set install_url to use ShareThis. Please set it in _config.yml.
You forgot to set the business or currency_code for Paypal. Please set it in _config.yml.

Comments

You forgot to set the shortname for Disqus. Please set it in _config.yml.