Tags:conceptdatabasestoagestoragemodels Status:🟩


Storage Models

Summary

Storage models define how a database management system (DBMS) organizes and stores data physically on disk and in memory. The choice of model impacts performance and is often tailored to specific workloads like OLTP or OLAP.

  1. NSM (N-ary Storage Model): Row store, optimized for OLTP with fast operations on individual entities.
  2. DSM (Decomposition Storage Model): Column store, optimized for OLAP by scanning specific attributes efficiently.
  3. PAX (Hybrid Storage Model): Combines features of NSM and DSM for balanced performance.

Details

Storage models is how a database management system stores and retrieves data. More specifically the storage model specifies how the DBMS physically organizes tuples on disk and in memory. It can have different performance characteristics based on the target workload. There it is also possible to use a different storage model based on the workload.

There are 3 main types of storage models:

  • NSM: N-ary Storage Model
  • DSM: Decomposition Storage Model
  • PAX: Hybrid Storage Model

NSM

N-ary Storage Model - also known as “row store” because it takes a row one-by-one, serialize it and put it into a file. The DBMS stores almost all attributes for a single tuple contiguously on a single page. It is ideal for OLTP workloads where queries are more likely to access individual entities and execute write-heavy workloads.

Advantages
Fast inserts, updates and deletes.

Good for queries that need the entire tuple (OLTP).

Can use index-oriented physical storage for clustering.
Disadvantages
Not good for scanning large portions of the table and/or a subset of the attributes.

Terrible memory locality in access patterns.

Not ideal for compression because of multiple value domains within a single page.

NSM OLTP Example


To the bottom left we have a hard-disk at the database file that consists of pages.
Withing such a page is, in this case, these byte arrays with rows of a table. Each row has a header and its respective attributes.
If we have an insert. Find a page in the database file that has an empty space for the byte array. If no space is available we have to expand the space of the file.

NSM OLAP Example

Now we have an OLAP type query. If we want to count the number of logins for government people, we would have to look at all pages in the file. Go through all the byte arrays and just take the hostname, filter it, extract month from last login.
For this type of workload NSM might not be a good idea. We dont need userID, userName and userPass.

DSM

Decomposition Storage Model - also known as “column store” because it cut the table column by column and store the columns as binary on disk. This is ideal for OLAP. When scanning a lot of data we can just use columns referenced.

Advantages
Reduces the amount wasted I/O per query because DBMS only read the data that it needs.

Faster query processing because of increased locality and cached data reuse.

Better data compression.
Disadvantages
Slow for point queries, inserts, updates and deletes because of tuple splitting / stitching / reorganization.

This image represents a file storing all the different attributes in a separate page.

DSM OLAP Example

Using DSM for OLAP is much faster than NSM because we only have to load 2 pages instead of all pages.

DSM OLTP Example

DSM is not a good chouse for OLTP queries because:
You load unnecessary information.
You have to look at a lot of different pages.
The cost of reassembling attributes from different pages into a table.