Database Storage II (CMU Database Systems 2019 Fall)

Log=structured file organization

Instead of storing tuples in pages , the DBMS only stores log records.

The system appends log records to the file of how the database was modified:

inserts store the entire tuple
Deletes mark the tuple as deleted
Updates contain the delta of just the attributes that were modified

why?

easy to roll back
fast

Obvious downside: reads…

To read a record, the DBMS scans the log backwards and “recreates” the tuple to find what it needs.

Build indexed to allow it to jump to locations in the log

Periodically compact the log.

Today’s agenda

Data representation
system catalogs
storage models

Tuple storage

The DBMS’s catalogs contain the schema information about tables that the system uses to figure out the tuple’s layout.

INTEGER/BIGINT/SMALLINT/TINYINT

C/C++ Representation

FLOAT/REAL vs. NUMERIC/DECIMAL

IEEE-754 STANDARD / Fixed-point Decimals

VARCHAR/VARBINARY/TEXT/BLOB

Header with length, followed by data bytes.

TIME/DATE/TIMESTAMP

32/64-bit integer of (micro)seconds since Unix epoch

Fixed precision numbers

Numeric data types with arbitrary precision and scale. Used when round errors are unacceptable.

Example: NUMERIC, DECIMAL

Typically stored in a exact, variable-length binary representation with additional meta-data

like a varchar but not stored as a string

Large Values

Most DBMSs don’t allow a tuple to exceed the size of a single page.

To store values that are larger than a page, the DBMS uses separate overflow storage pages.

Postgres: TOAST( > 2 KB)
MySQL: Overflow (> 1/2 size of page)
SQL Server: Overflow( > size of page)

To BLOB or Not To BLOB: Large Object Storage in a Database of a Filesystem?

256 KiB seems to be a division

System Catalogs

A DBMS stores meta-data about database in its internal catalogs

-Tables, columns, indexed, views

Users, permissions

Internal statistics

Almost every DBMS stores their a database’s catalog in itself.

Wrap object abstraction around tuples
Specialized code for “bootstrapping” catalog tables.

You can query the DBMS’s internal INFORMATION_SCHEMA catalog to get info about the database.

ANSI standard stet of read-only views that provide info about all of the tables, views, columns, and procedures in a database.

DBMSs also have non-standard shortcuts to retrieve this information.

1	SELSCT * FROM INFORMATION_SCHEMA.TABLES WHERE table_catalog = '<db name>';

\d;

1 2	SHOW TABLES; DESC <table name>;

1	.schema <table name>;

observation

The relational model does not specify that we have to store all of a tuple’s attributes together in a single page.

This may not actually be the best layout for some workloads…

OLTP

Online Transaction Processing:

Simple queries that read / update a small amount of data that is related to a single entity in the database.

This is usually the kind of application that people build first.

OLAP

Online Analytical Processing

Complex queries that read large portions of the database spanning multiple entities
You execute these workloads on the data you have collected from your OLTP applications(s).

N-ARY STORAGE MODEL

Advantages

Fast inserts, updates, and deletes.
Good for queries that need the entire tuple.

disadvantages

Not good for scanning large portions of the table and / or a subset of the atreibutes./

DECOMPOSITION STORAGE MODEL(DSM)

The DBMS stores the values of a single attribute for all tuples contiguously in a page

Also known as a “column store”.

Tuple Identification

Choice #1: fixed-length Offsets

Each value is the same length for an attribute

Choice #2: Embedded Tuple Ids

Each value is stored with its tuple id in a column

Advantages

Reduces the amount wasted I/O because the DBMS only reads the data that it needs
Better query processing and data compression (more on this later)

Disadvantages

slow for point queries, inserts, updates, and deletes because of tuple splitting/stitching

CONCLUSION

The storage manager is not entirely independent from the rest of the DBMS.

It is important to choose the right storage model for the target workload:

OLTP = Row Store
OLAP = Column Store

![](20221030-DatabaseStorageII.assets/屏幕截图 2022-10-30 185728.png)

How the DBMS manages its memory and move data back-and-forth from disk

MIX OF TWO? engineering overhead too much

hybrid storage systems