DATABASE WORKLOADS

On-Line Transaction Processing

On-Line Analytical Processing

Hybrid Transaction + Analytical Processing

Spacial control:

  • Where to write pages on disk
  • The goal is to keep pages that are used together often as physically close together as possible on disk.

Temporal Control:

  • When to read pages to memory, and when to write them to disk.
  • The goal is minimize the number of stalls from having to read data from disk.

TODAY’S AGENDA

Buffer Pool Manager

Replacement Policies

Other Memory Pools

Memory region organized as an array of fixed-size pages.

An array entry is called a frame.

when the DBMS requests a page, an exact copy is places into one of these frame.

The page table keeps track of pages that are currently in memory

Also maintains additional meta-data per page:

  • Dirty Flag
  • Pin/Reference Counter

Locks vs Latches

Locks:

  • Protects the database’s logical contents from other transactions.
  • Held for transaction duration.
  • Need to be able to rollback changes.

Latches: –mutex

  • Protects the critical sections of the DBMS’s internal data structure from other threads.
  • Held for operation duration.
  • Do not need to be able to rollback changes.

SCAN SHARING

Queries can reuse data retrieved from storage or operator computations.

  • This is different from result caching.

Allow multiple queries to attach to a single cursor that scans a table.

  • Queries do not have to be exactly the same.
  • Can also share intermediate results.

Buffer Replacement Policies

  • Least-recently used: Maintain a timestamp of when each page was last accessed
  • Clock
    • Approximation of LRU without needing a separate timestamp per page
      • Each Page has a reference bit
      • When a page is accessed, set to 1
    • Organize the pages in a circular buffer with a “clock hand”:
      • Upon sweeping, check if a page’s bit is set to 1.
      • If yes, set to zero. If no, then evict.

Problems

LRU and CLOCK replacement policies are susceptible to sequential flooding.

  • A query preforms a sequential scan that reads every page.
  • This pollutes the buffer pool with pages that are read once and then never again.

The most recently used page is actually the most unneeded page.

Better Policies: LRU-K

Track the history of last K references to each page as timestamps and compute the interval between subsequent accesses.

The DBMS then uses this history to estimate the next time that page is going to be accessed.

Better Polices: Localization

The DBMS chooses which pages to evict on a per txn/query basis. This minimizes the pollution of the buffer pool from each query.

  • Keep track of the pages that a query has accessed.

Example: Postgres maintains a small ring buffer that is private to the query.

Better Policies: Priority Hints

The DBMS knows that the context of each page during query execution.

It can provide hints to the buffer pool on whether a page is important or not.

DIRTY PAGES

Fast: If a page in the buffer pool is not dirty, then the DBMS can simply “drop” it.

SLOW: I f a page is dirty , then the DBMS must write back to disk to ensure that its changes are persisted.

Trade-off between fast evictions versus dirty writing pages that will not be read again in the future.

BACKGROUND WRITING

The DBMS can periodically walk through the page table and write dirty pages to disk.

when a dirty page is safely written, the DBMS can either evict the page or just unset the dirty flag.

Need to be careful that we don’t write dirty pages before their log records have been written.

CONCLUSION

The DBMS can manage that sweet, sweet memory better then the OS.

Leverage the semantics about the query plan to make better decisions:

  • Evictions
  • Allocations
  • Pre-fetching