Buffer Pools (CMU DataBase Systems 2019 fall)
CommentDATABASE WORKLOADS
On-Line Transaction Processing
On-Line Analytical Processing
Hybrid Transaction + Analytical Processing
Spacial control:
- Where to write pages on disk
- The goal is to keep pages that are used together often as physically close together as possible on disk.
Temporal Control:
- When to read pages to memory, and when to write them to disk.
- The goal is minimize the number of stalls from having to read data from disk.
TODAY’S AGENDA
Buffer Pool Manager
Replacement Policies
Other Memory Pools
Memory region organized as an array of fixed-size pages.
An array entry is called a frame.
when the DBMS requests a page, an exact copy is places into one of these frame.
The page table keeps track of pages that are currently in memory
Also maintains additional meta-data per page:
- Dirty Flag
- Pin/Reference Counter
Locks vs Latches
Locks:
- Protects the database’s logical contents from other transactions.
- Held for transaction duration.
- Need to be able to rollback changes.
Latches: –mutex
- Protects the critical sections of the DBMS’s internal data structure from other threads.
- Held for operation duration.
- Do not need to be able to rollback changes.
SCAN SHARING
Queries can reuse data retrieved from storage or operator computations.
- This is different from result caching.
Allow multiple queries to attach to a single cursor that scans a table.
- Queries do not have to be exactly the same.
- Can also share intermediate results.
Buffer Replacement Policies
- Least-recently used: Maintain a timestamp of when each page was last accessed
- Clock
- Approximation of LRU without needing a separate timestamp per page
- Each Page has a reference bit
- When a page is accessed, set to 1
- Organize the pages in a circular buffer with a “clock hand”:
- Upon sweeping, check if a page’s bit is set to 1.
- If yes, set to zero. If no, then evict.
- Approximation of LRU without needing a separate timestamp per page
Problems
LRU and CLOCK replacement policies are susceptible to sequential flooding.
- A query preforms a sequential scan that reads every page.
- This pollutes the buffer pool with pages that are read once and then never again.
The most recently used page is actually the most unneeded page.
Better Policies: LRU-K
Track the history of last K references to each page as timestamps and compute the interval between subsequent accesses.
The DBMS then uses this history to estimate the next time that page is going to be accessed.
Better Polices: Localization
The DBMS chooses which pages to evict on a per txn/query basis. This minimizes the pollution of the buffer pool from each query.
- Keep track of the pages that a query has accessed.
Example: Postgres maintains a small ring buffer that is private to the query.
Better Policies: Priority Hints
The DBMS knows that the context of each page during query execution.
It can provide hints to the buffer pool on whether a page is important or not.
DIRTY PAGES
Fast: If a page in the buffer pool is not dirty, then the DBMS can simply “drop” it.
SLOW: I f a page is dirty , then the DBMS must write back to disk to ensure that its changes are persisted.
Trade-off between fast evictions versus dirty writing pages that will not be read again in the future.
BACKGROUND WRITING
The DBMS can periodically walk through the page table and write dirty pages to disk.
when a dirty page is safely written, the DBMS can either evict the page or just unset the dirty flag.
Need to be careful that we don’t write dirty pages before their log records have been written.
CONCLUSION
The DBMS can manage that sweet, sweet memory better then the OS.
Leverage the semantics about the query plan to make better decisions:
- Evictions
- Allocations
- Pre-fetching