Tags:conceptdatabasetransactiontransacionmanagement Status:🟩


Transaction Management

Summary

Transaction management ensures that database operations are reliable and consistent, preventing data loss or corruption. It relies on the ACID properties—Atomicity, Consistency, Isolation, and Durability—to guarantee transactions are processed correctly. Techniques like buffer management, write-ahead logging, and transaction recovery help handle system failures. Locking mechanisms, such as two-phase locking, ensure transactions run smoothly without conflicts, maintaining data integrity even in complex scenarios. See transactions in sql for a brief introduction of transactions and implementation in SQL.

Motivation for transaction management

When a system is managing its transactions correct, the end users will never experience any crashes related to data or at least be in doubt whether their data reached the database or not.

ACID

ACID is a set of properties that ensure reliable database transactions.

  • Atomicity: Each transaction runs to completion or has no effect at all.
  • Consistency: After a transaction completes, the integrity constraints are satisfied.
  • Isolation: Transactions executed in a parallel have the same effects as if they were executed sequentially.
  • Durability: The effect of a committed transaction remains in the database even if the system crashed.

Atomicity and Durability issues

Atomicity and durability issues occur when a transaction is only partially completed or its committed changes are lost due to a system failure. Atomicity ensures a transaction is all-or-nothing, while durability guarantees committed changes persist even after a crash. Failures can violate these properties, leading to data inconsistencies.

  • t1: Finished before the checkpoint, so it’s committed and safe. No action needed after the crash.
  • t2: Started before, finished after the checkpoint. Redo needed to reapply its changes.
  • t3: Started before, never finished. Undo required to revert its partial changes.
  • t4: Started after the checkpoint, finished before the crash. Committed, no action needed.
  • t5: Started after, not finished. No action needed, no changes made.

Buffer Management

When a transaction is completed it performes a commit. This is not a change on the disk, but rather a record in the transaction log, marking the transaction as successful. Buffer Management Policies determine when changes are moved from RAM to disk.

  • FORCE / NO FORCE: This affect data pages of committed transactions.
  • STEAL / NO STEAL: This affect data pages of uncommitted transactions.

FORCE / NO STEAL

FORCE: Write changed pages to disk at commit.

  • Once a transaction commits, the changes are immediately written to disk and removed from main memory.
  • Increases response time due to the disk write, but ensures durability.
  • For systems with high request volume, frequent disk writes can slow down performance.

NO STEAL: Allow updated pages to be replaced.

  • Allows replacing data pages that are committed.
  • Does not allow replacing pages for uncommitted transactions.

NO FORCE / STEAL

NO FORCE: Do not write changed pages to disk at commit.

  • Changes may remain in memory after a transaction commits, not immediately written to disk.
  • Can improve performance by reducing disk writes, but may risk data loss in the event of a crash before the pages are written.
  • More efficient for systems with frequent transactions but less durable.

STEAL: Allow uncommitted pages to be replaced.

  • Data pages for uncommitted transactions can be replaced in memory.
  • Increases performance by freeing up memory but risks losing uncommitted changes if a crash occurs.
  • Allows for better memory management, but can lead to undo operations during recovery.

WAL Protocol (Write-Ahead Logging)

Before any changes are written to the disk, we force the corresponding log record to disk. Before a transaction is committed, we force all log records for the transaction to disk. This ensures Atomicity and Durability.

(Before making any changes to the disk, we first save the log record. Before finishing a transaction, we make sure all its log records are saved to the disk.)

Restart Recovery of Transactions

  1. Analyze information about transactions from the last checkpoint.
  2. Redo the changes of committed transactions that did not make it to disk.
  3. Undo the changes of uncommitted transactions that accidently made it to disk.

If a system does not have the log, the recovery will fail. It will need the log and either the Data or Backup.

Transactions Scheduler

Simple Scheduler

A simple scheduler would maintain a queue of transactions and carry the out in order. However then problem is that transactions must wait for each other, even though they might be unrelated to each other (requesting data from different disks).

Interleaving Scheduler

Most DBMSs have schedulers that allow the actions of transactions to interleave. However, even though the operations are interleaved, the final result should be equivalent to if the transactions were executed one after another in a serial order (serial schedule). One way to enforce serializability is through locks.

Locks

A lock is a right to perform operations on a database element. Only one transaction may hold a lock on an element at any time. The locks must be requested by transactions and are granted by the locking scheduler.

There are two types of locks:

  1. Write locks: Only one can be hold at a time.
  2. Read locks: Can be available for multiple transactions at a time.

Two-Phase Locking

Two-phase locking is a method to ensure that transactions are executed in a serializable way.

It has two phases:

  1. Growing phase: The transaction can acquire locks but cannot release any.
    1. When reading a database resource it gets a shared lock (S).
    2. When writing a database resource it gets a exclusive lock (X).
  2. Shrinking phase: The transaction can release locks but cannot acquire any new ones.

This ensures that once a transaction starts releasing locks, it cannot interfere with others, preventing conflicts and ensuring data consistency.

There are different variations of 2PL like Static 2PL, Strict 2PL and Rigorous 2PL.

Strict Two-Phase Locking

T = transaction

  1. Before reading a record/page, T gets a shared lock. Before writing a record/page, T gets a exclusive lock.
  2. A record/page cannot have an X lock at the same time as any other lock.
  3. Release shared locks during shrinking phase and exclusive locks on commit/abort.

Rigorous Two-Phase Locking

T = transaction

  1. Before reading a record/page, T gets a shared lock. Before writing a record/page, T gets a exclusive lock.
  2. A record/page cannot have an exclusive lock at the same time as any other lock.
  3. Release all locks on commit/abort.

Deadlocks

The DBMS sometimes must make a transaction wait for a another transaction to release a lock. However this can lead to a deadlock, where A waits for B and B waits for A, which means they are stuck, since they are waiting for each other. Deadlocks are resolved by aborting a transaction involved in the cycle.

Avoiding deadlocks

When a transaction tries to upgrade a lock, from shared to exclusive, while another is holding a conflicting one, it can lead to a deadlock.

If transactions access resources in a consistent order, deadlocks can be avoided.

The optimizer may not always allow control over the order of access.

Deadlock can occur when one transaction holds a read lock and another transaction attempts to acquire a write lock on the same resource, causing both to wait for each other.

Phantom tuples

Phantom tuples are rows in a database that appear (or disappear) when a query is re-executed, due to changes made by other transactions during the query’s execution.

Phantoms can be avoided by locking a relation before adding data, but this reduces concurrency. Index locking prevents phantom tuples while allowing most other insertions.

Example:

  • Transaction A queries for all employees earning more than $50,000.
  • Transaction B adds a new employee earning $60,000 while Transaction A is still running.
  • When Transaction A runs again, the new employee appears in the results, even though they weren’t there when the query started.

Isolation levels in modern systems