These are notes taken during CMSC 818e: Distributed And Cloud-Based Storage Systems. Course webpage and syllabus here.

Day Twelve

Serializability

Two read/write instruction “conflict” if
- they are by different transactions
- they operate on the same data item
- at least one is a write instruction

why do we care?

if two read/write instructions don’t conflict they can be swapped without any change in the final effect
if they do conflict they an’t be swapped

if S can be transformed into S’ through a series of swaps, S and S’ are called conflict-equivalent
conflict-equivalence guarantees the same final effect

Conflicting write instructions don’t matter because they’ll be overwritten anyway. The final write is the only one that matters.

View-serializability for S’ and S and each datum Q:

Testing for conflict-serializability

given a schedule, determine if it is conflict-serializable (this is slow N factorial)
draw a precedence-graph over the transactions (much more efficient!)
- if there is a cycle in the graph, not conflict serializable
- if there is none, it is conflict-serializable

How to guarantee serializability?

Allowing transactions to run, and then aborting them if the schedules aren’t serializable can be expensive

We can instead use schemes to guarantee that the schedule will be conflict-serializable.

Serializability is good for consistency
What if transactions fail?
in a recoverable schedule, we never commit a transaction that depends on a reads-from if the read-from transaction hasn’t been committed yet
cascading rollbacks
dirty read: reading a value written by a transaction that hasn’t committed yet
cascadeless schedules: transaction that only reads committed values
cascadeless implies no cascading rollbacks (which is good!)

### Concurrency Control

Approach: guarantee conflict-serialziablity by limiting concurrency with locks
Assumptions: ignoring durability (no crashes) though transactions can still abort
Goal: serializability and minimal impact of aborts

Lock-based protocols

transactions must acquire locks
- share locks
- exclusive locks
lock requess are made to the concurrency control manager
- it decides whether to grant a lock request

2-phase Locking Protocol

phase 1: growing phase
- transaction may obtain locks
- but may not release them
phase 2: shrinking phase
- only release locks
what about deadlock??
2PL guarantees conflict-serializability
- lock-point: the time at which a transcation acquired last lock
- if lock-point(T1) < lock-point(T2), there can’t be an edge from T2 to T1 in the precedence graph
rigorous 2PL: release both exclusive and read locks only at the very end
- makes serializablity order == the commit order
- more intuitive behavior for users
strict 2PL lock conversion
- transaction might not be sure what it needs a write lock on in advance
- start with an S lock
- upgrade to an X lock later if needed
- doesn’t change any of the other properties
strict and rigorous 2PL ensure consistency but not deadlocks

Dealing with deadlocks

Locking granularity