These are notes taken during CMSC 818e: Distributed And Cloud-Based Storage Systems. Course webpage and syllabus here.
Day Twelve
Serializability
Conflict Serializability
- Two read/write instruction “conflict” if
- they are by different transactions
- they operate on the same data item
- at least one is a write instruction
why do we care?
- if two read/write instructions don’t conflict they can be swapped without any change in the final effect
- if they do conflict they an’t be swapped
Conflict-equivalent schedules
- if S can be transformed into S’ through a series of swaps, S and S’ are called conflict-equivalent
- conflict-equivalence guarantees the same final effect
Conflicting write instructions don’t matter because they’ll be overwritten anyway. The final write is the only one that matters.
View-serializability for S’ and S and each datum Q:
- if Ti reads initial value of Q in S, must also in S’
- if Ti reads value written from Tj in S, must also in S’
- it Fi performs final write to Q in S, must also in S’
Testing for conflict-serializability
- given a schedule, determine if it is conflict-serializable (this is slow N factorial)
- draw a precedence-graph over the transactions (much more efficient!)
- if there is a cycle in the graph, not conflict serializable
- if there is none, it is conflict-serializable
How to guarantee serializability?
- Allowing transactions to run, and then aborting them if the schedules aren’t serializable can be expensive
We can instead use schemes to guarantee that the schedule will be conflict-serializable.
Recoverability
- Serializability is good for consistency
- What if transactions fail?
- in a recoverable schedule, we never commit a transaction that depends on a reads-from if the read-from transaction hasn’t been committed yet
- cascading rollbacks
- dirty read: reading a value written by a transaction that hasn’t committed yet
- cascadeless schedules: transaction that only reads committed values
- cascadeless implies no cascading rollbacks (which is good!)
### Concurrency Control
- Approach: guarantee conflict-serialziablity by limiting concurrency with locks
- Assumptions: ignoring durability (no crashes) though transactions can still abort
- Goal: serializability and minimal impact of aborts
Lock-based protocols
- transactions must acquire locks
- share locks
- exclusive locks
- lock requess are made to the concurrency control manager
- it decides whether to grant a lock request
2-phase Locking Protocol
- phase 1: growing phase
- transaction may obtain locks
- but may not release them
- phase 2: shrinking phase
- only release locks
- what about deadlock??
- 2PL guarantees conflict-serializability
- lock-point: the time at which a transcation acquired last lock
- if lock-point(T1) < lock-point(T2), there can’t be an edge from T2 to T1 in the precedence graph
- rigorous 2PL: release both exclusive and read locks only at the very end
- makes serializablity order == the commit order
- more intuitive behavior for users
- strict 2PL lock conversion
- transaction might not be sure what it needs a write lock on in advance
- start with an S lock
- upgrade to an X lock later if needed
- doesn’t change any of the other properties
- strict and rigorous 2PL ensure consistency but not deadlocks
Dealing with deadlocks
- deadlock detected, what to do?
- kill one, usually the youngest in the cycle
- how to deal with them pre-emptively?
- wait-die scheme
- wound-wait scheme
- both are biased in favor of the older transactions, since they’re assumed to have gotten more work done
- but, that means the possibility of starvation
- if a young transaction is aborted too many times, it may be given priority in continuing
Locking granularity
- coarse vs fine - tradeoffs!