CMSC 818e Day 12

Reading time ~2 minutes

These are notes taken during CMSC 818e: Distributed And Cloud-Based Storage Systems. Course webpage and syllabus here.

Day Twelve

Serializability

Conflict Serializability

  • Two read/write instruction “conflict” if
    • they are by different transactions
    • they operate on the same data item
    • at least one is a write instruction

why do we care?

  • if two read/write instructions don’t conflict they can be swapped without any change in the final effect
  • if they do conflict they an’t be swapped

Conflict-equivalent schedules

  • if S can be transformed into S’ through a series of swaps, S and S’ are called conflict-equivalent
  • conflict-equivalence guarantees the same final effect

Conflicting write instructions don’t matter because they’ll be overwritten anyway. The final write is the only one that matters.

View-serializability for S’ and S and each datum Q:

  • if Ti reads initial value of Q in S, must also in S’
  • if Ti reads value written from Tj in S, must also in S’
  • it Fi performs final write to Q in S, must also in S’

Testing for conflict-serializability

  • given a schedule, determine if it is conflict-serializable (this is slow N factorial)
  • draw a precedence-graph over the transactions (much more efficient!)
    • if there is a cycle in the graph, not conflict serializable
    • if there is none, it is conflict-serializable

How to guarantee serializability?

  • Allowing transactions to run, and then aborting them if the schedules aren’t serializable can be expensive

We can instead use schemes to guarantee that the schedule will be conflict-serializable.

Recoverability

  • Serializability is good for consistency
  • What if transactions fail?
  • in a recoverable schedule, we never commit a transaction that depends on a reads-from if the read-from transaction hasn’t been committed yet
  • cascading rollbacks
  • dirty read: reading a value written by a transaction that hasn’t committed yet
  • cascadeless schedules: transaction that only reads committed values
  • cascadeless implies no cascading rollbacks (which is good!)

### Concurrency Control

  • Approach: guarantee conflict-serialziablity by limiting concurrency with locks
  • Assumptions: ignoring durability (no crashes) though transactions can still abort
  • Goal: serializability and minimal impact of aborts

Lock-based protocols

  • transactions must acquire locks
    • share locks
    • exclusive locks
  • lock requess are made to the concurrency control manager
    • it decides whether to grant a lock request

2-phase Locking Protocol

  • phase 1: growing phase
    • transaction may obtain locks
    • but may not release them
  • phase 2: shrinking phase
    • only release locks
  • what about deadlock??
  • 2PL guarantees conflict-serializability
    • lock-point: the time at which a transcation acquired last lock
    • if lock-point(T1) < lock-point(T2), there can’t be an edge from T2 to T1 in the precedence graph
  • rigorous 2PL: release both exclusive and read locks only at the very end
    • makes serializablity order == the commit order
    • more intuitive behavior for users
  • strict 2PL lock conversion
    • transaction might not be sure what it needs a write lock on in advance
    • start with an S lock
    • upgrade to an X lock later if needed
    • doesn’t change any of the other properties
  • strict and rigorous 2PL ensure consistency but not deadlocks

Dealing with deadlocks

  • deadlock detected, what to do?
    • kill one, usually the youngest in the cycle
  • how to deal with them pre-emptively?
    • wait-die scheme
    • wound-wait scheme
    • both are biased in favor of the older transactions, since they’re assumed to have gotten more work done
    • but, that means the possibility of starvation
      • if a young transaction is aborted too many times, it may be given priority in continuing

Locking granularity

  • coarse vs fine - tradeoffs!

A Parrot Trainer Eats Crow

In this post, we'll consider how it is that models trained on massive datasets using millions of parameters can be both "low bias" and al...… Continue reading

Embedded Binaries for Go

Published on February 06, 2021