These are notes taken during CMSC 818e: Distributed And Cloud-Based Storage Systems. Course webpage and syllabus here.
Day Eleven
Transaction: a sequence of database actions enclosed within special tags.
Properties:
- Atomicity: Entire transaction or nothing
- Consistency: Transaction, executed completely, takes database from one consistent state to another (defined at the application level - DB can't enforce directly, so programmer needs to guarantee this. DBMS can do a few things, e.g. enforce constraints on the data )
- Isolation: Concurrent transactions *appear* to run in isolation (ok to relax this)
- Durability: Effects of committed transactions are not lost
How does this related to queries that we discussed?
- Queries don’t update data, so Durability and consistency not relevant
- Would want concurrency
    - Consider a query computing balance at the end of the day
- Don’t want to read “in flight” changes
 
- Would want isolation
    - What if somebody makes a transfer while we are computing the balance?
- Typically not guaranteed for such long-running queries (often need to relax to handle long-running queries)
 
- TPC-C (write only) vs TPC-H (data mining)
Assumptions
- the system can crash at any time
- similarly, the power can go out at any point
    - contents of the main memory won’t survive the crash, or power outage
 
- But disks are durable, they might stop but data is not Lost
    - (for now)
 
- disks only guarantee atomic sector writes, nothing more (that’s not enough for DBs)
- transactions are by themselves consistent
Goals
- Guaranteed durability, atomicity
- as much concurrency as possible, while not compromising isolation and or consistency
    - two transactions updating the same account balance… NO
- two transactions updating different accounts… YES
 
Transaction States
- active - initial state, while executing
- partially committed - after final statement
- failed - after discover that can not proceed
- aborted - after rolled back and DB restored
- committed (transactions not always linear)
Concurrency control schemes
- a CC scheme is used to guarantee that concurrency does not lead to problems
- for simplicity, we will ignore durability during this section (since it’s guaranteed by other things)
    - so no crashes, …
- though transactions may still abort
 
When is concurrency ok?
- serial schedules are always ok
- serializability (equivalence to some serial ordering of the transactions, doesn’t specify which) – so how to do this as fast as possible, with as much concurrency as possible
Schedules
- a interleaved execution sequence of transaction instructions
- serial schedule - a schedule in which transaction appear one after the other (no interleaving)
- serial schedules satisfy isolation and consistency
Serializability
- a schedule is serializable if it’s final effect is the same as that of a serial schedule
- serializability -> database remains consistent
- non-serializable schedules are unlike to result in consistent database
- ensure serializability by either limiting concurrency (pessimistic) or by comparing the write sets to see if they overlap to identify the need for corrections after the fact if things go wrong (optimistic)
