These are notes taken during CMSC 818e: Distributed And Cloud-Based Storage Systems. Course webpage and syllabus here.

Day Two

Discussion of Immutability Changes Everything

Pat Helland’s “Immutability Changes Everything”.
fight ambiguity with append-only applications
use content-defined names (e.g. hash of the file)
Amol’s DataHub - GitHub for datasets - how to store them efficiently? Each dataset is treated as immutable; might be new versions but the datasets themselves don’t change.
Deal with massively parallel “big data” with MapReduce; depends on immutable data – doesn’t matter when the actual computation on a subslice happens, because it won’t change.
Append-only computing: logs are the truth (log is immutable); single-master changes are applied sequentially via single-master or consensus.
Immutable is not always immutable:
- optimizing for read access: indexes, de-normalization
- farming out portions of work, with re-try
- tension between fast access (tiny tables) and expense of joins
- normalization is there to eliminate update anomalies.
Immutability enables unambiguous identity (content-defined names)
Immutability enables massive replication/caching/parallelism
Immutability eliminates locking
Immutability enables re-computation
- from immutable data
- of immutable data

Build a file system in Go. Will be completely in-memory. Implemented as in-memory tree specified by root.
gitlab link
Use Fuse - an interface for building user-level file systems.
modify dfs.go per README; don’t have to implement all of the functions at once. Start with ‘hello’ and slowly add in others
bazil can have a file system that doesn’t crash with a very small subset of the methods defined.
use Piazza to communicate with class & Pete