These are notes taken during CMSC 818e: Distributed And Cloud-Based Storage Systems. Course webpage and syllabus here.
Day Two
Discussion of Immutability Changes Everything
- Pat Helland’s “Immutability Changes Everything”.
- fight ambiguity with append-only applications
- use content-defined names (e.g. hash of the file)
- Amol’s DataHub - GitHub for datasets - how to store them efficiently? Each dataset is treated as immutable; might be new versions but the datasets themselves don’t change.
- Deal with massively parallel “big data” with MapReduce; depends on immutable data – doesn’t matter when the actual computation on a subslice happens, because it won’t change.
- Append-only computing: logs are the truth (log is immutable); single-master changes are applied sequentially via single-master or consensus.
- Immutable is not always immutable:
- optimizing for read access: indexes, de-normalization
- farming out portions of work, with re-try
- tension between fast access (tiny tables) and expense of joins
- normalization is there to eliminate update anomalies.
- Immutability enables unambiguous identity (content-defined names)
- Immutability enables massive replication/caching/parallelism
- Immutability eliminates locking
- Immutability enables re-computation
- from immutable data
- of immutable data
Project 1
- Build a file system in Go. Will be completely in-memory. Implemented as in-memory tree specified by root.
- gitlab link
- Use Fuse - an interface for building user-level file systems.
- modify
dfs.go
per README; don’t have to implement all of the functions at once. Start with ‘hello’ and slowly add in others bazil
can have a file system that doesn’t crash with a very small subset of the methods defined.- use Piazza to communicate with class & Pete