CMSC 818e Day 2

These are notes taken during CMSC 818e: Distributed And Cloud-Based Storage Systems. Course webpage and syllabus here.

Day Two

Discussion of Immutability Changes Everything

  • Pat Helland’s “Immutability Changes Everything”.
  • fight ambiguity with append-only applications
  • use content-defined names (e.g. hash of the file)
  • Amol’s DataHub - GitHub for datasets - how to store them efficiently? Each dataset is treated as immutable; might be new versions but the datasets themselves don’t change.
  • Deal with massively parallel “big data” with MapReduce; depends on immutable data – doesn’t matter when the actual computation on a subslice happens, because it won’t change.
  • Append-only computing: logs are the truth (log is immutable); single-master changes are applied sequentially via single-master or consensus.
  • Immutable is not always immutable:
    • optimizing for read access: indexes, de-normalization
    • farming out portions of work, with re-try
    • tension between fast access (tiny tables) and expense of joins
    • normalization is there to eliminate update anomalies.
  • Immutability enables unambiguous identity (content-defined names)
  • Immutability enables massive replication/caching/parallelism
  • Immutability eliminates locking
  • Immutability enables re-computation
    • from immutable data
    • of immutable data

Project 1

  • Build a file system in Go. Will be completely in-memory. Implemented as in-memory tree specified by root.
  • gitlab link
  • Use Fuse - an interface for building user-level file systems.
  • modify dfs.go per README; don’t have to implement all of the functions at once. Start with ‘hello’ and slowly add in others
  • bazil can have a file system that doesn’t crash with a very small subset of the methods defined.
  • use Piazza to communicate with class & Pete

