These are notes taken during CMSC 818e: Distributed And Cloud-Based Storage Systems. Course webpage and syllabus here.

Day Four

Elephant Paper

differentiating between undo and long term history
policies:
- keep one (browser cache, core, /tmp)
- keep all
- keep safe (just undo)
- keep landmarks <–
landmarks - people write/edit in blocks; cluster the times together and take the last one; call that a landmark
versioning for files but not for directories (why was that again?)
should changes propagate all the way to the root?
- make it cheaper by using file chunking/hashing (like LBNFS)
- set flag for “dirty” to signal that a node has been modified

cd foo/@12-nov-1999:11:30
tls    `ls @v`
tgrep

user-level process called when the cleaner comes across high-temp file
Downsides: less locality in inodes, data blocks; pressure on buffer cache
could use lbfs chunking to lessen the buffer cache pressure
what about something like video editing?
use diffs between versions of a chain to a land-mark
applications used to write to filesystems; once you have sufficient operating systems, you can take the load off the applications
operational transforms
incomplete description of results (not enough set-up for the graphs), for example:
- is the cleaner running? how often?
- what is the keep-safe window?
challenges with duplicating code from papers (e.g. epaxos)
Inferno - Bell Labs/Rob Pike
Network Application
snapshot: store a pointer to a previous root, use that to drive policies that reclaim old versions no longer needed

an attempt to generalize operation shipping
the specific details are in a previous paper on Arnold
eidetic versioning: any past state in the file system or in application memory
stores non-deterministic log for replay
nondeterministic log: system call results (always happen at the same time), external data reads (references to other file in the FS), thread scheduling (sometimes less predictable), unexpected signals (how to recreate?)
Store by values when programs to produce are not in cloud, computation costlier than communication
versioning policies: none; on close; on write; eidetic (system call)
Note: cost comparison can be difficult for long-running applications. Greedy policy is good in the short term, but might not be the globally optimal solution. Uses per-application histories to catch long-running apps that might benefit from ops (multiple versions?)
Problems: Reproducing past file may need input data from other logs; version vector show dependency graph between files; materialization delay is delay to reproduce inputs an the file
Costs - 7-8% recording; up to a minute to re-constitute; doesn’t mention the word interactive
sha1 deprecation
creating images of the environment - e.g. docker or vagrant