CMSC 818e Day 4

Reading time ~2 minutes

These are notes taken during CMSC 818e: Distributed And Cloud-Based Storage Systems. Course webpage and syllabus here.

Day Four

Elephant Paper

link to post

  • differentiating between undo and long term history
  • policies:
    • keep one (browser cache, core, /tmp)
    • keep all
    • keep safe (just undo)
    • keep landmarks <–
  • landmarks - people write/edit in blocks; cluster the times together and take the last one; call that a landmark
  • versioning for files but not for directories (why was that again?)
  • should changes propagate all the way to the root?
    • make it cheaper by using file chunking/hashing (like LBNFS)
    • set flag for “dirty” to signal that a node has been modified
cd foo/@12-nov-1999:11:30
tls    `ls @v`
tgrep
  • user-level process called when the cleaner comes across high-temp file
  • Downsides: less locality in inodes, data blocks; pressure on buffer cache
  • could use lbfs chunking to lessen the buffer cache pressure
  • what about something like video editing?
  • use diffs between versions of a chain to a land-mark
  • applications used to write to filesystems; once you have sufficient operating systems, you can take the load off the applications
  • operational transforms
  • incomplete description of results (not enough set-up for the graphs), for example:
    • is the cleaner running? how often?
    • what is the keep-safe window?
  • challenges with duplicating code from papers (e.g. epaxos)
  • Inferno - Bell Labs/Rob Pike
  • Network Application
  • snapshot: store a pointer to a previous root, use that to drive policies that reclaim old versions no longer needed

Knockoff Paper

link to post

  • an attempt to generalize operation shipping
  • the specific details are in a previous paper on Arnold
  • eidetic versioning: any past state in the file system or in application memory
  • stores non-deterministic log for replay
  • nondeterministic log: system call results (always happen at the same time), external data reads (references to other file in the FS), thread scheduling (sometimes less predictable), unexpected signals (how to recreate?)
  • Store by values when programs to produce are not in cloud, computation costlier than communication
  • versioning policies: none; on close; on write; eidetic (system call)
  • Note: cost comparison can be difficult for long-running applications. Greedy policy is good in the short term, but might not be the globally optimal solution. Uses per-application histories to catch long-running apps that might benefit from ops (multiple versions?)
  • Problems: Reproducing past file may need input data from other logs; version vector show dependency graph between files; materialization delay is delay to reproduce inputs an the file
  • Costs - 7-8% recording; up to a minute to re-constitute; doesn’t mention the word interactive
  • sha1 deprecation
  • creating images of the environment - e.g. docker or vagrant

Sharding the Shards

In "Sharding the Shards: Managing Datastore Locality at Scale with Akkio", Annamalai, et al. present Akkio, a locality management service...… Continue reading