These are notes taken during CMSC 818e: Distributed And Cloud-Based Storage Systems. Course webpage and syllabus here.

Day Three

Log-Structured File Systems

Why log-structured?

Fast File System (FFS) writes can require a lot of disk operations

(FFS) Creating a file (each requires a seek, possibly a sync):

write inode
write dir data
write dir inode
write file data
write inode

But… we have to put these in order! If we don’t we could end up with an inode that references a file that never made if from tmp to written to disk. So we have to makes sure we write the file data before we write the file inode (and the dir data before we write the dir inode) so we don’t end up with a pointer to garbage! This means sorting, which causes a delay between the writes. Details here LOOK algorithm (aka scan)

The key idea behind a log-structured file system is that it enables us to do single sequential (and potentially asynchronous) writes instead of a bunch of small writes that require the above described ordering and seeking.

LFS is good for :

reads (after the first time, which might be slow) - because after the first read, the file is all in cache (because large buffer caches!)
spatial and temporal locality
write speed
no seeks
aggregate writes for inodes and data
recover
- log is always consistent (never have orphaned pointers)
- might not be up to date

We don’t really use log-structured file systems today as they were imagined in this paper, though we will see echoes of this in other papers we read in class. See Journaling file system.

Potential problems:

file fragmentation - versioning the entire file; every single version is going to exist on the disk someplace; that’s why cleaning is needed

Follow-on work by Seltzer

compared more modern FFS to LFS. FFS has:
- bigger block size
- clustering (when I create a directory, creates some space for files and inodes)
- rotational-latency-aware file layout

Results

LFS is an order of magnitude faster on small file crates and deletes.
The systems are comparable on creates of large files (1 half megabyte or more)
The systems are comparable on reads of file less than 64KB
LFS read performance is superior between 64KB and 4MB, after which FFS is comparable
LFS write performance is superior for files of 256KB or less
FFS write performance is superior for files larger than 256KB

Comments

most interesting is crash recovery (don’t have to scan entire disk!)
how to deal with SSD’s? this approach is still somewhat useful because with flash RAM is easy, but writes are slow, have to chunk things up. Concerns about wear leveling. That means that log-structure is useful for SSD.
turn this immutable by skipping the cleaner!

Low-Bandwidth Networked File System

Key ideas

chunkifying the files
rabin fingerprints (overlapping 48-byte regions?)
- hashes would be too slow
server chunk database
approach used first by Spring and Wetherall

Problems

what about privacy? encryption randomizes everything
last write wins (atomic updates - close-to-open consistency) seems bad; but they were just adopting the standard at the time

CMSC 818e Day 3

September 05, 2018