Section notes for CS162 
Section #11, April 16, 2002
Barbara Hohlt <hohltb@cs.berkeley.edu>

- Announcments

- Worked HW 3, Problem 3 - Address Spaces
- Worked HW 4, Problems 1 & 2 - UNIX Filesystem

---------------------------------------------------------------------------
 NOTES ON BSD 4.3 FILE SYSTEM 
 Matt Welsh <mdw@cs.berkeley.edu>
---------------------------------------------------------------------------

---------------------------------------------------------------------------
4.3BSD File System [Leffler, McKusick, Karels, Quarterman 89]
  - Classic UNIX filesystem: Understand this and you understand
    pretty much everything you need to know :-)
---------------------------------------------------------------------------

inode contents:
  mode (permission bits, 10: trwxrwxrwx)
  owners (user and group)
  timestamps (access, creation, modification)
  size of file in bytes
  block count
  reference count (number of 'directory links' to file)
  12 direct block pointers
  single indirect block pointer
  double indirect block pointer
  triple indirect block pointer

Block size is 4096 bytes (can also be 8 KB, 16 KB, etc...)
  -> 12*4096 = 48 KB in direct blocks
  -> Indirect block stores 1024 32-bit block ptrs, so 4 MB in indirect block
  -> Double indirect block stores 1024*1024 block pointers, so 4 GB 
  -> Triple indirect block stores 1024*1024*1024 block poinrts, so 4 TB
     Total size: 4 TB + 4 GB  + 4 MB + 48 KB
  -> Note that triple-indirect block not really used and file size limited
     by 32-bit size word, so actually 4 GB in practice

Allocate blocks together in a CYLINDER GROUP
  -> Cylinder group consists of one or more consecutive cylinders
     on a disk
  -> Each cylinder group stores redundant copy of the SUPERBLOCK
     (static and vital data describing the filesystem), space for
     inodes, and a bitmap describing the available blocks in the
     cylinder group
  -> Static number of inodes per cylinder group: one inode for each
     2 KB of data space
  -> fs attempts to allocate blocks for a given file from the same
     cylinder group, to reduce seeks
  -> Bookkeeping information staggered across cylinder group to 
     prevent critical information from being on same platter of disk

Problem with large block sizes: INTERNAL FRAGMENTATION
  -> Much wasted space if files smaller than entire block, and for
     files which are not an exact number of blocks in size
  -> Solution: Divide each block into FRAGMENTS and allow these
     to be allocated/addressed separately
  -> e.g., split 4 KB block into 4 1KB fragments
  -> Files are then 0 or more complete blocks plus at most one 
     "fragmented" block -- optimize data structures to maintain
     fragement information only for last block in file
  -> Degree of internal fragmentation about the same as fs with
     1 KB blocks
  -> Reserved space: Reserving 10% of disk space allows free blocks
     to always be found. If we allow the disk to really fill up then
     kernel spends a lot of time trying to find free blocks. So, 
     tell user that disk is full when 10% is still free (superuser can still
     use up remaining 10% for system maintenance). Idea is that performance
     should still be good until we hit the reserved-space "wall", rather
     than performance dropping off terribly near the end.

Rotationally-optimal block layout
  -> On modern disks with a built-in processor, CPU can instruct disk to
     read a set of consecutive sectors and will get just one interrupt
     when all data has been read
  -> On older disk, CPU had to instruct disk to read each sector separately,
     and was interrupted after every sector
  -> If the fs lays blocks out in consecutive sectors on a track, by the
     time the processor can handle the interrupt and generate the I/O
     request for the next sector, it's no longer under the disk head
  -> So, 4.3BSD tries to allocate blocks in a rotaionally-optimal fashion
  -> Based on measurement (heuristic?) of time required to handle disk
     interrupt, speed of disk rotation, CPU speed, etc.
     -> My guess is that it was a hack...
  -> Allow sectors for consecutive blocks to be non-consecutive
  -> Problem: Upgrading CPU, moving fs to another system, etc. throws off
     this calculation
  -> Also, sectors closer to the spindle of the disk rotate (wrt the disk
     head) more slowly than those on the outer edge of the platter
     -> Not sure if or how this mechanism accounts for this

Layout policies
  -> Want to allocate inodes for files in a directory on the same cylinder
     group
  -> New directories placed in cylinder group with greater than avg number of
     free inodes, so this policy can succeed
  -> Block layout policies: Don't want large files to consume the entire
     cylinder group (other files in same cylinder group then negatively
     impacted).
  -> So, after allocating 1 MB of space, force allocation in a new cylinder
     group
     -> Again, kind of a hack ("heuristic" is the nice word)
  -> When allocating a new block, if the current cylinder group is full,
     take quadratic hash of current cylinder group number