Section notes for CS162 Section #11, April 16, 2002 Barbara Hohlt - Announcments - Worked HW 3, Problem 3 - Address Spaces - Worked HW 4, Problems 1 & 2 - UNIX Filesystem --------------------------------------------------------------------------- NOTES ON BSD 4.3 FILE SYSTEM Matt Welsh --------------------------------------------------------------------------- --------------------------------------------------------------------------- 4.3BSD File System [Leffler, McKusick, Karels, Quarterman 89] - Classic UNIX filesystem: Understand this and you understand pretty much everything you need to know :-) --------------------------------------------------------------------------- inode contents: mode (permission bits, 10: trwxrwxrwx) owners (user and group) timestamps (access, creation, modification) size of file in bytes block count reference count (number of 'directory links' to file) 12 direct block pointers single indirect block pointer double indirect block pointer triple indirect block pointer Block size is 4096 bytes (can also be 8 KB, 16 KB, etc...) -> 12*4096 = 48 KB in direct blocks -> Indirect block stores 1024 32-bit block ptrs, so 4 MB in indirect block -> Double indirect block stores 1024*1024 block pointers, so 4 GB -> Triple indirect block stores 1024*1024*1024 block poinrts, so 4 TB Total size: 4 TB + 4 GB + 4 MB + 48 KB -> Note that triple-indirect block not really used and file size limited by 32-bit size word, so actually 4 GB in practice Allocate blocks together in a CYLINDER GROUP -> Cylinder group consists of one or more consecutive cylinders on a disk -> Each cylinder group stores redundant copy of the SUPERBLOCK (static and vital data describing the filesystem), space for inodes, and a bitmap describing the available blocks in the cylinder group -> Static number of inodes per cylinder group: one inode for each 2 KB of data space -> fs attempts to allocate blocks for a given file from the same cylinder group, to reduce seeks -> Bookkeeping information staggered across cylinder group to prevent critical information from being on same platter of disk Problem with large block sizes: INTERNAL FRAGMENTATION -> Much wasted space if files smaller than entire block, and for files which are not an exact number of blocks in size -> Solution: Divide each block into FRAGMENTS and allow these to be allocated/addressed separately -> e.g., split 4 KB block into 4 1KB fragments -> Files are then 0 or more complete blocks plus at most one "fragmented" block -- optimize data structures to maintain fragement information only for last block in file -> Degree of internal fragmentation about the same as fs with 1 KB blocks -> Reserved space: Reserving 10% of disk space allows free blocks to always be found. If we allow the disk to really fill up then kernel spends a lot of time trying to find free blocks. So, tell user that disk is full when 10% is still free (superuser can still use up remaining 10% for system maintenance). Idea is that performance should still be good until we hit the reserved-space "wall", rather than performance dropping off terribly near the end. Rotationally-optimal block layout -> On modern disks with a built-in processor, CPU can instruct disk to read a set of consecutive sectors and will get just one interrupt when all data has been read -> On older disk, CPU had to instruct disk to read each sector separately, and was interrupted after every sector -> If the fs lays blocks out in consecutive sectors on a track, by the time the processor can handle the interrupt and generate the I/O request for the next sector, it's no longer under the disk head -> So, 4.3BSD tries to allocate blocks in a rotaionally-optimal fashion -> Based on measurement (heuristic?) of time required to handle disk interrupt, speed of disk rotation, CPU speed, etc. -> My guess is that it was a hack... -> Allow sectors for consecutive blocks to be non-consecutive -> Problem: Upgrading CPU, moving fs to another system, etc. throws off this calculation -> Also, sectors closer to the spindle of the disk rotate (wrt the disk head) more slowly than those on the outer edge of the platter -> Not sure if or how this mechanism accounts for this Layout policies -> Want to allocate inodes for files in a directory on the same cylinder group -> New directories placed in cylinder group with greater than avg number of free inodes, so this policy can succeed -> Block layout policies: Don't want large files to consume the entire cylinder group (other files in same cylinder group then negatively impacted). -> So, after allocating 1 MB of space, force allocation in a new cylinder group -> Again, kind of a hack ("heuristic" is the nice word) -> When allocating a new block, if the current cylinder group is full, take quadratic hash of current cylinder group number