CS161: Notes on file systems

I might not be able to give a lecture on filesystems this Friday (or it might whiz past with my mile-a-minute speech), so here are some detailed lecture notes on file systems to help you review for the final exams.

File systems. You know about file systems because that's one of the most obvious things about your computer. You know that your documents are hidden somewhere on your hard disk. You know that programs are stored in another directory. You know that you can make your own directories to organize your programs and your data.

Why are file systems important? Well, remember what happens during brown-outs or power fluctuations? If the power goes off and then on again in a computer lab without uninterruptible power supplies (UPSes), people will lose all their unsaved work. That's because the kind of memory we use loses its data if it doesn't have power. Hard disks, on the other hand, keep data even when the computer is off - so save often!


Let's take a look at the data associated with files. Different operating systems keep track of different things, but here's a short list.

- Name: The files in a directory should have unique names. This lets

us refer to that file.

- Type of file: What kind of file is it? Is it an image? A text

document? A webpage? This could help the operating system determine the best way to open this file.

- Location: Remember our discussion about sectors, tracks, and

cylinders? Where is the file on the hard disk? In terms of the logical filesystem, in what directory is the file?

- Size: How large is the file? How much data is in it right now?

- Protection: Can other people edit this file? run this file? read

this file? Who can work on this file?

- Time/date: When was it last modified? (Sometimes the operating

system stores more information, like creation time and access time)

- User identification: Who owns this file? (Windows 98 and below

don't care about this.)


- Create, write, read, delete: Makes sense.

- Reposition within a file: When you're reading a book and you want

to flip back to a certain place, you don't have to close the book and then open it again. You can just go to a certain position. Same with files. Some files will let you easily jump to a specified position within the file. On the other hand, some files will only let you go forward and not back.

- Truncate: Get rid of everything after a certain position.

Access methods

- Sequential: You can only access it going forward. You can't go back.

To read the 7th line in the file, you have to read the first 6 lines.

- Direct: You can jump around. This is also known as a random-access


- Indexed: This is like the way a dictionary works. You can jump

around, but some places (like the beginning of entries for each letter) are marked so that you can jump to them easily.


Your hard disk is divided into partitions (PC term) or volumes (Mac term). These are the "drives" you see on Windows. You can split up one hard disk into a C: drive and a D: drive. If you learn about something called RAID, you can combine two hard disks into just one drive. ("Drive" is a confusing term, though, as you can have C:, D:, E:, F:, etc. on just one hard disk.)

Single-level directory

Long, long ago, MSDOS didn't have directories. Seriously. Well, there was one directory, and all of your files had to be under it. You can imagine how this sucked, as all of your program files had to be in the same place as your data. Not only that, you were limited to 8 characters for the filename and 3 characters for the extension.

People used clever names like AAAAAAAA.TXT, AAAAAAAB.TXT and AAAAAAAC.TXT for their files. Of course, after six months, who could remember what the contents of each file were?

The diagram shows a single-level directory. The directory entries are "cat", "bo", "a", "test", "data", "mail", "cont", "hex" and "records". All of these entries point to files.

Two-level directory

Now we got to organize them by user, at least. Still sucked because all of your files were in just one directory, but at least you didn't have to worry about other people's naming schemes.

Tree-structured directory

This is the kind of directory tree you got used to in Microsoft Windows. You can create directories (aka folders) inside directories inside directories inside directories.

Links aren't actually real directories. In Microsoft Windows, they're fake - for example, you can't cd into them from the command line.

Acyclic graph directory

Remember the ln command from Unix? This is where links in Unix come in. (They're _real_ links, not like the fake ones in Microsoft Windows. =) )

ln somefile anotherfile

creates a link: anotherfile will refer to the exact same file that somefile refers to on the hard disk. They point to the same place on the hard disk.

This allows you to have acyclic graph directories, because the same file is referred to in two or more places.

General graph directory

ln returns! This time, we use it to make a symbolic link.

ln somefileordir anotherfileordir -s

Symbolic links can point to directories, so it's perfectly acceptable to make a link that points to one of your parent directories and thus get into some kind of loop.

Basically, a general graph directory is anything that could have these loops.

File protection

You don't want just anyone messing around with your files. Remember Unix file permissions and chmod? This slide talks about some of those permissions, although the access groups the slide uses are different from Unix permissions. Under Unix, it's user, group, others. Other operating systems support access control lists (ACLs) - this means that instead of just giving permission to one group, you can specify exactly who gets to do what to the file.

Allocation methods

Here we start looking at how things are physically stored on your hard disk. You can start up defrag to get an idea of what it looks like.

Contiguous allocation

It's like contiguous allocation for memory. All the space the file needs should be in one continuous block. The nice thing about it is that it's easy to figure out where all the data is - just find the starting position and count off so and so many bytes. If you need to allocate space for a new file, try either first-fit or best-fit. Downside of this is that file sizes are fixed, because once it's been allocated and another file has been allocated next to it, the file can't grow.

Linked allocation

The file is treated as a list of blocks, where a block is a fixed-size contiguous collection of bytes on the hard disk. The blocks don't all have to be together on the hard disk - each block in the file points to the next one. Plus side: files can grow, just link in new blocks. Minus: To read the file, you have to hop around the hard disk, plus all of that pointing around wastes space because each block has to refer to the next one. Solution: use groups of blocks (aka clusters), but these might be bigger than you need - if so, then space is wasted.

File allocation table

Form of linked allocation. The links to the next block are kept in one large table in a certain place in the hard disk.

Indexed allocation

Still uses blocks, but instead of each block pointing to the next one, one block has an index that points to all of the blocks. This block is called the inode (index node) under Unix. If the file is too big for one index block, the index block refers to other index blocks.

Free-space management

How your computer can tell how much space you have free.

- Bit vector/map: keep track of every single block! Requires 1 bit per

block (1 if it's occupied, 0 if it's free).

- Linked list: Like the linked allocation scheme (one block points to

the next one) except this keeps track of the free ones.

- Grouping: The first free block has an index of free blocks (as many

as it can). If there are more free blocks than will fit in the index, it just points to another index block.

- Counting: Keep track of the beginning of each set of free blocks

and how many free blocks there are.

Directory implementation

How directories work. That is - how do directories store info about files within them?

One way is to just keep a list of all the filenames in that directory. If you have a million files (and it's happened!), this can get _really_ slow.

Hashtables are supposed to be faster at lookup, so naturally there's a way to use them too.

Consistency checking

Remember the scandisk that shows up when you improperly shutdown your computer? This is what's happening. Your computer's checking if what it thinks your filesystem should be like is different from what it actually is.