CS161: Notes on file systems

| -Uncategorized

I might not be able to give a lecture on filesystems this Friday (or
it might whiz past with my mile-a-minute speech), so here are some
detailed lecture notes on file systems to help you review for the
final exams.

File systems. You know about file systems because that’s one of the
most obvious things about your computer. You know that your documents
are hidden somewhere on your hard disk. You know that programs are
stored in another directory. You know that you can make your own
directories to organize your programs and your data.

Why are file systems important? Well, remember what happens during
brown-outs or power fluctuations? If the power goes off and then on
again in a computer lab without uninterruptible power supplies
(UPSes), people will lose all their unsaved work. That’s because the
kind of memory we use loses its data if it doesn’t have power. Hard
disks, on the other hand, keep data even when the computer is off – so
save often!

Attributes

Let’s take a look at the data associated with files. Different
operating systems keep track of different things, but here’s a short
list.

– Name: The files in a directory should have unique names. This lets

us refer to that file.

– Type of file: What kind of file is it? Is it an image? A text

document? A webpage? This could help the operating system determine
the best way to open this file.

– Location: Remember our discussion about sectors, tracks, and

cylinders? Where is the file on the hard disk? In terms of the
logical filesystem, in what directory is the file?

– Size: How large is the file? How much data is in it right now?

– Protection: Can other people edit this file? run this file? read

this file? Who can work on this file?

– Time/date: When was it last modified? (Sometimes the operating

system stores more information, like creation time and access time)

– User identification: Who owns this file? (Windows 98 and below

don’t care about this.)

Operations

– Create, write, read, delete: Makes sense.

– Reposition within a file: When you’re reading a book and you want

to flip back to a certain place, you don’t have to close the book
and then open it again. You can just go to a certain position. Same
with files. Some files will let you easily jump to a specified
position within the file. On the other hand, some files will only
let you go forward and not back.

– Truncate: Get rid of everything after a certain position.

Access methods

– Sequential: You can only access it going forward. You can’t go back.

To read the 7th line in the file, you have to read the first 6
lines.

– Direct: You can jump around. This is also known as a random-access

file.

– Indexed: This is like the way a dictionary works. You can jump

around, but some places (like the beginning of entries for each
letter) are marked so that you can jump to them easily.

Partitions

Your hard disk is divided into partitions (PC term) or volumes (Mac
term). These are the “drives” you see on Windows. You can split up one
hard disk into a C: drive and a D: drive. If you learn about something
called RAID, you can combine two hard disks into just one drive.
(“Drive” is a confusing term, though, as you can have C:, D:, E:, F:,
etc. on just one hard disk.)

Directories

Single-level directory

Long, long ago, MSDOS didn’t have directories. Seriously. Well, there
was one directory, and all of your files had to be under it. You can
imagine how this sucked, as all of your program files had to be in the
same place as your data. Not only that, you were limited to 8
characters for the filename and 3 characters for the extension.

People used clever names like AAAAAAAA.TXT, AAAAAAAB.TXT and
AAAAAAAC.TXT for their files. Of course, after six months, who could
remember what the contents of each file were?

The diagram shows a single-level directory. The directory entries
are “cat”, “bo”, “a”, “test”, “data”, “mail”, “cont”, “hex” and
“records”. All of these entries point to files.

Two-level directory

Now we got to organize them by user, at least. Still sucked because
all of your files were in just one directory, but at least you didn’t
have to worry about other people’s naming schemes.

Tree-structured directory

This is the kind of directory tree you got used to in Microsoft
Windows. You can create directories (aka folders) inside directories
inside directories inside directories.

Links aren’t actually real directories. In Microsoft Windows, they’re
fake – for example, you can’t cd into them from the command line.

Acyclic graph directory

Remember the ln command from Unix? This is where links in Unix come
in. (They’re _real_ links, not like the fake ones in Microsoft
Windows. =) )

ln somefile anotherfile

creates a link: anotherfile will refer to the exact same file that
somefile refers to on the hard disk. They point to the same place on
the hard disk.

This allows you to have acyclic graph directories, because the same
file is referred to in two or more places.

General graph directory

ln returns! This time, we use it to make a symbolic link.

ln somefileordir anotherfileordir -s

Symbolic links can point to directories, so it’s perfectly acceptable
to make a link that points to one of your parent directories and thus
get into some kind of loop.

Basically, a general graph directory is anything that could have
these loops.

File protection

You don’t want just anyone messing around with your files. Remember
Unix file permissions and chmod? This slide talks about some of those
permissions, although the access groups the slide uses are different
from Unix permissions. Under Unix, it’s user, group, others. Other
operating systems support access control lists (ACLs) – this means
that instead of just giving permission to one group, you can specify
exactly who gets to do what to the file.

Allocation methods

Here we start looking at how things are physically stored on your
hard disk. You can start up defrag to get an idea of what it looks
like.

Contiguous allocation

It’s like contiguous allocation for memory. All the space the file
needs should be in one continuous block. The nice thing about it is
that it’s easy to figure out where all the data is – just find the
starting position and count off so and so many bytes. If you need to
allocate space for a new file, try either first-fit or best-fit.
Downside of this is that file sizes are fixed, because once it’s been
allocated and another file has been allocated next to it, the file
can’t grow.

Linked allocation

The file is treated as a list of blocks, where a block is a fixed-size
contiguous collection of bytes on the hard disk. The blocks don’t all
have to be together on the hard disk – each block in the file points
to the next one. Plus side: files can grow, just link in new blocks.
Minus: To read the file, you have to hop around the hard disk, plus
all of that pointing around wastes space because each block has to
refer to the next one. Solution: use groups of blocks (aka clusters),
but these might be bigger than you need – if so, then space is wasted.

File allocation table

Form of linked allocation. The links to the next block are kept in
one large table in a certain place in the hard disk.

Indexed allocation

Still uses blocks, but instead of each block pointing to the next one,
one block has an index that points to all of the blocks. This block
is called the inode (index node) under Unix. If the file is too big
for one index block, the index block refers to other index blocks.

Free-space management

How your computer can tell how much space you have free.

– Bit vector/map: keep track of every single block! Requires 1 bit per

block (1 if it’s occupied, 0 if it’s free).

– Linked list: Like the linked allocation scheme (one block points to

the next one) except this keeps track of the free ones.

– Grouping: The first free block has an index of free blocks (as many

as it can). If there are more free blocks than will fit in the
index, it just points to another index block.

– Counting: Keep track of the beginning of each set of free blocks

and how many free blocks there are.

Directory implementation

How directories work. That is – how do directories store info about
files within them?

One way is to just keep a list of all the filenames in that directory.
If you have a million files (and it’s happened!), this can get
_really_ slow.

Hashtables are supposed to be faster at lookup, so naturally there’s
a way to use them too.

Consistency checking

Remember the scandisk that shows up when you improperly shutdown your
computer? This is what’s happening. Your computer’s checking if what
it thinks your filesystem should be like is different from what it
actually is.

You can comment with Disqus or you can e-mail me at sacha@sachachua.com.