6085 comments
2357 subscribers
6224 on Twitter
Subscribe! Feed reader E-mail

CS161: Notes on file systems

I might not be able to give a lecture on filesystems this Friday (or
it might whiz past with my mile-a-minute speech), so here are some
detailed lecture notes on file systems to help you review for the
final exams.

File systems. You know about file systems because that’s one of the
most obvious things about your computer. You know that your documents
are hidden somewhere on your hard disk. You know that programs are
stored in another directory. You know that you can make your own
directories to organize your programs and your data.

Why are file systems important? Well, remember what happens during
brown-outs or power fluctuations? If the power goes off and then on
again in a computer lab without uninterruptible power supplies
(UPSes), people will lose all their unsaved work. That’s because the
kind of memory we use loses its data if it doesn’t have power. Hard
disks, on the other hand, keep data even when the computer is off – so
save often!

Attributes

Let’s take a look at the data associated with files. Different
operating systems keep track of different things, but here’s a short
list.

- Name: The files in a directory should have unique names. This lets

us refer to that file.

- Type of file: What kind of file is it? Is it an image? A text

document? A webpage? This could help the operating system determine
the best way to open this file.

- Location: Remember our discussion about sectors, tracks, and

cylinders? Where is the file on the hard disk? In terms of the
logical filesystem, in what directory is the file?

- Size: How large is the file? How much data is in it right now?

- Protection: Can other people edit this file? run this file? read

this file? Who can work on this file?

- Time/date: When was it last modified? (Sometimes the operating

system stores more information, like creation time and access time)

- User identification: Who owns this file? (Windows 98 and below

don’t care about this.)

Operations

- Create, write, read, delete: Makes sense.

- Reposition within a file: When you’re reading a book and you want

to flip back to a certain place, you don’t have to close the book
and then open it again. You can just go to a certain position. Same
with files. Some files will let you easily jump to a specified
position within the file. On the other hand, some files will only
let you go forward and not back.

- Truncate: Get rid of everything after a certain position.

Access methods

- Sequential: You can only access it going forward. You can’t go back.

To read the 7th line in the file, you have to read the first 6
lines.

- Direct: You can jump around. This is also known as a random-access

file.

- Indexed: This is like the way a dictionary works. You can jump

around, but some places (like the beginning of entries for each
letter) are marked so that you can jump to them easily.

Partitions

Your hard disk is divided into partitions (PC term) or volumes (Mac
term). These are the “drives” you see on Windows. You can split up one
hard disk into a C: drive and a D: drive. If you learn about something
called RAID, you can combine two hard disks into just one drive.
(“Drive” is a confusing term, though, as you can have C:, D:, E:, F:,
etc. on just one hard disk.)

Directories

Single-level directory

Long, long ago, MSDOS didn’t have directories. Seriously. Well, there
was one directory, and all of your files had to be under it. You can
imagine how this sucked, as all of your program files had to be in the
same place as your data. Not only that, you were limited to 8
characters for the filename and 3 characters for the extension.

People used clever names like AAAAAAAA.TXT, AAAAAAAB.TXT and
AAAAAAAC.TXT for their files. Of course, after six months, who could
remember what the contents of each file were?

The diagram shows a single-level directory. The directory entries
are “cat”, “bo”, “a”, “test”, “data”, “mail”, “cont”, “hex” and
“records”. All of these entries point to files.

Two-level directory

Now we got to organize them by user, at least. Still sucked because
all of your files were in just one directory, but at least you didn’t
have to worry about other people’s naming schemes.

Tree-structured directory

This is the kind of directory tree you got used to in Microsoft
Windows. You can create directories (aka folders) inside directories
inside directories inside directories.

Links aren’t actually real directories. In Microsoft Windows, they’re
fake – for example, you can’t cd into them from the command line.

Acyclic graph directory

Remember the ln command from Unix? This is where links in Unix come
in. (They’re _real_ links, not like the fake ones in Microsoft
Windows. =) )

ln somefile anotherfile

creates a link: anotherfile will refer to the exact same file that
somefile refers to on the hard disk. They point to the same place on
the hard disk.

This allows you to have acyclic graph directories, because the same
file is referred to in two or more places.

General graph directory

ln returns! This time, we use it to make a symbolic link.

ln somefileordir anotherfileordir -s

Symbolic links can point to directories, so it’s perfectly acceptable
to make a link that points to one of your parent directories and thus
get into some kind of loop.

Basically, a general graph directory is anything that could have
these loops.

File protection

You don’t want just anyone messing around with your files. Remember
Unix file permissions and chmod? This slide talks about some of those
permissions, although the access groups the slide uses are different
from Unix permissions. Under Unix, it’s user, group, others. Other
operating systems support access control lists (ACLs) – this means
that instead of just giving permission to one group, you can specify
exactly who gets to do what to the file.

Allocation methods

Here we start looking at how things are physically stored on your
hard disk. You can start up defrag to get an idea of what it looks
like.

Contiguous allocation

It’s like contiguous allocation for memory. All the space the file
needs should be in one continuous block. The nice thing about it is
that it’s easy to figure out where all the data is – just find the
starting position and count off so and so many bytes. If you need to
allocate space for a new file, try either first-fit or best-fit.
Downside of this is that file sizes are fixed, because once it’s been
allocated and another file has been allocated next to it, the file
can’t grow.

Linked allocation

The file is treated as a list of blocks, where a block is a fixed-size
contiguous collection of bytes on the hard disk. The blocks don’t all
have to be together on the hard disk – each block in the file points
to the next one. Plus side: files can grow, just link in new blocks.
Minus: To read the file, you have to hop around the hard disk, plus
all of that pointing around wastes space because each block has to
refer to the next one. Solution: use groups of blocks (aka clusters),
but these might be bigger than you need – if so, then space is wasted.

File allocation table

Form of linked allocation. The links to the next block are kept in
one large table in a certain place in the hard disk.

Indexed allocation

Still uses blocks, but instead of each block pointing to the next one,
one block has an index that points to all of the blocks. This block
is called the inode (index node) under Unix. If the file is too big
for one index block, the index block refers to other index blocks.

Free-space management

How your computer can tell how much space you have free.

- Bit vector/map: keep track of every single block! Requires 1 bit per

block (1 if it’s occupied, 0 if it’s free).

- Linked list: Like the linked allocation scheme (one block points to

the next one) except this keeps track of the free ones.

- Grouping: The first free block has an index of free blocks (as many

as it can). If there are more free blocks than will fit in the
index, it just points to another index block.

- Counting: Keep track of the beginning of each set of free blocks

and how many free blocks there are.

Directory implementation

How directories work. That is – how do directories store info about
files within them?

One way is to just keep a list of all the filenames in that directory.
If you have a million files (and it’s happened!), this can get
_really_ slow.

Hashtables are supposed to be faster at lookup, so naturally there’s
a way to use them too.

Consistency checking

Remember the scandisk that shows up when you improperly shutdown your
computer? This is what’s happening. Your computer’s checking if what
it thinks your filesystem should be like is different from what it
actually is.

Short URL: http://sachachua.com/blog/p/1730

On This Day...

  • 2013: Transcript of my chat with Mike Rohde (The Sketchnote Podcast) on digital sketchnoting — I talked to Mike Rohde about digital sketchnoting and my workflow. You can watch the podcast and check out other [...]
  • 2012: It’s okay to not know — “Congratulations! What’s your new business about?” “What will you be working on?” “So, what do you do?” I don’t [...]
  • 2011: Waking up: looking at my data — Whenever I manage to wake up early a few days in a row, I feel great about it. But I [...]
  • 2010: Patternicity, how things come together, and happiness — I’m fascinated by how things come together. When we look back, we weave almost-random elements of our lives into a [...]
  • 2009: Scaffolds and structures — I often talk about leverage and scale: creating as much value as I can for as many people as I [...]
  • 2008: You have received a painting from Sacha
  • 2007: At the social media and public relations event — I attended the Social Media and PR meetup at the Bier Market. It was great meeting people from the public [...]
  • 2007: Much progress! — I was stressing out about hooking up the back end of my metasearch engine to the front end that I’d prototyped [...]
  • 2007: Edgy — I think I left my phone in my room. At least I hope it did – my desk is a far [...]
  • 2005: Karaoke with Dave and Tony — Did most of my packing today. The big suitcase is jammed with souvenirs and the little suitcase will contain electronics. I’m shipping [...]
  • 2005: Travel plans — TKC to Kita Senju Kita Senju to Nippori (arrive by 7:00) Nippori (Keisei Line Sky Liner, 7:07) to Narita (arrive by 7:59) On [...]
  • 2005: Arrange for luggage pick-up by on the 23rd or 22nd — 1950 yen charge for 2nd piece of luggage Pickup from AOTS: 9 – 12 on the 23rd Pickup in airport: 3rd floor, [...]
  • 2004: Code for Nethack screenshots in Emacs — (defvar nethack-screenshot-file "~/.nethack-notes" "Filename to store Nethack data in.") (defun sacha/nethack-take-screenshot (caption) (interactive "MCaption: ") (save-window-excursion [...]
  • 2004: CS161 announcements — Yes, there are exemptions: 90 and above. This includes the grades from the projects. The list of exempted people will be [...]
  • 2004: Stuff Eric should check out — - Alain Chesnais (a-LAHN she-NAY) - YT Lee, head of SEAGRAPH - Barbara, in charge of organizing conferences (sponsored and in cooperation [...]
  • 2004: Chapter 2, “Editing and Navigating Java Source Code.” — Link from mparaz on pinoyjug@yahoogroups.com: http://www.sourcebeat.com/docs/Eclipse%20Live/Rev_1/Eclipse%20Live_SampleChapter.pdf
  • 2004: Instructions for wearable chording keyboards — On wear-hard@haven.org, John McKown said: http://chordite.com where you can download free instructions on how to build a wearable, chording keyboard. E-Mail from John McKown
  • 2004: eBay service — I keep getting Dear eBay Member, Dear customer, you have been billed for $15.00 recently. Please update your billing information at eBay Billing [...]
  • 2004: meetup.com — Jeremy Hogan said on open-source-now-list@redhat.com: Many of you have probably already heard of Meetup.com due to its prominence in the Democratic primaries. [...]
  • 2004: CS21A: ArrayList — Quick review: arrays Arrays are a neat way to store a fixed number of items. You can declare and create arrays and [...]
  • 2003: blogging — I’ve been reading up on blogging, since I want to get a firm grip on the kind of software I’d like. [...]

Get the highlights as a PDF!

Stories from my Twenties: Highlights from a Decade of Blogging