OS - 12. File-System Implementation－nix

    這是我自己讀OPERATING SYSTEM CONCEPTS (6th Edition)的筆記，本來讀的進度很緩慢，直到有一天在網路上看到冼鏡光老師分享上課的slide(OS第7版)，唸起OS就愈來有趣啦。
    筆記的章節是依OS第6版整理的，圖片則都是從冼鏡光老師的slide貼上來的，感謝冼老師無私的分享，讓我有機會一窺OS的奧妙。
    (如有著作權相關問題，請惠予通知)

12. File-System Implementation

12.1 File-System Structure

The lowest level, the I/O control, consists of device drivers and interrupt handlers to transfer information between the main memory and the disk system.
A device driver can be thought of as a translator. Its input consists of high-level commands such as “retrieve block 123”. Its input consists of low-level, hardware specific instructions that are used by the hardware controller, which interfaces the I/O device to the rest of the system. The device driver usually writes specific bit patterns to special locations in the I/O controller’s memory to tell the controller on which device location to act and what actions to take.
The logical file system manages metadata information. Metadata includes all of the file-system structure, excluding the actual data (or contents of the files).
The logical file system manages the directory structure to provide the file-organization module with the information such as symbolic file name. It maintains file structure via file control blocks.

12.2 File-System Implementation

12.2.1          Overview
A file system has on-disk and in-memory information.
A disk may contain the following for implementing a file system on it:
A boot control block per volume
A partition control block per volume
A directory structure per file system
A file control block per file
In-memory information include
An in-memory partition table
An in-memory directory structure
The system-wide open-file table
The per-process open-file table
A FCB, file control block, contains the details of a file.
In UNIX, a FCB is called an i-node.

12.2.2          Partitions and Mounting
Raw disk is used where no file system is appropriate.
As part of a successful mount operation, the OS verifies that the device contains a valid file system. It does so by asking the device driver to read the device directory and verifying that the directory has the expected format.
12.2.3          Virtual File Systems
Data structure and procedures are used to isolate the basic system call functionality form the implementation details.
The first layer is the file-system interface, based on the open (), read (), write (), and close () calls, and the file descriptors.
The second layer is called the Virtual File System (VFS) layer. It separates file-system-generic operations from their implementation by defining a clean VFS interface.
The VFS is based on a file-representation structure, called a vnode that contains a numerical designator for a network-wide unique file.
The VFS activates file-system-specific operations to handle local requests according to their file-system types, and even calls the NFS protocol procedures for remote requests.

12.3 Directory Implementation

A file directory is usually implemented as a linked-list or a tree (e.g., B-tree), a hash table with chaining, or the combination of both.
The problem with the linked-list approach is the poor performance in searching for a file. Directory search is a very frequently performed operation.
A directory is simply a file!
A directory entry may be very simple like the one used by MS-DOS. Or it may be quite complex like the UNIX i-node.

12.3.1 Linear List
The simplest method of implementing a directory is to use a linear list of file names with pointers to the data blocks.
A linear list of directory entries requires a linear search to find a particular entry.
12.3.2 Hash Table
The hash table approach speeds up search; however, we must deal with the problem of collisions. Thus, chaining is necessary.
A chained-overflow hash table can be used; each hash entry can be a linked list instead of an individual value.

12.4 Allocation Methods

12.4.1 Contiguous Allocation
With the contiguous allocation method, a user must indicate the file size before creating the file.
Then, the OS searches the disk to find contiguous disk blocks for the file.
The directory entry is easy. It contains the initial disk address of this file and the number of disk blocks.
Therefore, if the initial address is b and the number of blocks is n, the file will occupy blocks b, b+1, b+2… b+n-1.

Contiguous allocation is easy to implement.
Its disadvantages are
It can be considered as a form of dynamic memory allocation, and external fragmentation may occur and compaction may be needed.
It is difficult to estimate the file size. The size of a file may grow at run time and may be larger than the specified number of allocated blocks. In this case, the OS must move the blocks in order to provide mode space. In some systems, this is simply an error.

12.4.2          Linked Allocation
With the linked allocation approach, disk blocks of a file are chained together with a linked-list.
The directory entry of a file contains a pointer to the first block and a pointer to the last block. Append is difficult to do without the last pointer.
To create a file, we create a new directory entry and the pointers are initialized to nil.
When a write occurs, a new disk block is allocated and chained to the end of the list.
Advantages:
File size does not have to be specified.
No external fragmentation.
Disadvantages:
It does sequential access efficiently and is not for direct access.
Each block contains a pointer, wasting space (use cluster instead).
Blocks scatter everywhere and a large number of disk seeks may be necessary.
Reliability: what if a pointer is lost or damaged?
File Allocation Table (FAT): This is a variation of the linked allocation by pulling all pointers into a table.
A section of disk at the beginning of each partition is set aside to contain the table. The table has one entry for each disk block, and is indexed by block number.
The directory entry contains the block number of the first block of the file.
Large no. of disk seeks.
Can do direct access.
FAT needs space.
The left diagram shows file test has its first block at 217, followed by 618, 339 (end of file).

12.4.3          Indexed Allocation
Each file has an index block that is an array of disk block addresses.
The i-th entry in the index block points to the i-th block of the file.
A file’s directory entry contains a pointer to its index. Hence, the index block of an indexed allocation plays the same role as the page table.
Index allocation supports both sequential and direct access without external fragmentation.

The indexed allocation suffers from wasted space. The index block may not be fully used (i.e., internal fragmentation).
The number of entries of an index table determines the size of a file. To overcome this problem, we can
Linked scheme: Have multiple index blocks and chain them into a linked-list.
Multilevel index: Have multiple index blocks, but make them a tree just like the indexed access method.
Combined scheme: A combination of both (UNIX).

12.4.4          Performance

12.5 Free-Space Management

How do we keep track free blocks on a disk?
A free-list (free-space list) is maintained. When a new block is requested, we search this list to find one.

12.5.1          Bit Vector (bit map)
Each block is represented by a bit in a table. Thus, if there are n disk blocks, the table has n bits.
If a block is free, its corresponding bit is 1.
When a block is needed, the table is searched. If a 1 bit is found in position k, block k is free.
If the disk capacity is small, the whole bit vector can be stored in memory. For a large disk, this bit vector will consume too much memory.
We could group a few blocks into a cluster and allocate clusters. This saves space and may cause internal fragmentation.
Another possibility is the use of a summary table.
12.5.2          Linked List
Like the linked allocation method, free blocks can be chained into a linked list.
When a free block is needed, the first in the chain is allocated.
However, this method has the same disadvantages of the linked allocation method.
We can use a FAT for the disk and chain the free block pointers together. Keep in mind that the FAT may be very large and consume space if it is stored in memory.
12.5.3          Grouping (Linked List + Grouping)
The first free block contains the addresses of n other free blocks.
For each group, the first n-1 blocks are actually free and the last (i.e., n-th) block contains the addresses of the next group.
In this way, we can quickly locate free blocks.

12.5.4          Counting (Linked List + Address + Count)
We can make the list short with the following trick:
Blocks are often allocated and freed in groups
We can store the address of the of the first free block and the number of the following n free blocks.

12.6 Efficiency and Performance

12.6.1 Efficiency
UNIX inodes are preallocated on a partition, even an “empty” disk has a percentage of its space lost to inodes.
12.6.2 Performance
Unified virtual memory: use page caching to cache both process pages and file data.
Free-behind removes a page form the buffer as soon as the next page is requested.
With read-ahead, a requested page and several subsequent pages are read and cached.
RAM disk: a section of memory is set aside and treated as a virtual disk, a RAM-disk device driver accepts all the standard disk operations but performs those operations on the memory section, instead of on a disk.
The difference between a RAM disk and a disk cache is that the contents of the RAM disk are totally user controlled, whereas those of the disk cache are under the control of the OS.

12.7 Recovery

12.7.1 Consistency Checking
The consistency checker compares the data in the directory structure with the data blocks on disk, and tries to fix any inconsistencies it finds.
12.7.2 Backup and Restore

12.8 Log-Structured File System

Log-based transaction-oriented file systems (journaling file systems).

12.9 NFS

12.9.1          Overview
12.9.2          The Mount Protocol
The mount protocol establishes the initial logical connection between a server and a client.
12.9.3          The NFS Protocol
A further implication of the stateless-server philosophy and a result of the synchrony of an RPC is that modified data (including indirection and status blocks) must be committed to the server’s disk before results are returned to the client.
A single NFS write procedure call is guaranteed to be atomic, and also is not intermixed with other write calls to the same file.
12.9.4          Path-Name Translation
Path-name translation is done by breaking the path into component names and performing a separate NFS lookup call for every pair of component name and directory vnode. Once a mount point is crossed, every component lookup causes a separate RPC to the server.
12.9.5          Remote Operations

nix

nix 發表在痞客邦留言(1) 人氣()

E-mail轉寄

«	三月 2025					»
日	一	二	三	四	五	六
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

nix

Seize...

OS - 12. File-System Implementation

歷史上的今天

留言列表

月曆

文章分類

近期文章

最新迴響

參觀人氣

文章彙整

我的連結

運動

旅遊

公益環保

國事

電子報

Embedded System

Freeware

Linux

攝影

Pixnet

Wretch

Yahoo

世界真奇妙

益智

新聞交換(RSS)

RSS訂閱

«	三月 2025					»
日	一	二	三	四	五	六
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

«	三月 2025					»
日	一	二	三	四	五	六
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

«	三月 2025					»
日	一	二	三	四	五	六
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31