3. File Processing and Buffer Pools

3.1. Programmer’s View of Files

Logical view of files:
- An a array of bytes.
- A file pointer marks the current position.
Three fundamental operations:
- Read bytes from current position (move file pointer)
- Write bytes to current position (move file pointer)
- Set file pointer to specified byte position.

3.2. Java File Functions

RandomAccessFile(String name, String mode)
close()
read(byte[] b)
write(byte[] b)
seek(long pos)

3.3. Primary vs. Secondary Storage

Primary storage: Main memory (RAM)
Secondary Storage: Peripheral devices
- Disk drives
- Tape drives
- Flash drives

3.4. Comparisons

\[\begin{split}\begin{array}{l|r|r|r|r|r|r|r} \hline \textbf{Medium}& 1996 & 1997 & 2000 & 2004 & 2006 & 2008 & 2011\\ \hline \textbf{RAM}& \$45.00 & 7.00 & 1.500 & 0.3500 & 0.1500 & 0.0339 & 0.0138\\ \textbf{Disk}& 0.25 & 0.10 & 0.010 & 0.0010 & 0.0005 & 0.0001 & 0.0001\\ \textbf{USB drive}& -- & -- & -- & 0.1000 & 0.0900 & 0.0029 & 0.0018\\ \textbf{Floppy}& 0.50 & 0.36 & 0.250 & 0.2500 & -- & -- & --\\ \textbf{Tape}& 0.03 & 0.01 & 0.001 & 0.0003 & -- & -- & --\\ \textbf{Solid State}& -- & -- & -- & -- & -- & -- & 0.0021\\ \hline \end{array}\end{split}\]

(Costs per Megabyte)
RAM is usually volatile.
RAM is about 1/2 million times faster than disk.

3.5. Golden Rule of File Processing

Minimize the number of disk accesses!
1. Arrange information so that you get what you want with few disk accesses.
2. Arrange information to minimize future disk accesses.
An organization for data on disk is often called a file structure.
Disk-based space/time tradeoff: Compress information to save processing time by reducing disk accesses.

3.6. Disk Drives

3.7. Sectors

A sector is the basic unit of I/O.

3.8. Terms

Locality of Reference: When record is read from disk, next request is likely to come from near the same place on the disk.
Cluster: Smallest unit of file allocation, usually several sectors.
Extent: A group of physically contiguous clusters.
Internal fragmentation: Wasted space within sector if record size does not match sector size; wasted space within cluster if file size is not a multiple of cluster size.

3.9. Seek Time

Seek time: Time for I/O head to reach desired track. Largely determined by distance between I/O head and desired track.
Track-to-track time: Minimum time to move from one track to an adjacent track.
Average Access time: Average time to reach a track for random access.

3.10. Other Factors

Rotational Delay or Latency: Time for data to rotate under I/O head.
- One half of a rotation on average.
- At 7200 rpm, this is 8.3/2 = 4.2ms.
Transfer time: Time for data to move under the I/O head.
- At 7200 rpm: Number of sectors read/Number of sectors per track * 8.3ms.

3.11. (Old) Disk Spec Example

16.8 GB disk on 10 platters = 1.68GB/platter
13,085 tracks/platter
256 sectors/track
512 bytes/sector
Track-to-track seek time: 2.2 ms
Average seek time: 9.5ms
4KB clusters, 32 clusters/track.
5400RPM

3.12. Disk Access Cost Example (1)

Read a 1MB file divided into 2048 records of 512 bytes (1 sector) each.
Assume all records are on 8 contiguous tracks.
First track: 9.5 + (11.1)(1.5) = 26.2 ms
Remaining 7 tracks: 2.2 + (11.1)(1.5) = 18.9ms.
Total: 26.2 + 7 * 18.9 = 158.5ms

3.13. Disk Access Cost Example (2)

Read a 1MB file divided into 2048 records of 512 bytes (1 sector) each.
Assume all file clusters are randomly spread across the disk.
256 clusters. Cluster read time is 8/256 of a rotation for about 5.9ms for both latency and read time.
256(9.5 + 5.9) is about 3942ms or nearly 4 sec.

3.14. How Much to Read?

Read time for one track: $9.5 + (11.1)(1.5) = 26.2$ ms
Read time for one sector: $9.5 + 11.1/2 + (1/256)11.1 = 15.1$ ms
Read time for one byte: $9.5 + 11.1/2 = 15.05$ ms
Nearly all disk drives read/write one sector (or more) at every I/O access
Also referred to as a page or block

3.15. Newer Disk Spec Example

Samsung Spinpoint T166
500GB (nominal)
7200 RPM
Track to track: 0.8 ms
Average track access: 8.9 ms
Bytes/sector: 512
6 surfaces/heads

3.16. Buffers

The information in a sector is stored in a buffer or cache.
If the next I/O access is to the same buffer, then no need to go to disk.
Disk drives usually have one or more input buffers and one or more output buffers.

3.17. Buffer Pools

A series of buffers used by an application to cache disk data is called a buffer pool.
Virtual memory uses a buffer pool to imitate greater RAM memory by actually storing information on disk and “swapping” between disk and RAM.

3.18. Buffer Pools

Settings

Saving...

Server Error
Resubmit

3.19. Organizing Buffer Pools

Which buffer should be replaced when new data must be read?
First-in, First-out: Use the first one on the queue.
Least Frequently Used (LFU): Count buffer accesses, reuse the least used.
Least Recently used (LRU): Keep buffers on a linked list. When buffer is accessed, bring it to front. Reuse the one at end.

3.20. LRU

Settings

Saving...

Server Error
Resubmit

3.21. Dirty Bit

Settings

Saving...

Server Error
Resubmit

3.22. Bufferpool ADT: Message Passing

// ADT for buffer pools using the message-passing style
public interface BufferPoolADT {
  // Copy "sz" bytes from "space" to position "pos" in the buffered storage
  public void insert(byte[] space, int sz, int pos);

  // Copy "sz" bytes from position "pos" of the buffered storage to "space"
  public void getbytes(byte[] space, int sz, int pos);
}

3.23. Bufferpool ADT: Buffer Passing

// ADT for buffer pools using the buffer-passing style
public interface BufferPoolADT {
  // Return pointer to the requested block
  public byte[] getblock(int block);

  // Set the dirty bit for the buffer holding "block"
  public void dirtyblock(int block);

  // Tell the size of a buffer
  public int blocksize();
};

3.24. Design Issues

Disadvantage of message passing:
- Messages are copied and passed back and forth.
Disadvantages of buffer passing:
- The user is given access to system memory (the buffer itself)
- The user must explicitly tell the buffer pool when buffer contents have been modified, so that modified data can be rewritten to disk when the buffer is flushed.
- The pointer might become stale when the bufferpool replaces the contents of a buffer.

3.25. Some Goals

Be able to avoid reading data when the block contents will be replaced.
Be able to support multiple users accessing a buffer, and independently releasing a buffer.
Don’t make an active buffer stale.

3.26. Improved Interface

// Improved ADT for buffer pools using the buffer-passing style.
// Most user functionality is in the buffer class, not the buffer pool itself.

// A single buffer in the buffer pool
public interface BufferADT {
  // Read the associated block from disk (if necessary) and return a
  // pointer to the data
  public byte[] readBlock();

  // Return a pointer to the buffer's data array (without reading from disk)
  public byte[] getDataPointer();

  // Flag buffer's contents as having changed, so that flushing the
  // block will write it back to disk
  public void markDirty();

  // Release the block's access to this buffer. Further accesses to
  // this buffer are illegal
  public void releaseBuffer();
}

3. File Processing and Buffer Pools

3.1. Programmer’s View of Files

3.2. Java File Functions

3.3. Primary vs. Secondary Storage

3.4. Comparisons

3.5. Golden Rule of File Processing

3.6. Disk Drives

3.7. Sectors

3.8. Terms

3.9. Seek Time

3.10. Other Factors

3.11. (Old) Disk Spec Example

3.12. Disk Access Cost Example (1)

3.13. Disk Access Cost Example (2)

3.14. How Much to Read?

3.15. Newer Disk Spec Example

3.16. Buffers

3.17. Buffer Pools

3.18. Buffer Pools

3.19. Organizing Buffer Pools

3.20. LRU

3.21. Dirty Bit

3.22. Bufferpool ADT: Message Passing

3.23. Bufferpool ADT: Buffer Passing

3.24. Design Issues

3.25. Some Goals

3.26. Improved Interface

3.27. Improved Interface (2)