| About   «  6.2. Sorting Part 3   ::   Contents   ::   7.2. Project 3  »

7.1. File Processing and Buffer Pools

7.1.1. Programmer’s View of Files

  • Logical view of files:

    • An a array of bytes.

    • A file pointer marks the current position.

  • Three fundamental operations:

    • Read bytes from current position (move file pointer)

    • Write bytes to current position (move file pointer)

    • Set file pointer to specified byte position.

7.1.2. Java File Functions

  • RandomAccessFile(String name, String mode)

  • close()

  • read(byte[] b)

  • write(byte[] b)

  • seek(long pos)

7.1.3. Primary vs. Secondary Storage

  • Primary storage: Main memory (RAM)

  • Secondary Storage: Peripheral devices

    • Disk drives

    • SSD

    • Flash drives

    • (Network)

7.1.4. Comparisons

\[\begin{split}\begin{array}{l|r|r|r|r|r|r|r} \hline \textbf{Medium}& 1996 & 1997 & 2000 & 2004 & 2006 & 2008 & 2011 & 2025\\ \hline \textbf{RAM}& \$45.00 & 7.00 & 1.500 & 0.3500 & 0.1500 & 0.0339 & 0.0138 & 0.015\\ \textbf{HDD}& 0.25 & 0.10 & 0.010 & 0.0010 & 0.0005 & 0.0001 & 0.0001 & 0.000025\\ \textbf{SSD}& -- & -- & -- & -- & -- & -- & 0.0021 & 0.000125\\ \hline \end{array}\end{split}\]
  • (Costs per Megabyte)

  • RAM is usually volatile.

  • RAM is about 1/2 million times faster than disk.

7.1.5. Golden Rule of File Processing

  • Minimize the number of disk accesses!

    1. Arrange information so that you get what you want with few disk accesses.

    2. Arrange information to minimize future disk accesses.

  • An organization for data on disk is often called a file structure.

  • Disk-based space/time tradeoff: Compress information to save processing time by reducing disk accesses.

7.1.6. Disk Drives

Disk drive platters

7.1.7. Sectors

The organization of a disk platter
  • A sector is the basic unit of I/O.

7.1.8. Terms

  • Locality of Reference: When record is read from disk, next request is likely to come from near the same place on the disk.

  • Cluster: Smallest unit of file allocation, usually several sectors.

  • Extent: A group of physically contiguous clusters.

  • Internal fragmentation: Wasted space within sector if record size does not match sector size; wasted space within cluster if file size is not a multiple of cluster size.

7.1.9. Seek Time

  • Seek time: Time for I/O head to reach desired track. Largely determined by distance between I/O head and desired track.

  • Track-to-track time: Minimum time to move from one track to an adjacent track.

  • Average Access time: Average time to reach a track for random access.

7.1.10. Other Factors

  • Rotational Delay or Latency: Time for data to rotate under I/O head.

    • One half of a rotation on average.

    • At 7200 rpm, this is 8.3/2 = 4.2ms.

  • Transfer time: Time for data to move under the I/O head.

    • At 7200 rpm: Number of sectors read/Number of sectors per track * 8.3ms.

7.1.11. (Old) Disk Spec Example

  • 16.8 GB disk on 10 platters = 1.68GB/platter

  • 13,085 tracks/platter

  • 256 sectors/track

  • 512 bytes/sector

  • Track-to-track seek time: 2.2 ms

  • Average seek time: 9.5ms

  • 4KB clusters, 32 clusters/track.

  • 5400RPM

7.1.12. Disk Access Cost Example (1)

  • Read a 1MB file divided into 2048 records of 512 bytes (1 sector) each.

  • Assume all records are on 8 contiguous tracks.

  • First track: 9.5 + (11.1)(1.5) = 26.2 ms

  • Remaining 7 tracks: 2.2 + (11.1)(1.5) = 18.9ms.

  • Total: 26.2 + 7 * 18.9 = 158.5ms

7.1.13. Disk Access Cost Example (2)

  • Read a 1MB file divided into 2048 records of 512 bytes (1 sector) each.

  • Assume all file clusters are randomly spread across the disk.

  • 256 clusters. Cluster read time is 8/256 of a rotation for about 5.9ms for both latency and read time.

  • 256(9.5 + 5.9) is about 3942ms or nearly 4 sec.

7.1.14. How Much to Read?

  • Read time for one track: \(9.5 + (11.1)(1.5) = 26.2\) ms

  • Read time for one sector: \(9.5 + 11.1/2 + (1/256)11.1 = 15.1\) ms

  • Read time for one byte: \(9.5 + 11.1/2 = 15.05\) ms

  • Nearly all disk drives read/write one sector (or more) at every I/O access

  • Also referred to as a page or block

7.1.15. Newer Disk Spec Example

  • Samsung Spinpoint T166

  • 500GB (nominal)

  • 7200 RPM

  • Track to track: 0.8 ms

  • Average track access: 8.9 ms

  • Bytes/sector: 512

  • 6 surfaces/heads

7.1.16. Solid State Drive (SSD)

  • SSD do not require physical movement to read data.

    • Far closer to being “random access” than HDD.

  • Detailed vendor specs are hard to get.

    • Beware standard measures of I/O operations per second. While that might help a data center manager, it tells almost nothing about what a user on a personal computer will experience.

    • Far more important is latency time to get the next piece of information. 10ms for HDD, 0.1ms for SSD might be reasonable estimates.

  • SSD is far better at handling multiple requests in parallel, that hugely improves the relative performance compared to HDD.

    • If interested, read discussions about “queue depth”. For most applications, the individual program is interested in QD1 performance, which is the worst for SSD.

  • SSD still reads in blocks, just like HDD: Both are typically 4K bytes as basic unit of I/O.

  • Buffering still matters!

7.1.17. Buffers

  • The information in a sector is stored in a buffer or cache.

  • If the next I/O access is to the same buffer, then no need to go to disk.

  • Disk drives usually have one or more input buffers and one or more output buffers.

7.1.18. Buffer Pools

  • A series of buffers used by an application to cache disk data is called a buffer pool.

  • Virtual memory uses a buffer pool to imitate greater RAM memory by actually storing information on disk and “swapping” between disk and RAM.

7.1.19. Buffer Pools

7.1.20. Organizing Buffer Pools

  • Which buffer should be replaced when new data must be read?

  • First-in, First-out: Use the first one on the queue.

  • Least Frequently Used (LFU): Count buffer accesses, reuse the least used.

  • Least Recently used (LRU): Keep buffers on a linked list. When buffer is accessed, bring it to front. Reuse the one at end.

7.1.21. LRU

7.1.22. Dirty Bit

7.1.23. Bufferpool ADT: Message Passing

// ADT for buffer pools using the message-passing style
public interface BufferPoolADT {
  // Copy "sz" bytes from "space" to position "pos" in the buffered storage
  public void insert(byte[] space, int sz, int pos);

  // Copy "sz" bytes from position "pos" of the buffered storage to "space"
  public void getbytes(byte[] space, int sz, int pos);
}

7.1.24. Bufferpool ADT: Buffer Passing

// ADT for buffer pools using the buffer-passing style
public interface BufferPoolADT {
  // Return pointer to the requested block
  public byte[] getblock(int block);

  // Set the dirty bit for the buffer holding "block"
  public void dirtyblock(int block);

  // Tell the size of a buffer
  public int blocksize();
};

7.1.25. Design Issues

  • Disadvantage of message passing:

    • Messages are copied and passed back and forth.

  • Disadvantages of buffer passing:

    • The user is given access to system memory (the buffer itself)

    • The user must explicitly tell the buffer pool when buffer contents have been modified, so that modified data can be rewritten to disk when the buffer is flushed.

    • The pointer might become stale when the bufferpool replaces the contents of a buffer.

7.1.26. Some Goals

  • Be able to avoid reading data when the block contents will be replaced.

  • Be able to support multiple users accessing a buffer, and independently releasing a buffer.

  • Don’t make an active buffer stale.

7.1.27. Improved Interface

// Improved ADT for buffer pools using the buffer-passing style.
// Most user functionality is in the buffer class, not the buffer pool itself.

// A single buffer in the buffer pool
public interface BufferADT {
  // Read the associated block from disk (if necessary) and return a
  // pointer to the data
  public byte[] readBlock();

  // Return a pointer to the buffer's data array (without reading from disk)
  public byte[] getDataPointer();

  // Flag buffer's contents as having changed, so that flushing the
  // block will write it back to disk
  public void markDirty();

  // Release the block's access to this buffer. Further accesses to
  // this buffer are illegal
  public void releaseBuffer();
}

7.1.28. Improved Interface (2)

public interface BufferPoolADT {

  // Relate a block to a buffer, returning a pointer to a buffer object
  Buffer acquireBuffer(int block);
}

   «  6.2. Sorting Part 3   ::   Contents   ::   7.2. Project 3  »

Close Window