.. _FileProc: .. raw:: html .. |--| unicode:: U+2013 .. en dash .. |---| unicode:: U+2014 .. em dash, trimming surrounding whitespace :trim: .. This file is part of the OpenDSA eTextbook project. See .. http://algoviz.org/OpenDSA for more details. .. Copyright (c) 2012-2013 by the OpenDSA Project Contributors, and .. distributed under an MIT open source license. .. avmetadata:: :author: Cliff Shaffer ================================ File Processing and Buffer Pools ================================ File Processing and Buffer Pools -------------------------------- Programmer’s View of Files ~~~~~~~~~~~~~~~~~~~~~~~~~~ * Logical view of files: * An a array of bytes. * A file pointer marks the current position. * Three fundamental operations: * Read bytes from current position (move file pointer) * Write bytes to current position (move file pointer) * Set file pointer to specified byte position. Java File Functions ~~~~~~~~~~~~~~~~~~~ ``RandomAccessFile(String name, String mode)`` ``close()`` ``read(byte[] b)`` ``write(byte[] b)`` ``seek(long pos)`` Primary vs. Secondary Storage ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Primary storage: Main memory (RAM) * Secondary Storage: Peripheral devices * Disk drives * Tape drives * Flash drives Comparisons ~~~~~~~~~~~ .. math:: \begin{array}{l|r|r|r|r|r|r|r} \hline \textbf{Medium}& 1996 & 1997 & 2000 & 2004 & 2006 & 2008 & 2011\\ \hline \textbf{RAM}& \$45.00 & 7.00 & 1.500 & 0.3500 & 0.1500 & 0.0339 & 0.0138\\ \textbf{Disk}& 0.25 & 0.10 & 0.010 & 0.0010 & 0.0005 & 0.0001 & 0.0001\\ \textbf{USB drive}& -- & -- & -- & 0.1000 & 0.0900 & 0.0029 & 0.0018\\ \textbf{Floppy}& 0.50 & 0.36 & 0.250 & 0.2500 & -- & -- & --\\ \textbf{Tape}& 0.03 & 0.01 & 0.001 & 0.0003 & -- & -- & --\\ \textbf{Solid State}& -- & -- & -- & -- & -- & -- & 0.0021\\ \hline \end{array} * (Costs per Megabyte) * RAM is usually volatile. * RAM is about 1/2 million times faster than disk. Golden Rule of File Processing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Minimize the number of disk accesses! #. Arrange information so that you get what you want with few disk accesses. #. Arrange information to minimize future disk accesses. * An organization for data on disk is often called a file structure. * Disk-based space/time tradeoff: Compress information to save processing time by reducing disk accesses. Disk Drives ~~~~~~~~~~~ .. odsafig:: Images/Plat.png :width: 600 :align: center :capalign: justify :figwidth: 90% :alt: Disk drive platters Sectors ~~~~~~~ .. odsafig:: Images/Disk.png :width: 600 :align: center :capalign: justify :figwidth: 90% :alt: The organization of a disk platter * A sector is the basic unit of I/O. Terms ~~~~~ * **Locality of Reference**: When record is read from disk, next request is likely to come from near the same place on the disk. * **Cluster**: Smallest unit of file allocation, usually several sectors. * **Extent**: A group of physically contiguous clusters. * **Internal fragmentation**: Wasted space within sector if record size does not match sector size; wasted space within cluster if file size is not a multiple of cluster size. Seek Time ~~~~~~~~~ * **Seek time**: Time for I/O head to reach desired track. Largely determined by distance between I/O head and desired track. * **Track-to-track time**: Minimum time to move from one track to an adjacent track. * **Average Access time**: Average time to reach a track for random access. Other Factors ~~~~~~~~~~~~~ * **Rotational Delay** or **Latency**: Time for data to rotate under I/O head. * One half of a rotation on average. * At 7200 rpm, this is 8.3/2 = 4.2ms. * **Transfer time**: Time for data to move under the I/O head. * At 7200 rpm: Number of sectors read/Number of sectors per track * 8.3ms. (Old) Disk Spec Example ~~~~~~~~~~~~~~~~~~~~~~~ * 16.8 GB disk on 10 platters = 1.68GB/platter * 13,085 tracks/platter * 256 sectors/track * 512 bytes/sector * Track-to-track seek time: 2.2 ms * Average seek time: 9.5ms * 4KB clusters, 32 clusters/track. * 5400RPM Disk Access Cost Example (1) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Read a 1MB file divided into 2048 records of 512 bytes (1 sector) each. * Assume all records are on 8 contiguous tracks. * First track: 9.5 + (11.1)(1.5) = 26.2 ms * Remaining 7 tracks: 2.2 + (11.1)(1.5) = 18.9ms. * Total: 26.2 + 7 * 18.9 = 158.5ms Disk Access Cost Example (2) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Read a 1MB file divided into 2048 records of 512 bytes (1 sector) each. * Assume all file clusters are randomly spread across the disk. * 256 clusters. Cluster read time is 8/256 of a rotation for about 5.9ms for both latency and read time. * 256(9.5 + 5.9) is about 3942ms or nearly 4 sec. How Much to Read? ~~~~~~~~~~~~~~~~~ * Read time for one track: :math:`9.5 + (11.1)(1.5) = 26.2` ms * Read time for one sector: :math:`9.5 + 11.1/2 + (1/256)11.1 = 15.1` ms * Read time for one byte: :math:`9.5 + 11.1/2 = 15.05` ms * Nearly all disk drives read/write one sector (or more) at every I/O access * Also referred to as a page or block Newer Disk Spec Example ~~~~~~~~~~~~~~~~~~~~~~~ * Samsung Spinpoint T166 * 500GB (nominal) * 7200 RPM * Track to track: 0.8 ms * Average track access: 8.9 ms * Bytes/sector: 512 * 6 surfaces/heads Buffers ~~~~~~~ * The information in a sector is stored in a buffer or cache. * If the next I/O access is to the same buffer, then no need to go to disk. * Disk drives usually have one or more input buffers and one or more output buffers. Buffer Pools ~~~~~~~~~~~~ * A series of buffers used by an application to cache disk data is called a buffer pool. * Virtual memory uses a buffer pool to imitate greater RAM memory by actually storing information on disk and “swapping” between disk and RAM. Buffer Pools ~~~~~~~~~~~~ .. odsalink:: AV/Files/buffpoolCON.css .. inlineav:: buffintroCON ss :long_name: Buffer Pool Introduction Slideshow :points: 0.0 :required: False :threshold: 1.0 :align: center :output: show .. odsascript:: AV/Files/buffintroCON.js Organizing Buffer Pools ~~~~~~~~~~~~~~~~~~~~~~~ * Which buffer should be replaced when new data must be read? * First-in, First-out: Use the first one on the queue. * Least Frequently Used (LFU): Count buffer accesses, reuse the least used. * Least Recently used (LRU): Keep buffers on a linked list. When buffer is accessed, bring it to front. Reuse the one at end. LRU ~~~ .. inlineav:: LRUCON ss :long_name: LRU Replacement Slideshow :points: 0.0 :required: False :threshold: 1.0 :align: center :output: show .. odsascript:: AV/Files/LRUCON.js Dirty Bit ~~~~~~~~~ .. inlineav:: LRUwriteCON ss :long_name: LRU Replacement with write Slideshow :points: 0.0 :required: False :threshold: 1.0 :align: center :output: show .. odsascript:: AV/Files/LRUwriteCON.js Bufferpool ADT: Message Passing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. codeinclude:: BufferPool/BuffMsgADT Bufferpool ADT: Buffer Passing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. codeinclude:: BufferPool/BuffBuffADT Design Issues ~~~~~~~~~~~~~ * Disadvantage of message passing: * Messages are copied and passed back and forth. * Disadvantages of buffer passing: * The user is given access to system memory (the buffer itself) * The user must explicitly tell the buffer pool when buffer contents have been modified, so that modified data can be rewritten to disk when the buffer is flushed. * The pointer might become stale when the bufferpool replaces the contents of a buffer. Some Goals ~~~~~~~~~~ * Be able to avoid reading data when the block contents will be replaced. * Be able to support multiple users accessing a buffer, and independantly releasing a buffer. * Don’t make an active buffer stale. Improved Interface ~~~~~~~~~~~~~~~~~~ .. codeinclude:: BufferPool/BufferADT Improved Interface (2) ~~~~~~~~~~~~~~~~~~~~~~ .. codeinclude:: BufferPool/BufferPoolADT External Sorting ~~~~~~~~~~~~~~~~ * Problem: Sorting data sets too large to fit into main memory. * Assume data are stored on disk drive. * To sort, portions of the data must be brought into main memory, processed, and returned to disk. * An external sort should minimize disk accesses. Model of External Computation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Secondary memory is divided into equal-sized blocks (512, 1024, etc…) * A basic I/O operation transfers the contents of one disk block to/from main memory. * Under certain circumstances, reading blocks of a file in sequential order is more efficient. (When?) * Primary goal is to minimize I/O operations. * Assume only one disk drive is available. Key Sorting ~~~~~~~~~~~ * Often, records are large, keys are small. * Ex: Payroll entries keyed on ID number * Approach 1: Read in entire records, sort them, then write them out again. * Approach 2: Read only the key values, store with each key the location on disk of its associated record. * After keys are sorted the records can be read and rewritten in sorted order. Simple External Mergesort (1) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Quicksort requires random access to the entire set of records. * Better: Modified Mergesort algorithm. * Process :math:`n` elements in :math:`\Theta(\log n)` passes. * A group of sorted records is called a run. Simple External Mergesort (2) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #. Split the file into two files. #. Read in a block from each file. #. Take first record from each block, output them in sorted order. #. Take next record from each block, output them to a second file in sorted order. #. Repeat until finished, alternating between output files. Read new input blocks as needed. #. Repeat steps 2-5, except this time input files have runs of two sorted records that are merged together. #. Each pass through the files provides larger runs. Simple External Mergesort (3) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. odsalink:: AV/Files/extsortCON.css .. inlineav:: extMergeSortCON ss :long_name: External Merge Sort Slideshow :points: 0.0 :required: False :threshold: 1.0 :output: show .. odsascript:: AV/Files/extMergeSortCON.js Problems with Simple Mergesort ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Is each pass through input and output files sequential? * What happens if all work is done on a single disk drive? * How can we reduce the number of Mergesort passes? * In general, external sorting consists of two phases: * Break the files into initial runs * Merge the runs together into a single run. A Better Process ~~~~~~~~~~~~~~~~ .. inlineav:: extMergeSortExampCON ss :long_name: External Merge Sort Example Slideshow :points: 0.0 :required: False :threshold: 1.0 :output: show .. odsascript:: AV/Files/extMergeSortExampCON.js Breaking a File into Runs ~~~~~~~~~~~~~~~~~~~~~~~~~ * General approach: * Read as much of the file into memory as possible. * Perform an in-memory sort. * Output this group of records as a single run. Replacement Selection (1) ~~~~~~~~~~~~~~~~~~~~~~~~~ * Break available memory into an array for the heap, an input buffer, and an output buffer. * Fill the array from disk. * Make a min-heap. * Send the smallest value (root) to the output buffer. Replacement Selection (2) ~~~~~~~~~~~~~~~~~~~~~~~~~ * If the next key in the file is greater than the last value output, then * Replace the root with this key else * Replace the root with the last key in the array Add the next record in the file to a new heap (actually, stick it at the end of the array). .. inlineav:: extSortOverCON dgm :output: show .. odsascript:: AV/Files/extSortOverCON.js RS Example ~~~~~~~~~~ .. inlineav:: extRSCON ss :long_name: External Replacement Selection Slideshow :points: 0.0 :required: False :threshold: 1.0 :output: show .. odsascript:: DataStructures/binaryheap.js .. odsascript:: AV/Files/extRSCON.js Snowplow Analogy (1) ~~~~~~~~~~~~~~~~~~~~ * Imagine a snowplow moving around a circular track on which snow falls at a steady rate. * At any instant, there is a certain amount of snow S on the track. Some falling snow comes in front of the plow, some behind. * During the next revolution of the plow, all of this is removed, plus 1/2 of what falls during that revolution. * Thus, the plow removes 2S amount of snow. Snowplow Analogy (2) ~~~~~~~~~~~~~~~~~~~~ .. inlineav:: extSortSnowCON dgm :output: show .. odsascript:: AV/Files/extSortSnowCON.js Problems with Simple Merge ~~~~~~~~~~~~~~~~~~~~~~~~~~ * Simple mergesort: Place runs into two files. * Merge the first two runs to output file, then next two runs, etc. * Repeat process until only one run remains. * How many passes for r initial runs? * Is there benefit from sequential reading? * Is working memory well used? * Need a way to reduce the number of passes. Multiway Merge (1) ~~~~~~~~~~~~~~~~~~ * With replacement selection, each initial run is several blocks long. * Assume each run is placed in separate file. * Read the first block from each file into memory and perform an r-way merge. * When a buffer becomes empty, read a block from the appropriate run file. * Each record is read only once from disk during the merge process. Multiway Merge (2) ~~~~~~~~~~~~~~~~~~ * In practice, use only one file and seek to appropriate block. .. inlineav:: extMultiMergeCON ss :long_name: Multiway Merge Example Slideshow :points: 0.0 :required: False :threshold: 1.0 :output: show .. odsascript:: AV/Files/extMultiMergeCON.js Limits to Multiway Merge (1) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Assume working memory is :math:`b` blocks in size. * How many runs can be processed at one time? * The runs are :math:`2b` blocks long (on average). * How big a file can be merged in one pass? Limits to Multiway Merge (2) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Larger files will need more passes -- but the run size grows quickly! * This approach trades (:math:`\log b`) (possibly) sequential passes for a single or very few random (block) access passes. General Principles ~~~~~~~~~~~~~~~~~~ * A good external sorting algorithm will seek to do the following: * Make the initial runs as long as possible. * At all stages, overlap input, processing and output as much as possible. * Use as much working memory as possible. Applying more memory usually speeds processing. * If possible, use additional disk drives for more overlapping of processing with I/O, and allow for more sequential file processing.