.. _Secondary:
.. raw:: html
.. |--| unicode:: U+2013 .. en dash
.. |---| unicode:: U+2014 .. em dash, trimming surrounding whitespace
:trim:
.. This file is part of the OpenDSA eTextbook project. See
.. http://opendsa.org for more details.
.. Copyright (c) 2012-2020 by the OpenDSA Project Contributors, and
.. distributed under an MIT open source license.
.. avmetadata::
:author: Cliff Shaffer
:topic: File Processing
Primary versus Secondary Storage
================================
Computer storage devices are typically classified into
:term:`primary storage` or :term:`main memory` on the one hand, and
:term:`secondary storage` or :term:`peripheral storage` on the other.
Primary memory usually refers to :term:`Random Access Memory` (RAM),
while secondary storage refers to devices such as
hard disk drives, solid state drives, removable "USB" drives,
CDs, and DVDs.
Primary memory also includes registers, cache, and video memories,
but we will ignore them for this discussion because their existence
does not affect the principal differences between primary and
secondary memory.
Along with a faster CPU, every new model of computer seems to come
with more main memory.
As memory size continues to increase, is it possible that
relatively slow disk storage will be unnecessary?
Probably not, because the desire to store and process larger files
grows at least as fast as main memory size.
Prices for both main memory and peripheral storage devices have
dropped dramatically in recent years, as demonstrated by
Table :num:`#Price`.
However, the cost per unit of disk drive storage is about two
orders of magnitude less than RAM and has been for
many years.
.. _Price:
.. topic:: Table
Price comparison table for some writable electronic data storage
media in common use.
Prices are in US Dollars/MB.
.. math::
\begin{array}{l|r|r|r|r|r|r|r}
\hline
\textbf{Medium}& 1996 & 1997 & 2000 & 2004 & 2006 & 2008 & 2011\\
\hline
\textbf{RAM}& \$45.00 & 7.00 & 1.500 & 0.3500 & 0.1500 & 0.0339 & 0.0138\\
\textbf{Disk}& 0.25 & 0.10 & 0.010 & 0.0010 & 0.0005 & 0.0001 & 0.0001\\
\textbf{USB drive}& -- & -- & -- & 0.1000 & 0.0900 & 0.0029 & 0.0018\\
\textbf{Floppy}& 0.50 & 0.36 & 0.250 & 0.2500 & -- & -- & --\\
\textbf{Tape}& 0.03 & 0.01 & 0.001 & 0.0003 & -- & -- & --\\
\textbf{Solid State}& -- & -- & -- & -- & -- & -- & 0.0021\\
\hline
\end{array}
There is now a wide range of removable media available for
transferring data or storing data offline in relative safety.
These include floppy disks (now largely obsolete), writable CDs and
DVDs, "flash" drives, and magnetic tape.
Optical storage such as CDs and DVDs costs roughly half the price of
hard disk drive space per megabyte, and have become practical for use
as backup storage within the past few years.
Tape used to be much cheaper than other media, and was the preferred
means of backup, but are not so popular now as other media have
decreased in price.
Flash drives cost the most per megabyte, but due to their storage
capacity and flexibility, quickly replaced floppy disks as the
primary storage device for transferring data between computer when
direct network transfer is not available.
Secondary storage devices have
at least two other advantages over RAM memory.
Perhaps most importantly, disk, flash, and optical media are
:term:`persistent`,
meaning that they are not erased from the media when the power is
turned off.
In contrast, RAM used for main memory is usually :term:`volatile` |---|
all information is lost with the power.
A second advantage is that CDs and USB drives
can easily be transferred between computers.
This provides a convenient way to take information from one computer
to another.
In exchange for reduced storage costs, persistence, and
portability, secondary storage devices pay a penalty in terms of
increased access time.
While not all accesses to disk take the same amount of time
(more on this later), the typical time required to access a byte of
storage from a disk drive in 2011 is around 9 ms
(i.e., 9 `thousandths` of a second).
This might not seem slow, but compared to the time required
to access a byte from main memory, this is fantastically slow.
Typical access time from standard personal computer RAM in
2011 is about 5-10 nanoseconds
(i.e., 5-10 `billionths` of a second).
Thus, the time to access a byte of data from a disk drive is about
six orders of magnitude greater than that required to
access a byte from main memory.
While disk drive and RAM access times are both decreasing, they
have done so at roughly the same rate.
The relative speeds have remained the same for over several decades,
in that the difference in access time between RAM and a
disk drive has remained in the range between a factor of 100,000 and
1,000,000.
To gain some intuition for the significance of this speed difference,
consider the time that it might take for you to look up the entry for
disk drives in the index of this book, and then turn to the
appropriate page.
Call this your "primary memory" access time.
If it takes you about 20 seconds to perform this access, then
an access taking 500,000 times longer would require
months.
It is interesting to note that while processing speeds have increased
dramatically, and hardware prices have dropped dramatically, disk
and memory access times have improved by less than an order of magnitude
over the past 15 years.
However, the situation is really much better than that modest speedup
would suggest.
During the same time period, the size of both disk and
main memory has increased by over three orders of magnitude.
Thus, the access times have actually decreased in the face of a
massive increase in the density of these storage devices.
Due to the relatively slow access time for data on disk as compared to
main memory, great care is required to create efficient applications
that process disk-based information.
The million-to-one ratio of disk access time versus main memory access
time makes the following rule of paramount importance when designing
disk-based applications:
**Minimize the number of disk accesses!**
There are generally two approaches to minimizing disk accesses.
The first is to arrange information so that if you do access data from
secondary memory, you will get what you need in as few
accesses as possible, and preferably on the first access.
:term:`File structure` is the term used for a
data structure that organizes data stored in secondary memory.
File structures should be organized so as to minimize the required
number of disk accesses.
The other way to minimize disk accesses is to save information
previously retrieved (or retrieve additional data with each access at
little additional cost) that can be used to
minimize the need for future accesses.
This requires the ability to guess accurately
what information will be needed later and store it in primary memory
now.
This is referred to as :term:`caching`.