7.1. Chapter Introduction: Search¶
Organizing and retrieving information is at the heart of most computer applications, and searching is surely the most frequently performed of all computing tasks. Search can be viewed abstractly as a process to determine if an element with a particular value is a member of a particular set. The more common view of searching is an attempt to find the record within a collection of records that has a particular key value, or those records in a collection whose key values meet some criterion such as falling within a range of values.
We can define searching formally as follows. Suppose that we have a collection L of \(n\) records of the form
where \(I_j\) is information associated with key \(k_j\) from record \(j\) for \(1 \leq j \leq n\). Given a particular key value \(K\), the search problem is to locate a record \((k_j, I_j)\) in L such that \(k_j = K\) (if one exists). Searching is a systematic method for locating the record (or records) with key value \(k_j = K\).
A successful search is one in which a record with key \(k_j = K\) is found. An unsuccessful search is one in which no record with \(k_j = K\) is found (and no such record exists).
An exact-match query is a search for the record whose key value matches a specified key value. A range query is a search for all records whose key value falls within a specified range of key values.
We can categorize search algorithms into three general approaches:
Sequential and list methods.
Direct access by key value (hashing).
Tree indexing methods.
Any of these approaches are potentially suitable for implementing the Dictionary ADT. However, each has different performance characteristics that make it the method of choice in particular circumstances.
The current chapter considers methods for searching data stored in lists. List in this context means any list implementation including a linked list or an array. Most of these methods are appropriate for sequences (i.e., duplicate key values are allowed), although there are special techniques applicable to sets. The techniques from the first three sections of this chapter are most appropriate for searching a collection of records stored in RAM. Chapter introduces hashing, a technique for organizing data in an array such that the location of each record within the array is a function of its key value. Hashing is appropriate when records are stored either in RAM or on disk.
Chapter discusses tree-based methods for organizing information on disk, including a commonly used file structure called the B-tree. Nearly all programs that must organize large collections of records stored on disk use some variant of either hashing or the B-tree. Hashing is practical for only certain access applications (exact-match queries) and is generally appropriate only when duplicate key values are not allowed. B-trees are the method of choice for dynamic disk-based applications anytime hashing is not appropriate.