.. _HashCSimple:
.. raw:: html
.. |--| unicode:: U+2013 .. en dash
.. |---| unicode:: U+2014 .. em dash, trimming surrounding whitespace
:trim:
.. odsalink:: AV/Hashing/linProbeCON.css
.. This file is part of the OpenDSA eTextbook project. See
.. http://algoviz.org/OpenDSA for more details.
.. Copyright (c) 2012-2013 by the OpenDSA Project Contributors, and
.. distributed under an MIT open source license.
.. avmetadata::
:author: Cliff Shaffer
:requires: open hashing
:satisfies: collision resolution
:topic: Hashing
.. index:: ! collision resolution
Collision Resolution
====================
Collision Resolution
--------------------
We now turn to the most commonly used form of hashing:
:term:`closed hashing ` with no bucketing, and a
:term:`collision resolution policy` that can potentially use any slot
in the hash table.
During insertion, the goal of :term:`collision resolution` is to find
a free slot in the hash table when the home position for the record is
already occupied.
We can view any collision resolution method as generating a sequence
of hash table slots that can potentially hold the record.
The first slot in the sequence will be the home position for the key.
If the home position is occupied, then the collision resolution policy
goes to the next slot in the sequence.
If this is occupied as well, then another slot must be found, and
so on.
This sequence of slots is known as the
:term:`probe sequence`, and it is generated by some
:term:`probe function` that we will call **p**.
Insertion works as follows::
// Insert e into hash table HT
void hashInsert(const Key& k, const Elem& e) {
int home; // Home position for e
int pos = home = h(k); // Init probe sequence
for (int i=1; EMPTYKEY != (HT[pos]).key(); i++) {
pos = (home + p(k, i)) % M; // probe
if (k == HT[pos].key()) {
println("Duplicates not allowed");
return;
}
}
HT[pos] = e;
}
Method ``hashInsert`` first checks to see if the home slot for the
key is empty.
If the home slot is occupied, then we use the probe function
:math:`\textbf{p}(k, i)` to locate a free slot in the table.
Function **p** has two parameters, the key :math:`k` and a
count :math:`i` of where in the probe sequence we wish to be.
That is, to get the first position in the probe sequence after the
home slot for key :math:`K`, we call :math:`\textbf{p}(K, 1)`.
For the next slot in the probe sequence, call :math:`\textbf{p}(K, 2)`.
Note that the probe function returns an offset from the original home
position, rather than a slot in the hash table.
Thus, the ``for`` loop in ``hashInsert`` is computing positions
in the table at each iteration by adding the value returned from the
probe function to the home position.
The :math:`i` th call to **p** returns the :math:`i` th offset to be used.
Searching in a hash table follows the same probe sequence that was
followed when inserting records.
In this way, a record not in its home position can be recovered.
An implementation for the search procedure is as
follows.::
// Search for the record with Key K
bool hashSearch(const Key& K, Elem& e) const {
int home; // Home position for K
int pos = home = h(K); // Initial position is the home slot
for (int i = 1;
(K != (HT[pos]).key()) && (EMPTYKEY != (HT[pos]).key());
i++)
pos = (home + p(K, i)) % M; // Next on probe sequence
if (K == (HT[pos]).key()) { // Found it
e = HT[pos];
return true;
}
else return false; // K not in hash table
}
Both the insert and the search routines assume that at least
one slot on the probe sequence of every key will be empty.
Otherwise they will continue in an infinite loop on unsuccessful
searches.
Thus, the hash system should keep a count of the number of records stored,
and refuse to insert into a table that has only one free slot.
The simplest approach to collsion resolution is simply to move down
the table from the home slot until a free slot is found.
This is known as :term:`linear probing`.
The probe function for simple linear probing is
:math:`\textbf{p}(K, i) = i`.
That is, the :math:`i` th offset on the probe sequence is just
:math:`i`,
meaning that the :math:`i` th step is simply to move down :math:`i`
slots in the table.
Once the bottom of the table is reached, the probe sequence
wraps around to the beginning of the table (since the last step is to
mod the result to the table size).
Linear probing has the virtue that all slots in the table will be
candidates for inserting a new record before the probe sequence
returns to the home position.
.. inlineav:: linProbeCON1 ss
:points: 0.0
:required: False
:threshold: 1.0
:id: 151156
:long_name: Linear Probing Slideshow 1
:output: show
Can you see any reason why this might not be the best approach
to collision resolution?
The Problem with Linear Probing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
While linear probing is probably
the first idea that comes to mind when considering collision
resolution policies, it is not the only one possible.
Probe function **p** allows us many options for how to do collision
resolution.
In fact, linear probing is one of the worst collision resolution
methods.
The main problem is illustrated by the next slideshow.
.. inlineav:: linProbeCON2 ss
:points: 0.0
:required: False
:threshold: 1.0
:id: 151157
:long_name: Linear Probing Slideshow 2
:output: show
Again, the ideal behavior for a collision resolution mechanism is that
each empty slot in the table will have equal probability of
receiving the next record inserted (assuming that every slot in the
table has equal probability of being hashed to initially).
This tendency of linear probing to cluster items together is known as
:term:`primary clustering`.
Small clusters tend to merge into big clusters, making the problem
worse.
The objection to primary clustering is that it leads to
long probe sequences.
.. avembed:: Exercises/Hashing/HashLinearPPRO.html ka
:module: HashCSimple
:points: 1.0
:required: True
:threshold: 5.0
:id: 151158
:exer_opts: JXOP-debug=false&JOP-lang=en&JXOP-code=java
:long_name: Linear Probing Proficiency Exercise
.. odsascript:: AV/Hashing/linProbeCON1.js
.. odsascript:: AV/Hashing/linProbeCON2.js