.. _UnsortedSearch: .. raw:: html .. |--| unicode:: U+2013 .. en dash .. |---| unicode:: U+2014 .. em dash, trimming surrounding whitespace :trim: .. This file is part of the OpenDSA eTextbook project. See .. http://algoviz.org/OpenDSA for more details. .. Copyright (c) 2012-2013 by the OpenDSA Project Contributors, and .. distributed under an MIT open source license. .. avmetadata:: :author: Cliff Shaffer :requires: :satisfies: :topic: Search Analyzing Search in Unsorted Arrays =================================== You already know the simplest form of search: the sequential search algorithm. Sequential search on an unsorted list requires :math:`\Theta(n)` time in the worst case. How many comparisons does linear search do on average? A major consideration is whether :math:`K` is in list **L** at all. We can simplify our analysis by ignoring everything about the input except the position of :math:`K` if it is found in **L**. Thus, we have :math:`n+1` distinct possible events: That :math:`K` is in one of positions 0 to :math:`n-1` in **L** (each position having its own probability), or that it is not in :math:`L` at all. We can express the probability that :math:`K` is not in **L** as .. math:: \mathbf{P}(K \notin \mathbf{L}) = 1 - \sum_{i=1}^n \mathbf{P}(K = \mathbf{L}[i]) where :math:`\mathbf{P}(x)` is the probability of event :math:`x`. Let :math:`p_i` be the probability that :math:`K` is in position :math:`i` of **L** (indexed from 0 to :math:`n-1`. For any position :math:`i` in the list, we must look at :math:`i+1` records to reach it. So we say that the cost when :math:`K` is in position :math:`i` is :math:`i+1`. When :math:`K` is not in **L**, sequential search will require :math:`n` comparisons. Let :math:`p_n` be the probability that :math:`K` is not in **L**. Then the average cost :math:`\mathbf{T}(n)` will be .. math:: \mathbf{T}(n) = n p_n + \sum_{i=0}^{n-1} (i+1) p_i. What happens to the equation if we assume all the :math:`p_i` 's are equal (except :math:`p_n`)? .. math:: \mathbf{T}(n) &=& p_n n + \sum_{i=0}^{n-1} (i+1) p\\ &=& p_n n + p\sum_{i=1}^n i\\ &=& p_n n + p\frac{n(n+1)}{2}\\ &=& p_n n + \frac{1 - p_n}{n}\frac{n(n+1)}{2}\\ &=& \frac{n + 1 + p_n(n-1)}{2} Depending on the value of :math:`p_n`, :math:`\frac{n+1}{2} \leq \mathbf{T}(n) \leq n`.