2.1. Algorithm Analysis: Part 2¶
2.1.1. Asymptotic Analysis: Big-oh¶
Definition: For \(\mathbf{T}(n)\) a non-negatively valued function, \(\mathbf{T}(n)\) is in the set \(O(f(n))\) if there exist two positive constants \(c\) and \(n_0\) such that \(T(n) \leq cf(n)\) for all \(n > n_0\).
Use: The algorithm is in \(O(n^2)\) in [best, average, worst] case.
Meaning: For all data sets big enough (i.e., \(n>n_0\)), the algorithm always executes in less than \(cf(n)\) steps in the [best, average, worst] case.
2.1.2. Big-oh Notation (cont)¶
Big-oh notation indicates an upper bound.
Example: If \(\mathbf{T}(n) = 3n^2\) then \(\mathbf{T}(n)\) is in \(O(n^2)\).
Look for the tightest upper bound:
While \(\mathbf{T}(n) = 3n^2\) is in \(O(n^3)\), we prefer \(O(n^2)\).
2.1.3. Big-Oh Examples¶
Example 1: Finding value X in an array (average cost).
Then \(\textbf{T}(n) = c_{s}n/2\).
For all values of \(n > 1, c_{s}n/2 \leq c_{s}n\).
Therefore, the definition is satisfied for \(f(n)=n, n_0 = 1\), and \(c = c_s\). Hence, \(\textbf{T}(n)\) is in \(O(n)\).
2.1.4. Big-Oh Examples (2)¶
Example 2: Suppose \(\textbf{T}(n) = c_{1}n^2 + c_{2}n\), where \(c_1\) and \(c_2\) are positive.
\(c_{1}n^2 + c_{2}n \leq c_{1}n^2 + c_{2}n^2 \leq (c_1 + c_2)n^2\) for all \(n > 1\).
Then \(\textbf{T}(n) \leq cn^2\) whenever \(n > n_0\), for \(c = c_1 + c_2\) and \(n_0 = 1\).
Therefore, \(\textbf{T}(n)\) is in \(O(n^2)\) by definition.
Example 3: \(\textbf{T}(n) = c\). Then \(\textbf{T}(n)\) is in \(O(1)\).
2.1.5. A Common Misunderstanding¶
“The best case for my algorithm is n=1 because that is the fastest.”
WRONG!
Big-oh refers to a growth rate as n grows to \(\infty\)
Best case is defined for the input of size n that is cheapest among all inputs of the same size \(n\).
2.1.6. Big \(\Omega\)¶
Definition: For \(\textbf{T}(n)\) a non-negatively valued function, \(\textbf{T}(n)\) is in the set \(\Omega(g(n))\) if there exist two positive constants \(c\) and \(n_0\) such that \(\textbf{T}(n) \geq cg(n)\) for all \(n > n_0\).
Meaning: For all data sets big enough (i.e., \(n > n_0\)), the algorithm always requires more than \(cg(n)\) steps.
Lower bound.
2.1.7. Big-Omega Example¶
\(\textbf{T}(n) = c_1n^2 + c_2n\).
\(c_1n^2 + c_2n \geq c_1n^2\) for all \(n > 1\).
\(\textbf{T}(n) \geq cn^2\) for \(c = c_1\) and \(n_0 = 1\).
Therefore, \(\textbf{T}(n)\) is in \(\Omega(n^2)\) by the definition.
We want the greatest lower bound.
2.1.8. \(\Theta\) Notation¶
When big-Oh and \(\Omega\) coincide, we indicate this by using \(\Theta\) (big-Theta) notation.
Definition: An algorithm is said to be in \(\Theta(h(n))\) if it is in \(O(h(n))\) and it is in \(\Omega(h(n))\).
2.1.9. A Common Misunderstanding¶
Confusing worst case with upper bound.
Upper bound refers to a growth rate.
Worst case refers to the worst input from among the choices for possible inputs of a given size.
2.1.10. Simplifying Rules¶
If \(f(n)\) is in \(O(g(n))\) and \(g(n)\) is in \(O(h(n))\), then \(f(n)\) is in \(O(h(n))\).
If \(f(n)\) is in \(O(kg(n))\) for some constant \(k > 0\), then \(f(n)\) is in \(O(g(n))\).
If \(f_1(n)\) is in \(O(g_1(n))\) and \(f_2(n)\) is in \(O(g_2(n))\), then \((f_1 + f_2)(n)\) is in \(O(\max(g_1(n), g_2(n)))\).
If \(f_1(n)\) is in \(O(g_1(n))\) and \(f_2(n)\) is in \(O(g_2(n))\), then \(f_1(n)f_2(n)\) is in \(O(g_1(n)g_2(n))\).
2.1.11. Summary (1)¶
If we fix the size of \(n\)
TOH only has one input of any given size \(n\).
Findmax has the same cost (in terms of number of records viewed) for all inputs of size \(n\).
Find a value has different costs for different arrangements of the values in the array (ranging from 1 to n).
2.1.12. Summary (2)¶
2.1.13. Time Complexity Examples (1)¶
Example: a = b;
This assignment takes constant time, so it is \(\Theta(1)\).
Example:
sum = 0;
for (i=1; i<=n; i++) {
sum += n;
}
2.1.14. Time Complexity Examples (2)¶
Example:
sum = 0;
for (j=1; j<=n; j++) { // First for loop
for (i=1; i<=j; i++) { // is a double loop
sum++;
}
}
for (k=0; k<n; k++) { // Second for loop
A[k] = k;
}
2.1.15. Time Complexity Examples (3)¶
Example: Compare these two code fragments:
sum1 = 0;
for (i=1; i<=n; i++) { // First double loop
for (j=1; j<=n; j++) { // do n times
sum1++;
}
}
sum2 = 0;
for (i=1; i<=n; i++) { // Second double loop
for (j=1; j<=i; j++) { // do i times
sum2++;
}
}
2.1.16. Time Complexity Examples (4)¶
Not all double loops are \(\Theta(n^2)\).
sum1 = 0;
for (k=1; k<=n; k*=2) { // Do log n times
for (j=1; j<=n; j++) { // Do n times
sum1++;
}
}
sum2 = 0;
for (k=1; k<=n; k*=2) { // Do log n times
for (j=1; j<=k; j++) { // Do k times
sum2++;
}
}
2.1.17. Binary Search¶
How many elements are examined in worst case?
// Return the position of an element in sorted array A with value K.
// If K is not in A, return A.length.
public static int binarySearch(int[] A, int K) {
int low = 0;
int high = A.length - 1;
while(low <= high) { // Stop when low and high meet
int mid = (low + high) / 2; // Check middle of subarray
if( A[mid] < K) low = mid + 1; // In right half
else if(A[mid] > K) high = mid - 1; // In left half
else return mid; // Found it
}
return A.length; // Search value not in A
}
2.1.18. Other Control Statements¶
while loop: Analyze like a for loop.
if statement: Take greater complexity of then/else clauses.
switch statement: Take complexity of most expensive case.
Subroutine call: Complexity of the subroutine.
2.1.19. Analyzing Problems¶
Upper bound: Upper bound of best known algorithm.
Lower bound: Lower bound for every possible algorithm.
2.1.20. Analyzing Problems: Example¶
May or may not be able to obtain matching upper and lower bounds.
Example of imperfect knowledge: Sorting
Cost of I/O: \(\Omega(n)\).
Bubble or insertion sort: \(O(n^2)\).
A better sort (Quicksort, Mergesort, Heapsort, etc.): \(O(n \log n)\).
We prove later that sorting is in \(\Omega(n \log n)\).
2.1.21. Space/Time Tradeoff Principle¶
One can often reduce time if one is willing to sacrifice space, or vice versa.
Encoding or packing information
Boolean flags: Need less space, but take time to unpack
Table lookup
Factorials: Store a table of values for lookup instead of computing
Disk-based Space/Time Tradeoff Principle: The smaller you make the disk storage requirements, the faster your program will run.
2.1.22. Multiple Parameters¶
Compute the rank ordering for all C pixel values in a picture of P pixels.
for (i=0; i<C; i++) { // Initialize count
count[i] = 0;
}
for (i=0; i<P; i++) { // Look at all of the pixels
count[value(i)]++; // Increment a pixel value count
}
sort(count); // Sort pixel value counts
If we use P as the measure, then time is \((P \log P)\). Unrealistically high.
If we use C as the measure, then time is \((C \log C)\). Unrealistically low.
More accurate is \(\Theta(P + C \log C)\).
2.1.23. Space Complexity¶
Space complexity can also be analyzed with asymptotic complexity analysis.
Time: Algorithm
Space: Data Structure
