.. _Indexing:
.. raw:: html
.. |--| unicode:: U+2013 .. en dash
.. |---| unicode:: U+2014 .. em dash, trimming surrounding whitespace
:trim:
.. odsalink:: AV/Indexing/linearIndexingCON.css
.. odsalink:: AV/Indexing/treeIndexingCON.css
.. odsalink:: AV/Indexing/twoThreeTreeCON.css
.. This file is part of the OpenDSA eTextbook project. See
.. http://algoviz.org/OpenDSA for more details.
.. Copyright (c) 2012-2013 by the OpenDSA Project Contributors, and
.. distributed under an MIT open source license.
.. avmetadata::
:author: Cliff Shaffer
========
Indexing
========
Indexing
--------
Indexing
~~~~~~~~
* Goals:
* Store large files
* Support multiple search keys
* Support efficient insert, delete, and range queries
Files and Indexing
~~~~~~~~~~~~~~~~~~
* **Entry sequenced file**: Order records by time of insertion.
* Search with sequential search
* **Index file**: Organized, stores pointers to actual records.
* Could be organized with a tree or other data structure.
Keys and Indexing
~~~~~~~~~~~~~~~~~
* **Primary Key** : A unique identifier for records. May be
inconvenient for search.
* **Secondary Key**: An alternate search key, often not unique for
each record. Often used for search key.
Linear Indexing (1)
~~~~~~~~~~~~~~~~~~~
* **Linear index**: Index file organized as a simple sequence of
key/record pointer pairs with key values are in sorted order.
* Linear indexing is good for searching variable-length records.
.. inlineav:: varindexCON ss
:long_name: Simple linear index Slideshow
:output: show
Linear Indexing (2)
~~~~~~~~~~~~~~~~~~~
* If the index is too large to fit in main memory, a second-level index
might be used.
.. inlineav:: linindexCON ss
:long_name: Two-level linear index Slideshow
:output: show
:align: justify
Tree Indexing (1)
~~~~~~~~~~~~~~~~~
* Linear index is poor for insertion/deletion.
* Tree index can efficiently support all desired operations:
* Insert/delete
* Multiple search keys (multiple indices)
* Key range search
Tree Indexing (2)
~~~~~~~~~~~~~~~~~
.. inlineav:: pagedBSTCON ss
:long_name: Paged BST Slideshow
:output: show
Tree Indexing (3)
~~~~~~~~~~~~~~~~~
* Difficulties when storing tree index on disk:
* Tree must be balanced.
* Each path from root to leaf should cover few disk pages.
Tree Indexing (4)
~~~~~~~~~~~~~~~~~
.. inlineav:: rebalanceBSTCON ss
:long_name: Paged BST With Disk Accesses Slideshow
:output: show
2-3 Tree
~~~~~~~~
* A 2-3 Tree has the following properties:
#. A node contains one or two keys
#. Every internal node has either two children (if it contains
one key) or three children (if it contains two keys).
#. All leaves are at the same level in the tree, so the tree is
always height balanced.
* The 2-3 Tree has a search tree property analogous to the BST.
2-3 Tree Example
~~~~~~~~~~~~~~~~
* The advantage of the 2-3 Tree over the BST is that it can be
updated at low cost.
.. inlineav:: twoThreedgmCON dgm
:align: center
2-3 Tree Insertion (1)
~~~~~~~~~~~~~~~~~~~~~~
.. inlineav:: simpleInsertCON ss
:long_name: 2-3 Tree Insert Slideshow
:output: show
2-3 Tree Insertion (2)
~~~~~~~~~~~~~~~~~~~~~~
.. inlineav:: promoteCON ss
:long_name: 2-3 Tree Insert Promotion Slideshow
:output: show
2-3 Tree Insertion (3)
~~~~~~~~~~~~~~~~~~~~~~
.. inlineav:: splitCON ss
:long_name: 2-3 Tree Insert Split Slideshow
:output: show
B-Trees (1)
~~~~~~~~~~~
* The B-Tree is an extension of the 2-3 Tree.
* The B-Tree is now the standard file organization for applications
requiring insertion, deletion, and key range searches.
B-Trees (2)
~~~~~~~~~~~
#. B-Trees are always balanced.
#. B-Trees keep similar-valued records together on a disk page,
which takes advantage of locality of reference.
#. B-Trees guarantee that every node in the tree will be full at
least to a certain minimum percentage. This improves space
efficiency while reducing the typical number of disk fetches
necessary during a search or update operation.
.. odsafig:: Images/BTexamp.png
:width: 600
:align: center
:capalign: justify
:figwidth: 90%
:alt: A B-tree of order four
.. slide;: B-Tree Definition
* A B-Tree of order :math:`m` has these properties:
* The root is either a leaf or has two children.
* Each node, except for the root and the leaves, has between
:math:`\lceil m/2 \rceil` and :math:`m` children.
* All leaves are at the same level in the tree, so the tree is
always height balanced.
* A B-Tree node is usually selected to match the size of a disk
block.
* A B-Tree node could have hundreds of children.
B-Tree Search
~~~~~~~~~~~~~
* Generalizes search in a 2-3 Tree.
#. Do binary search on keys in current node. If search key is
found, then return record. If current node is a leaf node
and key is not found, then report an unsuccessful search.
#. Otherwise, follow the proper branch and repeat the process.
B+-Trees
~~~~~~~~
* The most commonly implemented form of the B-Tree is the B+-Tree.
* Internal nodes of the B+-Tree do not store record -- only key
values to guild the search.
* Leaf nodes store records or pointers to records.
* A leaf node may store more or less records than an internal node
stores keys.
B+-Tree Example
~~~~~~~~~~~~~~~
.. odsafig:: Images/BPexamp.png
:width: 800
:align: center
:capalign: justify
:figwidth: 90%
:alt: Example of a :math:`\mathrm{B}^+` tree.
* In this example, an internal node can have 2 to 4 children
* A leaf node can hold 3 to 5 keys
B+-Tree Insertion
~~~~~~~~~~~~~~~~~
.. odsafig:: Images/BPins.png
:width: 600
:align: center
:capalign: justify
:figwidth: 90%
:alt: Examples of :math:`\mathrm{B}^+` tree insertion.
B+-Tree Deletion (1)
~~~~~~~~~~~~~~~~~~~~
.. odsafig:: Images/BPexamp.png
:width: 800
:align: center
:capalign: justify
:figwidth: 90%
:alt: Example of a :math:`\mathrm{B}^+` tree.
* Delete 18
.. odsafig:: Images/BPsimDel.png
:width: 800
:align: center
:capalign: justify
:figwidth: 90%
:alt: Simple deletion from a :math:`\mathrm{B}^+` tree.
B+-Tree Deletion (2)
~~~~~~~~~~~~~~~~~~~~
.. odsafig:: Images/BPexamp.png
:width: 800
:align: center
:capalign: justify
:figwidth: 90%
:alt: Example of a :math:`\mathrm{B}^+` tree.
* Delete 12
.. odsafig:: Images/BPborrow.png
:width: 800
:align: center
:capalign: justify
:figwidth: 90%
:alt: Deletion from a :math:`\mathrm{B}^+` tree via borrowing from
a sibling.
B+-Tree Deletion (3)
~~~~~~~~~~~~~~~~~~~~
.. odsafig:: Images/BPexamp.png
:width: 800
:align: center
:capalign: justify
:figwidth: 90%
:alt: Example of a :math:`\mathrm{B}^+` tree.
* Delete 33
.. odsafig:: Images/BPmerge.png
:width: 800
:align: center
:capalign: justify
:figwidth: 90%
:alt: Deletion from a :math:`\mathrm{B}^+` tree via collapsing siblings
.
~
.
B-Tree Space Analysis (1)
~~~~~~~~~~~~~~~~~~~~~~~~~
* B+-Trees nodes are always at least half full.
* The B*-Tree splits two pages for three, and combines three pages into
two. In this way, nodes are always 2/3 full.
* Asymptotic cost of search, insertion, and deletion of nodes from
B-Trees is :math:`\Theta(log n)`.
* Base of the log is the (average) branching factor of the tree.
B-Tree Space Analysis (2)
~~~~~~~~~~~~~~~~~~~~~~~~~
* Example: Consider a B+-Tree of order 100 with leaf nodes
containing 100 records.
* 1 level B+-tree:
* 2 level B+-tree:
* 3 level B+-tree:
* 4 level B+-tree:
* Ways to reduce the number of disk fetches:
* Keep the upper levels in memory.
* Manage B+-Tree pages with a buffer pool.
B-Trees: The Big Idea
~~~~~~~~~~~~~~~~~~~~~
* B-trees are really good at managing a sorted list
* They break the list into manageable chunks
* The leaves of the B+-tree form the list
* The internal nodes of the B+-tree merely help find the right chunk
.. odsascript:: AV/Indexing/varindexCON.js
.. odsascript:: AV/Indexing/linindexCON.js
.. odsascript:: AV/Indexing/pagedBSTCON.js
.. odsascript:: AV/Indexing/rebalanceBSTCON.js
.. odsascript:: AV/Indexing/twoThreeTreeCON.js
.. odsascript:: AV/Indexing/twoThreedgmCON.js
.. odsascript:: AV/Indexing/simpleInsertCON.js
.. odsascript:: AV/Indexing/promoteCON.js
.. odsascript:: AV/Indexing/splitCON.js