.. _Dictionary: .. raw:: html .. |--| unicode:: U+2013 .. en dash .. |---| unicode:: U+2014 .. em dash, trimming surrounding whitespace :trim: .. This file is part of the OpenDSA eTextbook project. See .. http://algoviz.org/OpenDSA for more details. .. Copyright (c) 2012-2016 by the OpenDSA Project Contributors, and .. distributed under an MIT open source license. .. avmetadata:: :author: Cliff Shaffer :requires: comparison; list ADT :satisfies: dictionary :topic: Design The Dictionary ADT ================== The Dictionary ADT ------------------ The most common objective of computer programs is to store and retrieve data. Much of this book is about efficient ways to organize collections of data records so that they can be stored and retrieved quickly. In this section we describe a simple interface for such a collection, called a :term:`dictionary`. The dictionary ADT provides operations for storing records, finding records, and removing records from the collection. This ADT gives us a standard basis for comparing various data structures. Loosly speaking, we can say that any data structure that supports insert, search, and deletion is a "dictionary". Dictionaries depend on the concepts of a :term:`search key` and :ref:`comparable ` objects. To implement the dictionary's search function, we will require that keys be :term:`totally ordered `. Ordering fields that are naturally multi-dimensional, such as a point in two or three dimensions, present special opportunities if we wish to take advantage of their multidimensional nature. This problem is addressed by :term:`spatial data structures `. Here is code to define a simple abstract dictionary class. .. codeinclude:: Design/Dictionary :tag: DictionaryADT The methods ``insert`` and ``find`` are the heart of the class. Method ``insert`` takes a record and inserts it into the dictionary. Method ``find`` takes a key value and returns some record from the dictionary whose key matches the one provided. If there are multiple records in the dictionary with that key value, there is no requirement as to which one is returned. Method ``clear`` simply re-initializes the dictionary. The ``remove`` method is similar to ``find``, except that it also deletes the record returned from the dictionary. Once again, if there are multiple records in the dictionary that match the desired key, there is no requirement as to which one actually is removed and returned. Method ``size`` returns the number of elements in the dictionary. The remaining Method is ``removeAny``. This is similar to ``remove``, except that it does not take a key value. Instead, it removes an arbitrary record from the dictionary, if one exists. The purpose of this method is to allow a user the ability to iterate over all elements in the dictionary (of course, the dictionary will become empty in the process). Without the ``removeAny`` method, a dictionary user could not get at a record of the dictionary that he didn't already know the key value for. With the ``removeAny`` method, the user can process all records in the dictionary as shown in the following code fragment. .. codeinclude:: Design/DictionaryTest :tag: Dictp4 There are other approaches that might seem more natural for iterating though a dictionary, such as using a "first" and a "next" function. But not all data structures that we want to use to implement a dictionary are able to do "first" efficiently. For example, a hash table implementation cannot efficiently locate the record in the table with the smallest key value. By using ``RemoveAny``, we have a mechanism that provides generic access. Given a database storing records of a particular type, we might want to search for records in multiple ways. For example, we might want to store payroll records in one dictionary that allows us to search by ID, and also store those same records in a second dictionary that allows us to search by name. Here is an implementation for a payroll record. .. codeinclude:: Design/Payroll :tag: Payroll Class ``Payroll`` has multiple fields, each of which might be used as a search key. Simply by varying the type for the key, and using the appropriate field in each record as the key value, we can define a dictionary whose search key is the ID field, another whose search key is the name field, and a third whose search key is the address field. Here is an example where ``Payroll`` objects are stored in two separate dictionaries, one using the ID field as the key and the other using the name field as the key. .. codeinclude:: Design/DictionaryTest :tag: PayrollTest One problem with the example as it is written is that the dictionary relies on the programmer to be reasonable about being consistent with the keys. These dictionaries are intended to have :ref:`homogeneous ` elements. But nothing stops the programmer from inserting an integer key into the names dictionary, or searching with an integer search key. This problem can be handled by using C++ templates or Java generics. The fundamental operation for a dictionary is finding a record that matches a given key. This raises the issue of how to :ref:`extract the key ` from a record. We will usually assume that dictionary implementations store a :term:`key-value pair` so as to be able to extract the key associated with a record for this particular dictionary. The ``insert`` method of the dictionary class supports the key-value pair implementation because it takes two parameters, a record and its associated key for that dictionary. Now that we have defined the dictionary ADT and settled on the design approach of storing key-value pairs for our dictionary entries, we are ready to consider ways to implement it. Two possibilities would be to use an array-based or linked list. Here is an implementation for the dictionary using an (unsorted) array-based list. .. codeinclude:: Design/UALDictionary :tag: UALDictionary Examining class ``UALdict`` (UAL stands for "unsorted array-based list"), we can easily see that ``insert`` is a constant-time operation, because it simply inserts the new record at the end of the list. However, ``find``, and ``remove`` both require :math:`\Theta(n)` time in the average and worst cases, because we need to do a sequential search. Method ``remove`` in particular must touch every record in the list, because once the desired record is found, the remaining records must be shifted down in the list to fill the gap. Method ``removeAny`` removes the last record from the list, so this is a constant-time operation. As an alternative, we could implement the dictionary using a linked list. The implementation would be quite similar to that for ``UALDictionary``, and the cost of the functions should be the same asymptotically. Another alternative would be to implement the dictionary with a sorted list. The advantage of this approach would be that we might be able to speed up the ``find`` operation by using a binary search. To do so, first we must define a variation on the ``List`` ADT to support sorted lists. A sorted list is somewhat different from an unsorted list in that it cannot permit the user to control where elements get inserted. Thus, the ``insert`` method must be quite different in a sorted list than in an unsorted list. Likewise, the user cannot be permitted to append elements onto the list. For these reasons, a sorted list cannot be implemented with straightforward inheritance from the ``List`` ADT. The cost for ``find`` in a sorted list is :math:`\Theta(\log n)` for a list of length :math:`n`. This is a great improvement over the cost of ``find`` in an unsorted list. Unfortunately, the cost of ``insert`` changes from constant time in the unsorted list to :math:`\Theta(n)` time in the sorted list. Whether the sorted list implementation for the dictionary ADT is more or less efficient than the unsorted list implementation depends on the relative number of ``insert`` and ``find`` operations to be performed. If many more ``find`` operations than ``insert`` operations are used, then it might be worth using a sorted list to implement the dictionary. In both cases, ``remove`` requires :math:`\Theta(n)` time in the worst and average cases. Even if we used binary search to cut down on the time to find the record prior to removal, we would still need to shift down the remaining records in the list to fill the gap left by the ``remove`` operation. :ref:`Search trees ` are search structures that can perform all three key operations of insert, search, and delete in :math:`\Theta(\log n)` time. .. avembed:: Exercises/Design/DesignDictionarySumm.html ka :module: Dictionary :long_name: Dictionary Summary Exercise :points: 1.0 :required: True :threshold: 3.0 :exer_opts: JXOP-debug=true&JOP-lang=en&JXOP-code=java_generic