4.1. Project 2¶
4.1.1. Project 2 Description¶
The data stored for this project are DNA sequences:
(potentially long) strings consisting of the characters A, C, G, T.
We want to know whether a search string is in the collection, and we want to be able to find all strings that match a prefix.
We don’t want to pay the price of a string comparison that looks at all the characters of the strings.
Our solution: DNA Trees. (Note: This is totally invented for this project.)
4.1.2. Project 2 Mechanics¶
Completely identical to Project 1
Implementing to an interface.
Reference tests reference the interface methods.
Three milestones.
LLM Survey and Video requirements.
Mutation coverage threshold is 95% (which should be easier than it sounds for this project).
4.1.3. DNA Trees¶
A DNA Tree has branches for each letter. So all sequences that start with A go down the A branch, and so on.
Only leaf nodes store sequences. Never internal nodes.
What if we want to store a sequence that is a prefix of another?
The DNA tree internal nodes store a 5th branch labeled ‘$’.
Note that the $ branch is always a leaf node. (So it is a little bit special in that respect, but you don’t need to treat it differently in your tree implementation.)
Searches can distinguish between a string and a string prefix by putting $ at the end of a string, but not at the end of a string prefix.
4.1.4. PR Quadtrees¶
Some similarity to DNA trees.
4.1.5. DNA Tree Implementation¶
Design requirements:
Must use Composite design pattern
Must use Flyweight design pattern
