In class, we have discussed the tradeoffs between several data structures for storing collections of values. The provided StatsDriver prints statistics on collections storing a dictionary of 117,663 words. As is, it displays how many megabytes are required to store the dictionary in an ArrayList and how many seconds it takes to do a large number of searches on that ArrayList. By commenting out the ArrayList declaration and uncommenting one of the other lines, you can similarly obtain stats on a LinkedList or TreeSet. Execute this program and report the sizes and times for all three data structures. Based on what we know about the data structures, do the relative sizes and times make sense? Explain your answers.
A trie (pronounced like try) is an interesting tree variant that utilizes structure sharing to save space when storing strings. Unlike a binary tree, where each node has at most two children, a node in a standard trie can have up to 26 children, corresponding to the letters of the alphabet. Within each node, a flag can be set to mark whether the path of letters leading to that node corresponds to a stored string. For example, the diagram below shows a trie that stores 6 words: A, ARM, ART, I, IS, and THE. In this diagram, (1) edges are labeled with their corresponding letters, (2) null edges are not shown, and (3) nodes with set flags are marked with an X.
An empty trie consists of a single, unmarked node whose edges are all null. As words are added to a trie, nodes must be created and linked using the appropriate edges corresponding to the letters.
You are to create a Trie
class that implements this data structure.
Your class must provide the following constructor and methods:
Your Trie
implementation should be case-insensitive, so adding "foo" and "Foo" would
result in a single entry. You may assume that the Strings to be stored in a Trie
consist of letters only (no spaces or punctuation marks).
As part of your Trie
class definition, you will need to define an inner class to
represent the nodes in a Trie
(similar to the Node
, DNode
and TreeNode
inner classes we have previously studied). Your TrieNode
class should have two fields, a boolean (for the flag) and an array of 26 TrieNode
references (for the edges).
To test your Trie
class, define a class named TrieDriver
that contains a main method which (1) creates a Trie, (2) reads in and stores all of the words in dictionary.txt
, and (3) repeatedly prompts the user for letter sequences. For each sequence, it should display whether that sequence is a word in the dictionary, and if not whether it is the prefix of a word in the dictionary. The driver should repeatedly read in and classify letter sequences until the user enters a blank line. For example:
Enter a sequence (blank line to end): zyzz zyzz is not a word, but it is a prefix of a word. Enter again: zyzzyva zyzzyva is a word. Enter again: zzzz zzzz is neither a word nor a prefix. Enter again: DONE
The tradeoffs between tries and the other data structures we have studied are significant. You might expect a trie to require significantly more space, since each node essentially stores only one letter and has 26 potential references. However, structure sharing can offset this cost. For example, storing "fool" requires no additional storage if you have already stored "foolish". Searching a trie should be much faster, as search is independent of the number of words stored and only dependent on the length of the sequence being search for. Modify the StatsDriver
class to generate stats on a trie storing the dictionary. Report your stats and discuss whether they meet your expectations when compared with the previous data structures.