CSC 321: Data Structures
Spring 2024

HW5: Lists, Trees & Tries


Memory vs. Speed

In class, we have discussed the tradeoffs between several data structures for storing collections of values. The provided StatsDriver prints statistics on collections storing a dictionary of 117,663 words. As is, it displays how many megabytes are required to store the dictionary in an ArrayList and how many seconds it takes to do a large number of searches on that ArrayList. By commenting out the ArrayList declaration and uncommenting one of the other lines, you can similarly obtain stats on a LinkedList or TreeSet. Execute this program and report the sizes and times for all three data structures. Based on what we know about the data structures, do the relative sizes and times make sense? Explain your answers.

Trie, Trie Again

A trie (pronounced like try) is an interesting tree variant that utilizes structure sharing to save space when storing strings. Unlike a binary tree, where each node has at most two children, a node in a standard trie can have up to 26 children, corresponding to the letters of the alphabet. Within each node, a flag can be set to mark whether the path of letters leading to that node corresponds to a stored string. For example, the diagram below shows a trie that stores 6 words: A, ARM, ART, I, IS, and THE. In this diagram, (1) edges are labeled with their corresponding letters, (2) null edges are not shown, and (3) nodes with set flags are marked with an X.

An empty trie consists of a single, unmarked node whose edges are all null. As words are added to a trie, nodes must be created and linked using the appropriate edges corresponding to the letters. You are to create a Trie class that implements this data structure. Your class must provide the following constructor and methods:

public Trie() // constructs an empty Trie, consisting of a single node public boolean add(String word) // attempts to add word, returns true if successful (not already stored) public boolean contains(String word) // returns true if word is stored in the Trie public boolean containsPrefix(String pre) // returns true if pre is the prefix of a word in the tree public int size() // returns the number of words stored [should be O(1)]

Your Trie implementation should be case-insensitive, so adding "foo" and "Foo" would result in a single entry. You may assume that the Strings to be stored in a Trie consist of letters only (no spaces or punctuation marks). As part of your Trie class definition, you will need to define an inner class to represent the nodes in a Trie (similar to the Node, DNode and TreeNode inner classes we have previously studied). Your TrieNode class should have two fields, a boolean (for the flag) and an array of 26 TrieNode references (for the edges).

To test your Trie class, define a class named TrieDriver that contains a main method which (1) creates a Trie, (2) reads in and stores all of the words in dictionary.txt, and (3) repeatedly prompts the user for letter sequences. For each sequence, it should display whether that sequence is a word in the dictionary, and if not whether it is the prefix of a word in the dictionary. The driver should repeatedly read in and classify letter sequences until the user enters a blank line. For example:

    Enter a sequence (blank line to end): zyzz
      zyzz is not a word, but it is a prefix of a word.
    Enter again: zyzzyva
      zyzzyva is a word.
    Enter again: zzzz
      zzzz is neither a word nor a prefix.
    Enter again:
      DONE

Trie Stats

The tradeoffs between tries and the other data structures we have studied are significant. You might expect a trie to require significantly more space, since each node essentially stores only one letter and has 26 potential references. However, structure sharing can offset this cost. For example, storing "fool" requires no additional storage if you have already stored "foolish". Searching a trie should be much faster, as search is independent of the number of words stored and only dependent on the length of the sequence being search for. Modify the StatsDriver class to generate stats on a trie storing the dictionary. Report your stats and discuss whether they meet your expectations when compared with the previous data structures.