CSC 427: Data Structures and Algorithm Analysis
Fall 2006

HW3: Java Collections


PART 1:

In lectures, we considered several versions of the Dictionary class. The most basic version used an ArrayList, placing new items at the end and performing sequential search as needed. As a result, insertions were O(1) and searches were O(N), where N is the number of words in the list. Time this basic version of the Dictionary class using the DictionaryTimer class from the lectures. Your timings should be similar to the timings shown in the lecture, although they may differ somewhat based on your computer's specs. Show your timings and comment as to whether these timings support our big-Oh analysis.

Modify the Dictionary class so that it uses a TreeSet to store the words. A TreeSet provides both insertion and search in O(log N) time, and has the additional advantage that it stores the words in alphabetical order. Time this modified version using the DictionaryTimer class and compare the timings with those of the basic ArrayList version. Do these timings support our big-Oh analysis? Explain.

Similarly, modify the Dictionary class so that it uses a HashSet to store the words. A HashSet provides both insertion and search in O(1), on average, but does not maintain order. Time this modified version using the DictionaryTimer class and compare the timings with those of the ArrayList and TreeSet versions. Do these timings support our big-Oh analysis? Explain.

Part 2

When performing a search for a title, say a book in the library or a song on the Internet, you often have only part of the title. For example, you might know that the song has "Sugar" in the title, but not know the remaining words. A common technique for facilitating efficient searches is to preprocess the list of titles based on keywords. The title is indexed by each "significant" word, so that it is possible to go to all titles containing "sugar" without searching the entire list. Insignificant words, such as "a", "the" and "of", are generally ignored, and words are assumed to be case-insensitive.

You are to create a class named KeywordIndex that has the following structure:

public class KeywordIndex { /** Constructs an empty keyword index. * @param ignoredWords a collection of words to ignore when indexing */ public KeywordIndex(Collection<String> ignoredWords) { ... } /** Adds a new title to the keyword index * @param title the new title to be added */ public void add(String title) { ... } /** Looks up all of the titles with the specified keyword * @param keyword the word to be searched for * @return the collection of all titles with that keyword */ public Collection<String> lookup(String keyword) { ... } /** Displays each keyword (in alphabetical order) along with the titles * associated with that keyword (also in alphabetical order). */ public void showAll() { ... } }

A main class, IndexTester.java has been provided for testing your class. Note: you could implement your class using only Lists, but the task will be much simpler if you utilize Set and Map.

HINT: When storing the entries for a title, you will need to be able to break the title into individual words. This can be done several ways in Java. One would be to construct a Scanner object associated with that string, and then read individual words from that string just as you read from the console or from a file. A simpler way, however, is to use the split method of the String class. The split method takes a delimiter as it argument, and returns an array of all substrings, separated by that delimiter. The delimiter can be a simple string, e.g., " ", or it can be a regular expression, e.g., "\\s+" which represents any sequence of whitespace characters. Thus, the statement:

String[] words = title.split("\\s+");

would have the affect of splitting title into individual words (as delimited by whitespace).