CSC 427: Data Structures and Algorithm Analysis
Fall 2007

HW3: Sets and Maps


For this assignment, you will design and implement a program (i.e., a collection of interacting Java classes) for indexing the words in a file and allowing the user to search that index. Your program should include a GUI that allows three user-initiated actions.

  1. The user should be able to select a file to be processed. The words in that file should be read in by your program and stored in a Map. Each key in the Map is a word that appears in the file, and the value associated with a key is the Set of line numbers at which that word appears. For example, the word "all" appears on lines 2 and 7 of the file gettysburg.txt. As a result of processing that file, an entry should appear in the Map with key "all" and value {2, 7}.
  2. Once a file has been processed, the user should be able to specify a word, and the program should then display the Set of line numbers at which that word appears. The line numbers should appear in numeric (increasing) order.
  3. In addition, the user should be able to ask to see the entire index, in which case all of the words and associated line numbers should be displayed. The words should appear in lexicographical order.

Your program should be case-insensitive. That is, words should be considered equivalent if they have the same sequence of characters, regardless of capitalization. When reading and processing words from the file, you will need to be careful to remove trailing punctuation. For example, if "equal." appears at the end of a sentence, it should be recognized as the word "equal" and processed accordingly. Similarly, "JOY!!!" should be recognized as "JOY" or "joy".

HINT: When processing the text file, you will need to be able to read from the file one line at a time, and then break the line into individual words. This can be done several ways in Java. The Scanner class has a nextLine that reads an entire line of text. You could then construct another Scanner object associated with that string, and read individual words from that string just as you read from the console or from a file. A simpler way, however, is to use the split method of the String class. The split method takes a delimiter as it argument, and returns an array of all substrings, separated by that delimiter. The delimiter can be a simple string, e.g., " ", or it can be a regular expression, e.g., "\\s+" which represents any sequence of whitespace characters. Thus, the statement:

String[] words = line.split("\\s+");

would have the affect of splitting line into individual words (as delimited by whitespace).

You should strive to follow good design principles in creating your program. In particular, you should utilize cohesive and loosely couple classes, and a GUI front-end with minimal application-specific code inside it. Javadoc comments should be included in all classes.

DESIGN PLAN (20%): DUE 10/10
By this date, you should turn in a detailed plan for your program. In particular, you should provide a drawing or screen shot of the GUI, along with a description of what actions will be associated with GUI elements. You should also identify the classes that you plan to implement, with a listing of the methods and brief descriptions of their behaviors.

WORKING PROGRAM (80%): DUE 10/24
The final program must provide the basic functionality described above. For extra credit, you may implement additional features that improve the usability of the program.