CSC 222: Object-Oriented Programming
Spring 2013

HW 4: Files, Strings & Lists

Every day, literary scholars debate the stylistic choices and writing patterns of famous authors. In fact, it has been shown that certain authors have consistent patterns in the way they choose words and construct sentences, and these patterns can be studied and used to identify the authors of unknown works. For this assignment, you are given a simple class, FileStats.java, that reads words from a text file (whose name is specified by the user) and stores those words in an ArrayList. You will add methods to this class so that it can be used to study the patterns contained in famous literary works.

Note: For all of these new methods, you should assume that a default answer of 0 or 0.0 should be returned if the method is called on an empty list of words. Your methods may add additional fields or local variables, as necessary, but should not need to reopen the file.

  1. Add a method named numWordsOfLength that has one parameter, an integer specifying a word length. The method should calculate and return the number of words in the list that are the specified word length. For example, if the method is called with parameter 5, it should calculate and return the number of 5-character words currently stored.

  2. Add a method named averageCharsPerWord that calculates and returns the average number of characters per word. For example, given a tiny file containing one 3-character word and two 4-character words, a call to averageWordLegth should return 3.666...

  3. Add a method named averageSyllablesPerWord that calculates and returns the average number of syllables per word. To keep things simple, we will assume that any sequence of consecutive vowels (including 'y') corresponds to a syllable. For example, "heavy" has two syllables while "Italian" has three syllables. Hint: you might consider defining a private helper method, similar to strip, which takes a word as parameter and returns the number of syllables in that word.

  4. Add a method named typeTokenRatio that calculates and returns the Type-Token Ratio for the stored words, which is a measure of how repetitive the vocabulary is. The Type-Token Ratio is defined to be the number of different words divided by the total number of words. For example, text with no repeated words will have a Type-Token Ratio of 1.0, while text in which every word appears twice would have a Type-Token Ratio of 0.5.

  5. Add a method named hapaxLegomanaRatio that calculates and returns the Hapax Legomana Ratio for the stored words, which is closely related to the Type-Token Ratio.. The Hapax Legomana Ratio is defined to be the number of singleton words divided by the total number of words. A singleton word is a word that appears exactly once in the list. For example, text with no repeated words will have a Hapax Legomana Ratio of 1.0, while text in which every word appears twice would have a Hapax Legomana Ratio of 0.0.

When testing your modifications/additions, you should utilize small files for which you can hand-calculate stats. Once you are confident it works as desired, you can test your code on the following public-domain texts:


Submit via BlueLine2 your modified FileStats.java in a single ZIP file named "HW4_LAST_FIRST", where LAST is your last name and FIRST is your first name. For example, "HW4_Reed_Dave".