CSC 222: Computer Programming II
Spring 2004

HW1: C++ Review


Did Edgar Allen Poe set the mood with long, colorful words? Did Mark Twain find humor in short, common words? And did Shakespeare straddle the fence, mixing all types of words in his plays? If these questions seem a bit fanciful, you may be surprised to learn that characteristics such as word length distribution have been used by scholars to help identify the authors of anonymous works of literature (or works whose authorship was disputed).

For this assignment, you will write a C++ program that counts the number of words of varying lengths in a text. In particular, your program will read in a text file, determine the length of each word (with punctuation removed), count the number of occurrences of words of each length, and write out the count and relative frequency of each word length. In order to perform this analysis, your code will need to do the following:

Your program should display the count and relative frequency of each word length, as in the sample execution below. Note that the statistics are displayed in three columns, and the values are aligned down the columns. Also note that the percentages are displayed with only one digit to the right of the decimal place.

Enter the name of the file to be analyzed: poe.txt length count percent ------ ----- --------- 1: 171 ( 7.4 %) 2: 419 ( 18.0 %) 3: 529 ( 22.8 %) 4: 385 ( 16.6 %) 5: 229 ( 9.9 %) 6: 178 ( 7.7 %) 7: 139 ( 6.0 %) 8: 93 ( 4.0 %) 9: 85 ( 3.7 %) 10: 39 ( 1.7 %) >10: 57 ( 2.5 %)

To be useful in research, your program should be generalizable. In particular, the user should be prompted for the name of the file to be analyzed, and the maximimum word length to be counted should be a constant to make it easily changeable. Your program should also be robust, behaving reasonably when exceptions, such as a missing file or a file containing no words, occur. .

For testing purposes, you may download the following public-domain texts:

Your grade for this assignment will be partially based on the readability of your code. This program (and all subsequent programs) should have comments at the top giving the file name, your name, date, and a brief description. Functions should be used to encapsulate logical sections of code, and each function should have comments describing its behavior. Function and variable names should be informative, and blank lines and indentation should be used to make code structure clear. Confusing or grossly inefficient/redundant code may result in a penalty.