### CSC 222: Computer Programming II Spring 2004 HW1: C++ Review

Did Edgar Allen Poe set the mood with long, colorful words? Did Mark Twain find humor in short, common words? And did Shakespeare straddle the fence, mixing all types of words in his plays? If these questions seem a bit fanciful, you may be surprised to learn that characteristics such as word length distribution have been used by scholars to help identify the authors of anonymous works of literature (or works whose authorship was disputed).

For this assignment, you will write a C++ program that counts the number of words of varying lengths in a text. In particular, your program will read in a text file, determine the length of each word (with punctuation removed), count the number of occurrences of words of each length, and write out the count and relative frequency of each word length. In order to perform this analysis, your code will need to do the following:

• Read in a file one word at a time.
• Determine the length of each word, ignoring punctuation.
• Keep a count for each word length from 1 to 10, and an additional combined count for all words of length greater than 10.
• Calculate the frequency of each word length relative to the total number of letters.

Your program should display the count and relative frequency of each word length, as in the sample execution below. Note that the statistics are displayed in three columns, and the values are aligned down the columns. Also note that the percentages are displayed with only one digit to the right of the decimal place.

 Enter the name of the file to be analyzed: poe.txt length count percent ------ ----- --------- 1: 171 ( 7.4 %) 2: 419 ( 18.0 %) 3: 529 ( 22.8 %) 4: 385 ( 16.6 %) 5: 229 ( 9.9 %) 6: 178 ( 7.7 %) 7: 139 ( 6.0 %) 8: 93 ( 4.0 %) 9: 85 ( 3.7 %) 10: 39 ( 1.7 %) >10: 57 ( 2.5 %)

To be useful in research, your program should be generalizable. In particular, the user should be prompted for the name of the file to be analyzed, and the maximimum word length to be counted should be a constant to make it easily changeable. Your program should also be robust, behaving reasonably when exceptions, such as a missing file or a file containing no words, occur. .