### CSC 221: Introduction to Programming Fall 2013 HW5: Text Files and Readability

For PART 1 of this assignment, you may work with one (new) partner.
PART 2 must be completed independently (with assistance from the instructor as needed).

A variety of measures have been developed to characterize the readability of text. Usually, these measures describe readability in terms of grade level, e.g., This sentence is at a seventh grade reading level. For this assignment, you will write Python functions for calculating the readability grade level for files using two different measures.

• The Flesch-Kincaid Grade Level Formula estimates grade level using the average number of words per sentence and the average number of syllables per word:
```    F-K grade level = 0.39*avgWordsPerSentence + 11.8*avgSyllablesPerWord - 15.59
```
• The SMOG (Simple Measure of Gobbledygook) Formula estimates grade level using the average number of complex words (i.e., words with three or more syllables) per sentence:
```    SMOG grade level = 1.043*√30*avgComplexWordsPerSentence + 3.1291
```

For example, suppose a text file contained 10 sentences, consisting of 50 words. Those 50 words contained a total of 100 syllables, with 10 of the words having three or more syllables in them. Then,

```	 F-K grade level = 0.39*5 + 11.8*2 - 15.59 = 9.96

SMOG grade level = 1.043*√30*1 + 3.1291 = 8.84
```

### PART 1: Helper Functions (40%)

Due to the complexity of the English language, identifying the ends of sentences and the number of syllables in a word can be tricky. To make these tasks manageable, we will make the following simplifications:

• We will assume that any word that ends in a period, exclamation point, or question mark (ignoring trailing quotation marks) is the end of a sentence. For example, the following paragraph contains three sentence: What? He told me to "Go away." So, I left as soon as I could.
• In general, we will assume that any sequence of consecutive vowels (including 'y') corresponds to a syllable. Thus, `"heavy"` has two syllables and `"Italian"` has three syllables. However, words whose last letter is an "e" are a special case. If the "e" is preceded by a vowel (e.g., `"tree"`) or the letter "l" (e.g., `"whistle"`), or if the "e" is the only vowel in the word (e.g., `"the"`), then it counts as a syllable. Otherwise, the trailing "e" does not count as a syllable (e.g., `"spite"`).

Define a function named `isEndOfSentence` that has a single word as input. The function should return `True` if the word ends in a period, exclamation point, or question mark (ignoring trailing quotation marks). For example, `isEndOfSentence("What?")` should return `True`, while `isEndOfSentence("So,")` should return `False`. Hint: to ignore trailing quotation marks, use the string `rstrip` method. For example, the following assignment will strip trailing quotation marks off of a `word` and save the resulting string in `stripped`:

stripped = word.rstrip("\"\'")

Define a function named `countSyllables` that has a single word as input. The function should return the number of syllables in that word (using the above rules for estimating syllables). For example, `countSyllables("people")` should return `2`, while `countSyllables("mezzanine")` should return `3`.

Be sure to test your functions thoroughly before moving on to the next part.

### PART 2: File Processing (60%)

Consider the following function that processes a text file and displays the individual words in that file:

Since the file to be processed may be large, the `processFile` function reads its contents one line at a time, breaking each line into individual words (using the string `split` method). You are to enter this function into your Python module and modify it so that it collects statistics on the words in the file and displays the number of syllables, words, and sentences (using the helper functions written in Part 1). Using those counts, it should also calculate and display the Flesch-Kincaid and SMOG grade levels for the file, rounded to a single decimal place. For example:

/Users/davereed/Documents/Classes/CSC221/melville.txt Number of syllables = 21755 Number of words = 14343 Number of sentences = 817 Flesch-Kincaid grade level = 9.2 SMOG grade level = 12.0

One special case you will need to watch out for when processing a text file is when "words" contain no syllables. These include numbers, e.g., "2011" and punctuation sequences, e.g., "--". For this assignment, any "word" that contains no syllables (as determined by your `countSyllables` function) should not contribute to the word count for the file, but may contribute to the sentence count. For example, the sentence `"The year is 2011."` would be considered to have only 3 words in it.

You should test your code on small files for which you can hand-calculate stats. Once you are confident it works as desired, you can test your code on the following public-domain texts: