For PART 1 of this assignment, you may work with one (new) partner.
PART 2 must be completed independently (with assistance from the instructor as needed).
A variety of measures have been developed to characterize the readability of text. Usually, these measures describe readability in terms of grade level, e.g., This sentence is at a seventh grade reading level. For this assignment, you will write Python functions for calculating the readability grade level for files using two different measures.
F-K grade level = 0.39*avgWordsPerSentence + 11.8*avgSyllablesPerWord - 15.59
SMOG grade level = 1.043*√30*avgComplexWordsPerSentence + 3.1291
For example, suppose a text file contained 10 sentences, consisting of 50 words. Those 50 words contained a total of 100 syllables, with 10 of the words having three or more syllables in them. Then,
F-K grade level = 0.39*5 + 11.8*2 - 15.59 = 9.96
SMOG grade level = 1.043*√30*1 + 3.1291 = 8.84
Due to the complexity of the English language, identifying the ends of sentences and the number of syllables in a word can be tricky. To make these tasks manageable, we will make the following simplifications:
"heavy"
has two syllables and "Italian"
has three syllables. However, words whose last letter is an "e" are a special case. If the "e" is preceded by a vowel (e.g., "tree"
) or the letter "l" (e.g., "whistle"
), or if the "e" is the only vowel in the word (e.g., "the"
), then it counts as a syllable. Otherwise, the trailing "e" does not count as a syllable (e.g., "spite"
).
Define a function named isEndOfSentence
that has a single word as input. The function should return True
if the word ends in a period, exclamation point, or question mark (ignoring trailing quotation marks). For example, isEndOfSentence("What?")
should return True
, while
isEndOfSentence("So,")
should return False
.
Hint: to ignore trailing quotation marks, use the string rstrip
method. For example, the following assignment will strip trailing quotation marks off of a word
and save the resulting string in stripped
:
Define a function named countSyllables
that has a single word as input. The function should return the number of syllables in that word (using the above rules for estimating syllables). For example, countSyllables("people")
should return 2
, while
countSyllables("mezzanine")
should return 3
.
Be sure to test your functions thoroughly before moving on to the next part.
Consider the following function that processes a text file and displays the individual words in that file:
Since the file to be processed may be large, the processFile
function reads its contents one line at a time, breaking each line into individual words (using the string split
method). You are to enter this function into your Python module and modify it so that it collects statistics on the words in the file and displays the number of syllables, words, and sentences (using the helper functions written in Part 1). Using those counts, it should also calculate and display the Flesch-Kincaid and SMOG grade levels for the file, rounded to a single decimal place. For example:
One special case you will need to watch out for when processing a text file is when "words" contain no syllables. These include numbers, e.g., "2011" and punctuation sequences, e.g., "--". For this assignment, any "word" that contains no syllables (as determined by your countSyllables
function) should not contribute to the word count for the file, but may contribute to the sentence count. For example, the sentence "The year is 2011."
would be considered to have only 3 words in it.
You should test your code on small files for which you can hand-calculate stats. Once you are confident it works as desired, you can test your code on the following public-domain texts: