CSC 222: Object-Oriented Programming
Fall 2015

HW 4: Files, Strings & Lists

In recent years, researchers have increasingly used data generated by social media to study the behavior and psychological well being of communities. In particular, a number of research studies have looked at the amount of positivity/negativity in tweets and used the result to predict diverse trends such as community health, political unrest, and economic development.

For this assignment, you will write a Java class that can be used to classify the mood of a message by comparing the number of unique positive and negative words in that message. For example, consider the following tweet:

I hate it when good things happen to bad people.

There are two words in the message that have negative connotations: "hate" and "bad", and one word that has positive connotations: "good". Since the negative words outnumber the positive one, we would say that this message has a negative mood. If a message has the same number of unique positive and negative words, we say that it is neutral.

For our purposes, we will assume case-insensitivy and will ignore word breaks, so positive/negative words will count even if embedded in other words. For example, "Madly", "maddening", and "MADDER" would all count as negative words since they contain the root "mad". Of course, this can lead to incorrect matches, e.g., "made", but even this simplisitic approach can be useful for classifying messages. Also note that we count unique positive and negative words, so "It made me mad." would still count as having only one negative word.

You are to implement the Classifier class, whose javadoc description is provided in Classifier.html. To make testing your class easier, files containing positive and negative words (positives.txt and negatives.txt) are provided. In addition, the file tweets.txt contains 498 actual tweets that you can use to test your processFile method.