CSC 221: Introduction to Programming
Fall 2023

HW6: Lists and Structured Data


For this assignment, you will write several functions that access and analyze a file of structured tweet data. You may assume that the file contains one tweet per line, with ten comma-separated values per tweet. The first line of the file contains comma-separated headings that identify each of the ten values. For example,

date(UTC),day(UTC),time(UTC),tweet,mentions,photos,#replies,#retweets,#likes,retweet? 10/30/23,Mon,20:27:08,how about those bluejays?,[],[],0,8,12,FALSE 10/30/23,Mon,20:18:33,I am Mr. Meseeks - look at me!,[buddy],[],0,150,200,FALSE 10/28/23,Sat,18:18:33,check this out,[],[],0,0,0,TRUE 10/27/23,Fri,5:57:29,I'm going to sleep,[],[],0,0,0,TRUE 10/26/23,Thu,13:57:29,out of bed,[],[],0,0,0,TRUE . . .

The file elonmusk.csv contains all of Elon Musk's tweets from 2015-2020 with corresponding metadata.

PART 1: showDate

Define a function named showDate that takes two inputs, a string representing a date and a list of tweet records (e.g., as read in using the readCSVlist function from class). The function should print the text of every tweet that was posted on that date, one tweet per line. After displaying all of the tweets, it should display a blank line followed by a line of the form "There were # tweets on M/D/Y". For example, assuming the tweet data above had been read in and stored under the name sampleTweets, the call showDate("10/30/23", sampleTweets) would display the following:

how about those bluejays? I am Mr. Meseeks - look at me! There were 2 tweets on 10/30/23

PART 2: numBetween and showYears

Define a function named numBetween that has three inputs, a start date, an end date and a tweet list. The function should return the number of tweets from the list that occurred between the specified dates (inclusive). For example, given the sampleTweets above, the call numBetween("10/27/23", "10/29/23", sampleTweets) should return 2. The following helper function might prove useful when comparing dates:

def lessThan(date1:str, date2:str) -> bool: """Returns True if date1 comes before date2, else False.""" [month1, day1, year1] = [int(x) for x in date1.split("/")] [month2, day2, year2] = [int(x) for x in date2.split("/")] return year1 < year2 or \ (year1 == year2 and month1 < month2) or \ (year1 == year2 and month1 == month2 and day1 < day2)

Once you have thoroughly tested your numBetween function, define an additional function named showYears that takes three inputs, a start year, an end year and a tweet list. It should display the number of tweets in each year in that range, followed by a blank line and a line showing the total number of tweets. For example, the call showYears(21, 23, sampleTweets) might produce the following:

21 : 273 22 : 120 23 : 301 ======== TOTAL : 694

PART 3: allHours

In class, we wrote a function named allDays that had a single input, a structured list of tweet data, and displayed tweet counts for each day of the week. You are to write a function named allHours that similarly displays tweet counts for every hour of the day (ranging from 0 to 23). For example:

HOUR # TWEETS =============== 0 312 1 80 2 12 3 2 4 0 . . . . . . 23 420
Note that the output should appear as above, with the data aligned in columns and a heading at the top. Also, recall that times in the tweet file are in Universal Coordinated Time (UCT). You should adjust those times to the Pacific time zone (eight hours earlier) when collecting your statistics.

PART 4: mostFrequent

In class, we wrote a function named inTweets that counted how many tweets contained a specific word. A related task would be to determine which word appears most often in all of the tweets. Define a function named mostFrequent that takes a tweet list as input and returns a pair containing the word that appeared most often and its corresponding count. For example, if we assumed that the listfirstFive contained the five tweets listed above, then the call mostFrequent(firstFive) should return ['out', 2].

Since the total number of words in the tweets is a very large number, you will need to be efficient in implementing this function. You should create a dictionary that keeps count of the words encountered, similar to the wordCounts example from class. You will need to traverse the tweets, extract each word (removing punctuation), and keep counts in the dictionary. Once the dictionary is complete, you can traverse it and find the word with the largest count.



Save your functions in a file named lastnameTweets.py, where lastname is your last name. Your file should have a comment block at the top that includes your name and a brief description of the file, and each function should have a doc string that describes its behavior. Be careful to name the functions exactly as specified (including the order of the inputs). Feel free to implement additional helper functions as needed.