For this assignment, you will write several functions that access and analyze a file of structured tweet data. You may assume that the file contains one tweet per line, with ten comma-separated values per tweet. The first line of the file contains comma-separated headings that identify each of the ten values. For example,
date(UTC),day(UTC),time(UTC),tweet,mentions,photos,#replies,#retweets,#likes,retweet? 10/30/23,Mon,20:27:08,how about those bluejays?,[],[],0,8,12,FALSE 10/30/23,Mon,20:18:33,I am Mr. Meseeks - look at me!,[buddy],[],0,150,200,FALSE 10/28/23,Sat,18:18:33,check this out,[],[],0,0,0,TRUE 10/27/23,Fri,5:57:29,I'm going to sleep,[],[],0,0,0,TRUE 10/26/23,Thu,13:57:29,out of bed,[],[],0,0,0,TRUE . . .
The file elonmusk.csv contains all of Elon Musk's tweets from 2015-2020 with corresponding metadata.
showDate
Define a function named showDate
that takes two inputs, a string representing a date and a list of tweet records (e.g., as read in using the readCSVlist
function from class). The function should print the text of every tweet that was posted on that date, one tweet per line. After displaying all of the tweets, it should display a blank line followed by a line of the form "There were # tweets on M/D/Y". For example, assuming the tweet data above had been read in and stored under the name sampleTweets
, the call showDate("10/30/23", sampleTweets)
would display the following:
how about those bluejays? I am Mr. Meseeks - look at me! There were 2 tweets on 10/30/23
numBetween
and showYears
Define a function named numBetween
that has three inputs, a start date, an end date and a tweet list. The function should return the number of tweets from the list that occurred between the specified dates (inclusive). For example, given the sampleTweets
above, the call numBetween("10/27/23", "10/29/23", sampleTweets)
should return 2. The following helper function might prove useful when comparing dates:
def lessThan(date1:str, date2:str) -> bool: """Returns True if date1 comes before date2, else False.""" [month1, day1, year1] = [int(x) for x in date1.split("/")] [month2, day2, year2] = [int(x) for x in date2.split("/")] return year1 < year2 or \ (year1 == year2 and month1 < month2) or \ (year1 == year2 and month1 == month2 and day1 < day2)
Once you have thoroughly tested your numBetween
function, define an additional function named showYears
that takes three inputs, a start year, an end year and a tweet list. It should display the number of tweets in each year in that range, followed by a blank line and a line showing the total number of tweets. For example, the call showYears(21, 23, sampleTweets)
might produce the following:
21 : 273 22 : 120 23 : 301 ======== TOTAL : 694
allHours
In class, we wrote a function named allDays
that had a single input, a structured list of tweet data, and displayed tweet counts for each day of the week. You are to write a function named allHours
that similarly displays tweet counts for every hour of the day (ranging from 0 to 23). For example:
Note that the output should appear as above, with the data aligned in columns and a heading at the top. Also, recall that times in the tweet file are in Universal Coordinated Time (UCT). You should adjust those times to the Pacific time zone (eight hours earlier) when collecting your statistics.HOUR # TWEETS =============== 0 312 1 80 2 12 3 2 4 0 . . . . . . 23 420
mostFrequent
In class, we wrote a function named inTweets
that counted how many tweets contained a specific word. A related task would be to determine which word appears most often in all of the tweets. Define a function named mostFrequent
that takes a tweet list as input and returns a pair containing the word that appeared most often and its corresponding count. For example, if we assumed that the listfirstFive
contained the five tweets listed above, then the call mostFrequent(firstFive)
should return ['out', 2]
.
Since the total number of words in the tweets is a very large number, you will need to be efficient in implementing this function. You should create a dictionary that keeps count of the words encountered, similar to the wordCounts
example from class. You will need to traverse the tweets, extract each word (removing punctuation), and keep counts in the dictionary. Once the dictionary is complete, you can traverse it and find the word with the largest count.
Save your functions in a file named lastnameTweets.py
, where lastname
is your last name. Your file should have a comment block at the top that includes your name and a brief description of the file, and each function should have a doc string that describes its behavior. Be careful to name the functions exactly as specified (including the order of the inputs). Feel free to implement additional helper functions as needed.