CSC 221: HW6

CSC 221: Introduction to Programming
Fall 2023

HW6: Lists and Structured Data

For this assignment, you will write several functions that access and analyze a file of structured tweet data. You may assume that the file contains one tweet per line, with ten comma-separated values per tweet. The first line of the file contains comma-separated headings that identify each of the ten values. For example,

The file elonmusk.csv contains all of Elon Musk's tweets from 2015-2020 with corresponding metadata.

PART 1: `showDate`

Define a function named showDate that takes two inputs, a string representing a date and a list of tweet records (e.g., as read in using the readCSVlist function from class). The function should print the text of every tweet that was posted on that date, one tweet per line. After displaying all of the tweets, it should display a blank line followed by a line of the form "There were # tweets on M/D/Y". For example, assuming the tweet data above had been read in and stored under the name sampleTweets, the call showDate("10/30/23", sampleTweets) would display the following:

PART 2: `numBetween` and `showYears`

Define a function named numBetween that has three inputs, a start date, an end date and a tweet list. The function should return the number of tweets from the list that occurred between the specified dates (inclusive). For example, given the sampleTweets above, the call numBetween("10/27/23", "10/29/23", sampleTweets) should return 2. The following helper function might prove useful when comparing dates:

Once you have thoroughly tested your numBetween function, define an additional function named showYears that takes three inputs, a start year, an end year and a tweet list. It should display the number of tweets in each year in that range, followed by a blank line and a line showing the total number of tweets. For example, the call showYears(21, 23, sampleTweets) might produce the following:

PART 3: `allHours`

In class, we wrote a function named allDays that had a single input, a structured list of tweet data, and displayed tweet counts for each day of the week. You are to write a function named allHours that similarly displays tweet counts for every hour of the day (ranging from 0 to 23). For example:

Note that the output should appear as above, with the data aligned in columns and a heading at the top. Also, recall that times in the tweet file are in Universal Coordinated Time (UCT). You should adjust those times to the Pacific time zone (eight hours earlier) when collecting your statistics.

PART 4: `mostFrequent`

In class, we wrote a function named inTweets that counted how many tweets contained a specific word. A related task would be to determine which word appears most often in all of the tweets. Define a function named mostFrequent that takes a tweet list as input and returns a pair containing the word that appeared most often and its corresponding count. For example, if we assumed that the listfirstFive contained the five tweets listed above, then the call mostFrequent(firstFive) should return ['out', 2].

Since the total number of words in the tweets is a very large number, you will need to be efficient in implementing this function. You should create a dictionary that keeps count of the words encountered, similar to the wordCounts example from class. You will need to traverse the tweets, extract each word (removing punctuation), and keep counts in the dictionary. Once the dictionary is complete, you can traverse it and find the word with the largest count.

Save your functions in a file named lastnameTweets.py, where lastname is your last name. Your file should have a comment block at the top that includes your name and a brief description of the file, and each function should have a doc string that describes its behavior. Be careful to name the functions exactly as specified (including the order of the inputs). Feel free to implement additional helper functions as needed.

CSC 221: Introduction to Programming Fall 2023 HW6: Lists and Structured Data

PART 1: showDate

PART 2: numBetween and showYears

PART 3: allHours

PART 4: mostFrequent

CSC 221: Introduction to Programming
Fall 2023

HW6: Lists and Structured Data

PART 1: `showDate`

PART 2: `numBetween` and `showYears`

PART 3: `allHours`

PART 4: `mostFrequent`