Spring 2004

HW4: Sorting and Efficiency

For this assignment, you will experiment with two different sorting algorithms, selection sort and merge sort.
To begin, you will need to copy the file Sorts.h, which contains definitions of the
`SelectionSort` and `MergeSort` functions.

- Write a main program named
`sortnums.cpp`that can be used to compare the performance of these two algorithms. Your program should prompt the user for the number of items to be sorted, then construct a vector of that many random integers (similar to the examples in class). It should then perform each of the sorts on that vector and time how long it takes. The time for each sort should be displayed in a readable format.

- Perform 3 different executions of your program on each of the following list sizes: 500 items, 1000 items, 2000
items, and 4000 items. Report your timings in a table such as the one below.
500 items 1000 items 2000 items 4000 items

SELECTION SORT:

1st timing:

2nd timing:

3rd timing:

MERGE SORT:

1st timing:

2nd timing:

3rd timing:

- For each sorting algorithm, how consistent are the timings? That is, when you run the algorithm on three lists
of the same size, are the timings the same? Should they be? Explain your answer, and describe any patterns
concerning the consistency of the timings.

- Recall that selection sort is an O(N
^{2}) algorithm, whereas merge sort is O(N log N). This means that it takes "on the order of" N^{2}operations to sort a list of N items using selection sort, whereas it takes "on the order of" (N log_{2}N) operations using merge sort. As such, merge sort tends to be the faster sort. Do your timings support this claim?

- Recall that for an O(N
^{2}) algorithm such as selection sort, the number of operations grows proportional to the square of the list size. So, if you double the size of the list, the number of operations increases by a factor of 2^{2}= 4. Conversely, for an O(N log N) algorithm such as merge sort, doubling the list size increases the number of operations by a factor only slightly larger than 2. Do your timings demonstrate these rates of growth? Explain.

- One attractive property of insertion sort is that it performs more efficiently when the list is already
partially sorted. Augment your program so that it will allow you to test whether
selection sort and/or merge sort share this property. In
particular, your modified program should time each sort on the same random list (as before), then retime them on the
sorted version of the list. Is there a significant improvement when the list is already sorted? Would you expect
there to be any? Explain your answers.

- If you look closely at the implementations for the sorting algorithms, you will note that there are two different
manners in which the algorithms access the vector: comparisons and swaps. For example, selection sort repreatedly
traverses the vector, comparing each data value with the smallest so far, and then swaps the smallest value into the
correct position. Likewise, when merge sort merges two sorted lists, it repeatedly compares values from the lists
and swaps the smallest into a temporary array (and eventually back into the original array). When dealing with
integer values, the cost of comparing and swapping are somewhat comparable. If the data to be compared/assigned is
complex, however, the time required to swap values can far exceed the time required for comparisons. For example, if
the vector contained long strings, you would expect the time required for swaps to be more significant.
Write a second main program named

`sortwords.cpp`that is similar to`sortnums.cpp`except that it sorts vectors of random strings. After prompting the user for the size of the vector, the program should generate random letter sequences of length 20 (although this length should be easily changeable) and sort them using selection sort and merge sort. Perform repeated executions of your program to complete a table similar to the one in part 2. Which of the algorithms is most affected by increasing the size of the objects being sorted? Does this make sense when you consider the number of comparisons vs. swaps for each algorithm? Justify your answer.