Name: _________________________________________

This assignment will demonstrate the use of computer programs as experimental tools
for solving problems. Utilizing the `SequenceGenerator` class, you
will perform experiments with word distributions and attempt to estimate the
number of words with certain properties.

The SequenceGenerator class
contains a method named `randomSequence` that generates a random sequence of letters. The
method takes one parameter value, the length of the sequence to be generated, and returns a random
letter sequence.

EXERCISE 1:Load this class into the BlueJ IDE and create aSequenceGeneratorobject (using the first constructor with no parameter). Generate 10 random sequences of 3 letters and list them below. Were any of the 10 sequences real English words? Would you expect them to be?

EXERCISE 2:Similarly, generate 10 random sequences of 4 letters and list them below. Were any of the 10 sequences real English words? Would you expect them to be? Would you expect a 4-letter word be more or less likely than a 3-letter word? Explain.

EXERCISE 3:Generating one sequence at a time can make experimentation with aSequenceGeneratortedious. It would make things easier to have another method that automated the process of generating and displaying multiple sequences. The following method has two parameters, the number of sequences to generate and their length. Add this method to the source code for theSequenceGeneratorclass,placed below the existing methods (but above the closing curly-brace)./** * Displays a set number of random letter sequences of the specified length * @param numSequences the number of sequences to generate & display * @param seqLength the number of letters in the random sequences */ public void displaySequences(int numSequences, int seqLength) { int wordsPerLine = 40 / seqLength; int sequencesSoFar = 0; while (sequencesSoFar < numSequences) { System.out.print(this.randomSequence(seqLength) + " "); sequencesSoFar = sequencesSoFar + 1; if (sequencesSoFar % wordsPerLine == 0 || sequencesSoFar == numSequences) { System.out.println(); } } } Compile the modified class, create a

SequenceGeneratorobject, and call your new method. Verify that this method behaves as described.

EXERCISE 4:What would you expect to happen if executed thedisplaySequencesmethod and entered 0 or a negative number for the number of sequences to be generated? How about 0 or a negative number fo the length of each sequence? Verify your answers.

Using a `SequenceGenerator` object, you can perform some interesting
experiments. In particular, you can use the object as a tool for verifying or
disproving hypotheses about word distributions, and to generate further data for
analysis. First, consider the total number of unique 4-letter sequences that can
be generated. Since each of the four positions in a sequence can be any of the
26 letters, there are 26^{4} = 456,976 different
sequences. Clearly, not all of these sequences form real words. The question
arises: how many random 4-letter sequences would you expect to have to generate
before you obtain a word?

It so happens that there are approximately 1,780 4-letter words in the English
language (according to a popular online dictionary). Thus, if you generated a
random 4-letter sequence, there is a 1,780 out of 456,976 chance that it will be a
word. Since 1,780/456,967 is approximately 1/256, you might expect 1 out of every 256
sequences to be a word. More accurately, you might expect 4 out of every 1,000
random 4-letter sequences to be words, since 1/256 is approximately .38%. This
number can be verified experimentally using a `SequenceGenerator` object.

EXERCISE 5:Use aSequenceGeneratorobject to generate 1,000 random 4-letter sequences and count how many English words you obtain. List that number below.Hint:Generating and counting 1,000 words all at once can be tedious and lead to counting oversights. Instead, generate the sequences in 10 groups of 100, clearing the output screen after each call. Scanning 100 sequences for words can be done in just a few seconds.

Is the number you obtained close (relatively speaking) to the expected value of 4? Calculate the relative error using the formula:

For example, if you obtained only 2 words, the relative error would berelative error = |(experimental result)-(expected value)| / (expected value)|2-4|/4 = 2/4 = 0.5 = 50%. Show your data and calculations.

Generate another 1000 sequences and recalculate the relative error with your updated numbers. For example, if your second set of 1000 sequences produced 5 words, you would have a total of 7 words with an expected number of 8. This yields a relative error of

|7-8|/8 = 1/8 = 0.125 = 12.5%. Show your data and calculations.If your relative error is still greater than 10%, continue generating sequences in sets of 1000 until the relative error is less than 10% (or until you have totaled 4,000 sequences). Show your data and calculations.

Part of the blame for the scarcity of words among randomly generated sequences falls on letters such as 'q' and 'z'. Since these letters are used so infrequently in English, their inclusion in a random sequence of letters makes a real word extremely unlikely. If we exclude letters such as these, however, we can improve the chances of generating words considerably. For example, the 10 letters that appear most frequently in English text are "etaoinshrd". Random sequences of these letters would appear more likely to produce words.

EXERCISE 6:Using the alternate constructor (with a String parameter), create aSequenceGeneratorthat is limited to the letters "etaoinshrd". Generate 1000 random 4-letter sequences of these letters and count how many English words you obtain. List that number below. Then generate another 1000 random 4-letter sequences and list the number of words.Are the two counts you obtained relatively close to each other (e.g., within 10% relative error)? If not, generate 2000 more random words and average the counts from all of your trials.

Using your experimental results from the previous exercise, you should now be able to estimate the number of 4-letter words in the English language that use only the letters in "etaoinshrd". The following general formula applies:

The total number of 4-letter sequences that use only 10 letters is 10# of words = (# of possible sequences) * (chances that a random sequence is a word)

EXERCISE 7:Using the numbers you obtained in EXERCISE 6, estimate the number of 4-letter words in the English language that use only the letters in "etaoinshrd". Show your work in obtaining your estimate.