CSC 550: Introduction to Artificial Intelligence
Fall 2008

HW4: Machine Learning & Decision Trees

  1. Consider the following classification examples:

    animalmammal?can fly?carnivore?

    On paper, show the steps performed by the ID3 algorithm in building a decision tree that classifies these examples. That is, show the calculations used to determine which properties to split on, in the appropriate order. When you have completed your tree on paper, enter the rules into the appropriate file formats and use the Decision Tree Applet to verify your tree. Recall: you will need to set the splitting function to Gain before running the algorithm.

  2. The files Mushroom and define a database of 8142 training samples for identifying edible and poisonous mushrooms (taken from the UC Irvine Machine Learning Repository). Using the Decision Tree Applet, construct a decision tree using Gain as the splitting function. Either print a screenshot of the resulting tree, or sketch the tree to sufficient enough detail to make clear the order of splittings.

    One technique commonly used to avoid overspecialization is to divide the examples into two sets: a training set used to construct the decision tree and a testing set to subsequently test the constructed tree. The Decision Tree Applet allows you to specify the size of testing set and will then randomly select the examples. Reconstruct a decision tree using a testing set whose size is 50% of the total examples. Do you obtain the same tree?

  3. Identify a new data set of your own choosing and use the Decision Tree Applet to extract patterns from the data. The data set must involve real world data and be sufficiently large (say, at 25 examples involving at least 5 properties) to allow for identifying non-trivial patterns.