Decision Trees

Decision trees are a type of recursive partitioning algorithm that strives to classify observations in a dataset by splitting them into sub-groups based on dichotomous independent variables. The reason it's "recursive" is that each sub-group can itself be split into yet more groups, until the splitting process terminates because of some pre-defined criteria.

The decision tree starts with a node called the "root."

Strengths of decisions trees include...

A prominent weakness of decision trees is their tendency to over-fit the data.

 

RESOURCES

- Decision Trees in R


Overfitting

Overfitting is what happens when a model pays too much attention to the nuances of a specific sample of data, and thereby become less useful when applied to new data. Imagine an alien species landing in rural Madagascar. Based on their human interactions with Madagascans, the aliens construct a "model" for how to interact with humans. Now imagine the aliens attempt to apply that model to citizens of Rome. If they got too specific (i.e. over-fit) their model in Madagascar by, say, assuming that all humans speak Malagasy and like Romazava, their models wouldn't work very well in Rome--or New York City, or rural China, for that matter. The aliens have overfit their model and would have been better to rely on generalities only (i.e. humans don't like to be hit, humans need to eat in semi-regular intervals, etc). 


Detail 3

The following is placeholder text known as “lorem ipsum,” which is scrambled Latin used by designers to mimic real copy. Sed a ligula quis sapien lacinia egestas. Maecenas non leo laoreet, condimentum lorem nec, vulputate massa. Nulla eu pretium massa. In sit amet felis malesuada, feugiat purus eget, varius mi. Quisque congue porttitor ullamcorper.

Donec eu est non lacus lacinia semper. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Donec eget risus diam. Sed a ligula quis sapien lacinia egestas.