In this article, we will explain to you how to make a "Decision Tree". A Decision Tree is a flowchart that facilitates decision-making based on past experiences.

Machine Learning - Decision Tree

In this article, a person will try to decide whether or not to purchase a specific product based on various factors, age, gender and income.

Age	Gender	Income	Purchase
25	Male	35000	No
30	Female	30000	Yes
35	Female	32000	Yes
20	Male	40000	Yes
40	Male	45000	Yes
45	Female	28000	No
55	Male	55000	Yes
50	Female	26000	No
22	Male	42000	Yes
38	Female	58000	Yes

Now, based on this data set, Python can create a decision tree that can be used to decide if any new product will be purchased.

How Does it Work?

First, read the dataset with pandas:

import pandas

df = pandas.read_csv("data.csv")

print(df)

To make a decision tree, all data has to be numerical.

We have to convert the non numerical columns 'Gender' and 'Purchase' into numerical values.

Pandas has a map() method that takes a dictionary with information on how to convert the values.

d = {'Female': 0, 'Male': 1}

Means convert the values 'Female' to 0, 'Male' to 1.

d = {'Female': 0, 'Male': 1}
df['Gender'] = df['Gender'].map(d)
d = {'Yes': 1, 'No': 0}
df['Purchase'] = df['Purchase'].map(d)

print(df)

Then we have to separate the feature columns from the target column.

The feature columns are the columns that we try to predict from, and the target column is the column with the values we try to predict.

X is the feature columns, Y is the target column:

features = ['Age', 'Gender', 'Income']

X = df[features]
Y = df['Purchase']

print(X)
print(Y)

Now we can create the actual decision tree, fit it with our details. Start by importing the modules we need:

Create and display a Decision Tree:

import sys
import matplotlib
matplotlib.use('Agg')
import pandas as pd
from sklearn.tree import DecisionTreeClassifier, plot_tree
import matplotlib.pyplot as plt

def main():
    # Convert data to DataFrame
    df = pd.read_csv("data.csv")

    # Map categorical variables to numerical values
    d = {'Female': 0, 'Male': 1}
    df['Gender'] = df['Gender'].map(d)
    d = {'Yes': 1, 'No': 0}
    df['Purchase'] = df['Purchase'].map(d)

    # Define features and target variable
    features = ['Age', 'Gender', 'Income']
    X = df[features]
    Y = df['Purchase']

    # Train Decision Tree Classifier
    dtree = DecisionTreeClassifier()
    dtree = dtree.fit(X, Y)

    # Plot decision tree
    plt.figure(figsize=(10, 7))
    plot_tree(dtree, feature_names=features, filled=True)
    plt.savefig("decision_tree.png")
    plt.show()

if __name__ == "__main__":
    main()

Predict Values

We can use the Decision Tree to predict new values.

Example: Should I buy product with;

Use predict() method to predict new values:

import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree

def main():
    # Convert data to DataFrame
    df = pd.read_csv("data.csv")

    # Map categorical variables to numerical values
    d = {'Female': 0, 'Male': 1}
    df['Gender'] = df['Gender'].map(d)
    d = {'Yes': 1, 'No': 0}
    df['Purchase'] = df['Purchase'].map(d)

    # Define features and target variable
    features = ['Age', 'Gender', 'Income']
    X = df[features]
    Y = df['Purchase']

    # Train Decision Tree Classifier
    dtree = DecisionTreeClassifier()
    dtree = dtree.fit(X, Y)

    # Predict
    prediction = dtree.predict([[30, 1, 50000]])
    print("Prediction:", prediction)

    print("[1] means 'Yes'")
    print("[0] means 'No'")

if __name__ == "__main__":
    main()

What would the answer be if the income lower than 20000?

print(dtree.predict([[50, 0, 20000]]))

INFO

Different Results

You will see that the Decision Tree gives you different results if you run it enough times, even if you feed it with the same data.

That is because the Decision Tree does not give us a 100% certain answer. It is based on the probability of an outcome, and the answer will vary.

Github Repo