5. Validate and evaluate result

5.1. Definition

To evaluate the performance of algorithm, we use several measures:

5.1.1. Accuracy

The accuracy formula is the following:

\[Accuracy = { Number Of Labels Predicted Correctly \over Total Number Of Predictions }\]

In detail, we can explain the accuracy by:

\[Accuracy = { True Positives + True Negatives \over True Positives + True Negatives + False Positives + False Negatives }\]

Warning

The accuracy is not ideal for skewed classes. For exemple, if the number of POI are small compared to non-POI (18 vs 128 in your case ), we have high accuracy by classifying many non-POI correctly and still not have a POI classiifed correctly. We can used other measures to raise it.

5.1.2. Precision

\[Precision = { True Positives \over True Positive + False Positives }\]

A good Precision means that the number of False Positives is low compared to the True Positives. Its means that we can trust algorithm when it flag a POI.

5.1.3. Recall

\[Recall = { True Positives \over True Positive + False Negatives }\]

A good Recall means that the number of False Negatives is low compared to the True Positives. Its means that we can trust algorithm when it flag a non POI.

5.1.4. F1 score

The F1 score is a measure of a test’s accuracy.

\[F1 score = {2 * Precision * Recall \over Precision + Recall }\]

A good F1-score means that my false positives and false negatives are low, I can identify my POI’s reliably and accurately.

5.2. Test execution

The test script execution generate the following result :

GaussianNB(priors=None, var_smoothing=1e-09)
        Accuracy: 0.85393       Precision: 0.48327      Recall: 0.32500 F1: 0.38864     F2: 0.34778
        Total predictions: 14000        True positives:  650    False positives:  695   False negatives: 1350   True negatives: 11305

5.3. Conclusion

The selected algorithm based on GaussianNB generate the following performance measures:

  • accuracy score: 0.86

  • precision score: 0.49

  • recall score: 0.32

  • f1 score: 0.37

Clearly:

  • I have arround 51% of error for predicted POI who is not one

  • I have arround 68% of error for predicted non POI who is one

Link to these low measures, I can affirm that my algorithm is not correctly trained. there are some improvement areas :

  • Change the features list ( replace and/or increase)

  • Realize a SelectKBest on all available features

  • Use another algoritm like Decision Tree Classifier, Random Forest Classifier or K Nearest Neighbors