Current output for Random Forest Confusion Matrix graph using ML_classification.py displays an additional set of negative ("0") and positive ("1") on the predicted axis of the image -- attached.
base2_model_cofunction.csv_mod.txt_RF_CM.pdf
Also, in the RF_results output file, the Mean Balanced Confusion Matrix also shows the extra columns for another negatice and positive class.
Mean Balanced Confusion Matrix:
Class 0 1 0 1
0.0 4542.62 2905.38
1.0 4119.38 3328.62
The expected output should only have two categories for each axis display a total of 4 regions in the graph to illustrate: True negatives, False Positives, True Positives, and False Negatives.
A subset of data can be used to run the pipeline to show display the graph format.
subset_matrixdata_CM_GitIssue.csv
Incorrect graph format can be recreated using provided subset data and code below:
- ML_preprocess.py:
python /ML-Pipeline/ML_preprocess.py -df subset_matrixdata_CM_GitIssue.csv -sep ',' -onehot f
test_set.py: python /ML-Pipeline/test_set.py -df subset_matrixdata_CM_GitIssue.csv_mod.txt -sep ',' -type c -p 0.1 -save <test_set_file.txt>
- ML_classification.py:
python /ML-Pipeline/ML_classification.py -df subset_matrixdata_CM_GitIssue.csv_mod.txt -sep ',' -test <test_set_file.txt> -cl_train 1,0 -alg RF -cm t -plots t -n_jobs 8
Current output for Random Forest Confusion Matrix graph using ML_classification.py displays an additional set of negative ("0") and positive ("1") on the predicted axis of the image -- attached.
base2_model_cofunction.csv_mod.txt_RF_CM.pdf
Also, in the RF_results output file, the Mean Balanced Confusion Matrix also shows the extra columns for another negatice and positive class.
The expected output should only have two categories for each axis display a total of 4 regions in the graph to illustrate: True negatives, False Positives, True Positives, and False Negatives.
A subset of data can be used to run the pipeline to show display the graph format.
subset_matrixdata_CM_GitIssue.csv
Incorrect graph format can be recreated using provided subset data and code below:
python /ML-Pipeline/ML_preprocess.py -df subset_matrixdata_CM_GitIssue.csv -sep ',' -onehot ftest_set.py: python /ML-Pipeline/test_set.py -df subset_matrixdata_CM_GitIssue.csv_mod.txt -sep ',' -type c -p 0.1 -save <test_set_file.txt>python /ML-Pipeline/ML_classification.py -df subset_matrixdata_CM_GitIssue.csv_mod.txt -sep ',' -test <test_set_file.txt> -cl_train 1,0 -alg RF -cm t -plots t -n_jobs 8