from codeE.evaluation import ...Function to use in evaluation on crowdsourcing scenario.
The available evaluation metrics/indicators are:
codeE.evaluation.accuracy_model(model, X_data, Z_data)Quality evaluation of some predictive model over some set X and ground truth Z, based on Accuracy.
Tp is the number of true positives, Tn is the number of true negatives, Fp is the number of false positives and Fn the number of false negatives.
Parameters
- model: class of model with predictive function 'predict'
The predictive model of the ground truth - X_data: array-like of shape (n_samples, ...)
The input patterns, the points is for the different shapes of data representation, images, text, audio, etc. - Z_data: array-like of shape or (n_samples,)
The ground truth of the data in class format.
Returns
- acc: float
The accuracy on the set, value between 0 and 1.
import numpy as np
N = 100
K = 8
X = np.random.rand(N,5)
Z = np.random.randint(K, size=(N,))
from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier()
model.fit(X,Z)
from codeE.evaluation import accuracy_model
accuracy_model(model, X, Z, mode="weighted")codeE.evaluation.f1score_model(model, X_data, Z_data, mode='macro')Quality evaluation of some predictive model over some set X and ground truth Z, based on F1-score.
Tp is the number of true positives, Tn is the number of true negatives, Fp is the number of false positives and Fn the number of false negatives.
Parameters
- model: class of model with predictive function 'predict'
The predictive model of the ground truth - X_data: array-like of shape (n_samples, ...)
The input patterns, the points is for the different shapes of data representation, images, text, audio, etc. - Z_data: array-like of shape or (n_samples,)
The ground truth of the data in class format. - mode: string, {'micro','macro','weighted'}, default='macro'
The average done over the f1 score, based on scikit-learn, further details in https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html
Returns
- f1: float
The f1 score on the set, value between 0 and 1.
import numpy as np
N = 100
K = 8
X = np.random.rand(N,5)
Z = np.random.randint(K, size=(N,))
from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier()
model.fit(X,Z)
from codeE.evaluation import f1score_model
f1score_model(model, X, Z, mode="macro")codeE.evaluation.D_JS(conf_true, conf_pred, raw=False)Evaluation of one confusion matrix estimate, based on normalized Jensen-Shannon divergence between the rows.
- with
the normalized Jensen-Shannon divergence between probabilities p and q.
- Jensen-Shannon divergence:
, with m= (p + q*)/2*.
- Kullback-Leibler divergence:
.
P correspond to the real matrix and Q the estimation
, while p* and q* rows probabilities. The rows correspond to the ground truth labels z and the columns the observed labels y.
Parameters
- conf_true: array-like of shape (n_classes, n_classes)
Real confusion matrix P. - conf_pred: array-like of shape (n_classes, n_classes)
Estimated confusion matrix Q. - raw: boolean, default=False
If the error is returned per row (True) or global as a mean between the rows (False)
Returns
- d_js: float
The d_js on the estimation, value between 0 and 1.
codeE.evaluation.D_NormF(conf_true, conf_pred)Evaluation of one confusion matrix estimate, based on normalized Frobenius between the rows.
P correspond to the real matrix and Q the estimation
, the rows correspond to the ground truth labels z and the columns the observed labels y.
Parameters
- conf_true: array-like of shape (n_classes, n_classes)
Real confusion matrix P. - conf_pred: array-like of shape (n_classes, n_classes)
Estimated confusion matrix Q.
Returns
- normF: float
The normF on the estimation, value between 0 and 1.
import numpy as np
N = 100 #data
K = 8 #classes
Z = np.random.randint(K, size=(N,))
R = np.random.randint(3, size=(N,K))
from codeE.utils import generate_Global_conf
P = generate_Global_conf(Z, R)
P_hat = P + 1e-7 #P_hat=QEvaluation:
from codeE.evaluation import D_KL, D_JS, D_NormF
print("D_KL = ",D_KL(P, P_hat))
print("D_JS = ",D_JS(P, P_hat))
print("D_NormF = ",D_NormF(P, P_hat))codeE.evaluation.Individual_D(confs_true, confs_pred, D)Evaluation of a set of confusion matrix estimates, based on D.
B correspond to a set of the T real confusion matrices.
Parameters
- confs_true: array-like of shape (n_annotators, n_classes, n_classes)
Real set of confusion matrix, for example individual matrices - confs_pred: array-like of shape (n_annotators, n_classes, n_classes)
Estimated set of confusion matrix, for example individual matrices - D: function, {D_KL, D_JS, D_NormF}
Function to measure error between two array-like confusion matrices.
Returns
- res: float
The error on the estimation, value between 0 and 1.
import numpy as np
N = 100 #data
K = 8 #classes
T = 10 # annotators
Z = np.random.randint(K, size=(N,))
Y = np.random.randint(K, size=(N,T))
Y_ohv = keras.utils.to_categorical(Y)
from codeE.utils import generate_Individual_conf
B_ind = generate_Individual_conf(Z, Y_ohv)
B_ind_hat = B_ind +1e-7Evaluation
from codeE.evaluation import Individual_D, D_JS, D_NormF
print("Individual D_JS = ",Individual_D(B_ind, B_ind_hat, D=D_JS))
print("Individual D_NormF = ",Individual_D(B_ind, B_ind_hat, D=D_NormF))codeE.evaluation.I_sim(conf_ma, D=D_JS)An indicator of expertise (ability level) of the behavior. The similarity to an identity matrix I (expert behavior), based on D.
- With
some confusion matrix to analize, the rows correspond to the ground truth labels z and the columns the observed labels y.
A higher value indicates greater ability.
Parameters
- conf_ma: array-like of shape (n_classes, n_classes)
A confusion matrix of probabilistic behavior, - D: function, {D_JS, D_NormF}, default=D_JS
Function to measure error between two array-like confusion matrices.
Returns
- res: float
The indicator of similarity to I, value between 0 and 1.
codeE.evaluation.R_score(conf_ma)An indicator associated to expert behavior. Average between the probabilities on the diagonal of the confusion matrix.
- With
some confusion matrix to analize, the rows correspond to the ground truth labels z and the columns the observed labels y.
A higher value indicates greater ability.
Parameters
Returns
- res: float
The indicator of probability expertise, value between 0 and 1.
codeE.evaluation.H_conf(conf_ma)An indicator of randomness of behavior. The normalized entropy H averaged between the rows of a confusion matrix.
- With
- With
some confusion matrix to analize, the rows correspond to the ground truth labels z and the columns the observed labels y.
A higher value indicates a more random behavior.
Parameters
Returns
- res: float
The indicator of entropy, value between 0 and 1.
codeE.evaluation.S_score(conf_ma)An indicator associated to spammer behavior. Generalized log odds, based on normF.
- With
some confusion matrix to analize, the rows correspond to the ground truth labels z and the columns the observed labels y.
Over a row: positive values are for more expert, negative for more malicious. A value =0 correspond to random spammer behavior, =1 to expert and =-1 to malicious spammer.
Parameters
Returns
- res: float
The indicator of spammer score, value between -1 and 1.
import numpy as np
N = 100 #data
K = 8 #classes
Z = np.random.randint(K, size=(N,))
R = np.random.randint(3, size=(N,K))
from codeE.utils import generate_Global_conf
beta_ex = generate_Global_conf(Z, R)Indicators to analize the random matrix generated:
from codeE.evaluation import I_sim, R_score, H_conf, S_score
print("Expertise Identity (I_sim) =", I_sim(beta_ex))
print("Expertise Diagonal (R_score) =", R_score(beta_ex))
print("Randomness (H_conf) =", H_conf(beta_ex))
print("Spammer score (S_score) =", S_score(beta_ex))Create another matrix close to identity
beta_ex = np.identity(K) + np.random.normal(0, 1e-2, size=(K,K))Analize:
from codeE.evaluation import I_sim, R_score, H_conf, S_score
print("Expertise Identity (I_sim) =", I_sim(beta_ex))
print("Expertise Diagonal (R_score) =", R_score(beta_ex))
print("Randomness (H_conf) =", H_conf(beta_ex))
print("Spammer score (S_score) =", S_score(beta_ex))codeE.evaluation.S_bias(conf_ma, mode="median")An indicator associated to the bias on the behavior. It is based on the marginal (a-priori) probability over annotations
- With
some confusion matrix to analize, the rows correspond to the ground truth labels z and the columns the observed labels y.
A higher value means more bias b for some c class.
Parameters
- conf_ma: array-like of shape (n_classes, n_classes)
A confusion matrix of probabilistic behavior, - mode: {'simple','median','entropy'}, default='median'
The type of which the score of bias b will be calculated:
Returns