Skip to content

Latest commit

 

History

History
353 lines (290 loc) · 15.3 KB

File metadata and controls

353 lines (290 loc) · 15.3 KB

Evaluation functions

from codeE.evaluation import ...

Function to use in evaluation on crowdsourcing scenario.

The available evaluation metrics/indicators are:


Accuracy by model

codeE.evaluation.accuracy_model(model, X_data, Z_data)

Quality evaluation of some predictive model over some set X and ground truth Z, based on Accuracy.

Tp is the number of true positives, Tn is the number of true negatives, Fp is the number of false positives and Fn the number of false negatives.

Parameters

  • model: class of model with predictive function 'predict'
    The predictive model of the ground truth
  • X_data: array-like of shape (n_samples, ...)
    The input patterns, the points is for the different shapes of data representation, images, text, audio, etc.
  • Z_data: array-like of shape or (n_samples,)
    The ground truth of the data in class format.

Returns

  • acc: float
    The accuracy on the set, value between 0 and 1.
Examples
import numpy as np
N = 100
K = 8
X = np.random.rand(N,5)
Z = np.random.randint(K, size=(N,))
from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier()
model.fit(X,Z)
from codeE.evaluation import accuracy_model
accuracy_model(model, X, Z, mode="weighted")

F1-score by model

codeE.evaluation.f1score_model(model, X_data, Z_data, mode='macro')

Quality evaluation of some predictive model over some set X and ground truth Z, based on F1-score.

  • With the precision and recall respectively.

Tp is the number of true positives, Tn is the number of true negatives, Fp is the number of false positives and Fn the number of false negatives.

Parameters

  • model: class of model with predictive function 'predict'
    The predictive model of the ground truth
  • X_data: array-like of shape (n_samples, ...)
    The input patterns, the points is for the different shapes of data representation, images, text, audio, etc.
  • Z_data: array-like of shape or (n_samples,)
    The ground truth of the data in class format.
  • mode: string, {'micro','macro','weighted'}, default='macro'
    The average done over the f1 score, based on scikit-learn, further details in https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html

Returns

  • f1: float
    The f1 score on the set, value between 0 and 1.
Examples
import numpy as np
N = 100
K = 8
X = np.random.rand(N,5)
Z = np.random.randint(K, size=(N,))
from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier()
model.fit(X,Z)
from codeE.evaluation import f1score_model
f1score_model(model, X, Z, mode="macro")

Error on confusion matrix estimation by JS

codeE.evaluation.D_JS(conf_true, conf_pred, raw=False)

Evaluation of one confusion matrix estimate, based on normalized Jensen-Shannon divergence between the rows.

  • with the normalized Jensen-Shannon divergence between probabilities p and q.
  • Jensen-Shannon divergence: , with m= (p + q*)/2*.
  • Kullback-Leibler divergence: .

P correspond to the real matrix and Q the estimation , while p* and q* rows probabilities. The rows correspond to the ground truth labels z and the columns the observed labels y.

Parameters

  • conf_true: array-like of shape (n_classes, n_classes)
    Real confusion matrix P.
  • conf_pred: array-like of shape (n_classes, n_classes)
    Estimated confusion matrix Q.
  • raw: boolean, default=False
    If the error is returned per row (True) or global as a mean between the rows (False)

Returns

  • d_js: float
    The d_js on the estimation, value between 0 and 1.

Error on confusion matrix estimation by NormF

codeE.evaluation.D_NormF(conf_true, conf_pred)

Evaluation of one confusion matrix estimate, based on normalized Frobenius between the rows.

  • With , where K is the number of rows/columns (square matrices).
  • Frobenius norm:

P correspond to the real matrix and Q the estimation , the rows correspond to the ground truth labels z and the columns the observed labels y.

Parameters

  • conf_true: array-like of shape (n_classes, n_classes)
    Real confusion matrix P.
  • conf_pred: array-like of shape (n_classes, n_classes)
    Estimated confusion matrix Q.

Returns

  • normF: float
    The normF on the estimation, value between 0 and 1.

Examples of confusion matrix estimation

import numpy as np
N = 100 #data
K = 8 #classes
Z = np.random.randint(K, size=(N,))
R = np.random.randint(3, size=(N,K))
from codeE.utils import generate_Global_conf
P = generate_Global_conf(Z, R)
P_hat = P + 1e-7 #P_hat=Q

Evaluation:

from codeE.evaluation import D_KL, D_JS, D_NormF
print("D_KL = ",D_KL(P, P_hat))
print("D_JS = ",D_JS(P, P_hat))
print("D_NormF = ",D_NormF(P, P_hat))

Error on set of confusion matrices estimation

codeE.evaluation.Individual_D(confs_true, confs_pred, D)

Evaluation of a set of confusion matrix estimates, based on D.

B correspond to a set of the T real confusion matrices.

Parameters

  • confs_true: array-like of shape (n_annotators, n_classes, n_classes)
    Real set of confusion matrix, for example individual matrices
  • confs_pred: array-like of shape (n_annotators, n_classes, n_classes)
    Estimated set of confusion matrix, for example individual matrices
  • D: function, {D_KL, D_JS, D_NormF}
    Function to measure error between two array-like confusion matrices.

Returns

  • res: float
    The error on the estimation, value between 0 and 1.
Examples of confusion matrix estimation
import numpy as np
N = 100 #data
K = 8 #classes
T = 10 # annotators
Z = np.random.randint(K, size=(N,))
Y = np.random.randint(K, size=(N,T))
Y_ohv = keras.utils.to_categorical(Y)
from codeE.utils import generate_Individual_conf
B_ind = generate_Individual_conf(Z, Y_ohv)
B_ind_hat = B_ind +1e-7

Evaluation

from codeE.evaluation import Individual_D, D_JS, D_NormF
print("Individual D_JS = ",Individual_D(B_ind, B_ind_hat, D=D_JS))
print("Individual D_NormF = ",Individual_D(B_ind, B_ind_hat, D=D_NormF))

Expertise Identity of Confusion Matrix

codeE.evaluation.I_sim(conf_ma, D=D_JS)

An indicator of expertise (ability level) of the behavior. The similarity to an identity matrix I (expert behavior), based on D.

  • With some confusion matrix to analize, the rows correspond to the ground truth labels z and the columns the observed labels y.

A higher value indicates greater ability.

Parameters

  • conf_ma: array-like of shape (n_classes, n_classes)
    A confusion matrix of probabilistic behavior,
  • D: function, {D_JS, D_NormF}, default=D_JS
    Function to measure error between two array-like confusion matrices.

Returns

  • res: float
    The indicator of similarity to I, value between 0 and 1.

Expertise Diagonal of Confusion Matrix

codeE.evaluation.R_score(conf_ma)

An indicator associated to expert behavior. Average between the probabilities on the diagonal of the confusion matrix.

  • With some confusion matrix to analize, the rows correspond to the ground truth labels z and the columns the observed labels y.

A higher value indicates greater ability.

Parameters

  • conf_ma: array-like of shape (n_classes, n_classes)
    A confusion matrix of probabilistic behavior,

Returns

  • res: float
    The indicator of probability expertise, value between 0 and 1.

Entropy of Confusion Matrix

codeE.evaluation.H_conf(conf_ma)

An indicator of randomness of behavior. The normalized entropy H averaged between the rows of a confusion matrix.

  • With
  • With some confusion matrix to analize, the rows correspond to the ground truth labels z and the columns the observed labels y.

A higher value indicates a more random behavior.

Parameters

  • conf_ma: array-like of shape (n_classes, n_classes)
    A confusion matrix of probabilistic behavior,

Returns

  • res: float
    The indicator of entropy, value between 0 and 1.

Spammer score of Confusion Matrix

codeE.evaluation.S_score(conf_ma)

An indicator associated to spammer behavior. Generalized log odds, based on normF.

  • With some confusion matrix to analize, the rows correspond to the ground truth labels z and the columns the observed labels y.

Over a row: positive values are for more expert, negative for more malicious. A value =0 correspond to random spammer behavior, =1 to expert and =-1 to malicious spammer.

Parameters

  • conf_ma: array-like of shape (n_classes, n_classes)
    A confusion matrix of probabilistic behavior,

Returns

  • res: float
    The indicator of spammer score, value between -1 and 1.

Use the indicators to analize a confusion matrix

import numpy as np
N = 100 #data
K = 8 #classes
Z = np.random.randint(K, size=(N,))
R = np.random.randint(3, size=(N,K))
from codeE.utils import generate_Global_conf
beta_ex = generate_Global_conf(Z, R)

Indicators to analize the random matrix generated:

from codeE.evaluation import I_sim, R_score, H_conf, S_score
print("Expertise Identity (I_sim) =", I_sim(beta_ex))
print("Expertise Diagonal (R_score) =", R_score(beta_ex))
print("Randomness (H_conf) =", H_conf(beta_ex))
print("Spammer score (S_score) =", S_score(beta_ex))

Create another matrix close to identity

beta_ex = np.identity(K) + np.random.normal(0, 1e-2, size=(K,K))

Analize:

from codeE.evaluation import I_sim, R_score, H_conf, S_score
print("Expertise Identity (I_sim) =", I_sim(beta_ex))
print("Expertise Diagonal (R_score) =", R_score(beta_ex))
print("Randomness (H_conf) =", H_conf(beta_ex))
print("Spammer score (S_score) =", S_score(beta_ex))

Bias score of Confusion Matrix

codeE.evaluation.S_bias(conf_ma, mode="median")

An indicator associated to the bias on the behavior. It is based on the marginal (a-priori) probability over annotations

  • With some confusion matrix to analize, the rows correspond to the ground truth labels z and the columns the observed labels y.

A higher value means more bias b for some c class.

Parameters

  • conf_ma: array-like of shape (n_classes, n_classes)
    A confusion matrix of probabilistic behavior,
  • mode: {'simple','median','entropy'}, default='median'
    The type of which the score of bias b will be calculated:

simple:

median:

entropy:

Returns

  • class_bias: int
    The index of the class c at which is biased the matrix, a value between 0 and n_classes-1.
  • res: float
    The indicator of bias score b, value between 0 and 1.