rra {iRRA}R Documentation

Compute the Ratio of Relevant Areas (RRA) of a ROC curve

Description

This is the main function of the iRRA package. It builds the Region of Interest (RoI) given the coordinates of the ROC curve, the number of actual positives (AP) and actual negatives (AN) and the reference values of the performance metrics, and then it computes the Ratio of Relevant Areas (RRA). It returns a "rra_result" object, a named list of "rra_result" class. This object can be printed and plotted. Additionally, two "rra_result" object can be compared (rra.test) and a list of "rra_result" objects can be averaged out by rra.average

Usage

  rra(roc_x, roc_y, AP, AN,
        precision = FALSE, c_precision = "pop", p_precision = -1,
        recall = FALSE, c_recall = "pop", p_recall = -1,
        fm = FALSE, c_fm = "pop", p_fm = -1,
        npv = FALSE, c_npv = "pop", p_npv= -1,
        specificity = FALSE, c_specificity = "pop", p_specificity = -1,
        fallout = FALSE, c_fallout = "pop", p_fallout = -1,
        nm = FALSE, c_nm = "pop", p_nm = -1,
        j = FALSE, c_j = -1,
        markedness = FALSE, c_markedness = -1,
        phi = FALSE, c_phi = 0.4,
        ncost = FALSE, c_ncost = "uses_mu", lambda = c(-1), mu = 1,
        print = TRUE, plot = TRUE, ...)

Arguments

roc_x

The x values of the ROC curve's points

roc_y

The y values of the ROC curve's points

AP

The number of actual positives. This value represents the number of positive responses (the "1" values) used to build the ROC curve. It must be greater than 0

AN

The number of actual negatives. This value represents the number of negative responses (the "0" values) used to build the ROC curve. It must be greater or equal to 0

precision

If the user wants to use a precision reference value to build the Region of Interest

c_precision

The reference value. It must be between 0 and 1. It can also indicate that the reference value is selected using a uniform random policy ("uni") method (c_precision = "uni"). If c_precision = "pop" it means that a "uni" method with p(m)=AP/n is used ("Proportion Of Positives" policy)

p_precision

The unified probability used by the "uni" method

recall

If the user wants to use a recall reference value to build the Region of Interest

c_recall

The reference value. It must be between 0 and 1. It can also indicate that the reference value is selected using a uniform random policy ("uni") method (c_recall = "uni"). If c_recall = "pop" it means that a "uni" method with p(m)=AP/n is used ("Proportion Of Positives" policy)

p_recall

The unified probability used by the "uni" method

fm

If the user wants to use a F-Measure (FM) reference value to build the Region of Interest

c_fm

The reference value. It must be between 0 and 1 (0 excluded). It can also indicate that the reference value is selected using a uniform random policy ("uni") method (c_fm = "uni"). If c_fm = "pop" it means that a "uni" method with p(m)=AP/n is used ("Proportion Of Positives" policy)

p_fm

The unified probability used by the "uni" method

npv

If the user wants to use a Negative Predictive Value (NPV) reference value to build the Region of Interest

c_npv

The reference value. It must be between 0 and 1. It can also indicate that the reference value is selected using a uniform random policy ("uni") method (c_npv = "uni"). If c_npv = "pop" it means that a "uni" method with p(m)=AP/n is used ("Proportion Of Positives" policy)

p_npv

The unified probability used by the "uni" method

specificity

If the user wants to use a F_Measure (FM) reference value to build the Region of Interest

c_specificity

The reference value. It must be between 0 and 1. It can also indicate that the reference value is selected using a uniform random policy ("uni") method (c_specificity = "uni"). If c_specificity = "pop" it means that a "uni" method with p(m)=AP/n is used ("Proportion Of Positives" policy)

p_specificity

The unified probability used by the "uni" method

fallout

If the user wants to use a Fall-out (or False Positive Rate) reference value to build the Region of Interest

c_fallout

The reference value. It must be between 0 and 1. It can also indicate that the reference value is selected using a uniform random policy ("uni") method (c_fallout = "uni"). If c_fallout = "pop" it means that a "uni" method with p(m)=AP/n is used ("Proportion Of Positives" policy)

p_fallout

The unified probability used by the "uni" method

nm

If the user wants to use a Negative-F-Measure (NM) reference value to build the Region of Interest

c_nm

The reference value. It must be between 0 and 1 (0 excluded). It can also indicate that the reference value is selected using a uniform random policy ("uni") method (c_nm = "uni"). If c_nm = "pop" it means that a "uni" method with p(m)=AP/n is used ("Proportion Of Positives" policy)

p_nm

The unified probability used by the "uni" method

j

If the user wants to use a Youden's J reference value to build the Region of Interest

c_j

The reference value. It must be between -1 and 1

markedness

If the user wants to use a Markedness reference value to build the Region of Interest

c_markedness

The reference value. It must be between 0 and 1

phi

If the user wants to use a Matthews Correlation Coefficient (phi) reference value to build the Region of Interest

c_phi

The reference value. It must be between 0 and 1

ncost

If the user wants to use a Normalized Cost (NC) reference value to build the Region of Interest

c_ncost

The reference value. It must be between 0 and the NC value selected with the "pop" method. This value will define the cost reduction index (mu). The default value, "uses_mu", indicates that the user wants to use the mu value directly

lambda

The value of False Negative and False Positive cost ratio as (cFN / (cFN+cFP)). It must be between 0 and 1. Indicating two values means that the user wants to consider a range of lambda values

mu

The reduction cost index value. It's used only if c_ncost = "uses_mu"

print

If the user wants to print the result. For more information, check rra.print

plot

If the user wants to plot the ROC curve and the RoI. For more information, check rra.plot

...

Other arguments for rra.print and rra.plot. Check those functions' documentation for more information

Details

The Region of Interest (RoI) represents the points in the ROC space that have a better performance value than the reference values.

Every performance metrics corresponds to a specific border of the RoI. It is possible to use multiple metrics and different methods, but it is important to keep in mind that some border could be always greater than others within the ROC space. In this case some borders will obscure the others.

Additionally, some special values will be not very significant. For instance, a recall reference value of 1 will result in a non-existent RoI. Its RRA will be 0 unless the ROC curve is perfect (AUC = 1). On the other hand, if recall is equal to 0 the RoI will correspond to the ROC space, therefore the RRA will be equal to the AUC of the ROC curve.

The default value for phi, 0.4, represents a medium-strong association between a model and actual positiveness.

A "rra_result" object contains the points of the ROC curve, the coordinates of the RoI and the RoI under the curve, the RRA value and the list of the performance metrics considered

Errors

The function will stop if roc_x and roc_y have different length or have values greater than 1 or lesser than 0. It will also stop if the other parameters have invalid values

Note

Note that the precision("uni") border will be y=x for every p(m). For this reason, using c_precision = "uni", p_precision = (0<p<1) will generate the same border as using c_precision = "pop". This is also true for the NPV border

See Also

rra.plot, rra.print, rra.test, rra.average

Examples

## Not run: 
# They can be run if one has the ROC curve's coordinates and the AP and AN values.

rra(roc_x, roc_y, AP, AN, recall = TRUE, fallout = TRUE)
# The RoI represents all the points that have a better recall and fall-out value than the
#"pop" values

rra(roc_x, roc_y, AP, AN, ncost = TRUE, lambda = c(0.4,0.6), mu = 0.9, plot = FALSE)
# The RoI represents all the points that have a better NC than the NC("pop")*0.9 value with
  lambda between 0.4 and 0.6. Its borders are two lines. This RoI won't be plotted

rra(roc_x, roc_y, AP, AN)
# This will warn the user that no performance metric has been selected. It will return the
# AUC value.
> Warning message:
> In rra(roc$x, roc$y, app, ann) :
>   No performance metrics have been selected. The AUC value of the ROC curve has been returned

rra(roc_x, roc_y, AP, AN, phi = TRUE, precision = TRUE)
# In this case phi = 0.4 will generate a curve that is always greater than y=x
# (precision("pop") border).
# Precision will not contribute to the generation of the RoI

rra(roc_x, roc_y, AP, AN, recall = TRUE, fallout = TRUE, colUnder = "red")
# A parameter for the rra.plot function is used. The RoI under the ROC curve will
# be red instead of light blue

## End(Not run)

[Package iRRA version 0.1.0 Index]