metrax.RougeN#

class metrax.RougeN(total_precision: Array, total_recall: Array, total_f1: Array, num_examples: Array, order: int)#

Bases: RougeBase

Computes macro-averaged ROUGE-N recall, precision, and F1-score.

This metric first calculates ROUGE-N precision, recall, and F1-score for each individual prediction compared against its single corresponding reference. ROUGE-N scores are based on the number of overlapping n-grams (sequences of n words) between the prediction and the reference text. These per-instance precision, recall, and F1-scores are then averaged across all instances in the dataset/batch.

How ROUGE-N scores are calculated for each individual prediction-reference pair:

\[\text{Precision} = \frac{N_o}{N_p}\]
\[\text{Recall} = \frac{N_o}{N_r}\]
\[\text{F1} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}\]
where:
  • \(N_o\) be the number of n-grams that overlap between the prediction

and the reference. - \(N_p\) be the total number of n-grams in the prediction. - \(N_r\) be the total number of n-grams in the reference.

Final Macro-Averaged Metrics:

\[\text{MacroAvgPrecision} = \frac{\text{total_precision}}{\text{num_examples}}\]
\[\text{MacroAvgRecall} = \frac{\text{total_recall}}{\text{num_examples}}\]
\[\text{MacroAvgF1} = \frac{\text{total_f1}}{\text{num_examples}}\]
order#

The specific ‘N’ in ROUGE-N (e.g., 1 for ROUGE-1, 2 for ROUGE-2).

Type:

int

total_precision#

Accumulated sum of precision scores from each instance.

Type:

jax.Array

total_recall#

Accumulated sum of recall scores from each instance.

Type:

jax.Array

total_f1#

Accumulated sum of f1 scores from each instance.

Type:

jax.Array

num_examples#

The number of instances (prediction-reference pairs) processed.

Type:

jax.Array

__init__(total_precision: Array, total_recall: Array, total_f1: Array, num_examples: Array, order: int) None#

Methods

__init__(total_precision, total_recall, ...)

compute()

Computes macro-averaged recall, precision, and F1-score.

compute_value()

Wraps compute() and returns a values.Value.

empty([order])

Creates an empty Rouge metric.

from_fun(fun)

Calls cls.from_model_output with the return value from fun.

from_model_output(predictions, references, ...)

Computes sums of per-instance ROUGE scores for a batch.

from_output(name)

Calls cls.from_model_output with model output named name.

merge(other)

Merges this Rouge metric with another.

reduce()

Reduces the metric along it first axis by calling _reduce_merge().

replace(**updates)

Returns a new object replacing the specified fields with new values.

Attributes

order: int#
classmethod empty(order: int = 2) RougeN#

Creates an empty Rouge metric. Implemented by subclasses.

__init__(total_precision: Array, total_recall: Array, total_f1: Array, num_examples: Array, order: int) None#
replace(**updates)#

Returns a new object replacing the specified fields with new values.