Compute a standardized set of quality metrics for an object returned by
fit_topic_model(). Metrics are returned in a single long-format
data.table so that results from different
engines and different values of \(K\) can be compared directly.
Arguments
- fit
An object of class
nlp_topic_fitreturned byfit_topic_model().- training
The document-feature matrix used to train
fit. Required for coherence metrics ("coherence_npmi","coherence_umass") and training likelihood metrics ("train_nll","train_perplexity"). Accepted classes are dgCMatrix-class, dfm, andDocumentTermMatrix. Defaults toNULL.- newdata
A held-out document-feature matrix. Required for predictive metrics (
"held_out_nll","held_out_perplexity"). Accepted classes are the same astraining. Defaults toNULL.- metrics
Character vector of metrics to compute. Defaults to all eight supported metrics (alphabetical). Use the canonical metric names below; the deprecated
"perplexity"alias was removed inv0.9.7."coherence_npmi"Normalized Pointwise Mutual Information coherence per topic (Aletras & Stevenson, 2013). Requires
training."coherence_umass"UMass coherence per topic (Mimno et al., 2011). Requires
training."diversity"Proportion of unique top-N terms across all topics. Engine-agnostic; no extra data required.
"exclusivity"STM-style per-topic exclusivity: mean share of each top-N term's probability belonging to that topic. Engine-agnostic; no extra data required.
"held_out_nll"Mean negative log-likelihood per token on
newdata. Requiresnewdata."held_out_perplexity"Held-out perplexity. Equal to
exp(held_out_nll). Requiresnewdata."train_nll"Mean negative log-likelihood per token on
training. Requirestraining."train_perplexity"Training perplexity. Equal to
exp(train_nll). Requirestraining.
- level
Reporting level. One of
"aggregate"(default),"topic", or"all"."aggregate"returns only corpus-level rows (level == "aggregate"),"topic"returns only topic-level rows (level == "topic"), and"all"returns both.- top_n
Integer. Number of top terms per topic used by coherence, diversity, and exclusivity. Defaults to
10L.- epsilon
Small positive constant for numerical stability in logarithm computations. Defaults to
1e-12.
Value
A data.table with columns:
metricMetric name (one of the values in
metrics).level"aggregate"for corpus-level scalars,"topic"for topic-level values.topic_idTopic###identifier for"topic"rows;NAfor"aggregate"rows. This column is retained for all evaluation outputs so aggregate-only and topic-level results share a stable schema.valueNumeric metric value.
NAwhensupported = FALSE.supportedTRUEwhen the metric was computed;FALSEwhen the required data is missing or the metric is unsupported for the given engine.
Rows are ordered by metric then level then topic_id.
Details
Coherence metrics require training to be the same corpus used to fit
the model. They are computed in-package using sparse co-occurrence statistics
from training, so results are directly comparable across all supported
engines. Terms in fit$vocab that are absent from training contribute
zero to all co-occurrence counts.
Diversity is the proportion of unique terms among all available
top-top_n terms across topics:
length(unique_top_terms) / (K * min(top_n, vocabulary_size)). A value of
1 means no term appears in more than one topic's top list; a low value
indicates topics that share high-probability terms.
Exclusivity (per topic t) is the mean, over the top-top_n terms of
t, of phi[t, w] / sum_j phi[j, w]. High exclusivity means those terms
are concentrated in that topic rather than spread across topics.
Training and held-out likelihood metrics align the supplied corpus to the
fitted vocabulary, obtain document-topic weights, then combine those weights
with fit$tww to reconstruct per-token log-likelihoods. Training metrics use
cached fitted document-topic weights when available; held-out metrics infer
document-topic weights for newdata with the fitted topic-word weights held
fixed. Documents whose terms are all outside the fitted vocabulary are
dropped with a warning (they carry no information under the fitted model).
Tokens outside the fitted vocabulary are excluded from the token count,
matching the convention used by topicmodels::perplexity().
For STM fits with prevalence covariates, held-out likelihood metrics are marked unsupported because new-document covariate handling is not inferred by NLPstudio. Coherence, diversity, exclusivity, and training likelihood metrics remain available when their required inputs exist.
level controls only which rows are returned. Metrics are computed in the
same way regardless of level. Metrics that are naturally corpus-level only
have no topic-level rows and are omitted when level = "topic".
References
Aletras, N., & Stevenson, M. (2013). Evaluating topic coherence using distributional semantics. EACL, 13-22.
Mimno, D., Wallach, H., Talley, E., Leenders, M., & McCallum, A. (2011). Optimizing semantic coherence in topic models. EMNLP, 262-272.
Roberts, M. E., Stewart, B. M., Tingley, D., Lucas, C., Leder-Luis, J., Gadarian, S. K., Albertson, B., & Rand, D. G. (2014). Structural topic models for open-ended survey responses. American Journal of Political Science, 58(4), 1064-1082.
Examples
dtm <- methods::as(
Matrix::Matrix(
matrix(c(2, 1, 0, 0, 1, 1, 1, 0, 0, 1, 2, 1, 0, 0, 1, 2),
nrow = 4, byrow = TRUE),
sparse = TRUE
),
"dgCMatrix"
)
#> 'as(<dsCMatrix>, "dgCMatrix")' is deprecated.
#> Use 'as(., "generalMatrix")' instead.
#> See help("Deprecated") and help("Matrix-deprecated").
rownames(dtm) <- paste0("doc", 1:4)
colnames(dtm) <- paste0("term", 1:4)
fit <- fit_topic_model(
dtm, engine = "text2vec", model = "lda", k = 2,
control = list(fit = list(n_iter = 25, progressbar = FALSE))
)
# Engine-agnostic metrics only (no extra data needed)
evaluate_topic_model(fit, metrics = c("diversity", "exclusivity"))
#> metric level topic_id value supported
#> <char> <char> <char> <num> <lgcl>
#> 1: diversity aggregate <NA> 0.5 TRUE
#> 2: exclusivity aggregate <NA> 0.5 TRUE
# Coherence and training likelihood with training data.
# Use level = "all" to include topic-level rows.
evaluate_topic_model(fit, training = dtm,
metrics = c("coherence_npmi", "train_perplexity"),
level = "all")
#> metric level topic_id value supported
#> <char> <char> <char> <num> <lgcl>
#> 1: coherence_npmi aggregate <NA> -0.1457735 TRUE
#> 2: coherence_npmi topic Topic001 -0.1457735 TRUE
#> 3: coherence_npmi topic Topic002 -0.1457735 TRUE
#> 4: train_perplexity aggregate <NA> 2.7388768 TRUE
