Evaluate a Fitted Topic Model

Compute a standardized set of quality metrics for an object returned by fit_topic_model(). Metrics are returned in a single long-format data.table so that results from different engines and different values of $K$ can be compared directly.

Usage

evaluate_topic_model(
  fit,
  training = NULL,
  newdata = NULL,
  metrics = c("coherence_npmi", "coherence_umass", "diversity", "exclusivity",
    "held_out_nll", "held_out_perplexity", "train_nll", "train_perplexity"),
  level = c("aggregate", "topic", "all"),
  top_n = 10L,
  epsilon = 1e-12
)

Arguments

fit

An object of class nlp_topic_fit returned by fit_topic_model().

training

The document-feature matrix used to train fit. Required for coherence metrics ("coherence_npmi", "coherence_umass") and training likelihood metrics ("train_nll", "train_perplexity"). Accepted classes are dgCMatrix-class, dfm, and DocumentTermMatrix. Defaults to NULL.

newdata

A held-out document-feature matrix. Required for predictive metrics ("held_out_nll", "held_out_perplexity"). Accepted classes are the same as training. Defaults to NULL.

metrics

Character vector of metrics to compute. Defaults to all eight supported metrics (alphabetical). Use the canonical metric names below; the deprecated "perplexity" alias was removed in v0.9.7.

"coherence_npmi": Normalized Pointwise Mutual Information coherence per topic (Aletras & Stevenson, 2013). Requires training.
"coherence_umass": UMass coherence per topic (Mimno et al., 2011). Requires training.
"diversity": Proportion of unique top-N terms across all topics. Engine-agnostic; no extra data required.
"exclusivity": STM-style per-topic exclusivity: mean share of each top-N term's probability belonging to that topic. Engine-agnostic; no extra data required.
"held_out_nll": Mean negative log-likelihood per token on newdata. Requires newdata.
"held_out_perplexity": Held-out perplexity. Equal to exp(held_out_nll). Requires newdata.
"train_nll": Mean negative log-likelihood per token on training. Requires training.
"train_perplexity": Training perplexity. Equal to exp(train_nll). Requires training.

level

Reporting level. One of "aggregate" (default), "topic", or "all". "aggregate" returns only corpus-level rows (level == "aggregate"), "topic" returns only topic-level rows (level == "topic"), and "all" returns both.

top_n

Integer. Number of top terms per topic used by coherence, diversity, and exclusivity. Defaults to 10L.

epsilon

Small positive constant for numerical stability in logarithm computations. Defaults to 1e-12.

Value

A data.table with columns:

metric: Metric name (one of the values in metrics).
level: "aggregate" for corpus-level scalars, "topic" for topic-level values.
topic_id: Topic### identifier for "topic" rows; NA for "aggregate" rows. This column is retained for all evaluation outputs so aggregate-only and topic-level results share a stable schema.
value: Numeric metric value. NA when supported = FALSE.
supported: TRUE when the metric was computed; FALSE when the required data is missing or the metric is unsupported for the given engine.

Rows are ordered by metric then level then topic_id.

Details

Coherence metrics require training to be the same corpus used to fit the model. They are computed in-package using sparse co-occurrence statistics from training, so results are directly comparable across all supported engines. Terms in fit$vocab that are absent from training contribute zero to all co-occurrence counts.

Diversity is the proportion of unique terms among all available top-top_n terms across topics: length(unique_top_terms) / (K * min(top_n, vocabulary_size)). A value of 1 means no term appears in more than one topic's top list; a low value indicates topics that share high-probability terms.

Exclusivity (per topic t) is the mean, over the top-top_n terms of t, of phi[t, w] / sum_j phi[j, w]. High exclusivity means those terms are concentrated in that topic rather than spread across topics.

Training and held-out likelihood metrics align the supplied corpus to the fitted vocabulary, obtain document-topic weights, then combine those weights with fit$tww to reconstruct per-token log-likelihoods. Training metrics use cached fitted document-topic weights when available; held-out metrics infer document-topic weights for newdata with the fitted topic-word weights held fixed. Documents whose terms are all outside the fitted vocabulary are dropped with a warning (they carry no information under the fitted model). Tokens outside the fitted vocabulary are excluded from the token count, matching the convention used by topicmodels::perplexity().

For STM fits with prevalence covariates, held-out likelihood metrics are marked unsupported because new-document covariate handling is not inferred by NLPstudio. Coherence, diversity, exclusivity, and training likelihood metrics remain available when their required inputs exist.

level controls only which rows are returned. Metrics are computed in the same way regardless of level. Metrics that are naturally corpus-level only have no topic-level rows and are omitted when level = "topic".

References

Aletras, N., & Stevenson, M. (2013). Evaluating topic coherence using distributional semantics. EACL, 13-22.

Mimno, D., Wallach, H., Talley, E., Leenders, M., & McCallum, A. (2011). Optimizing semantic coherence in topic models. EMNLP, 262-272.

Roberts, M. E., Stewart, B. M., Tingley, D., Lucas, C., Leder-Luis, J., Gadarian, S. K., Albertson, B., & Rand, D. G. (2014). Structural topic models for open-ended survey responses. American Journal of Political Science, 58(4), 1064-1082.

Examples

dtm <- methods::as(
  Matrix::Matrix(
    matrix(c(2, 1, 0, 0,  1, 1, 1, 0,  0, 1, 2, 1,  0, 0, 1, 2),
           nrow = 4, byrow = TRUE),
    sparse = TRUE
  ),
  "dgCMatrix"
)
#> 'as(<dsCMatrix>, "dgCMatrix")' is deprecated.
#> Use 'as(., "generalMatrix")' instead.
#> See help("Deprecated") and help("Matrix-deprecated").
rownames(dtm) <- paste0("doc", 1:4)
colnames(dtm) <- paste0("term", 1:4)

fit <- fit_topic_model(
  dtm, engine = "text2vec", model = "lda", k = 2,
  control = list(fit = list(n_iter = 25, progressbar = FALSE))
)

# Engine-agnostic metrics only (no extra data needed)
evaluate_topic_model(fit, metrics = c("diversity", "exclusivity"))
#>         metric     level topic_id value supported
#>         <char>    <char>   <char> <num>    <lgcl>
#> 1:   diversity aggregate     <NA>   0.5      TRUE
#> 2: exclusivity aggregate     <NA>   0.5      TRUE

# Coherence and training likelihood with training data.
# Use level = "all" to include topic-level rows.
evaluate_topic_model(fit, training = dtm,
                     metrics = c("coherence_npmi", "train_perplexity"),
                     level = "all")
#>              metric     level topic_id      value supported
#>              <char>    <char>   <char>      <num>    <lgcl>
#> 1:   coherence_npmi aggregate     <NA> -0.1457735      TRUE
#> 2:   coherence_npmi     topic Topic001 -0.1457735      TRUE
#> 3:   coherence_npmi     topic Topic002 -0.1457735      TRUE
#> 4: train_perplexity aggregate     <NA>  2.7388768      TRUE