Skip to contents

Create a faceted bar chart of the highest-probability terms for each topic using the long-format output from get_top_terms() via ggplot2. Each facet corresponds to one topic, and bars represent the estimated topic–word probabilities (\(\phi\)).

Usage

plot_top_terms(top_terms, facet_args = list(scales = "free_y"), ...)

Arguments

top_terms

A data.table returned by get_top_terms() with format = "long". Must contain the columns rank, topic, term, and probability.

facet_args

A named list of additional arguments passed to facet_wrap(). Defaults to list(scales = "free_y"), which allows each facet to have its own y-axis scale.

...

Additional arguments passed to geom_col().

Value

A ggplot object: a faceted horizontal bar chart with one facet per topic. Each bar shows the contribution of a term to that topic, as estimated by the topic–word distribution matrix (\(\phi\)).

Details

The function visualizes topic–word probabilities in a tidy, per-topic format. Terms are ranked within each topic by descending probability and reordered internally using tidytext::reorder_within to ensure correct sorting within facets. Typically, this function is used in combination with get_top_terms() (with format = "long") to prepare the input data.

Examples

if (FALSE) { # interactive() && requireNamespace("text2vec", quietly = TRUE)
# Requires the optional tidytext package.
dtm <- methods::as(
  Matrix::Matrix(
    matrix(
      c(1, 0, 1,
        1, 1, 0,
        0, 1, 1,
        1, 1, 1),
      nrow = 4,
      byrow = TRUE
    ),
    sparse = TRUE
  ),
  "dgCMatrix"
)
colnames(dtm) <- paste0("term", 1:3)
rownames(dtm) <- paste0("doc", 1:4)

model <- fit_topic_model(
  dtm,
  engine = "text2vec",
  model = "lda",
  k = 2,
  control = list(fit = list(n_iter = 25, progressbar = FALSE))
)
top_terms <- get_top_terms(model, n = 3, format = "long")

plot_top_terms(top_terms)
}