Skip to contents

Extract the top n highest-probability terms from each topic. The function works across all topic-model backends supported by get_tww().

Usage

get_top_terms(x, n = 10, topics = NULL, format = c("long", "wide"))

Arguments

x

A supported topic-model object accepted by get_tww().

n

Integer. Number of top terms to extract per topic. Defaults to 10.

topics

Optional topic filter supplied either as numeric indices or as Topic### identifiers. If NULL, all topics are included.

format

Output format. Either "long" (default) or "wide".

Value

A data.table.

  • "long" format contains one row per (topic, term) pair with columns rank, topic, term, and probability.

  • "wide" format contains one row per rank with two columns per topic: <Topic###>_term and <Topic###>_prob.

Details

get_top_terms() first extracts standardized TWW via get_tww(), then ranks terms within each topic by descending probability.

Examples

dtm <- methods::as(
  Matrix::Matrix(
    matrix(
      c(1, 0, 1,
        1, 1, 0,
        0, 1, 1,
        1, 1, 1),
      nrow = 4,
      byrow = TRUE
    ),
    sparse = TRUE
  ),
  "dgCMatrix"
)
rownames(dtm) <- paste0("doc", 1:4)
colnames(dtm) <- paste0("term", 1:3)

fit <- fit_topic_model(
  dtm,
  engine = "text2vec",
  model = "lda",
  k = 2,
  control = list(fit = list(n_iter = 25, progressbar = FALSE))
)

get_top_terms(fit, n = 2, format = "long")
#>     rank    topic   term probability
#>    <int>   <char> <char>       <num>
#> 1:     1 Topic001  term2   0.4285714
#> 2:     2 Topic001  term1   0.2857143
#> 3:     1 Topic002  term1   0.5000000
#> 4:     2 Topic002  term3   0.5000000