Extract the top n highest-probability terms from each topic. The function
works across all topic-model backends supported by get_tww().
Usage
get_top_terms(x, n = 10, topics = NULL, format = c("long", "wide"))Arguments
- x
A supported topic-model object accepted by
get_tww().- n
Integer. Number of top terms to extract per topic. Defaults to
10.- topics
Optional topic filter supplied either as numeric indices or as
Topic###identifiers. IfNULL, all topics are included.- format
Output format. Either
"long"(default) or"wide".
Value
A data.table.
"long"format contains one row per(topic, term)pair with columnsrank,topic,term, andprobability."wide"format contains one row per rank with two columns per topic:<Topic###>_termand<Topic###>_prob.
Details
get_top_terms() first extracts standardized TWW via get_tww(), then
ranks terms within each topic by descending probability.
Examples
dtm <- methods::as(
Matrix::Matrix(
matrix(
c(1, 0, 1,
1, 1, 0,
0, 1, 1,
1, 1, 1),
nrow = 4,
byrow = TRUE
),
sparse = TRUE
),
"dgCMatrix"
)
rownames(dtm) <- paste0("doc", 1:4)
colnames(dtm) <- paste0("term", 1:3)
fit <- fit_topic_model(
dtm,
engine = "text2vec",
model = "lda",
k = 2,
control = list(fit = list(n_iter = 25, progressbar = FALSE))
)
get_top_terms(fit, n = 2, format = "long")
#> rank topic term probability
#> <int> <char> <char> <num>
#> 1: 1 Topic001 term2 0.4285714
#> 2: 2 Topic001 term1 0.2857143
#> 3: 1 Topic002 term1 0.5000000
#> 4: 2 Topic002 term3 0.5000000
