Return STM-native topic labels as a standardized long table. The helper wraps
stm::labelTopics() and, optionally, stm::sageLabels() while keeping
NLPstudio's canonical Topic### identifiers.
Usage
get_stm_topic_labels(
x,
n = 7L,
topics = NULL,
label_types = c("prob", "frex", "lift", "score"),
frexweight = 0.5,
include_sage = FALSE
)Arguments
- x
An STM
nlp_topic_fitreturned byfit_topic_model()or a raw stmSTMobject without content covariates.- n
Integer. Number of terms per label type. Defaults to
7L.- topics
Optional topic filter supplied as numeric topic indices or
Topic###identifiers.- label_types
Character vector of STM label families to return. Valid values are
"prob","frex","lift", and"score".- frexweight
Numeric value in
[0, 1]forwarded tostm::labelTopics()for FREX labels. Defaults to0.5.- include_sage
Logical. Should
stm::sageLabels()marginal labels also be included? Defaults toFALSE.
Value
A data.table with columns topic_id,
topic_int, source, label_type, rank, and term.
Details
This function is STM-specific. It is meant to complement the engine-agnostic
get_top_terms() accessor when users want labels based on STM's own
probability, FREX, lift, score, and optional SAGE calculations.
STM content-covariate models are not supported because they imply covariate-specific topic-word distributions, while NLPstudio currently standardizes one TWW matrix per fit.
Examples
dtm <- methods::as(
Matrix::Matrix(
matrix(c(2, 1, 0, 0, 1, 2, 0, 0, 0, 0, 2, 1,
0, 0, 1, 2, 2, 1, 0, 0, 0, 0, 1, 2),
nrow = 6, byrow = TRUE),
sparse = TRUE
),
"dgCMatrix"
)
rownames(dtm) <- paste0("doc", 1:6)
colnames(dtm) <- c("growth", "profit", "risk", "loss")
fit <- fit_topic_model(
dtm,
engine = "stm",
model = "stm",
k = 2,
control = list(fit = list(seed = 1, max.em.its = 5, verbose = FALSE))
)
#> Warning: K=2 is equivalent to a unidimensional scaling model which you may prefer.
get_stm_topic_labels(fit, n = 3)
#> topic_id topic_int source label_type rank term
#> <char> <int> <char> <char> <int> <char>
#> 1: Topic001 1 labelTopics frex 1 growth
#> 2: Topic001 1 labelTopics frex 2 profit
#> 3: Topic001 1 labelTopics frex 3 loss
#> 4: Topic001 1 labelTopics lift 1 profit
#> 5: Topic001 1 labelTopics lift 2 growth
#> 6: Topic001 1 labelTopics lift 3 loss
#> 7: Topic001 1 labelTopics prob 1 growth
#> 8: Topic001 1 labelTopics prob 2 profit
#> 9: Topic001 1 labelTopics prob 3 loss
#> 10: Topic001 1 labelTopics score 1 profit
#> 11: Topic001 1 labelTopics score 2 growth
#> 12: Topic001 1 labelTopics score 3 risk
#> 13: Topic002 2 labelTopics frex 1 risk
#> 14: Topic002 2 labelTopics frex 2 loss
#> 15: Topic002 2 labelTopics frex 3 growth
#> 16: Topic002 2 labelTopics lift 1 risk
#> 17: Topic002 2 labelTopics lift 2 loss
#> 18: Topic002 2 labelTopics lift 3 growth
#> 19: Topic002 2 labelTopics prob 1 loss
#> 20: Topic002 2 labelTopics prob 2 risk
#> 21: Topic002 2 labelTopics prob 3 growth
#> 22: Topic002 2 labelTopics score 1 risk
#> 23: Topic002 2 labelTopics score 2 loss
#> 24: Topic002 2 labelTopics score 3 profit
#> topic_id topic_int source label_type rank term
#> <char> <int> <char> <char> <int> <char>
