Estimate and tidy STM prevalence effects for fitted topics.
Usage
estimate_stm_topic_effects(
fit,
formula = NULL,
metadata = NULL,
topics = NULL,
uncertainty = c("Global", "Local", "None"),
nsims = 25L,
conf_level = 0.95
)Arguments
- fit
An STM
nlp_topic_fitreturned byfit_topic_model()or a raw stmSTMobject without content covariates.- formula
Optional prevalence formula. If
NULL, NLPstudio uses the prevalence formula stored in the STM fit. A right-hand-side-only formula such as~ groupis combined with the selected topics. A full formula is forwarded as-is.- metadata
Optional metadata for
stm::estimateEffect(). If omitted, storedfit$docvarsare used when available.- topics
Optional topic filter supplied as numeric topic indices or
Topic###identifiers. Ignored whenformulais a full formula with its own left-hand side.- uncertainty
Uncertainty mode forwarded to
stm::estimateEffect(). One of"Global","Local", or"None".- nsims
Integer number of simulations forwarded to
stm::estimateEffect(). Defaults to25L.- conf_level
Confidence level used for Wald intervals in the returned table. Defaults to
0.95.
Value
An object of class
c("nlp_stm_topic_effects", "data.table", "data.frame") with tidy
coefficient rows. The raw estimateEffect object is attached as
attr(result, "estimate_effect").
Details
This helper reports prevalence effects for STM fits. It does not add new STM prediction behavior, and it does not support content-covariate STM models.
Examples
texts <- c(
doc1 = "profit revenue growth",
doc2 = "profit margin growth",
doc3 = "risk litigation loss",
doc4 = "debt risk loss",
doc5 = "revenue market profit",
doc6 = "litigation cost risk"
)
corp <- quanteda::corpus(texts)
quanteda::docvars(corp, "group") <- rep(c("a", "b"), 3)
dfm <- quanteda::dfm(quanteda::tokens(corp))
fit <- fit_topic_model(
dfm,
engine = "stm",
model = "stm",
k = 2,
docvars = TRUE,
control = list(
fit = list(
prevalence = ~ group,
seed = 1,
max.em.its = 5,
verbose = FALSE
)
)
)
#> Warning: K=2 is equivalent to a unidimensional scaling model which you may prefer.
estimate_stm_topic_effects(fit, nsims = 5)
#> topic_id topic_int term estimate std_error statistic p_value
#> <char> <int> <char> <num> <num> <num> <num>
#> 1: Topic001 1 (Intercept) 0.53910970 0.1085404 4.9669026 0.007668356
#> 2: Topic001 1 groupb -0.04961562 0.1442915 -0.3438568 0.748268905
#> 3: Topic002 2 (Intercept) 0.46119938 0.1068672 4.3156312 0.012490383
#> 4: Topic002 2 groupb 0.04728100 0.1412102 0.3348271 0.754577280
#> conf_low conf_high uncertainty nsims
#> <num> <num> <char> <int>
#> 1: 0.3263744 0.7518450 Global 5
#> 2: -0.3324218 0.2331906 Global 5
#> 3: 0.2517435 0.6706552 Global 5
#> 4: -0.2294859 0.3240479 Global 5
