Skip to contents

Estimate and tidy STM prevalence effects for fitted topics.

Usage

estimate_stm_topic_effects(
  fit,
  formula = NULL,
  metadata = NULL,
  topics = NULL,
  uncertainty = c("Global", "Local", "None"),
  nsims = 25L,
  conf_level = 0.95
)

Arguments

fit

An STM nlp_topic_fit returned by fit_topic_model() or a raw stm STM object without content covariates.

formula

Optional prevalence formula. If NULL, NLPstudio uses the prevalence formula stored in the STM fit. A right-hand-side-only formula such as ~ group is combined with the selected topics. A full formula is forwarded as-is.

metadata

Optional metadata for stm::estimateEffect(). If omitted, stored fit$docvars are used when available.

topics

Optional topic filter supplied as numeric topic indices or Topic### identifiers. Ignored when formula is a full formula with its own left-hand side.

uncertainty

Uncertainty mode forwarded to stm::estimateEffect(). One of "Global", "Local", or "None".

nsims

Integer number of simulations forwarded to stm::estimateEffect(). Defaults to 25L.

conf_level

Confidence level used for Wald intervals in the returned table. Defaults to 0.95.

Value

An object of class c("nlp_stm_topic_effects", "data.table", "data.frame") with tidy coefficient rows. The raw estimateEffect object is attached as attr(result, "estimate_effect").

Details

This helper reports prevalence effects for STM fits. It does not add new STM prediction behavior, and it does not support content-covariate STM models.

Examples

texts <- c(
  doc1 = "profit revenue growth",
  doc2 = "profit margin growth",
  doc3 = "risk litigation loss",
  doc4 = "debt risk loss",
  doc5 = "revenue market profit",
  doc6 = "litigation cost risk"
)
corp <- quanteda::corpus(texts)
quanteda::docvars(corp, "group") <- rep(c("a", "b"), 3)
dfm <- quanteda::dfm(quanteda::tokens(corp))
fit <- fit_topic_model(
  dfm,
  engine = "stm",
  model = "stm",
  k = 2,
  docvars = TRUE,
  control = list(
    fit = list(
      prevalence = ~ group,
      seed = 1,
      max.em.its = 5,
      verbose = FALSE
    )
  )
)
#> Warning: K=2 is equivalent to a unidimensional scaling model which you may prefer.
estimate_stm_topic_effects(fit, nsims = 5)
#>    topic_id topic_int        term    estimate std_error  statistic     p_value
#>      <char>     <int>      <char>       <num>     <num>      <num>       <num>
#> 1: Topic001         1 (Intercept)  0.53910970 0.1085404  4.9669026 0.007668356
#> 2: Topic001         1      groupb -0.04961562 0.1442915 -0.3438568 0.748268905
#> 3: Topic002         2 (Intercept)  0.46119938 0.1068672  4.3156312 0.012490383
#> 4: Topic002         2      groupb  0.04728100 0.1412102  0.3348271 0.754577280
#>      conf_low conf_high uncertainty nsims
#>         <num>     <num>      <char> <int>
#> 1:  0.3263744 0.7518450      Global     5
#> 2: -0.3324218 0.2331906      Global     5
#> 3:  0.2517435 0.6706552      Global     5
#> 4: -0.2294859 0.3240479      Global     5