Skip to contents

Extract an ordered list of raw topicmodels::LDA(method = "VEM") fits and a vocabulary-aligned weighted DFM for OpTop::optimal_topic().

Usage

as_optop_input(x, weighted_dfm)

Arguments

x

An nlp_k_selection object created with return_fits = TRUE, a list of nlp_topic_fit objects, or a list of raw LDA_VEM objects.

weighted_dfm

A weighted quanteda::dfm, usually created with as_optop_weighted_dfm() from the same fitting input used for the LDA models.

Value

A list of class c("nlp_optop_input", "list") with:

lda_models

Raw LDA_VEM objects ordered by topic count.

weighted_dfm

Weighted DFM aligned to the LDA vocabulary.

k

Integer topic counts in ascending order.

Details

OpTop currently expects LDA_VEM objects from topicmodels. This adapter intentionally rejects Gibbs LDA, CTM, text2vec, seededlda, ETM, and partial fits so that users do not pass objects outside OpTop's current assumptions.

NLPstudio does not import or call OpTop. After preparing the input, call OpTop::optimal_topic(lda_models = input$lda_models, weighted_dfm = input$weighted_dfm, ...) when OpTop is installed.

Examples

dtm <- methods::as(
  Matrix::Matrix(
    matrix(c(2, 1, 0, 0,  1, 1, 1, 0,  0, 1, 2, 1,
             0, 0, 1, 2,  1, 0, 1, 1,  1, 2, 0, 1),
           nrow = 6, byrow = TRUE),
    sparse = TRUE
  ),
  "dgCMatrix"
)
rownames(dtm) <- paste0("doc", 1:6)
colnames(dtm) <- paste0("term", 1:4)
dfmat <- quanteda::as.dfm(dtm)

selection <- select_k_topics(
  dfmat,
  engine = "topicmodels",
  model = "lda",
  method = "VEM",
  k_grid = 2:3,
  metrics = c("diversity", "exclusivity"),
  holdout = 0,
  return_fits = TRUE,
  control = list(fit = list(seed = 1, em = list(iter.max = 5), var = list(iter.max = 5)))
)

optop_input <- as_optop_input(selection, as_optop_weighted_dfm(dfmat))
# OpTop::optimal_topic(optop_input$lda_models, optop_input$weighted_dfm)