Prepare NLPstudio Topic Models for OpTop — as_optop

Extract an ordered grid of topic models and a vocabulary-aligned weighted DFM for OpTop::optop_select() (called optimal_topic() before OpTop 0.19; the old name survives there as a delegating alias).

Usage

as_optop_input(x, weighted_dfm)

Arguments

x: An nlp_k_selection object created with return_fits = TRUE, a list of nlp_topic_fit objects, or a list of raw backend fits supported by OpTop (topicmodels::LDA()/CTM() objects or seededlda textmodel_*() objects). Classes and fitting methods may be mixed within a grid as long as every model was fitted on the same corpus and vocabulary.
weighted_dfm: A weighted quanteda::dfm, usually created with as_optop_weighted_dfm() from the same fitting input used for the models.

Value

A list of class c("nlp_optop_input", "list") with:

topic_models: Topic models ordered by topic count, ready for the topic_models argument of OpTop::optop_select().
weighted_dfm: Weighted DFM aligned to the model vocabulary.
k: Integer topic counts in ascending order.
lda_models: Deprecated alias for topic_models, kept for scripts written against the pre-1.2.0 bridge (when OpTop's argument was still called lda_models).

Details

OpTop consumes a model grid only through each model's fitted word probabilities, so since OpTop 0.19 the grid may hold nlp_topic_fit objects directly - whatever their engine - alongside raw topicmodels fits (VEM or Gibbs LDA, CTM) and seededlda models. nlp_topic_fit grids are therefore passed through as-is; this adapter's job is the bookkeeping around them: pulling stored fits out of a select_k_topics() result, ordering the grid by K, rejecting duplicate topic counts and vocabulary mismatches, and validating and aligning the weighted DFM.

NLPstudio does not import or call OpTop. After preparing the input, call OpTop::optop_select(topic_models = input$topic_models, weighted_dfm = input$weighted_dfm, ...) when OpTop is installed, and fold the result back into the selection report with summarize_k_selection().

Examples

dtm <- methods::as(
  Matrix::Matrix(
    matrix(c(2, 1, 0, 0,  1, 1, 1, 0,  0, 1, 2, 1,
             0, 0, 1, 2,  1, 0, 1, 1,  1, 2, 0, 1),
           nrow = 6, byrow = TRUE),
    sparse = TRUE
  ),
  "dgCMatrix"
)
rownames(dtm) <- paste0("doc", 1:6)
colnames(dtm) <- paste0("term", 1:4)
dfmat <- quanteda::as.dfm(dtm)

selection <- select_k_topics(
  dfmat,
  engine = "topicmodels",
  model = "lda",
  method = "VEM",
  k_grid = 2:3,
  metrics = c("diversity", "exclusivity"),
  holdout = 0,
  return_fits = TRUE,
  control = list(fit = list(seed = 1, em = list(iter.max = 5), var = list(iter.max = 5)))
)

optop_input <- as_optop_input(selection, as_optop_weighted_dfm(dfmat))
# OpTop::optop_select(
#   topic_models = optop_input$topic_models,
#   weighted_dfm = optop_input$weighted_dfm
# )