Convert Existing Topic-Model Objects to nlp_topic_fit

as_nlp_topic_fit() converts supported fitted topic-model objects into the current nlp_topic_fit class used by fit_topic_model(). It can adopt raw fits from supported backends and saved outputs from the removed warp_lda() wrapper without refitting models.

Usage

as_nlp_topic_fit(x, ...)

# S3 method for class 'nlp_topic_fit'
as_nlp_topic_fit(x, ...)

# S3 method for class 'list'
as_nlp_topic_fit(
  x,
  k = NULL,
  doc_ids = NULL,
  vocab = NULL,
  docvars = NULL,
  doc_data = NULL,
  control = list(),
  warn_partial = TRUE,
  return_dtw = TRUE,
  return_tww = TRUE,
  ...
)

# S3 method for class 'TopicModel'
as_nlp_topic_fit(
  x,
  docvars = NULL,
  doc_data = NULL,
  return_dtw = TRUE,
  return_tww = TRUE,
  ...
)

# S3 method for class 'LDA_Gibbs'
as_nlp_topic_fit(x, docvars = NULL, doc_data = NULL, ...)

# S3 method for class 'LDA_VEM'
as_nlp_topic_fit(x, docvars = NULL, doc_data = NULL, ...)

# S3 method for class 'CTM_VEM'
as_nlp_topic_fit(x, docvars = NULL, doc_data = NULL, ...)

# S3 method for class 'textmodel'
as_nlp_topic_fit(
  x,
  model = NULL,
  docvars = NULL,
  doc_data = NULL,
  return_dtw = TRUE,
  return_tww = TRUE,
  keep_backend_data = FALSE,
  ...
)

# S3 method for class 'textmodel_lda'
as_nlp_topic_fit(x, model = NULL, docvars = NULL, doc_data = NULL, ...)

# S3 method for class 'WarpLDA'
as_nlp_topic_fit(
  x,
  theta = NULL,
  doc_ids = NULL,
  vocab = NULL,
  docvars = NULL,
  doc_data = NULL,
  control = list(),
  warn_partial = TRUE,
  return_dtw = TRUE,
  return_tww = TRUE,
  ...
)

# S3 method for class 'STM'
as_nlp_topic_fit(
  x,
  doc_ids = NULL,
  docvars = NULL,
  doc_data = NULL,
  return_dtw = TRUE,
  return_tww = TRUE,
  ...
)

# Default S3 method
as_nlp_topic_fit(x, ...)

Arguments

x: Object to convert.
...: Additional arguments forwarded to methods.
k: Optional topic count. Usually inferred from theta, phi, or the stored backend object.
doc_ids: Optional document IDs used when legacy theta does not already contain document identifiers.
vocab: Optional vocabulary used when legacy phi does not already contain term names.
docvars: Optional document metadata to store on the converted object.
doc_data: Optional document metadata or text sidecar to store on the converted object.
control: Optional backend controls to store as migration metadata. Use control$model$doc_topic_prior and control$model$topic_word_prior when the old model used non-default WarpLDA priors.
warn_partial: Logical. Warn when theta or phi cannot be recovered. Defaults to TRUE.
return_dtw, return_tww: Logical. Should the standardized document-topic weights (DTW) and topic-word weights (TWW) be materialized and cached on the converted object? Both default to TRUE. Set return_tww = FALSE to avoid densifying very large topic-word matrices (for example n-gram models) at conversion time; get_tww() and get_dtw() then reconstruct on demand from the retained model object. Mirrors the same arguments on fit_topic_model().
model: Optional model family override for raw seededlda objects. Use "seqlda" for sequential LDA fits, which are not reliably distinguishable from ordinary seededlda LDA after fitting.
keep_backend_data: Logical. seededlda textmodel_*() objects retain the entire input dfm in $data; by default (FALSE) the copy stored inside the converted fit replaces it with a zero-count dfm of identical shape, dimnames, and docvars (the caller's original object is not modified). Set to TRUE to retain the counts. Mirrors the same argument on fit_topic_model().
theta: Optional document-topic matrix for raw text2vec WarpLDA objects. Raw WarpLDA objects do not retain the return value of fit_transform(), so pass that matrix here when available.

Value

An object of class c("nlp_topic_fit", "list").

Details

Supported input families are:

topicmodels S4 fits from topicmodels::LDA() and topicmodels::CTM() (LDA_Gibbs, LDA_VEM, and CTM_VEM);
seededlda textmodel fits from textmodel_lda() and textmodel_seededlda();
raw text2vec WarpLDA/LDA R6 objects, optionally paired with the theta matrix returned by fit_transform();
raw stm STM objects without content covariates;
saved list outputs from the removed NLPstudio warp_lda() wrapper.

The conversion is non-refitting. It standardizes cached DTW/TWW matrices, topic IDs, document IDs, vocabulary, and metadata where those components are already present on the input object. Raw text2vec objects do not retain document-topic weights internally, so pass theta when downstream DTW access is needed.

Raw stm content-covariate models are not converted because they imply covariate-specific topic-word distributions, while NLPstudio currently standardizes one TWW matrix per fit.

Examples

if (FALSE) { # interactive()
old <- readRDS("legacy-warp-lda-output.rds")
fit <- as_nlp_topic_fit(old)
get_top_terms(fit)
}