Extract DTW (document-topic weights) from a supported topic-model object and return a standardized data.table.
Usage
get_dtw(
x,
doc_data = NULL,
docvars = FALSE,
include_text = FALSE,
doc_id_col = "doc_id",
text_col = "text"
)Arguments
- x
A supported topic-model object. This includes
nlp_topic_fit, rawtopicmodelsfits, rawseededldafits, and already standardized DTW tables.- doc_data
Optional document-data override. When supplied, this is used instead of any
doc_datastored inx. Accepted inputs are a corpus, data.frame, or data.table keyed bydoc_id.- docvars
Should stored or pre-existing document metadata be joined onto the returned DTW table? Defaults to
FALSE.- include_text
Should a
textcolumn be attached when a text-bearingdoc_datasource is available? Defaults toFALSE. WhenTRUEbut no text-bearingdoc_datais available, the function emits a warning.- doc_id_col
Document-ID column name when
doc_datais a data.frame or data.table. Defaults to"doc_id".- text_col
Text column name when
doc_datais a data.frame or data.table. Defaults to"text".
Value
A data.table with:
doc_idtopic columns named
Topic001,Topic002, ...topic_max_idtopic_max_inttopic_max_valuestored docvars when
docvars = TRUEmetadata columns from
doc_datawhen availabletextwheninclude_text = TRUEand text is available
Columns are ordered as doc_id, document metadata, DTW output columns, and
finally text when text is requested and available.
For already standardized DTW-table inputs, non-topic metadata columns are
treated as pre-existing document metadata and retained only when
docvars = TRUE.
Examples
dtm <- methods::as(
Matrix::Matrix(
matrix(
c(1, 0, 1,
1, 1, 0,
0, 1, 1,
1, 1, 1),
nrow = 4,
byrow = TRUE
),
sparse = TRUE
),
"dgCMatrix"
)
rownames(dtm) <- paste0("doc", 1:4)
colnames(dtm) <- paste0("term", 1:3)
fit <- fit_topic_model(
dtm,
engine = "text2vec",
model = "lda",
k = 2,
control = list(fit = list(n_iter = 25, progressbar = FALSE))
)
get_dtw(fit)
#> doc_id Topic001 Topic002 topic_max_id topic_max_int topic_max_value
#> <char> <num> <num> <char> <int> <num>
#> 1: doc1 1.0000000 0.0000000 Topic001 1 1.0000000
#> 2: doc2 0.5000000 0.5000000 Topic001 1 0.5000000
#> 3: doc3 0.5000000 0.5000000 Topic001 1 0.5000000
#> 4: doc4 0.6666667 0.3333333 Topic001 1 0.6666667
get_dtw(fit, docvars = TRUE)
#> doc_id Topic001 Topic002 topic_max_id topic_max_int topic_max_value
#> <char> <num> <num> <char> <int> <num>
#> 1: doc1 1.0000000 0.0000000 Topic001 1 1.0000000
#> 2: doc2 0.5000000 0.5000000 Topic001 1 0.5000000
#> 3: doc3 0.5000000 0.5000000 Topic001 1 0.5000000
#> 4: doc4 0.6666667 0.3333333 Topic001 1 0.6666667
