Build a compact one-row-per-topic interpretation table from an
nlp_topic_fit. The table combines top terms, prevalence, available
evaluation metrics, and representative documents.
Usage
summarize_topics(
fit,
training = NULL,
doc_data = NULL,
top_n = 10L,
representative_n = 3L,
include_text = FALSE,
docvars = FALSE,
doc_id_col = "doc_id",
text_col = "text"
)Arguments
- fit
An
nlp_topic_fitobject returned byfit_topic_model().- training
Optional training document-feature matrix. When supplied, coherence metrics are included.
- doc_data
Optional document metadata or text source forwarded to
get_dtw()andget_representative_candidates().- top_n
Integer. Number of top terms per topic. Defaults to
10L.- representative_n
Integer. Number of representative documents to retain per topic. Defaults to
3L.- include_text
Logical. Should representative text be included when available? Defaults to
FALSE.- docvars
Logical. Should stored document variables be available for representative selection output? Defaults to
FALSE.- doc_id_col
Document-ID column name when
doc_datais tabular. Defaults to"doc_id".- text_col
Text column name when
doc_datais tabular. Defaults to"text".
Value
A data.table with one row per topic.
Examples
dtm <- methods::as(
Matrix::Matrix(
matrix(c(2, 1, 0, 0, 1, 1, 1, 0, 0, 1, 2, 1,
0, 0, 1, 2, 1, 0, 1, 1, 1, 2, 0, 1),
nrow = 6, byrow = TRUE),
sparse = TRUE
),
"dgCMatrix"
)
rownames(dtm) <- paste0("doc", 1:6)
colnames(dtm) <- paste0("term", 1:4)
fit <- fit_topic_model(
dtm,
engine = "topicmodels",
model = "lda",
k = 2,
method = "Gibbs",
control = list(fit = list(seed = 1, iter = 50, burnin = 0, thin = 1))
)
summarize_topics(fit, training = dtm, top_n = 3)
#> topic_id topic_int top_terms top_term_probabilities
#> <char> <int> <char> <char>
#> 1: Topic001 1 term1, term4, term2 0.490385, 0.490385, 0.00961538
#> 2: Topic002 2 term2, term3, term1 0.490385, 0.490385, 0.00961538
#> prevalence coherence_npmi coherence_umass diversity exclusivity
#> <num> <num> <num> <num> <num>
#> 1: 0.5000582 -0.1179313 -0.5579921 0.6666667 0.6602564
#> 2: 0.4999418 -0.1179313 -0.5579921 0.6666667 0.6602564
#> representative_doc_ids representative_documents
#> <char> <list>
#> 1: doc1, doc4, doc5 <data.table[3x1]>
#> 2: doc3, doc2 <data.table[2x1]>
