Fast Calculation of Readability Measures — calculate

Compute readability measures in a single call to quanteda.textstats::textstat_readability() (whose internal tokenization runs in quanteda's multithreaded C++ core). On top of the quanteda core, this wrapper validates its input, returns a data.table keyed by doc_id in input order, and scopes quanteda's internal multithreading via threads.

Usage

calculate_readability(
  x,
  threads = NULL,
  ncores = NULL,
  nchunks = NULL,
  socket = NULL,
  ...
)

Arguments

x: A quanteda corpus or a character vector containing the documents to process.
threads: Integer or NULL. Number of threads quanteda's internal (TBB) pool may use for this call. The previous setting is restored on exit. NULL (default) leaves quanteda::quanteda_options("threads") untouched — quanteda itself defaults to all available cores, so the default is already parallel. See the Threading section.
ncores, nchunks, socket: Deprecated since NLPstudio 1.2.0 and ignored: the chunked process-level (PSOCK/FORK) backend has been removed because it duplicated quanteda's internal multithreading while adding cluster startup, serialization, and peak-memory overhead. Supplying any of them raises a warning of class NLPstudio_deprecated; ncores is mapped to threads when threads is not given. These arguments will be removed in a future release.
...: Additional arguments passed to textstat_readability.

Value

A data.table with as many columns as passed to measure and doc_id as the document identifier.

Details

The chunked process-level backend was removed in NLPstudio 1.2.0; see the Threading section of tokenize_corpus() and vignette("performance-and-threading", package = "NLPstudio").

Author

Francesco Grossetti francesco.grossetti@unibocconi.it

Examples

texts <- c(
  doc1 = "This is a short and very simple document.",
  doc2 = "This second document contains slightly longer sentences for illustration."
)

calculate_readability(
  texts,
  measure = c("Flesch", "FOG")
)
#> 
#> ── Calculating readability ──
#> 
#> ℹ quanteda.textstats::textstat_readability() has been called with the following parameters
#> ℹ measure = Flesch, FOG
#> ✔ Done
#>    doc_id Flesch      FOG
#>    <char>  <num>    <num>
#> 1:   doc1 71.815  8.20000
#> 2:   doc2  9.700 16.93333