Compute readability measures in parallel using backends from the
parallel package. By default, computation is parallelized with
PSOCK clusters (parallel::clusterApplyLB()) for cross-platform
stability. Optionally, FORK-based parallelism
(parallel::mclapply()) may be requested on Linux/macOS.
Usage
calculate_readability(
x,
ncores = 1,
nchunks = ncores,
socket = c("PSOCK", "FORK"),
...
)Arguments
- x
A quanteda corpus or a character vector containing the documents to process.
- ncores
Integer. Number of CPU cores to use for parallel processing. Defaults to 1 (sequential).
- nchunks
Integer. Number of chunks to split the corpus into. Defaults to
ncores. Settingnchunks > ncorescan improve load balancing when documents vary in size. See Details.- socket
Character. Parallel backend to use. One of
"PSOCK"(default, recommended) or"FORK". On Windows,"FORK"is not supported and will trigger an error.- ...
Additional arguments passed to textstat_readability.
Value
A data.table with as many columns as passed to
measure and doc_id as the document identifier.
Author
Francesco Grossetti francesco.grossetti@unibocconi.it
Examples
texts <- c(
doc1 = "This is a short and very simple document.",
doc2 = "This second document contains slightly longer sentences for illustration."
)
calculate_readability(
texts,
measure = c("Flesch", "FOG")
)
#>
#> ── Calculating readability ──
#>
#> ℹ quanteda.textstats::textstat_readability() has been called with the following parameters
#> ℹ measure = Flesch, FOG
#> ℹ Computing readability sequentially
#> ✔ Done
#> doc_id Flesch FOG
#> <char> <num> <num>
#> 1: doc1 71.815 8.20000
#> 2: doc2 9.700 16.93333
