Skip to contents

Compute readability measures in parallel using backends from the parallel package. By default, computation is parallelized with PSOCK clusters (parallel::clusterApplyLB()) for cross-platform stability. Optionally, FORK-based parallelism (parallel::mclapply()) may be requested on Linux/macOS.

Usage

calculate_readability(
  x,
  ncores = 1,
  nchunks = ncores,
  socket = c("PSOCK", "FORK"),
  ...
)

Arguments

x

A quanteda corpus or a character vector containing the documents to process.

ncores

Integer. Number of CPU cores to use for parallel processing. Defaults to 1 (sequential).

nchunks

Integer. Number of chunks to split the corpus into. Defaults to ncores. Setting nchunks > ncores can improve load balancing when documents vary in size. See Details.

socket

Character. Parallel backend to use. One of "PSOCK" (default, recommended) or "FORK". On Windows, "FORK" is not supported and will trigger an error.

...

Additional arguments passed to textstat_readability.

Value

A data.table with as many columns as passed to measure and doc_id as the document identifier.

Author

Francesco Grossetti francesco.grossetti@unibocconi.it

Examples

texts <- c(
  doc1 = "This is a short and very simple document.",
  doc2 = "This second document contains slightly longer sentences for illustration."
)

calculate_readability(
  texts,
  measure = c("Flesch", "FOG")
)
#> 
#> ── Calculating readability ──
#> 
#>  quanteda.textstats::textstat_readability() has been called with the following parameters
#>  measure = Flesch, FOG
#>  Computing readability sequentially
#>  Done
#>    doc_id Flesch      FOG
#>    <char>  <num>    <num>
#> 1:   doc1 71.815  8.20000
#> 2:   doc2  9.700 16.93333