Skip to contents

Remove subjects with transitions to different states occurring at the same exact time in an augmented dataset produced by augment().

Usage

polish(
  data,
  data_key,
  pattern,
  time = NULL,
  check_NA = FALSE,
  copy = FALSE,
  verbosity = getOption("msmtools.verbosity", "quiet")
)

Arguments

data

A data.table or data.frame object in longitudinal format where each row represents an observation with known start and end times. If data is a data.frame, augment() internally casts it to a data.table.

data_key

A keying variable used to identify subjects and define a key for data (see data.table::setkey()).

pattern

Either an integer, a factor, or a character variable with 2 or 3 unique values that gives each subject's terminal outcome schema. When 2 values are detected, they must be in the format: 0 = "alive", 1 = "dead". When 3 values are detected, they must be: 0 = "alive", 1 = "dead during a transition", 2 = "dead after a transition has ended" (see Details).

time

The time variable used to identify duplicate transition times. If omitted or set to NULL, polish() uses "augmented_int" when it is available, then "augmented_num". If neither column exists, time must be supplied explicitly.

check_NA

If TRUE, data_key, pattern, and time are checked for missing values. If any missing values are found, the function stops with an error. Default is FALSE.

copy

If FALSE (default), polish() keeps the historical memory-efficient behavior and may modify caller-owned data by reference. If TRUE, data is copied before any data.table operation so the input object remains unchanged.

verbosity

Controls informational output. Use "quiet" to suppress status messages, "summary" for high-level phase messages and timing, and "progress" for phase messages plus progress bars in long status-building loops. The default is getOption("msmtools.verbosity", "quiet").

Value

A data.table with the same columns as the input data. Subjects whose pattern transitions occur at the same time on different states are removed in full (every row sharing the same data_key); rows from unaffected subjects are kept as-is. When no duplicated transitions are found, the input data is returned unchanged.

Details

The function searches for cases where two subsequent events for the same subject land on different states but occur at the same time. When this happens, the whole subject, as identified by data_key, is removed from the data. The function reports how many subjects were removed.

By default, polish() follows data.table by-reference semantics to avoid unnecessary copies of large augmented datasets. This means the input may have its key changed while duplicate subjects are identified. Set copy = TRUE when the original input object must remain unchanged.

The function always returns a data.table. Use as.data.frame() on the result if a plain data.frame is needed by downstream code.

See also

Author

Francesco Grossetti francesco.grossetti@unibocconi.it.

Examples


# loading data
data(hosp)

# augmenting longitudinal data
hosp_aug = augment(data = hosp, data_key = subj, n_events = adm_number,
                   pattern = label_3, t_start = dateIN, t_end = dateOUT,
                   t_cens = dateCENS)
#> Warning: no t_death has been passed. Assuming that dateCENS contains both censoring and death times

# cleaning targeted duplicate transitions
hosp_aug_clean = polish(data = hosp_aug, data_key = subj, pattern = label_3)