Remove subjects with transitions to different states occurring at the same
exact time in an augmented dataset produced by augment().
Usage
polish(
data,
data_key,
pattern,
time = NULL,
check_NA = FALSE,
copy = FALSE,
verbosity = getOption("msmtools.verbosity", "quiet")
)Arguments
- data
A
data.tableordata.frameobject in longitudinal format where each row represents an observation with known start and end times. Ifdatais adata.frame,augment()internally casts it to adata.table.- data_key
A keying variable used to identify subjects and define a key for
data(seedata.table::setkey()).- pattern
Either an integer, a factor, or a character variable with 2 or 3 unique values that gives each subject's terminal outcome schema. When 2 values are detected, they must be in the format: 0 = "alive", 1 = "dead". When 3 values are detected, they must be: 0 = "alive", 1 = "dead during a transition", 2 = "dead after a transition has ended" (see Details).
- time
The time variable used to identify duplicate transition times. If omitted or set to
NULL,polish()uses"augmented_int"when it is available, then"augmented_num". If neither column exists,timemust be supplied explicitly.- check_NA
If
TRUE,data_key,pattern, andtimeare checked for missing values. If any missing values are found, the function stops with an error. Default isFALSE.- copy
If
FALSE(default),polish()keeps the historical memory-efficient behavior and may modify caller-owneddataby reference. IfTRUE,datais copied before anydata.tableoperation so the input object remains unchanged.- verbosity
Controls informational output. Use
"quiet"to suppress status messages,"summary"for high-level phase messages and timing, and"progress"for phase messages plus progress bars in long status-building loops. The default isgetOption("msmtools.verbosity", "quiet").
Value
A data.table with the same columns as the input data. Subjects
whose pattern transitions occur at the same time on different states are
removed in full (every row sharing the same data_key); rows from
unaffected subjects are kept as-is. When no duplicated transitions are
found, the input data is returned unchanged.
Details
The function searches for cases where two subsequent events for the
same subject land on different states but occur at the same time. When this
happens, the whole subject, as identified by data_key, is removed from the
data. The function reports how many subjects were removed.
By default, polish() follows data.table by-reference semantics to avoid
unnecessary copies of large augmented datasets. This means the input may have
its key changed while duplicate subjects are identified. Set copy = TRUE
when the original input object must remain unchanged.
The function always returns a data.table. Use as.data.frame() on the
result if a plain data.frame is needed by downstream code.
Author
Francesco Grossetti francesco.grossetti@unibocconi.it.
Examples
# loading data
data(hosp)
# augmenting longitudinal data
hosp_aug = augment(data = hosp, data_key = subj, n_events = adm_number,
pattern = label_3, t_start = dateIN, t_end = dateOUT,
t_cens = dateCENS)
#> Warning: no t_death has been passed. Assuming that dateCENS contains both censoring and death times
# cleaning targeted duplicate transitions
hosp_aug_clean = polish(data = hosp_aug, data_key = subj, pattern = label_3)