Reshape standard longitudinal data into augmented transition data suitable for multi-state models fitted with msm.
Arguments
- data
A
data.tableordata.frameobject in longitudinal format where each row represents an observation with known start and end times. Ifdatais adata.frame,augment()internally casts it to adata.table.- data_key
A keying variable used to identify subjects and define a key for
data(seedata.table::setkey()).- n_events
An integer variable indicating the progressive (monotonic) event number for each subject.
augment()checks whethern_eventsis monotonically increasing within eachdata_keyand stops if the check fails (see Details). If missing,augment()creates a variable named"n_events".- pattern
Either an integer, a factor, or a character variable with 2 or 3 unique values that gives each subject's terminal outcome schema. When 2 values are detected, they must be in the format: 0 = "alive", 1 = "dead". When 3 values are detected, they must be: 0 = "alive", 1 = "dead during a transition", 2 = "dead after a transition has ended" (see Details).
- state
A character vector of exactly three unique, non-missing, non-empty labels used as the generated transition-state vocabulary. Defaults to
c("IN", "OUT", "DEAD")(see Details).- t_start
The starting time of an observation. It can be passed as date, integer, or numeric format.
- t_end
The ending time of an observation. It can be passed as date, integer, or numeric format.
- t_cens
The censoring time of the study. This is the date until each ID is observed, if still active in the cohort.
- t_death
The exact death time of a subject ID. If
t_deathis missing,t_censis assumed to contain both censoring and death times and a warning is raised.- t_augmented
A variable indicating the name of the new time variable in the augmented format. If
t_augmentedis missing, the default name"augmented"is used and the new variable is added todata. Whent_startis a date or difftime,augment()also creates an integer or numeric companion variable. The suffix"_int"or"_num"is added tot_augmentedaccordingly. This is needed because msm does not handle date or difftime variables directly. Both variables are positioned beforet_start.- more_status
A variable that marks further transitions beyond the default ones given by
state.more_statuscan be a factor or character (see Details). IfNULL(default),augment()ignores it.- check_NA
If
TRUE,data_key,n_events,pattern,t_start, andt_endare checked for missing values. If any missing values are found, the function stops with an error. Default isFALSEbecauseaugment()is not intended for general consistency checks and the scan can add memory overhead on very large datasets.more_statusis always checked for missing values when supplied.- copy
If
FALSE(default),augment()keeps the historical memory-efficient behavior and may modify caller-owneddataby reference. IfTRUE,datais copied before anydata.tableoperation so the input object remains unchanged.- verbosity
Controls informational output. Use
"quiet"to suppress status messages,"summary"for high-level phase messages and timing, and"progress"for phase messages plus progress bars in long status-building loops. The default isgetOption("msmtools.verbosity", "quiet").
Value
An augmented dataset of class data.table. Each row represents a
specific transition for a given subject. augment() computes the following
key variables:
augmented: The transition time variable. Ift_augmentedis missing,augment()creates augmented by default. The variable is built fromt_startandt_endand inherits their class. Ift_startis a date,augment()also creates an integer variable named augmented_int. Ift_startis a difftime, it creates a numeric variable named augmented_num.status: A status flag that contains the states as specified instate.augment()automatically checks whether argumentpatternhas 2 or 3 unique values and computes the correct structure of a given subject as reported in the vignette. The variable is cast as character.status_num: The corresponding integer version of status.n_status: A mix ofstatusandn_eventscast as character. This is useful when modelling process progression.
If more_status is passed, augment() computes additional variables.
They mirror the meaning of status, status_num, and n_status but they
account for the more complex structure defined. They are: status_exp,
status_exp_num, and n_status_exp.
Details
augment() requires a monotonic event sequence within each subject.
The data are ordered with data.table::setkey() using data_key as the
primary key and t_start as the secondary key. The function then checks the
monotonicity of n_events; if the check fails, it stops and reports the
subjects that violate the condition. If n_events is missing, augment()
first computes a progression number named n_events and then runs the same
check.
Argument pattern describes the terminal outcome schema and must follow the
expected ordering. With two statuses, values must correspond to
0 = "alive" and 1 = "dead". With three statuses, integer values must
correspond to 0 = "alive", 1 = "dead inside a transition", and
2 = "dead outside a transition". Character and factor values must follow
the same order. For example, 0 cannot be used to indicate death.
Argument state describes the generated transition-state vocabulary. Its
order also matters. The first element is the state at t_start (for example,
"IN"), the second element is the state at t_end (for example, "OUT"),
and the third element is the absorbing state (for example, "DEAD"). A
two-value pattern still requires three state labels because augment()
infers whether death maps to the absorbing state inside or outside the
transition window.
more_status lets augment() represent transitions beyond the defaults in
state. Standard observations that add no extra information should use
"df" for "default" (see Examples, or run ?hosp and inspect rehab_it).
More complex transitions should use concise, self-explanatory labels.
By default, augment() follows data.table by-reference semantics to avoid
unnecessary copies of large longitudinal datasets. This means the input may
have its key changed, and n_events may be added when the argument is
omitted. Set copy = TRUE when the original input object must remain
unchanged.
The function always returns a data.table. Use as.data.frame() on the
result if a plain data.frame is needed by downstream code.
References
Grossetti, F., Ieva, F., and Paganoni, A.M. (2018). A multi-state approach to patients affected by chronic heart failure. Health Care Management Science, 21, 281-291. doi:10.1007/s10729-017-9400-z .
Jackson, C.H. (2011). Multi-State Models for Panel Data: The msm Package for R. Journal of Statistical Software, 38(8), 1-29. https://www.jstatsoft.org/v38/i08/.
M. Dowle, A. Srinivasan, T. Short, S. Lianoglou with contributions from
R. Saporta and E. Antonyan (2016): data.table: Extension of data.frame.
R package version 1.9.6. https://github.com/Rdatatable/data.table/wiki
Author
Francesco Grossetti francesco.grossetti@unibocconi.it.
Examples
# loading data
data(hosp)
# augmenting hosp
hosp_augmented = augment(data = hosp, data_key = subj, n_events = adm_number,
pattern = label_3, t_start = dateIN, t_end = dateOUT,
t_cens = dateCENS)
#> Warning: no t_death has been passed. Assuming that dateCENS contains both censoring and death times
# augmenting hosp by passing more information regarding transitions
# with argument more_status
hosp_augmented_more = augment(data = hosp, data_key = subj, n_events = adm_number,
pattern = label_3, t_start = dateIN, t_end = dateOUT,
t_cens = dateCENS, more_status = rehab_it)
#> Warning: no t_death has been passed. Assuming that dateCENS contains both censoring and death times
# requesting progress output
hosp_augmented = augment(data = hosp, data_key = subj, n_events = adm_number,
pattern = label_3, t_start = dateIN, t_end = dateOUT,
t_cens = dateCENS, verbosity = "summary")
#> ── setting everything up ───────────────────────────────────────────────────────
#> Warning: no t_death has been passed. Assuming that dateCENS contains both censoring and death times
#> ℹ checking monotonicity of adm_number
#> ✔ adm_number is monotonic
#> ℹ checking label_3 and defining patterns
#> ✔ detected 3 values in label_3
#> ℹ augmenting data
#> ✔ data have been augmented
#> ℹ defining dimensions
#> ✔ dimensions computed
#> ℹ adding status flag
#> ✔ status flag has been added successfully
#> ℹ adding numeric status flag
#> ✔ numeric status has been added successfully
#> ℹ adding sequential status flag
#> ✔ sequential status flag has been added successfully
#> ℹ adding variable augmented as new time variable
#> ✔ variables augmented and augmented_int successfully added and repositioned
#> ── augment() took: 0.0449999999999999 sec. ─────────────────────────────────────