using CSV
using DataFramesMeta
using Dates
using ReadStatTables
using StatsBase
using PharmaDatasets
using ADaM
DAPA01: Creating an ADPPK dataset with ADaM.jl (without covariates)
This tutorial presents a template for ADPPK dataset preparation using ADaM.jl. Since we do not have any other datasets for deriving covariates, the ADPPK dataset primarily contains information related to concentration, dose, and time. This template should be customized according to the specific study and data being used.
This tutorial uses code annotations with hover functionality to highlight certain code sections. Hovering over the numbered annotations beside code snippets will display additional information.
1 Load Packages
In this section we load every Julia package used in the tutorial. DataFramesMeta drives the data wrangling, Dates handles the date/time arithmetic, ReadStatTables and PharmaDatasets pull in the source SDTM domains, and ADaM provides the helpers — make_dtm, set_exclusion, join_columns, round_columns — that power each transformation step below.
2 Read Data
The datasets are read from the SDTM/DAPA01 datasets folder of the PharmaDatasets.jl package.
pc = @chain dataset("SDTM/DAPA01/pc") convert_to_missing(["", nothing])
ex = @chain dataset("SDTM/DAPA01/ex") convert_to_missing(["", nothing])
first(pc, 5)| Row | STUDYID | DOMAIN | USUBJID | PCSEQ | PCTESTCD | PCTEST | PCORRES | PCORRESU | PCSTRESC | PCSTRESN | PCSTRESU | PCSPEC | PCLLOQ | VISIT | VISITNUM | PCDTC | PCDY | PCTPT | PCTPTNUM |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| String7 | String3 | String15 | Float64 | String7 | String15 | String15 | String7 | String15 | Float64 | String7 | String7 | String15 | String15 | Float64 | String31 | Float64 | String31 | Float64 | |
| 1 | DAPA01 | PC | DAPA01-001 | 1.0 | DAPA | Dapagliflozin | 157.021 | ng/mL | 157.021 | 157.021 | ng/mL | plasma | 0.1 ng/mL | Period 1 Day 1 | 2.0 | 2022-06-10 09:30:00 | 1.0 | 0-HR POSTDOSE | 0.0 |
| 2 | DAPA01 | PC | DAPA01-001 | 2.0 | DAPA | Dapagliflozin | 141.892 | ng/mL | 141.892 | 141.892 | ng/mL | plasma | 0.1 ng/mL | Period 1 Day 1 | 2.0 | 2022-06-10 09:33:00 | 1.0 | 0.05-HR POSTDOSE | 0.05 |
| 3 | DAPA01 | PC | DAPA01-001 | 3.0 | DAPA | Dapagliflozin | 116.228 | ng/mL | 116.228 | 116.228 | ng/mL | plasma | 0.1 ng/mL | Period 1 Day 1 | 2.0 | 2022-06-10 09:51:00 | 1.0 | 0.35-HR POSTDOSE | 0.35 |
| 4 | DAPA01 | PC | DAPA01-001 | 4.0 | DAPA | Dapagliflozin | 109.353 | ng/mL | 109.353 | 109.353 | ng/mL | plasma | 0.1 ng/mL | Period 1 Day 1 | 2.0 | 2022-06-10 10:00:00 | 1.0 | 0.5-HR POSTDOSE | 0.5 |
| 5 | DAPA01 | PC | DAPA01-001 | 5.0 | DAPA | Dapagliflozin | 66.4814 | ng/mL | 66.4814 | 66.4814 | ng/mL | plasma | 0.1 ng/mL | Period 1 Day 1 | 2.0 | 2022-06-10 10:15:00 | 1.0 | 0.75-HR POSTDOSE | 0.75 |
2.1 VISITDY Lookup
The pc dataset does not contain the VISITDY column, which is needed to derive nominal times, but this column is available in the ex dataset. Therefore, the ex dataset is used to create a lookup table with VISITNUM - VISITDY mapping, which can then be merged with the pc dataset to derive nominal variables.
visitdy_lookup = @chain ex begin
@select :VISITNUM :VISITDY
unique
end
first(visitdy_lookup, 5)| Row | VISITNUM | VISITDY |
|---|---|---|
| Float64 | Float64 | |
| 1 | 2.0 | 1.0 |
| 2 | 3.0 | 8.0 |
| 3 | 4.0 | 15.0 |
| 4 | 5.0 | 22.0 |
3 PC Data Preparation
In this section of the tutorial we shape the PC (Pharmacokinetics) domain into an analysis-ready form. We derive the analysis date/time variables (ADTM, ADT, ATM), join the VISITDY lookup to compute the nominal time variables (NFRLT, AVISIT, AVISITN), set the event identifier EVID = 0 for observations, and convert PCLLOQ from its unit-suffixed string form into a numeric lower limit of quantification.
pc_prep = @chain pc begin
@rtransform :PCDTC = replace(:PCDTC, " " => "T")
make_dtm(:PCDTC, prefix = "A")
make_dtm_to_dt(:ADTM, prefix = "A")
make_dtm_to_tm(:ADTM, prefix = "A")
leftjoin(visitdy_lookup, on = :VISITNUM)
@rtransform @astable begin
:EVID = 0
:DRUG = :PCTEST
:NFRLT = 24 * (:VISITDY - 1) + :PCTPTNUM
:AVISITN = :VISITNUM
:AVISIT = "Visit " * string(:AVISITN)
:PCLLOQ = parse(Float64, replace(:PCLLOQ, " ng/mL" => ""))
end
@orderby :USUBJID :DRUG :ADTM :EVID
end
first(pc_prep, 5)- 1
-
@rtransform @astable beginlets you create several columns inside a single block.@astablemakes a column derived earlier in the block (e.g.AVISITN) available when computing a later column (e.g.AVISIT). - 2
-
PCLLOQarrives as aStringwith the unit suffix (e.g."5 ng/mL"). Pumas expects numeric values for the lower limit of quantification, so the unit is stripped withreplaceand the remaining text is parsed toFloat64.
| Row | STUDYID | DOMAIN | USUBJID | PCSEQ | PCTESTCD | PCTEST | PCORRES | PCORRESU | PCSTRESC | PCSTRESN | PCSTRESU | PCSPEC | PCLLOQ | VISIT | VISITNUM | PCDTC | PCDY | PCTPT | PCTPTNUM | ADTM | ADT | ATM | VISITDY | EVID | DRUG | NFRLT | AVISITN | AVISIT |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| String7 | String3 | String15 | Float64 | String7 | String15 | String15 | String7 | String15 | Float64 | String7 | String7 | Float64 | String15 | Float64 | String | Float64 | String31 | Float64 | DateTime | Date | Time | Float64? | Int64 | String15 | Float64 | Float64 | String | |
| 1 | DAPA01 | PC | DAPA01-001 | 1.0 | DAPA | Dapagliflozin | 157.021 | ng/mL | 157.021 | 157.021 | ng/mL | plasma | 0.1 | Period 1 Day 1 | 2.0 | 2022-06-10T09:30:00 | 1.0 | 0-HR POSTDOSE | 0.0 | 2022-06-10T09:30:00 | 2022-06-10 | 09:30:00 | 1.0 | 0 | Dapagliflozin | 0.0 | 2.0 | Visit 2.0 |
| 2 | DAPA01 | PC | DAPA01-001 | 2.0 | DAPA | Dapagliflozin | 141.892 | ng/mL | 141.892 | 141.892 | ng/mL | plasma | 0.1 | Period 1 Day 1 | 2.0 | 2022-06-10T09:33:00 | 1.0 | 0.05-HR POSTDOSE | 0.05 | 2022-06-10T09:33:00 | 2022-06-10 | 09:33:00 | 1.0 | 0 | Dapagliflozin | 0.05 | 2.0 | Visit 2.0 |
| 3 | DAPA01 | PC | DAPA01-001 | 3.0 | DAPA | Dapagliflozin | 116.228 | ng/mL | 116.228 | 116.228 | ng/mL | plasma | 0.1 | Period 1 Day 1 | 2.0 | 2022-06-10T09:51:00 | 1.0 | 0.35-HR POSTDOSE | 0.35 | 2022-06-10T09:51:00 | 2022-06-10 | 09:51:00 | 1.0 | 0 | Dapagliflozin | 0.35 | 2.0 | Visit 2.0 |
| 4 | DAPA01 | PC | DAPA01-001 | 4.0 | DAPA | Dapagliflozin | 109.353 | ng/mL | 109.353 | 109.353 | ng/mL | plasma | 0.1 | Period 1 Day 1 | 2.0 | 2022-06-10T10:00:00 | 1.0 | 0.5-HR POSTDOSE | 0.5 | 2022-06-10T10:00:00 | 2022-06-10 | 10:00:00 | 1.0 | 0 | Dapagliflozin | 0.5 | 2.0 | Visit 2.0 |
| 5 | DAPA01 | PC | DAPA01-001 | 5.0 | DAPA | Dapagliflozin | 66.4814 | ng/mL | 66.4814 | 66.4814 | ng/mL | plasma | 0.1 | Period 1 Day 1 | 2.0 | 2022-06-10T10:15:00 | 1.0 | 0.75-HR POSTDOSE | 0.75 | 2022-06-10T10:15:00 | 2022-06-10 | 10:15:00 | 1.0 | 0 | Dapagliflozin | 0.75 | 2.0 | Visit 2.0 |
make_dtm helps in creating a DateTime column ADTM (Analysis DateTime) from a String column . The prefix A needs to be specified to name the resultant column ADTM.
make_dtm_to_dt helps in creating a Date column ADT (Analysis Date) from DateTime column ADTM (Analysis DateTime) based on prefix A.
make_dtm_to_tm helps in creating a Time column ATM (Analysis Time) from DateTime column ADTM (Analysis DateTime) based on prefix A.
After deriving the date variables, the nominal time variables NFRLT (Nominal Relative Time from First Dose), AVISIT (Analysis Visit) and AVISITN are derived.
4 EX Data Preparation
Here we walk through the parallel preparation of the EX (Exposure) domain. We derive ASTDTM, AENDTM, and ADTM from the dosing start time, generate the corresponding Date and Time columns, set EVID = 1 for dosing records, and use groupby + transform to carry the first dose DateTime (FANLDTM) and the first administered dose (EXDOSE_first) across each subject’s records.
ex_prep = @chain ex begin
@rtransform :EXSTDTC = replace(:EXSTDTC, " " => "T")
make_dtm(:EXSTDTC, prefix = "AST")
@rtransform begin
:AENDTM = :ASTDTM
:ADTM = :ASTDTM
end
make_dtm_to_dt(:ASTDTM, prefix = "AST") # Output : ASTDT
make_dtm_to_dt(:AENDTM, prefix = "AEN") # Output : AENDT
make_dtm_to_dt(:ADTM, prefix = "A") # Output : ADT
make_dtm_to_tm(:ADTM, prefix = "A") # Output : ATM
@rtransform @astable begin
:NFRLT = 24 * (:VISITDY - 1)
:AVISITN = :VISITNUM
:AVISIT = "Visit " * string(Int(:AVISITN))
:EVID = 1
:DRUG = :EXTRT
end
transform(
groupby(_, [:USUBJID, :DRUG]),
:EXDOSE => (x -> minimum(skipmissing(x))) => :EXDOSE_first,
:ADTM => (x -> minimum(skipmissing(x))) => :FANLDTM,
)
@orderby :USUBJID :DRUG :ADTM :EVID
end
first(ex_prep, 5)- 1
-
minimum(skipmissing(x))is used to find the minimumADTMacrossUSUBJID, DRUGgrouping, skipping themissingvalues (likena.rm = TRUEinR). The resultant columnFANLDTMhas the minimum doseDateTimefilled across the group.
| Row | STUDYID | DOMAIN | USUBJID | EXSEQ | EXTRT | EXDOSE | EXDOSU | EXDOSFRM | EXDOSFRQ | EXROUTE | VISITNUM | VISIT | VISITDY | EXSTDTC | EXSTDY | ASTDTM | AENDTM | ADTM | ASTDT | AENDT | ADT | ATM | NFRLT | AVISITN | AVISIT | EVID | DRUG | EXDOSE_first | FANLDTM |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| String7 | String3 | String15 | Float64 | String15 | Float64 | String3 | String15 | String7 | String15 | Float64 | String15 | Float64 | String | Float64 | DateTime | DateTime | DateTime | Date | Date | Date | Time | Float64 | Float64 | String | Int64 | String15 | Float64 | DateTime | |
| 1 | DAPA01 | EX | DAPA01-001 | 1.0 | Dapagliflozin | 5.0 | mg | Injection | ONCE | Intravenous | 2.0 | Period 1 Day 1 | 1.0 | 2022-06-10T09:30:00 | 1.0 | 2022-06-10T09:30:00 | 2022-06-10T09:30:00 | 2022-06-10T09:30:00 | 2022-06-10 | 2022-06-10 | 2022-06-10 | 09:30:00 | 0.0 | 2.0 | Visit 2 | 1 | Dapagliflozin | 5.0 | 2022-06-10T09:30:00 |
| 2 | DAPA01 | EX | DAPA01-001 | 2.0 | Dapagliflozin | 5.0 | mg | Capsule | ONCE | Oral | 3.0 | Period 2 Day 1 | 8.0 | 2022-06-17T09:30:00 | 8.0 | 2022-06-17T09:30:00 | 2022-06-17T09:30:00 | 2022-06-17T09:30:00 | 2022-06-17 | 2022-06-17 | 2022-06-17 | 09:30:00 | 168.0 | 3.0 | Visit 3 | 1 | Dapagliflozin | 5.0 | 2022-06-10T09:30:00 |
| 3 | DAPA01 | EX | DAPA01-001 | 3.0 | Dapagliflozin | 10.0 | mg | Capsule | ONCE | Oral | 4.0 | Period 3 Day 1 | 15.0 | 2022-06-25T09:30:00 | 15.0 | 2022-06-25T09:30:00 | 2022-06-25T09:30:00 | 2022-06-25T09:30:00 | 2022-06-25 | 2022-06-25 | 2022-06-25 | 09:30:00 | 336.0 | 4.0 | Visit 4 | 1 | Dapagliflozin | 5.0 | 2022-06-10T09:30:00 |
| 4 | DAPA01 | EX | DAPA01-001 | 4.0 | Dapagliflozin | 25.0 | mg | Capsule | ONCE | Oral | 5.0 | Period 4 Day 1 | 22.0 | 2022-07-02T11:30:00 | 22.0 | 2022-07-02T11:30:00 | 2022-07-02T11:30:00 | 2022-07-02T11:30:00 | 2022-07-02 | 2022-07-02 | 2022-07-02 | 11:30:00 | 504.0 | 5.0 | Visit 5 | 1 | Dapagliflozin | 5.0 | 2022-06-10T09:30:00 |
| 5 | DAPA01 | EX | DAPA01-002 | 1.0 | Dapagliflozin | 5.0 | mg | Injection | ONCE | Intravenous | 2.0 | Period 1 Day 1 | 1.0 | 2022-03-12T09:08:00 | 1.0 | 2022-03-12T09:08:00 | 2022-03-12T09:08:00 | 2022-03-12T09:08:00 | 2022-03-12 | 2022-03-12 | 2022-03-12 | 09:08:00 | 0.0 | 2.0 | Visit 2 | 1 | Dapagliflozin | 5.0 | 2022-03-12T09:08:00 |
First, we derive the date variables such as ASTDTM,AENDTM,ADTM using make_dtm and ASTDT,AENDT,ADT using make_dtm_to_dt. For that the space () in the DateTime strings is replaced to T to make it DateTime convertible.
Second, we derive the nominal time variables such as NFRLT, AVISITN and AVISIT.
Lastly, we derive the FANLDTM(First Analyte Dose DateTime) and EXDOSE_first (First dosage administered).
5 EX Expansion
Dose expansion is the step where interval-style dosing records (e.g. “BID for 14 days”) are expanded into one row per administered dose. For DAPA01 no expansion is needed — the regimen is ONCE daily and each administration is already a separate EX record — so ex_prep is used directly downstream.
If your study records dosing as intervals (start and end DateTime with a frequency), this is where you would expand the EX table into individual dose rows before combining it with PC.
6 Combine PC and EX
In this section of the tutorial we bring pc_prep and ex_prep together into a single combined dataset. We first stack the two domains with vcat, apply exclusion flags for problematic subjects, then perform a self-join against the dosing records to derive reference variables (ADTM_prev, EXDOSE_prev, NFRLT_prev) that downstream relative-time calculations rely on.
6.1 Flag the combined dataset
We begin by row-binding pc_prep and ex_prep, then use set_exclusion to mark subjects that should not enter the analysis — those with missing concentrations, no dosing records, or no concentration records. Each call adds (or extends) the EXCLF (Exclusion Flag) and EXCLFCOM (Exclusion Flag Comment) columns, so excluded subjects remain in the dataset but are traceable.
adppk_flag = @chain pc_prep begin
vcat(ex_prep, cols = :union)
# Exclusion 1: Subjects with missing conc. data
set_exclusion(
"Subjects with missing conc.",
excl_func = group -> all(ismissing, group.PCSTRESN),
group = [:USUBJID, :DRUG],
)
# Exclusion 2: Subjects with no dosing data
set_exclusion(
"Subjects with no dose records",
excl_func = group -> all((==)(0), group.EVID),
group = [:USUBJID, :DRUG],
)
# Exclusion 3: Subjects with no conc. data
set_exclusion(
"Subjects with no conc. records",
excl_func = group -> all((==)(1), group.EVID),
group = [:USUBJID, :DRUG],
)
@orderby :USUBJID :DRUG :ADTM :EVID
end;| Row | USUBJID | DRUG | ADTM | EVID | EXCLF | EXCLFCOM |
|---|---|---|---|---|---|---|
| String15 | String15 | DateTime | Int64 | Int64 | Missing | |
| 1 | DAPA01-001 | Dapagliflozin | 2022-06-10T09:30:00 | 0 | 0 | missing |
| 2 | DAPA01-001 | Dapagliflozin | 2022-06-10T09:30:00 | 1 | 0 | missing |
| 3 | DAPA01-001 | Dapagliflozin | 2022-06-10T09:33:00 | 0 | 0 | missing |
| 4 | DAPA01-001 | Dapagliflozin | 2022-06-10T09:51:00 | 0 | 0 | missing |
| 5 | DAPA01-001 | Dapagliflozin | 2022-06-10T10:00:00 | 0 | 0 | missing |
set_exclusion flags rows by group, adding EXCLF (Exclusion Flag) and EXCLFCOM (Exclusion Flag Comment). It takes a reason string, an excl_func = group -> condition predicate over a grouped sub-DataFrame, and a group key (e.g. [:USUBJID, :DRUG]).
The predicates above use point-free shorthand — all(ismissing, group.PCSTRESN) instead of all(x -> ismissing(x), group.PCSTRESN), and (==)(0) instead of x -> x == 0.
6.2 Derive Reference data
Next, we look up reference values from the dosing records for every row of the combined dataset. Using join_columns against ex_prep, we pull in the most recent prior dose DateTime (ADTM_prev), the most recent prior dose amount (EXDOSE_prev), and the most recent prior nominal time (NFRLT_prev). These “previous-dose” columns become the references used in the next section to compute relative time variables.
adppk_nom_prev = @chain adppk_flag begin
join_columns(
ex_prep,
on = [:USUBJID, :DRUG],
order = [:ADTM],
keep = [:ADTM => :ADTM_prev, :EXDOSE => :EXDOSE_prev],
filter_join = (t, r) -> t.ADTM > r.ADTM,
mode = "last",
)
join_columns(
ex_prep,
on = [:USUBJID, :DRUG],
order = [:NFRLT],
keep = [:NFRLT => :NFRLT_prev],
filter_join = (t, r) -> t.NFRLT > r.NFRLT,
mode = "last",
)
@orderby :USUBJID :DRUG :ADTM :EVID
end;- 1
-
join_columnslooks up the most recent prior dosing record (fromex_prep) for each row, usingfilter_jointo keep only references strictly earlier than the current row, andmode = "last"to pick the latest qualifying match.
| Row | USUBJID | EVID | ADTM | ADTM_prev | NFRLT | NFRLT_prev |
|---|---|---|---|---|---|---|
| String15 | Int64 | DateTime | DateTime? | Float64 | Float64? | |
| 1 | DAPA01-001 | 0 | 2022-06-10T09:30:00 | missing | 0.0 | missing |
| 2 | DAPA01-001 | 1 | 2022-06-10T09:30:00 | missing | 0.0 | missing |
| 3 | DAPA01-001 | 0 | 2022-06-10T09:33:00 | 2022-06-10T09:30:00 | 0.05 | 0.0 |
| 4 | DAPA01-001 | 0 | 2022-06-10T09:51:00 | 2022-06-10T09:30:00 | 0.35 | 0.0 |
| 5 | DAPA01-001 | 0 | 2022-06-10T10:00:00 | 2022-06-10T09:30:00 | 0.5 | 0.0 |
Deriving Reference variables are an important step for deriving required time and analysis variables.
join_columnsperforms a self-join against the dosing dataset (ex_prep) based onUSUBJIDandDRUGgrouping to derive theprevvariables.For each row, the
filter_joinconditiont.<col> > r.<col>restricts matches to dose records strictly earlier than the current row, andmode = "last"selects the most recent qualifying dose, giving the previous dose’sADTM,EXDOSE, andNFRLT.
7 Derive Relative Time Variables
In this section of the tutorial we use the reference columns derived above to compute the relative time variables that any population PK analysis depends on. We fill FANLDTM and the per-subject minimum NFRLT across each USUBJID + DRUG group, then derive AFRLT (Actual Time Relative to First Dose), APRLT (Actual Time Relative to Previous Dose), and NPRLT (Nominal Time Relative to Previous Dose), handling missing values with @passmissing and clamping any negative pre-dose times to zero.
adppk_aprlt = @chain adppk_nom_prev begin
# Fill the missing values with minimum value based on groupby
transform(
groupby(_, [:USUBJID, :DRUG]),
:FANLDTM => (x -> minimum(skipmissing(x))) => :FANLDTM,
:NFRLT => (x -> minimum(skipmissing(x))) => :min_NFRLT,
:EXDOSE_first => (x -> minimum(skipmissing(x))) => :EXDOSE_first,
)
@rtransform @passmissing begin
:AFRLT = (:ADTM - DateTime(:FANLDTM)) / Hour(1)
:APRLT = (:ADTM - DateTime(:ADTM_prev)) / Hour(1)
end
@rtransform :NPRLT =
(:EVID == 1) ? 0 :
ismissing(:NFRLT_prev) ? (:NFRLT - :min_NFRLT) : (:NFRLT - :NFRLT_prev)
# Handling negative times
@rtransform @passmissing :AFRLT = :AFRLT < 0 ? 0 : :AFRLT
@rtransform @passmissing :APRLT = :APRLT < 0 ? 0 : :APRLT
@orderby :USUBJID :DRUG :ADTM :EVID
end;| Row | USUBJID | ADTM | EVID | FANLDTM | AFRLT | ADTM_prev | APRLT | NFRLT_prev | NPRLT |
|---|---|---|---|---|---|---|---|---|---|
| String15 | DateTime | Int64 | DateTime | Float64 | DateTime? | Float64? | Float64? | Real | |
| 1 | DAPA01-001 | 2022-06-10T09:30:00 | 0 | 2022-06-10T09:30:00 | 0.0 | missing | missing | missing | 0.0 |
| 2 | DAPA01-001 | 2022-06-10T09:30:00 | 1 | 2022-06-10T09:30:00 | 0.0 | missing | missing | missing | 0 |
| 3 | DAPA01-001 | 2022-06-10T09:33:00 | 0 | 2022-06-10T09:30:00 | 0.05 | 2022-06-10T09:30:00 | 0.05 | 0.0 | 0.05 |
| 4 | DAPA01-001 | 2022-06-10T09:51:00 | 0 | 2022-06-10T09:30:00 | 0.35 | 2022-06-10T09:30:00 | 0.35 | 0.0 | 0.35 |
| 5 | DAPA01-001 | 2022-06-10T10:00:00 | 0 | 2022-06-10T09:30:00 | 0.5 | 2022-06-10T09:30:00 | 0.5 | 0.0 | 0.5 |
Firstly, the FANLDTM column is filled across the combined dataset based on the minimum value, along with NFRLT which is represented in a new column min_NFRLT. Then the relative time variables are derived and their missing and negative values handled accordingly.
| Relative Time Variables | Column Label | Reference Data |
|---|---|---|
AFRLT |
Actual Time Relative to First Dose | FANLDTM |
APRLT |
Actual Time Relative to Previous Dose | ADTM_prev |
NPRLT |
Nominal Time Relative to Previous Dose | NFRLT_prev |
@passmissing macro helps to skip missing values creating a new column or modifying an existing column.
8 Derive Analysis Variables
Here we walk through the derivation of the core analysis and flag variables that the modelling step will consume. We compute the actual dose carried on each record (DOSEA), the analysis lower limit of quantification (ALLOQ), the compartment and amount columns (CMT, AMT), the below-quantification flags (BLQFL, BLQFN), the dependent variable and its log transform (DV, DVL), the missing-dependent-variable flag (MDV), and the analysis unit (AVALU).
adppk_aval = @chain adppk_aprlt begin
@orderby :USUBJID :ADTM
@rtransform @astable begin
# Derive Actual Dose
:DOSEA =
(:EVID == 1) ? :EXDOSE : ismissing(:EXDOSE_prev) ? :EXDOSE_first : :EXDOSE_prev
# Analysis Lower Limit of Quantification
:ALLOQ = :PCLLOQ
# Compartment and Amount
:CMT = (:EVID == 1) ? 1 : 2
:AMT = (:EVID == 1) ? :EXDOSE : missing
end
# Below Lower Limit of Quantification Flag
@rtransform @passmissing :BLQFL = (:PCSTRESN <= :ALLOQ) ? "Y" : "N"
@rtransform @passmissing :BLQFN = (:PCSTRESN <= :ALLOQ) ? 1 : 0
# Dependent Variable
@rtransform :DV = (:EVID == 1) ? missing : :PCSTRESN
# Log Transformed Dependent Variable
@rtransform @passmissing :DVL = (:DV > 0) ? log(:DV) : missing
# Missing Dependent Variable Result
@rtransform :MDV = (:EVID == 1) ? 1 : ismissing(:DV) ? 1 : 0
# Analysis Variable Unit
@rtransform :AVALU = (:EVID == 1) ? :EXDOSU : :PCSTRESU
@orderby :USUBJID :DRUG :ADTM :EVID
end;| Row | USUBJID | ADTM | EVID | DOSEA | ALLOQ | CMT | AMT | BLQFL | BLQFN | DV | DVL | MDV | AVALU |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| String15 | DateTime | Int64 | Float64 | Float64? | Int64 | Float64? | String? | Int64? | Float64? | Float64? | Int64 | InlineSt… | |
| 1 | DAPA01-001 | 2022-06-10T09:30:00 | 0 | 5.0 | 0.1 | 2 | missing | N | 0 | 157.021 | 5.05638 | 0 | ng/mL |
| 2 | DAPA01-001 | 2022-06-10T09:30:00 | 1 | 5.0 | missing | 1 | 5.0 | missing | missing | missing | missing | 1 | mg |
| 3 | DAPA01-001 | 2022-06-10T09:33:00 | 0 | 5.0 | 0.1 | 2 | missing | N | 0 | 141.892 | 4.95507 | 0 | ng/mL |
| 4 | DAPA01-001 | 2022-06-10T09:51:00 | 0 | 5.0 | 0.1 | 2 | missing | N | 0 | 116.228 | 4.75555 | 0 | ng/mL |
| 5 | DAPA01-001 | 2022-06-10T10:00:00 | 0 | 5.0 | 0.1 | 2 | missing | N | 0 | 109.353 | 4.69458 | 0 | ng/mL |
Important analysis and flag variables are derived under this step.
Compartment CMT depends on concentration specimen and route of administration which can vary from study to study. Derivation must be done accordingly.
9 ADPPK Dataset
In the final assembly step of the tutorial we produce the analysis-ready ADPPK dataset. We add the record and analysis sequence columns (RECSEQ, ASEQ), fill the constant dose covariates within each subject-drug group, rename columns to match the ADPPK standard, select and order the columns in a regulatory-style layout, and round the Float64 columns to three decimal places using round_columns.
adppk = @chain adppk_aval begin
@orderby :USUBJID :DRUG :ADTM :EVID
transform(eachindex => :RECSEQ)
transform(groupby(_, [:USUBJID, :DRUG]), eachindex => :ASEQ)
transform(
groupby(_, [:USUBJID, :DRUG]),
:EXROUTE => (x -> first(skipmissing(x))) => :EXROUTE,
:EXDOSFRM => (x -> first(skipmissing(x))) => :EXDOSFRM,
:EXDOSFRQ => (x -> first(skipmissing(x))) => :EXDOSFRQ,
)
# Rename
rename(:DRUG => :PROJID, :EXDOSFRQ => :DOSEFRQ, :EXROUTE => :ROUTE, :EXDOSFRM => :FORM)
# Select Columns
select(
# exclusion flags
:EXCLF,
:EXCLFCOM,
# subject level
:STUDYID,
:USUBJID,
:ASEQ,
:PROJID,
# dose details
:DOSEFRQ,
:ROUTE,
:FORM,
:DOSEA,
:AMT,
:CMT,
:EVID,
# time details
:AVISITN,
:AFRLT,
:APRLT,
:NFRLT,
:NPRLT,
:ADTM,
:ATM,
:FANLDTM,
# conc details
:DV,
:DVL,
:MDV,
:ALLOQ,
:BLQFL,
:BLQFN,
)
round_columns(3)
end
first(adppk, 5)- 1
-
Sequence columns are derived (for excluded rows, the values are
missing): - 2
- Constant dose covariates are filled across the dataset based on grouping.
- 3
-
Several columns are renamed according to
ADPPKstandards. - 4
- The required columns are selected and positioned in a specific order.
- 5
-
Float64columns are rounded across theDataFrameusing theround_columnsfunction. The number of digits to retain after the decimal point is specified (3in this case).
| Row | EXCLF | EXCLFCOM | STUDYID | USUBJID | ASEQ | PROJID | DOSEFRQ | ROUTE | FORM | DOSEA | AMT | CMT | EVID | AVISITN | AFRLT | APRLT | NFRLT | NPRLT | ADTM | ATM | FANLDTM | DV | DVL | MDV | ALLOQ | BLQFL | BLQFN |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Float64 | Missing | String7 | String15 | Float64 | String15 | String7 | String15 | String15 | Float64 | Float64? | Float64 | Float64 | Float64 | Float64 | Float64? | Float64 | Float64 | DateTime | Time | DateTime | Float64? | Float64? | Float64 | Float64? | String? | Float64? | |
| 1 | 0.0 | missing | DAPA01 | DAPA01-001 | 1.0 | Dapagliflozin | ONCE | Intravenous | Injection | 5.0 | missing | 2.0 | 0.0 | 2.0 | 0.0 | missing | 0.0 | 0.0 | 2022-06-10T09:30:00 | 09:30:00 | 2022-06-10T09:30:00 | 157.021 | 5.056 | 0.0 | 0.1 | N | 0.0 |
| 2 | 0.0 | missing | DAPA01 | DAPA01-001 | 2.0 | Dapagliflozin | ONCE | Intravenous | Injection | 5.0 | 5.0 | 1.0 | 1.0 | 2.0 | 0.0 | missing | 0.0 | 0.0 | 2022-06-10T09:30:00 | 09:30:00 | 2022-06-10T09:30:00 | missing | missing | 1.0 | missing | missing | missing |
| 3 | 0.0 | missing | DAPA01 | DAPA01-001 | 3.0 | Dapagliflozin | ONCE | Intravenous | Injection | 5.0 | missing | 2.0 | 0.0 | 2.0 | 0.05 | 0.05 | 0.05 | 0.05 | 2022-06-10T09:33:00 | 09:33:00 | 2022-06-10T09:30:00 | 141.892 | 4.955 | 0.0 | 0.1 | N | 0.0 |
| 4 | 0.0 | missing | DAPA01 | DAPA01-001 | 4.0 | Dapagliflozin | ONCE | Intravenous | Injection | 5.0 | missing | 2.0 | 0.0 | 2.0 | 0.35 | 0.35 | 0.35 | 0.35 | 2022-06-10T09:51:00 | 09:51:00 | 2022-06-10T09:30:00 | 116.228 | 4.756 | 0.0 | 0.1 | N | 0.0 |
| 5 | 0.0 | missing | DAPA01 | DAPA01-001 | 5.0 | Dapagliflozin | ONCE | Intravenous | Injection | 5.0 | missing | 2.0 | 0.0 | 2.0 | 0.5 | 0.5 | 0.5 | 0.5 | 2022-06-10T10:00:00 | 10:00:00 | 2022-06-10T09:30:00 | 109.353 | 4.695 | 0.0 | 0.1 | N | 0.0 |
10 Conclusion
This tutorial walked through a minimal end-to-end ADPPK build with ADaM.jl:
- Reading
PCandEXfrom theDAPA01study and pullingVISITDYfromEXintoPCvia a lookup table. - Deriving analysis date/time variables (
ADTM,ADT,ATM) and nominal time variables (NFRLT,AVISIT,AVISITN). - Combining
PCandEX, then flagging records withset_exclusionfor missing concentrations, missing dosing, and missing samples. - Looking up previous-dose references with
join_columnsto deriveAFRLT,APRLT, andNPRLT. - Deriving analysis variables (
DOSEA,CMT,AMT,DV,DVL,MDV,BLQFL,BLQFN) and assembling the final ADPPK dataset with sequence columns and a regulatory-style column order.
Because this study has no additional source datasets, no baseline covariates were added. In a typical analysis you would extend this template by merging DM, VS, LB, and other SDTM domains to derive demographic and time-varying covariates, then re-running the downstream exclusion and analysis-variable steps.