Capstone Exercise: Applying Lessons in a Mini-Project
Notes and instructions
On Github Desktop
- Cloned R-cafe repository
- Created your own branch on the repository
- Checkout your own branch
You are expected to save your answer under R-cafe/capstone_exercises/capstone.R
It is expected that the exercise answers will vary, and any result that fulfills the objectives will be acceptable.
The samples are provided in samples
folder to illustrate what might be included in the results; you may add other elements or make changes based on your experience and needs.
A. Data and Files
This project uses real clinical trial data from OUCRU. The data includes:
- Main Dataset:
- File:
2-10-2020-_03TS_V1_Data.xls
- Contains most information. Variable explanations are in the statistical plan file (
03TS analysis plan V1.5 November 2020 Accept changes.docx
).
- File:
- Treatment Allocation:
- File:
03TS_Randlist.xlsx
- Includes treatment arm allocations and a
dictionary
sheet explaining the four arms.
- File:
- Violations, Exclusions, Withdrawals:
- File:
Protocol violations, exclusions, withdrawals.xlsx
- Any ids listed in the files will be excluded accordingly.
- File:
- Adverse events:
- File:
AE.SAE DATA SHEET.xls
- Includes adverse events information.
- File:
- Statistical Plan:
- File:
03TS analysis plan V1.5 November 2020 Accept changes.docx
- Provides guidance on specific variables. Read this for more details to fulfill your objectives.
- File:
B. Objectives
I. Data Preparation
Prepare data for the following populations:
IT-ITT, IM_ITT, IT_PP, IM_PP, IM_ALL.
Use
03TS_Randlist.xlsx
for allocation of the first 272 participants.Use
Protocol violations, exclusions, withdrawals.xlsx
for filtering data per population.
II. Table Creation
1. Baseline Characteristics:
Summarize as follows:
- Numeric Data: Report median, 1st/3rd quartiles, lowest, and highest values.
- Categorical Data: Report count (n) and percentage (%).
- No formal statistical comparisons between study arms.
Characteristics to summarize:
- Patient Details:
- Sex, age, BMI (BMI = ADM.WEIGHT / (ADM.HEIGHT)^2).
- Past Medical History:
- Variables like ADM.HYPERTENSION, ADM.MYOCARDIALINFART, ADM.SEVERELIVER, etc.
- Patient History:
- Duration of illness, incubation period, respiratory rate, platelet count, etc.
- Specific Severity Scores:
- Tetanus Severity, SOFA, APACHE II (calculated from variables).
2. Adverse Events (AE):
Summarize proportions of individuals with AEs, categorized by grade (I-IV) and relation to treatment.
- Exclude specific events (e.g., “Nasogastric tube”, “Tracheostomy”).
- Use chi-square or Fisher’s exact test for comparisons.
- Severe adverse events are summarized separately for each population.
III. Plots
1. Create plots for:
- Pipecuronium:
- Total dose during hospital stay (ventilated patients).
- Duration of use (ventilated patients).
- Diazepam:
- Total dose during hospital stay.
- Midazolam:
- Total dose during hospital stay.
- Benzodiazepines:
- Total dose as diazepam equivalent.
- Total duration of use.
- ggplot2 is a powerful package for creating plots, offering many functions that support various features. For example, it can create violin plots (
geom_violin
), boxplots (geom_boxplot
), histograms (geom_histogram
with thebinwidth
argument to adjust the bins), scatter plots (geom_point
), and usegeom_segment
to display the mean and confidence intervals. - gridExtra is a package used to arrange multiple plots in a single figure. Try using
grid.arrange
, or you may opt for another package that serves the same function.
2. Create line plot for:
Create a line plot of the daily maximum temperature for each patient during the first 7 days after admission. Use one colour for each treatment arm and make the mean temperature for each day bold for each patient. Sample is not provided for this exercise.
- Compute Mean: Use
dplyr::group_by()
andsummarize()
to calculate the mean temperature for each day and treatment arm. - Plot Individual Lines: Use
geom_line()
withaes(group = usubjid)
for individual patient data. - Add Bold Line for Mean: Use a second
geom_line()
with the computed mean data and adjust the size parameter for a bold line.