Day 1 - Take home exercise

Dataset

Data covid_cases.rds under R-cafe/day1/data/covid_cases.rds is used for this exercise.

Data dictionary

Column	Description
`date`	date of the case report
211 columns with the format `cases_{country_code}`	number of reported cases of each country and archipelago

Country codes of interest (i.e. Codes for countries needed for the exercise)

Code	Country name
`chn`	China
`usa`	United States of America
`vnm`	Vietnam
`sgp`	Singapore

Preparation

Before working on this exercise, you should have completed the following checklist

Checklist

On Github Desktop

Cloned R-cafe repository
Created your own branch on the repository
Checkout your own branch

On RStudio

Make an R project from cloned R-cafe directory
Have renv setup

Answer script

You are expected to save your answer under R-cafe/day1/takehome.R

Exercises

Task 1: Data import

Read data from R-cafe/day1/data/covid_cases.rds

Task 2: Simple computations using dataset

Compute the following

The earliest and latest date of data report in the dataset and assign them to 2 variables first_report_date, last_report_date
Create new column case_global for covid_cases, which represents the total cases across every country per report day.
Create new column percent_chn in covid_cases, which represents the percentage of global cases that China’s cases account for per report day.

Tip: compute sum of each row

Use function rowSums() to compute sum of case columns row wise

Example

example_table <- data.frame(
  a = c(1:3), b = c(4:6)
)
# show example_table
example_table

# see output of rowSums
rowSums(example_table)

[1] 5 7 9

Tip: create new column/ update column values

To create/update a column, use the syntax data[["colname"]] <- values_for_col where length of values_for_col must be equal to number of rows

Example

example_table <- data.frame(
  a = c(1:3), b = c(4:6)
)

# create new column new_col
example_table[["new_col"]] <- c(7,9,11)
example_table

Task 3: Create a function

Complete the following:

Create a function compute_percent that
- takes the dataset and the country code (i.e. the 3 characters after cases_ in column name) as input.
- return the percentage of the global cases that the given country’s cases account for per report day.
Use that function to create 3 new columns for covid_cases: percent_vnm, percent_usa, percent_sgp
Print the final covid_cases selecting only the following columns date, percent_chn, percent_vnm, percent_usa, percent_sgp

Tip: joining strings

Use function paste() to join multiple strings together

Example

# joining 2 strings
paste("cases_", "chn", sep = "")

[1] "cases_chn"

# joining 3 strings
paste("hello", "world,", "welcome to R-cafe", sep = " ")

[1] "hello world, welcome to R-cafe"

Task 3.5 (Optional) Create a function from given code

Create a function that returns the plot for number of cases reported for a country over time, where users can

Select the case column being plotted
Change the country name displayed on the plot’s title
Change the color of the bar chart and line chart
Set the min and max date (x-axis limit) for the plot

Given that the code to plot case reports for China is as followed

# Uncomment and run the following if you have not installed tidyverse or ggplot2
# install.packages("tidyverse")
library(ggplot2)
plot_col <- "cases_chn"
country <- "China"

ggplot() +
  geom_col( # layer for bar chart
    aes( # define columns for x, y axis
      x = covid_cases$date, # this is equivalent to covid_cases[["date"]]
      y = covid_cases[[plot_col]] 
      ), 
    fill = "cornflowerblue" # choose color for bar chart
  ) +
  geom_line( # layer for line chart
    aes( # define columns for x, y axis
      x = covid_cases$date, 
      y = covid_cases[[plot_col]] 
      ), 
    color = "red" # choose color for line chart
  ) +
  labs(
    y = "Cases",
    x = "Date",
    title = paste0("Reported Covid cases for ", country) # define title for the plot
  )

Task 4: Generate data summary

Complete the following:

Use the function skim() from package skimr to generate summary for the following countries: China, Vietnam, USA, Singapore
Describe notable observations from each country from the result of skim()

Task 5: Peer review

Complete the following:

Push your solution to Github with the commit message "day 1 take home solution your_name"
Create an issue commenting on at least 1 other participant’s solution