This lesson starts with an example showing how to use R code to draw a box plot and calculate the 5 number summary followed by an activity where the student produces their own.
To understand how to visualise the shape of data using box plots
To produce box plots and the related five number summary
To interpret box plots
tidyverse
astsa
box plot, minimum, maximum, lower quartile, median, upper quartile, five number summary, inter-quartile range, range
Knit your R markdown document frequently. Knit for the final time when you have completed the lesson. This final output file will be the one your teacher marks.
Write your code in the code chunks.
You can run your R code chunk in the Markdown file by clicking on the little arrow on the right of the chunk.
A box plot is a way to show the shape or distribution of a set of data. It shows five useful features of the data, known as the five number summary.
Minimum - the smallest value Maximum - the largest value
Median - the middle or 50% value
Lower quartile - the value half way between the minimum and the median or 25% value
Upper quartile - the value half way between the median and the maximum or 75% value
The difference between the upper and lower quartile is known as the inter-quartile range (IQR)
Zaynab keeps a record of her journey times to school each morning. The times are to the nearest minute:
29,21,16,25,21,19,18,30,21,21,12,26,19,21,20,19,30,29,16,21,18,18,27,18,20
Let us draw a box plot and obtain the five number summary and the inter-quartile range (IQR). We will do this in three steps.
The R code in chunk1 tells R to store all the times into a variable called y and then to use these times to draw a box plot of the data.
To help us read values from the box plot it is useful to have gridlines on the plot.
Here is a simple trick to get gridlines on the plot.
The code in chunk2 tells R to use the numbers it store in variable y to draw a box plot and then to add grid lines.
Click on the little arrow on the right of the code chunk. The box plot will be drawn below the chunk.
We can also get R to calculate the five number summary for us.
# we can calculate the five number summary like this
summary(y) # five number summary but also provides the mean
Min. 1st Qu. Median Mean 3rd Qu. Max.
12.0 18.0 21.0 21.4 25.0 30.0
IQR(y) # inter-quartile range
[1] 7
Produce a box plot, the five number summary and IQR of the following data
Aisha records the length of time spent by each member of her class on their last English homework assignment.
The times are recorded to the nearest minute.
85,124,55,140,120,61,95,105,118,180,55,78,130,112,70,126,60,90,115,60,142,100,105,65,100,75
Use the R code in chunk1 to help you draw the box plot. Remember to replace the example data with the new dataset.
Now use the code in chunk2 to add grid lines to your box plot
Now calculate the five number summary using code in chunk3
Below are the percentage exam marks out of 100 for Maths and English.
Maths (%): 98,79,51,54,62,61,56,87,70,60,93,51,52,54,68
English (%): 37,50,58,45,93,47,47,45,38,61,65,46,97,99,54
Produce two box plots side by side
maths <- c(98,79,51,54,62,61,56,87,70,60,93,51,52,54,68)
english <- c(37,50,58,45,93,47,47,45,38,61,65,46,97,99,54)
boxplot(maths, english, xlab="Marks%", names=c("Maths", "English"), horizontal=T)
Grid()
boxplot(maths, english, xlab="Marks%", names=c("Maths", "English"), horizontal=T, add=T)
summary(maths)
Min. 1st Qu. Median Mean 3rd Qu. Max.
51.0 54.0 61.0 66.4 74.5 98.0
summary(english)
Min. 1st Qu. Median Mean 3rd Qu. Max.
37.0 45.5 50.0 58.8 63.0 99.0
IQR(maths)
[1] 20.5
IQR(english)
[1] 17.5
Write down the five number summaries below
What are the inter-quartile ranges
Below are the heights of boys and girls of a similar age.
Boys (cm): 181,157,159,179,186,159,178,162,137,184,140,173,176
Girls (cm): 172,151,176,159,139,179,178,162,134,166,164,172,170
Produce two box plots side by side with a grid and answer the following questions.
Remember to replace the example data with the new information (boys, girls, heights, data)
Questions
Are the statements below True or False - and explain why
The girls are taller on average
Half the girls are over 165 cm tall
The girls show less spread in height
The boys show less spread in height
The shortest person is a girl
The tallest person is a boy
Half the boys are over 172 cm tall
Half the girls are under 165cm tall
KNIT YOUR DOCUMENT for the final time. This will be the version your teacher will mark
License and Citation: You can use, modify, and adapt any of the lessons, but please include the following attribution: RGirls Community. (2022, April 10). RGirls Lessons. Zenodo. https://doi.org/10.5281/zenodo.6436861