The Coin Experiment: Appropriate for learning of experimental design.

by Joe Kunkel, Biology Department, UMass Amherst, 01003-5810

As interpreted by Joe Kunkel from a course in Project-Based Instruction [UMass EDUC 795A] given by Kathy Davis, Fall 2004.

Introduction
An experiment was designed to see how many drops of water will fit on a coin without spilling over. This gives the student an opportunity to learn experimental design and data analysis.
Materials and Methods
Coins (quarters, nickels, dimes and pennies) were placed on dry paper toweling. A disposable plastic dropper with a slender tip was used to apply two different solutions, one without detergent and another set with detergent of varying amounts. The data was saved in a table as drops in the initial field of each record with fields for coin-type, head-tails-side, detergent-or-not, research-group (See DataSet). Each group contributed data on each coin type, heads and tails, using both the no-detergent and one detergent solution. Data analysis was done using the native libraries of analytic functions of R, a free-software package and computational environment suitable for simple or complex calculations on most popular computer platforms and operating systems (i.e. PCs, Macs, Windows, Unix, Linux).

Download DataSet.

Analysis of the data

The only quantitative data collected in this approach to the experiment was numbers of drops of fluid that fit on the surface of each coin face. All other data are defined as factors: coin type, coin face, detergent, group. We can use the library functions of the R computational environment to do the analysis of the data. Run the R software program on your computer. Change the working directory to that which houses the data set in the text file named 'CoinData.csv'. Cut and paste the following text into the R-console workspace:

# boxed set 1 of R commands:
# Text preceded by a pound sign is for comments and is not processed in R
dat<-data.frame(read.csv("CoinData.csv"))   # Read the data.
saved<-par(mfrow=c(1,3))           # Set up to plot three plots per row in the graphics window.
attach(dat)                        # Make the data in the data frame callable by their names.
boxplot(drops ~ detergent)         # Do a whisker plot of detergent effects on drops.
boxplot(drops ~ coin)              # Do a whisker plot of coin effects on drops.
boxplot(drops ~ coin + detergent)  # Do a whisker plot of detergent and coin effects on drops.
par(saved)                         # Reset the plotting environment

This latter boxed set of R commands reads the data file and plots it in three different ways. We can make the plots more readable by resizing the graph window to span the screen width and also by re-phrasing the plot commands to label the plots and make the axis labels more prominent, Fig 1.

# boxed set 2 of R commands:
boxplot(drops ~ coin + detergent, main="Effect of coin-type and detergent on drops held",
         cex.main = 2)   # Notice that commands can be completed on subsequent lines

http://www.bio.umass.edu/biology/kunkel/pub/pics/coindeterg.png
Fig 1. Boxplot of coin-type and detergent on drops-per-coin, illustrating specifying the 'main'
title text and doubling the default font size using the 'cex.main' parameter.

While graphical output is intuitive and may prove ones point visually, scientists are devoted to testing hypotheses using statistical tests. We can do this with the R computing environment. We will ask several statistical questions using library functions in R. We do this using the same equation-like grammar that we used with boxplot. Now however we use the lm() function for doing linear regression.

# boxed set 3 of R commands:
lm(formula = drops ~ coin + detergent)
#Output:
#Coefficients:
# (Intercept)       coin Nic       coin Pen     coin Qua   detergent  Yes
#    47.771          30.750         7.083        42.083       -23.542

Further, the output of the lm() function can be used by the anova() function to produce an Analysis of Variance (ANOVA) Table (boxed set 4).

# boxed set 4 of R commands:
anova(lm(drops ~ coin + detergent))

# Output:
# Analysis of Variance Table

# Response: drops
#           Df  Sum Sq Mean Sq F value    Pr(>F)
# coin       3 14040.9  4680.3  12.594 4.829e-06 ***
# detergent  1  6650.5  6650.5  17.896 0.0001199 ***
# Residuals 43 15979.6   371.6
# ---
# Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

This ANOVA Table presents the statistics (F-test) that conclude that there was a very highly significant (***) 'coin-type' effect on drops held by a coin. In addition there is a very highly significant 'detergent' effect on drops held by a coin.

Design Issues
The reason that this analysis went so smoothly was partly due to design. Research group data sets (A, C and D) were included because they were complete data sets with no missing values. Every coin-type was tested on both heads and tails both with detergent and without. The inclusion of groups with incomplete data is possible but not with the simple commands used here. Studies with missing data are more difficult to analyze and the conclusions must be taken with reservations that are consistent with degree of missing data. Part of the approach to creating the data set involved replacing data cells that had more than one observation with the median of that cell. Perhaps a more honest approach to that data selection would be to have replaced those datums by a random process.
Other hypotheses
Other factors in the data set might also be tested. The difference between research-groups might depend upon the exact techniques and methods they used in their loading of drops onto the coins. The difference between heads and tails might depend on individual coin architecture. Both of these factors can be tested using the current data set. The leaders of the Coin Experiment Project revealed at the end of the wet-lab work that the groups A-D had detergent concentrations of 1:2:4:8 which means that the three groups included here (A, C, D) had 1:4:8 ratio of detergent. This design element was confounded with the research group effect, partially controlled by the fact that each group used a detergent free treatment to compare with the detergent effect. The data fields of this data set could be supplemented with data we know about these coins such as their diameters and metal types, each of which might be additional factors that could be tested for significance. The R calculation environment is capable of dealing with each of these analytical questions as long as the designed data collection spanned that question without confounding it with some other factor. The user of R will gain more confidence with each additional data set analyzed however it should be realized that entire college level courses are given in experimental design.

Enjoy and spread the word of R!


JoeKunk

Addendum

Analysis of all the factors included in the data set can be done but its discussion would perhaps be laborious. Here are the results without comment.

lm(formula = drops ~ coin + detergent + group+ HT)

#Output:
#Call:
#lm(formula = drops ~ coin + detergent + group + HT)

#Coefficients:
#  (Intercept)      coin Nic    coin Pen    coin Qua    detergent Yes   group C   group D      HT t
#    43.896          30.750      7.083       42.083        -23.542       1.437     2.250       5.292

anova(lm(formula = drops ~ coin + detergent + group+ HT))
#Output
#Analysis of Variance Table

#Response: drops
#          Df  Sum Sq Mean Sq F value    Pr(>F)
#coin       3 14040.9  4680.3 11.9992  9.66e-06 ***
#detergent  1  6650.5  6650.5 17.0504 0.0001798 ***
#group      2    41.5    20.8  0.0533 0.9482084
#HT         1   336.0   336.0  0.8615 0.3588917
#Residuals 40 15602.0   390.0
#---
#Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1