12 June 2012

Data Cleaning

Clean your data
Screening process
 • Detect errors
    ~ Missing data
    ~ Outliers
• Make sure data meets assumptions for analysis
    ~ Normality  

Two Types of Screening
1. Preliminary data screening
    ~ Screen one variable at a time on the entire data set before any analysis
    ~ Today’s focus
2. In conjunction with statistical analysis
    ~ Dependent on analysis being performed  

Steps
1. Check for missing data
2. Check for normality
3. Remove outliers
4. Check for normality again
5. Transform data  

Keep in mind:
• Do this with each dependent variable before analyzing data
• Keep transformations consistent across all dependent variables
• Although transformed data looks pretty, it can be difficult to interpret
• Run your analysis with transformed data and without the transformation and compare the results

Great resource: Mickey, R. M., Dunn, O. J., and Clark V. A. (2004). Applied statistics: analysis of variance and regression, 3rd Edition. John Wiley & Sons, Inc. Chapter 1: Data Screening