In this task, you will check for outliers and their potential impact using the following steps:
|
Before you analyze your data, examine the distribution and normality of the data, and identify outlying values.
Â
proc univariate data =Phthalate normal plot ; Â Â Â var URXMHP; Â Â Â id seqn; run ; |
Â
Example: Plot the phthalate subsample weight (WTSPH6YR) against the values of urinary mono-(2-ethyl)-hexyl phthalate to identify any outliers.
Â
/******************************************************************************** * Use the PROC GPLOT procedure to plot urinary mono-(2-ethyl)-hexyl phthalate  * * (URXMHP) by the corresponding weights for each observation in the dataset.   * * Symbol and height are option statements used to format the output of the plot * ********************************************************************************/ symbol1 value =dot height = .2; proc gplot data =Phthalate;    plot WTSPH6YR*URXMHP/ frame ; run ; |
Â
In this step you will:
For this example, assume that four observations may be outliers.
Â
/******************************************************************************* * Use the IF, THEN, and DELETE statements to remove the identified outliers.  * * Use the PROC MEANS procedure to produce means and standard error for the   * * dataset with and without outlier values.                                    * ********************************************************************************/ data Exclu4SPs;    set Phthalate;    if seqn in ( 3140,11249,14737,24817) then delete ;    proc means data =Phthalate mean stderr maxdec = 1;    title 'Without exclusion' ;    var URXMHP;    class RIAGENDR;    weight WTSPH6YR;  proc means data =Exclu4SPs mean stderr maxdec = 1;    title 'After removing 4 outlier values' ;    var URXMHP;    class RIAGENDR;    weight WTSPH6YR; run ; |
Â
Â
Â