How to Identify and Describe the Impact of Influential Outliers
Before you analyze your data, it is very important that you examine the data for the presence of outlying values.
Delete Observations with Implausible Values.
There are 10,080 minutes in each week. Censor the data by deleting study participants who have minutes of weekly activity that exceed this number. In the PAQMSTR dataset we created a variable that combines minutes of household/yard, transportation, and leisure-time weekly activity to describe total moderate-to-vigorous minutes of physical activity per week (TOTMINW).
Sample Code
 data =paq; 
tables   TOTMINW*SEQN/ list ; 
where   WTINT4CD >  0  and RIDAGEYR >=  16 ; 
where   TOTMINW >  10080 ; 
 ; 
  paq; 
set   paq; 
if   TOTMINW >  10080 then  delete ; 
 ;  
Check for Outliers among Plausible Data by Running a Univariate Analysis
Use the PROC UNIVARIATE procedure to get all default descriptive statistics such as mean, minimum and maximum values, standard deviation, and skewness. Use the VAR statement to identify the variable of interest (PAG_MINW). Use the ID statement to list the sequence numbers associated with extreme values in the output.
Sample Code
 data =paq   normal plot ; 
     var   TOTMINW; 
     where   WTINT4CD >  0  and RIDAGEYR >=  16 ; 
     id   seqn;  
     title 'Distribution of TOTMINW among study participants aged 16 and  older' ; 
 ; 
Output of Program
Download program output [PDF - 196 KB]
Plot Sample Weight against the Distribution of the Variable
Use the PROC GPLOT procedure to plot total minutes of moderate-to-vigorous activity per week (TOTMINW) by the corresponding sample weight for each observation in the dataset. Set 7,560 minutes per week as the maximum reasonable volume of weekly activity based on a maximum of 18 hours per day considering that study participants are sleep for a minimum of 6 hours each night.
Sample Code
symbol1 value = square height = .5 ;
 data   = paq; 
     plot   WTINT4CD*TOTMINW/ href  =  7560 frame ; 
 ;