Prof. Vinod, Stat I, Descriptive Statistics Original Unclassified Data 34 98 57 32 65 86 xbar or mean= 62 ---------------------------------------------- i Xi-xbar (Xi-xbar)^2 ---------------------------------------------- 1 -28 784 2 36 1296 3 -5 25 4 -30 900 5 3 9 6 24 576 ---------------------------------------------- Check sum of deviations from the mean=0 0 Sum of squared deviations= 3590 Denominator for sample variance= n-1 = 5 Unclassified sample variance~std dev 718 26.7955 Sum of absolute deviations= 126 Mean absolute deviation=mad= 21 Coefficient of Variation=100*(std dev)/xbar 43.2186 Sorted data 32 34 57 65 86 98 The Range= Xmax- Xmin 66 xbar or mean= 62 Median= 61 Now Skewness from Unclassified mean and median Since mean>median, underlying frequency distribution is skewed to the right! Now compute 10% trimmed mean k~intk~frack~n' 0.6 0 0.6 4.8 y[nlo]~(1-frack) 32 0.4 y[nlo+1:nup]'~((1-frack)*y[nlo])~(1-frack)*y[nup+1] 34 57 65 86 12.8 39.2 61.25 Now 13% trimmed mean k~intk~frack~n' 0.78 0 0.78 4.44 y[nlo]~(1-frack) 32 0.22 y[nlo+1:nup]'~((1-frack)*y[nlo])~(1-frack)*y[nup+1] 34 57 65 86 7.04 21.56 60.9459 Now report the percentiles for a Notched Box Plot: 5% = 32 10%= 32 Q1=25%= 34 Mi=50%= 61 Q3=75%= 86 90%= 98 95%= 98 IQR= Interquartile range= 52 Outlier detection Limits (Q1-1.5*IQR)~(Q3+1.5*IQR) -44 164 No outlier on the left side is present No outlier on the right side is present Number of classes in which to classify the data is given to be: 2 Width of ultimate class intervals shuld be at least 33 Chosen Lower limit of 1st class interval and width = 30 35 First class interval's midpoint= M1= 47.5 Lower limits: 30 65 Total frequency=Summation of fj= 6 ------------------------------------------------------------------- j Low Up Mj fj Mj*fj ------------------------------------------------------------------- 1 30 65 47.5 3 142.5 2 65 100 82.5 3 247.5 ------------------------------------------------------------------- Total frequency=Summation of fj column = 6 Sum of last column and total freq. 390 6 Mean from classified data= 65 Compare above to the earlier computed mean from unclassified data= 62 Max of frequencies= 3 Mode for classified data is the midpoint of the class interval. containing the largest number of frequencies 47.5 or 82.5 is suggested, actually there is NO MODE Now begin computation of the variance for classified data. First we need to extend the above table with 3 more columns: ----------------------------------------------------- j (Mj-xbar) (Mj-xbar)^2 fj*(Mj-xbar)^2 ----------------------------------------------------- 1 -17.5 306.25 918.75 2 17.5 306.25 918.75 ----------------------------------------------------- Sums 0 612.5 1837.5 ----------------------------------------------------- Denominator for sample variance= n-1 = 5 sample variance for classified data= 367.5 Classified data std dev= 19.1703 For comparison, the unclassified data std dev. above was= 26.7955 Find the classified data median graphically, plot the Less-Than-Ogive. Also plot More-Than-Ogive and find the point of intersection to get Median. A dummy class is one without any frequency, designed to plot freq. polygon. Need one dummy classes at the beginning and another at the end of classes. dlibrary -d;