Author: H. D. Vinod,
Dates: Noted in the software itself
All The code on this page is provided gratis without any guarantees or warrantees.
Part A has GAUSS software code and Part B has some math typing tricks in MS-Word
Proprietary modifications of this code are not permitted.
Please make appropriate attribution if you use the code in a research project.
PART A: R code
#it
is a good idea to clean out old objects from R memory and record the date
#__________________________ Cut here ____________________________
objects()
# these objects are already in memory
rm(list=ls())
#this cleans them
ls()
#this lists what is left
options(prompt="R>")
#this changes the prompt
print(paste("Following
executed on", date()))
#__________________________ Cut here ____________________________
# I
have written the following function to get outliers automatically
#First
copy and paste all lines of the following “function” in R
get.outliers
= function(x) { #this left curly brace
begins function
#function
to compute the number of outliers automatically
#author
H. D. Vinod, Fordham university,
#revised
# input
a column vector of values,
#
output: various quantities used in outlier
detection
# such as interquartile range, limits
and
# xnew= revised vector after outliers
are deleted
xnew=x #initialize the xnew
found after removal of outliers
su=summary(x)
if
(ncol(as.matrix(x))>1) {print("Error: input to get.outliers function
has 2 or more columns")
return(0)}
iqr=su[5]-su[2]#inter
quartile range
dn=su[2]-1.5*iqr #dn denotes lower limit for outlier
detection
up=su[5]+1.5*iqr
LO=x[x<dn]#vector
of values below the lower limit
nLO=length(LO)
UP=x[x>up]
nUP=length(UP)
print(c("
Q1-1.5*(inter quartile range)=",
as.vector(dn),"number
of outliers below it are=",as.vector(nLO)),quote=F)
or=1:length(x)
if
(nLO>0){
print(c("Actual
values below the lower limit are:", LO),quote=F)
print(c(“sequence number of outlier(s) for possible deletion are:”,
or[x<dn]),quote=F)
} #this right curly brace ends the if
statement
print(c("
Q3+1.5*(inter quartile range)=",
as.vector(up),"
number of outliers above it are=",as.vector(nUP)),quote=F)
if (nUP>0){
print(c("Actual
values above the upper limit are:", UP),quote=F)
print(c(“sequence number(s) of outlier(s) for possible deletion
are:”, or[x>up]),quote=F)
xnew=x[-c(or[x<dn],or[x>up])]#the minus means remove those
observations
} #this
right curly brace ends the if statement above
#now
outputs from the function are ready for extraction
# with
the use of the dollar symbol and are listed as follows
list(below=LO,nLO=nLO,above=UP,nUP=nUP,low.lim=dn,up.lim=up,
xnew=xnew)} #this right curly brace ends the function formally
#TEST
Example x=c(1,-4,3,4,5,55)
#xx=get.outliers(x)
#xx$xnew
extracts xnew=revised x without outliers
#xx$be
extracts actual values below the lower outlier limit and so on
# the “$b”
is an abbreviation for “$below”
# b
alone works since nothing else in the “list” has b at the start
# = = =
= = = function ends here = = = = = =
# WARNING on xnew for regression! It will not work!
# If
you are removing outliers in a regression be sure to remove
# the complete
matched set of observations for all
variables.
# e.g.,
if fifth observation is outlier in y but not in x or z and
# if
lm(y~x+z) is used, remove fifth observation from x, y and z
# This
will have to be done manually rather than by
using xnew above
# xnew
works only if the model has only one variable
#now assuming x, y and z are already in memory, type
xx=get.outliers(x)
xx=get.outliers(y)
xx=get.outliers(z)
#__________________________ Cut here ____________________________
summary2=function(x)
#object
is to also provide greater digits in mean and sd and info about length
{xx=as.matrix(x)
#print("Means")
#print(apply(xx,2,mean,
na.rm=T))
print("standard
deviations")
print(apply(xx,2,sd,
na.rm=T))
print("Lengths")
print(apply(xx,2,length))
sumx=summary(x)
return(sumx)
}
#summary2(x)
#_____________________Cut
here ____________________________
get.skewkurt
= function(x)
{
#object
compute third and fourth powers of deviations from mean
#INPUT x =data
#
OUTPUT
# sum3= sum of cubes of deviations from
the mean
# sum4= sum of fourth powers of
deviations from the mean
# devfromm=vector of deviations from the
mean
#new
variance is (1+a)^2 times var(x)
#new
range is (1+a) times old range
max(x)-min(x)
xb=mean(x)
n=length(x)
devfromm=rep(1,n)
sum3=0
sum4=0
i=1
while
(i<=n) {
devfromm[i]=x[i]-xb
sum3=sum3+(x[i]-xb)^3
sum4=sum4+(x[i]-xb)^4
i=i+1 }
list(sum3=sum3,
sum4=sum4, devfromm=devfromm)
}
#_____________________Cut
here ____________________________
sort.matrix
=function(x,j)
{
#sort
matrix x by column j
# and
carry along the remaining columns
#author
H. D. Vinod, June 14, 2006.
y=0
dd=dim(x)
if
(is.numeric(dd[1])){
#print("Error
in sort.matrix function")
oo=order(x[,j])
fn=function
(x,oo) {y=x[oo]; return(y)}
y=apply(x,2,fn,oo=order(x[,j]))
}
return(y) }
#example
#x=round(matrix(rnorm(12),4,3),2)
#sort.matrix(x,2)
#_____________________Cut
here ____________________________
cen.moments
= function(x)
#object compute 4 sample central moments and
cumulants
{
n=length(x)
m=mean(x)
m2=sum((x-m)^2)/(n-1) #WARNING dividing by n-1 not n here
m3=sum((x-m)^3)/n
m4=sum((x-m)^4)/n
k1=m
k2=m2
k3=m3
k4=m4-3*(m2^2)
list(m2=m2,
m3=m3, m4=m4, k1=k1, k2=k2, k3=k3, k4=k4)
}
#_____________________Cut
here ____________________________
PART B: GAUSS code
1) A code for testing the numerical accuracy of any software, written in GAUSS software.
http://www.american.edu/academic.depts/cas/econ/gavussres/utilitys/utilitys.htm
This link has useful gauss procedures for computing accurate mean and variance.
2) Following simple proc helps in reshaping the data without giving number of rows.
@The
following test program should be run to understand what it does.
Note
that since 8 is not divisible by 3, it ignores the last two
data
points if you want to reshape into 3 columns.
of
course, reshape is typically used for getting large data from
ascii
files, not for data typed in the way it is shown below.
@
new;
x={1,
2, 3, 4, 5, 6, 7, 8};
y=reshape2(x,2);
y;
y=reshape2(x,3);
y;
proc
(1)=reshape2(x,ncol);
@Author:
H. D. Vinod,
proc
returns the reshaped matrix with correct number of rows
@
local
n,n1;
clear
n1;
n=rows(x);"number
of rows before reshaping= " n;
n1=floor(n/ncol);
"
(number of rows before reshaping)/(no of columns) " n1;
retp(reshape(x,n1,ncol));
endp;
PART C:
Some
Great Tricks for Math Typing in MS Word
If you want to type mathematical functions in MS Word without much
difficulty, use the autocorrect in the Tools menu.
The following file has many preset ideas. For example, \a gives alpha
/app= gives approximately equal and numerous other useful symbols.
the attached file called normal.dot can be downloaded. Use this to
replace your normal.dot file typically in the location
C:\Program Files\Microsoft Office\Templates
or
C:\Documents and Settings\user\Application Data\Microsoft\Office\Recent
Microsoft keeps changing this, but you can find it!
Be careful though. Keep a backup copy before replacing.
It may not work for your configuration. It has worked for many of my
graduate students.