Author: H. D. Vinod, Fordham University, New York

Dates: Noted in the software itself

 

All The code on this page is provided gratis without any guarantees or warrantees.

Part A has GAUSS software code and Part B has some math typing tricks in MS-Word

 

Proprietary modifications of this code are not permitted.

Please make appropriate attribution if you use the code in a research project.

 

PART A:  R code

#it is a good idea to clean out old objects from R memory and record the date

 

#__________________________ Cut here ____________________________

 

objects() # these objects are already in memory

rm(list=ls()) #this cleans them

ls() #this lists what is left

options(prompt="R>") #this changes the prompt

print(paste("Following executed on", date()))

 

 

 

#__________________________ Cut here ____________________________

 

# I have written the following function to get outliers automatically

#First copy and paste all lines of the following “function” in R

get.outliers = function(x) {  #this left curly brace begins function

#function to compute the number of outliers automatically

#author H. D. Vinod, Fordham university, New York, 24 March, 2006

#revised April 16, 2006

# input a column vector of values,

# output:  various quantities used in outlier detection

#         such as interquartile range, limits and

#         xnew= revised vector after outliers are deleted

xnew=x  #initialize the xnew found after removal of outliers

su=summary(x)

if (ncol(as.matrix(x))>1) {print("Error: input to get.outliers function has 2 or more columns")

return(0)}

iqr=su[5]-su[2]#inter quartile range

dn=su[2]-1.5*iqr  #dn denotes lower limit for outlier detection

up=su[5]+1.5*iqr

LO=x[x<dn]#vector of values below the lower limit

nLO=length(LO)

UP=x[x>up]

nUP=length(UP)

print(c(" Q1-1.5*(inter quartile range)=",

as.vector(dn),"number of outliers below it are=",as.vector(nLO)),quote=F)

or=1:length(x)

if (nLO>0){

print(c("Actual values below the lower limit are:", LO),quote=F)

print(c(“sequence number of outlier(s) for possible deletion are:”, or[x<dn]),quote=F)

}  #this right curly brace ends the if statement

 

print(c(" Q3+1.5*(inter quartile range)=",

as.vector(up)," number of outliers above it are=",as.vector(nUP)),quote=F)

 

if (nUP>0){

print(c("Actual values above the upper limit are:", UP),quote=F)

print(c(“sequence number(s) of outlier(s) for possible deletion are:”, or[x>up]),quote=F)

xnew=x[-c(or[x<dn],or[x>up])]#the minus means remove those observations

} #this right curly brace ends the if statement above

#now outputs from the function are ready for extraction

# with the use of the dollar symbol and are listed as follows

list(below=LO,nLO=nLO,above=UP,nUP=nUP,low.lim=dn,up.lim=up, xnew=xnew)} #this right curly brace ends the function formally

#TEST Example x=c(1,-4,3,4,5,55)

#xx=get.outliers(x)

#xx$xnew extracts xnew=revised x without outliers

#xx$be extracts actual values below the lower outlier limit and so on

#  the “$b”  is an abbreviation for “$below”

# b alone works since nothing else in the “list” has b at the start

# = = = = = = function ends here = = = = = =

 

#  WARNING on xnew for regression!  It will not work!

# If you are removing outliers in a regression be sure to remove

# the complete matched set of observations for all variables. 

# e.g., if fifth observation is outlier in y but not in x or z and

# if lm(y~x+z) is used, remove fifth observation from x, y and z

# This will have to be done manually rather than by using xnew above

# xnew works only if the model has only one variable

 

#now assuming x, y and z are already in memory, type

xx=get.outliers(x)

xx=get.outliers(y)

xx=get.outliers(z)

 

#__________________________ Cut here ____________________________

 

summary2=function(x)

#object is to also provide greater digits in mean and sd and info about length

{xx=as.matrix(x)

#print("Means")

#print(apply(xx,2,mean, na.rm=T))

 

print("standard deviations")

print(apply(xx,2,sd, na.rm=T))

 

print("Lengths")

print(apply(xx,2,length))

 

sumx=summary(x)

return(sumx)

}

#summary2(x)

 

#_____________________Cut here ____________________________

get.skewkurt = function(x)

{

#object compute third and fourth powers of deviations from mean

#INPUT  x =data

# OUTPUT 

#       sum3= sum of cubes of deviations from the mean

#       sum4= sum of fourth powers of deviations from the mean

#       devfromm=vector of deviations from the mean

#new variance is (1+a)^2 times var(x)

#new range is (1+a) times old range  max(x)-min(x)

xb=mean(x)

n=length(x)

devfromm=rep(1,n)

sum3=0

sum4=0

i=1

while (i<=n) {

devfromm[i]=x[i]-xb

sum3=sum3+(x[i]-xb)^3

sum4=sum4+(x[i]-xb)^4

i=i+1  }

list(sum3=sum3, sum4=sum4, devfromm=devfromm)

}

 

#_____________________Cut here ____________________________

sort.matrix =function(x,j)

{

#sort matrix x by column j

# and carry along the remaining columns

#author H. D. Vinod, June 14, 2006.

y=0

dd=dim(x)

if (is.numeric(dd[1])){

#print("Error in sort.matrix function")

oo=order(x[,j])

fn=function (x,oo) {y=x[oo]; return(y)}

y=apply(x,2,fn,oo=order(x[,j]))

}

return(y)  }

#example

#x=round(matrix(rnorm(12),4,3),2)

#sort.matrix(x,2)

 

#_____________________Cut here ____________________________

cen.moments = function(x)

#object  compute 4 sample central moments and cumulants

{ n=length(x)

m=mean(x)

m2=sum((x-m)^2)/(n-1)   #WARNING dividing by n-1 not n here

m3=sum((x-m)^3)/n

m4=sum((x-m)^4)/n

k1=m

k2=m2

k3=m3

k4=m4-3*(m2^2)

list(m2=m2, m3=m3, m4=m4, k1=k1, k2=k2, k3=k3, k4=k4)

}

 

#_____________________Cut here ____________________________

 

 

 

PART B:  GAUSS code

 

1) A code for testing the numerical accuracy of any software, written in GAUSS software.

 

http://www.american.edu/academic.depts/cas/econ/gavussres/utilitys/utilitys.htm

 

This link has useful gauss procedures for computing accurate mean and variance.

 

2) Following simple proc helps in reshaping the data without giving number of rows.

@The following test program should be run to understand what it does.

Note that since 8 is not divisible by 3, it ignores the last two

data points if you want to reshape into 3 columns.

of course, reshape is typically used for getting large data from

ascii files, not for data typed in the way it is shown below.

@

new;

x={1, 2, 3, 4, 5, 6, 7, 8};

y=reshape2(x,2);

y;

y=reshape2(x,3);

y;

 

proc (1)=reshape2(x,ncol);

@Author: H. D. Vinod, May 2, 1983.

proc returns the reshaped matrix with correct number of rows

@

local n,n1;

clear n1;

n=rows(x);"number of rows before reshaping= " n;

n1=floor(n/ncol);

" (number of rows before reshaping)/(no of columns) " n1;

retp(reshape(x,n1,ncol));

endp;

 

 

PART C:

Some Great Tricks for Math Typing in MS Word

If you want to type mathematical functions in MS Word without much

difficulty, use the autocorrect in the Tools menu.

The following file has many preset ideas.  For example, \a  gives alpha

/app=  gives approximately equal  and numerous other useful symbols.

the attached file called normal.dot can be downloaded. Use this to

replace your normal.dot file typically in the location

C:\Program Files\Microsoft Office\Templates

or

C:\Documents and Settings\user\Application Data\Microsoft\Office\Recent

 

Microsoft keeps changing this, but you can find it!

Be careful though. Keep a backup copy before replacing.

It may not work for your configuration.  It has worked for many of my

graduate students.