Monday, June 1, 2015

The Sample Mode

The Sample Mode

Part of Mike's Big Data, Data Mining, and Analytics Tutorial  

The sample mode is a statistic that reflects which value occurs most frequently in the sample. It is a suitable measure of center for nominal data. It can also be used on higher level data (ordinal and continuous).
Given the following 20 values generated between 1 and 3:
#Get 20 random integer values uniformly distributed between 1 and 3
x<-round(runif(20,1,3))
#sort and display the values 
x<-x[order(x)]
x
##  [1] 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3
These values can be summarized as frequencies of individual values (frequency referring to the number [count] of times each individual value appears in the set):
x_freq<-table(x)
x_freq
## x
##  1  2  3 
##  4 13  3
The mean can be determined by finding which table values are the highest:
names(x_freq)[which(x_freq == max(x_freq))]
## [1] "2"
Assessed graphically, the mode is the tallest bar:

R does not have a built in function to find the mode; however it is easy using the combination of table and which
names(x_freq)[which(x_freq == max(x_freq))]
## [1] "2"
 

Back to Mike's Big Data, Data Mining, and Analytics Tutorial  

 

No comments:

Post a Comment