Mean/Average Values in R
Part of Mike's Big Data, Data Mining, and Analytics Tutorial
The mean or average of a set of data values is defined as the sum of the values divided by the count of values.
\[ \bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} \]
There are a few guidelines to using the mean:
- The mean is a measure of center for data that is measured on a continuous scale (Review data classification here)
- The mean is not appropriate for ordinal/nominal scale data. Using the mean leads to meaningless statements of the form:
- “The average gender in the world is somewhere between male and female (1.2)”
- “The average satisfaction was 2.3”
#Get 10 random integer values uniformly distributed between 1 and 20
x<-round(runif(10,1,20))
#sort and display the values
x<-x[order(x)]
x
## [1] 1 3 3 10 11 14 15 16 19 19
These values can be summarized as frequencies of individual values (frequency referring to the number [count] of times each individual value appears in the set):table(x)
## x
## 1 3 10 11 14 15 16 19
## 1 2 1 1 1 1 1 2
This table of values can be visualized in a histogram (a bar chart that shows the relative frequency of each value or a summarization within ranges of values [called bins]). In the chart below, the red line is drawn at the mean of the values:The chart below shows the same information, but using R’s default binning/summarization algorithm:
The mean of this set is:
\[ \bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} \]
\[ \bar{x} = \frac{1 + 3 + 3+ 10+ 11+ 14+ 15+ 16+ 19+ 19}{10} \]
\[ \bar{x} = \frac{111}{10} \]
\[ \bar{x} = 11.1 \]
In R, it is easy to find the average of a set of numbers using the built-in mean function:
mean(x)
## [1] 11.1
It is also possible to write (mostly) equivalent, but less efficient functions that compute the mean/average in R:average<-function(x) {
sum(x)/length(x)
}
average(x)
## [1] 11.1
Or even worse performance-wise, but demonstrating the mechanics of the for loop…average<-function(x) {
sum_x<-0
count_x<-0
for (i in 1:length(x)) {
sum_x<-sum_x+x[i]
count_x<-count_x + 1
}
sum_x/count_x
}
average(x)
## [1] 11.1
There’s really not a good reason in most cases to write your own function that calculates the mean, but you may find a special reason in doing so…Back to Mike's Big Data, Data Mining, and Analytics Tutorial
No comments:
Post a Comment