Visualizing Clustered Data in R

Peter

5 years ago

Step 1: Load the vegan package

library(vegan)

Step 2: Create example data with four clusters

set.seed(531) # set seed for reproducibility

df = data.frame(
  X1=rnorm(50, mean=0, sd=1), Y1=rnorm(50, mean=2, sd=1), 
  X2=rnorm(50, mean=3, sd=1), Y2=rnorm(50, mean=0, sd=1),
  X3=rnorm(50, mean=6, sd=1), Y3=rnorm(50, mean=3, sd=1),
  X4=rnorm(50, mean=8, sd=2), Y4=rnorm(50, mean=-1, sd=1))

# reorder data for ordihull function
data = cbind(
  c(df$X1, df$X2, df$X3, df$X4), 
  c(df$Y1, df$Y2, df$Y3, df$Y4))

# create data label variable for ordihull function
grouping = cbind(rep(1,50),rep(2,50),rep(3,50),rep(4,50))

Step 3: Initial plot of the data clusters (color-coded)

plot(df$X1, df$Y1, pch=20, col="cornflowerblue", xlim=c(-4,12), ylim=c(-4,5), xlab="", ylab="", xaxt='n', yaxt='n')
points(df$X2, df$Y2, pch=20, col="darkseagreen")
points(df$X3, df$Y3, pch=20, col="darkgoldenrod")
points(df$X4, df$Y4, pch=20, col="deeppink")

Plot of the data. Color-coding helps identify the four data clusters.

Step 4: Use ordihull function to make data clusters stand out

ordihull(data, groups=grouping, label=TRUE)

Using the ordihull function makes the data clusters even more apparent.

There are additional ordi plot functions that can achieve similar results, depending on your visualization needs and tastes. Here is an example with the ordispider function:

ordispider(data, groups=grouping, label=TRUE)

The ordispider function results in a different look, but still makes the data clusters easily identifiable.

It’s worth giving the ordi functions in the vegan package a thorough look if you’re performing unsupervised learning and need to visualize data clusters. The help manual for the vegan package can be found here.

Full code

library(vegan) # load package

set.seed(531) # set seed for reproducibility

df = data.frame(
  X1=rnorm(50, mean=0, sd=1), Y1=rnorm(50, mean=2, sd=1), 
  X2=rnorm(50, mean=3, sd=1), Y2=rnorm(50, mean=0, sd=1),
  X3=rnorm(50, mean=6, sd=1), Y3=rnorm(50, mean=3, sd=1),
  X4=rnorm(50, mean=8, sd=2), Y4=rnorm(50, mean=-1, sd=1))

# reorder data for ordihull function
data = cbind(
  c(df$X1, df$X2, df$X3, df$X4), 
  c(df$Y1, df$Y2, df$Y3, df$Y4))

# create data label variable for ordihull function
grouping = cbind(rep(1,50),rep(2,50),rep(3,50),rep(4,50))

plot(df$X1, df$Y1, pch=20, col="cornflowerblue", xlim=c(-4,12), ylim=c(-4,5), xlab="", ylab="", xaxt='n', yaxt='n')
points(df$X2, df$Y2, pch=20, col="darkseagreen")
points(df$X3, df$Y3, pch=20, col="darkgoldenrod")
points(df$X4, df$Y4, pch=20, col="deeppink")

ordihull(data, groups=grouping, label=TRUE)

Did the ordihull function help you visualize your data clusters? What other function and packages do you use? Let me know in the comments below!

Check out more at ProjectsByPeter.com/Projects

Step 1: Load the vegan package

Step 2: Create example data with four clusters

Step 3: Initial plot of the data clusters (color-coded)

Step 4: Use ordihull function to make data clusters stand out

Full code

Share this: