# Visualizing Clustered Data in R

If you are performing unsupervised learning for exploratory data analysis, you’ll probably need to visualize the subgroups – or clusters – within the data. Color-coding the clusters is common, but here I’ll show you how to use the vegan package in R to make those clusters really stand out.

Full code is at the end of the post in case you want to skip the step-by-step tutorial.

#### Step 1: Load the vegan package

``library(vegan)``

#### Step 2: Create example data with four clusters

``````set.seed(531) # set seed for reproducibility

df = data.frame(
X1=rnorm(50, mean=0, sd=1), Y1=rnorm(50, mean=2, sd=1),
X2=rnorm(50, mean=3, sd=1), Y2=rnorm(50, mean=0, sd=1),
X3=rnorm(50, mean=6, sd=1), Y3=rnorm(50, mean=3, sd=1),
X4=rnorm(50, mean=8, sd=2), Y4=rnorm(50, mean=-1, sd=1))

# reorder data for ordihull function
data = cbind(
c(df\$X1, df\$X2, df\$X3, df\$X4),
c(df\$Y1, df\$Y2, df\$Y3, df\$Y4))

# create data label variable for ordihull function
grouping = cbind(rep(1,50),rep(2,50),rep(3,50),rep(4,50))``````

#### Step 3: Initial plot of the data clusters (color-coded)

``````plot(df\$X1, df\$Y1, pch=20, col="cornflowerblue", xlim=c(-4,12), ylim=c(-4,5), xlab="", ylab="", xaxt='n', yaxt='n')
points(df\$X2, df\$Y2, pch=20, col="darkseagreen")
points(df\$X3, df\$Y3, pch=20, col="darkgoldenrod")
points(df\$X4, df\$Y4, pch=20, col="deeppink")``````

#### Step 4: Use ordihull function to make data clusters stand out

``ordihull(data, groups=grouping, label=TRUE)``

There are additional ordi plot functions that can achieve similar results, depending on your visualization needs and tastes. Here is an example with the ordispider function:

``ordispider(data, groups=grouping, label=TRUE)`` The ordispider function results in a different look, but still makes the data clusters easily identifiable.

It’s worth giving the ordi functions in the vegan package a thorough look if you’re performing unsupervised learning and need to visualize data clusters. The help manual for the vegan package can be found here.

#### Full code

``````library(vegan) # load package

set.seed(531) # set seed for reproducibility

df = data.frame(
X1=rnorm(50, mean=0, sd=1), Y1=rnorm(50, mean=2, sd=1),
X2=rnorm(50, mean=3, sd=1), Y2=rnorm(50, mean=0, sd=1),
X3=rnorm(50, mean=6, sd=1), Y3=rnorm(50, mean=3, sd=1),
X4=rnorm(50, mean=8, sd=2), Y4=rnorm(50, mean=-1, sd=1))

# reorder data for ordihull function
data = cbind(
c(df\$X1, df\$X2, df\$X3, df\$X4),
c(df\$Y1, df\$Y2, df\$Y3, df\$Y4))

# create data label variable for ordihull function
grouping = cbind(rep(1,50),rep(2,50),rep(3,50),rep(4,50))

plot(df\$X1, df\$Y1, pch=20, col="cornflowerblue", xlim=c(-4,12), ylim=c(-4,5), xlab="", ylab="", xaxt='n', yaxt='n')
points(df\$X2, df\$Y2, pch=20, col="darkseagreen")
points(df\$X3, df\$Y3, pch=20, col="darkgoldenrod")
points(df\$X4, df\$Y4, pch=20, col="deeppink")

ordihull(data, groups=grouping, label=TRUE)``````

Did the ordihull function help you visualize your data clusters? What other function and packages do you use? Let me know in the comments below!

Check out more at ProjectsByPeter.com/Projects