## Doing plots in R is easy, and there are countless types of graphs you can do in base R, Lattice and other packages like ggplot2. One of the most used plot is a simple Histogram. Histogram ared use to plot the number of times a element in a column has a certain value. It is a great way to first visualise a single column of data. In order to use the histogram plot the values must be numeric.

Lets do a simple plot of data

```
numbers<-c(1,2,2,3,3,3,4,4,4,4,5,5,5,5,5,6,6,6,6,6,6)
hist(numbers)
```

Well, that is a little funky. The histgrom is technically correct, but what has happened here is the program determined the breaks which resulted in the integers 1 and 2 being put together in the same bin. The way to fix this is to add the break term to the Histgram Plot.

`hist(numbers,breaks=c(0,1,2,3,4,5,6))`

The issue here is you really need to know what you have. A better method may be to create a lot of bins (here I use the length of the numbers vector).

`hist(numbers,breaks=length(numbers))`

Typically the historgram is used to determine if the data you have visually matches a certian distribution like normal, t, poisson, gamma, etc. In order to do that we need to add another element to the histogram plot and add a plot of the actual curve of the distrbution to our histogram. In the call for the histogram we need to add the term freq=FALSE. This will convert the frequencies of the Histogram from pure counts to ratios that add up to 1. Lets see what this looks like with a sample of 10,000 random normal values

```
dist=rnorm(10000)
hist(dist,freq=FALSE)
curve(dnorm,add=TRUE)
```

So that is really it from a functionality standpoint. Everything else is really to dress up the plot with colors and labels etc.

```
hist(dist,breaks=100,freq=FALSE,col="blue",border="dark blue",main="Histogram",xlab="randomly generated normal data")
curve(dnorm,add=TRUE,col="yellow",lwd=4)
```

With these things you should be able to do histograms in R.

## No comments:

## Post a Comment