## 2D DENSITY PLOT

A 2d density plot is useful to study the relationship between 2 numeric variables if you have a huge number of points. To avoid overlapping as in the scatterplot besideit divides the plot area in a multitude of small fragment and represents the number of points in this fragment. There are several types of 2d density plots. Each has its proper ggplot2 function. This post describes all of them. For 2d histogram, the plot area is divided in a multitude of squares. It is a 2d version of the classic histogram. This function offers a bins argument that controls the number of bins you want to display. This function provides the bins argument as well, to control the number of division per axis. As you can plot a density chart instead of a histogramit is possible to compute a 2d density and represent it. Several possibilities are offered by ggplot2 : you can show the contour of the distribution, or the area, or use the raster function:. Whatever you use a 2d histogram, a hexbin chart or a 2d distribution, you can and should custom the colour of your chart. You can see other methods in the ggplot2 section of the gallery. This document is a work by Yan Holtz. Any feedback is highly encouraged. You can fill an issue on Githubdrop me a message on Twitteror send an email pasting yan. Related chart types. Contact This document is a work by Yan Holtz. Github Twitter.

## Definition This chapter of the tutorial will give a brief introduction to some of the tools in seaborn for examining univariate and bivariate distributions. You may also want to look at the categorical plots chapter for examples of functions that make it easy to compare the distribution of a variable across levels of other variables. The most convenient way to take a quick look at a univariate distribution in seaborn is the distplot function. By default, this will draw a histogram and fit a kernel density estimate KDE. Histograms are likely familiar, and a hist function already exists in matplotlib. A histogram represents the distribution of data by forming bins along the range of the data and then drawing bars to show the number of observations that fall in each bin. You can make the rug plot itself with the rugplot function, but it is also available in distplot :. When drawing histograms, the main choice you have is the number of bins to use and where to place them. The kernel density estimate may be less familiar, but it can be a useful tool for plotting the shape of a distribution. Like the histogram, the KDE plots encode the density of observations on one axis with height along the other axis:. Drawing a KDE is more computationally involved than drawing a histogram. What happens is that each observation is first replaced with a normal Gaussian curve centered at that value:. Next, these curves are summed to compute the value of the density at each point in the support grid. The resulting curve is then normalized so that the area under it is equal to We can see that if we use the kdeplot function in seaborn, we get the same curve. This function is used by distplotbut it provides a more direct interface with easier access to other options when you just want the density estimate:. The bandwidth bw parameter of the KDE controls how tightly the estimation is fit to the data, much like the bin size in a histogram. It corresponds to the width of the kernels we plotted above. The default behavior tries to guess a good value using a common reference rule, but it may be helpful to try larger or smaller values:. As you can see above, the nature of the Gaussian KDE process means that estimation extends past the largest and smallest values in the dataset. You can also use distplot to fit a parametric distribution to a dataset and visually evaluate how closely it corresponds to the observed data:. It can also be useful to visualize a bivariate distribution of two variables. The easiest way to do this in seaborn is to just use the jointplot function, which creates a multi-panel figure that shows both the bivariate or joint relationship between two variables along with the univariate or marginal distribution of each on separate axes. The most familiar way to visualize a bivariate distribution is a scatterplot, where each observation is shown with point at the x and y values. This is analogous to a rug plot on two dimensions. You can draw a scatterplot with scatterplotand it is also the default kind of plot shown by the jointplot function:. This plot works best with relatively large datasets. It looks best with a white background:. It is also possible to use the kernel density estimation procedure described above to visualize a bivariate distribution. In seaborn, this kind of plot is shown with a contour plot and is available as a style in jointplot :. You can also draw a two-dimensional kernel density plot with the kdeplot function. This allows you to draw this kind of plot onto a specific and possibly already existing matplotlib axes, whereas the jointplot function manages its own figure:. If you wish to show the bivariate density more continuously, you can simply increase the number of contour levels:.

## Simple example of 2D density plots in python This post will show you how to:. For fitting the gaussian kernel, we specify a meshgrid which will use points interpolation on each axis e. The matplotlib object doing the entire magic is called QuadContour set cset in the code. We can programatically access the contour lines by iterating through allsegs object. The calculated labels are accessible from labelTexts. We can plot the density as a surface:. Representation using 2D histograms. Another way to present the same information is by using 2D histograms. The entire code is available on Github. Sign in. Simple example of 2D density plots in python. How to visualize joint distributions. Madalina Ciortan Follow. Towards Data Science A Medium publication sharing concepts, ideas, and codes. Computer science engineer, bioinformatician, researcher in data science. Towards Data Science Follow. A Medium publication sharing concepts, ideas, and codes. Write the first response. More From Medium. More from Towards Data Science. Rhea Moutafis in Towards Data Science. Emmett Boudreau in Towards Data Science. Discover Medium. Make Medium yours. Become a member. About Help Legal.

## Redirecting… Different types of 2d density chart. This is the two dimension version of the classic histogram. The plot area is split in a multitude of small squares, the number of points in each square is represented by its color. Learn how to customize the color and the bin size of your 2d histogram. Very similar to the 2d histogram above, but the plot area is split in a multitude of hexagons instead of squares. Learn how to customize the color and the bin size of your hexbin chart. Like it is possible to plot a density chart instead of a histogram to represent a distribution, it is possible to make a 2d density plot. Several variations are available using ggplot2 :. Build a hexbin chart with the hexbin package and color it with RColorBrewer. Color and bin size Learn how to customize the color and the bin size of your 2d histogram. Most basic Most basic, default parameters. Color and bin size Learn how to customize the color and the bin size of your hexbin chart. Contour plot. Use the raster geom. Hexbin package Build a hexbin chart with the hexbin package and color it with RColorBrewer. Scatter on top of 2d distribution Add a scatterplot on top of the ggplot2 2d density chart. Related chart types.