Wednesday, July 29, 2015

R & Python Module 1b

So in Module 1a we added color to the last plot and there is definitely more information that is visible but our brains are scrambled just trying to unjumble that whole mess.  I think it is time to unlock and unleash the power the package ggplot2 has to offer for graphics.  It is a little more cumbersome and involved, but the more you use it the easier it becomes. The new and improved ggplot() command comes with the ggplot2 packages and since we already loaded it to access the diamonds dataset, we don’t need to use the library() command to evoke it.  Let me forewarn you, the ggplot() command is rather elaborate and may feel difficult to learn for some people (I was categorized as one). So don’t feel bad if you get a lot of errors.  Be perseverant and research what the errors mean online because – and I hope this is a little consolation to assuage your anguish – you are not alone in your frustration!   People discuss and troubleshoot these exact same errors on websites like Stack Overflow.  Learn to use these websites to your advantage and you will start to become more autonomously proficient.  Alright, let’s delve in and get messy with some new code!
ggplot()+geom_point(data = diamonds, aes(x = carat, y = price, color = clarity))
 
 
 
 
This scatterplot definitely is an upgrade from the basic plot() command but it still can use some improvement.  However, I want you to take notice how we no longer have to use the $ symbol to direct the computer where to look in the diamonds dataset due to the bit where we said data = diamonds.  Also, when we add the color aspect to the aes(), or the aesthetics part of the code, it automatically generated a legend describing what each color represents.  The basic plot() command lacks in this way leaving us in the dark as how to figure out the meaning of each color.  Moreover, the alpha command helps by adding a level of transparency to the datapoints allowing us to see through them to the ones behind them.   Next, I want to draw your attention to the plus sign, +, in the code.  The ggplot() command is built up by using this notion of layers.  These plots can be basic like the one we just created, or they can have way more depth to them if we add more layers.  All we have to do is add (with the use of the plus sign, +) extra layers to print on top of the previous layer(s).  You may have noticed that I neglected to add a title or anything to the graph, and that’s because titles and axis-labels are extra layers that we may or may not elect to add to the graph.  Let’s generate a better graph by adding more layers
ggplot()+geom_point(data=diamonds,aes(x=carat,y=price, color=clarity, alpha=0.3))+
  ggtitle("Carat Vs Price")+xlab("Carat Size of Diamond")+ylab("Price of Diamond")
 
Wow!  That’s really neat!  I merely used the same code as before, but all I did this time was add a few extra layers and it added a title and labels to the axes.  The layers are fairly self-explanatory: ggtitle() adds a title to the scatterplot; xlab() and ylab() add labels to the respective axes.  The only problem is I cannot read the titles or labels very well.  They are rather unproportional in comparison to the whole body of the graph.  I feel like I need a magnifying glass to be able to read them!  All we have to do is add another layer and play around with the sizes until we feel satisfied with what we see.  This aspect of data science is the art part of being a data scientist and may take some getting used to it because creativity and art have been divorced from the math and sciences since we were in grade school.  After playing around with the code for a while, I felt happy with the aesthetics.
ggplot()+geom_point(data=diamonds,aes(x=carat,y=price, color=clarity, alpha=0.3))+
  ggtitle("Carat Vs Price")+xlab("Carat Size of Diamond")+ylab("Price of Diamond")+
  theme(plot.title = element_text(size=30), 
        axis.title.x = element_text(size=22),
        axis.title.y = element_text(size=22),
        legend.text = element_text(size=15))
 
Ahhhhh, good!  Now I can put off going to visit the eye doctor for another year or two!  There are a lot of ways that a person can tweak a graph using ggplot2 package.  For instance, I could have used bold or italics on any of the titles or labels as well as changed the color or position.  All of this can be done with in the theme() layer.  Now keep in mind, we will barely scratch the surface of everything that can be done within the ggplot2 package.  For more info on all the different graph-type commands as well as the different layers of ggplot() I find the following website to be very helpful: http://docs.ggplot2.org/current/ .  Next, let’s do something about that color.  It is kind of hard to tell with so many different colors from the clarity attribute but if you have a keen eye for detail then you may have noticed that there are linear striations of color amongst the points.  This is a bit of the story that wants to be told, but it is not 100% readily available to the naked eye when plotted in this manner.  So let’s add another layer to really make it pop and definitely be noticeable.
ggplot()+geom_point(data=diamonds, aes(x=carat, y=price, color=clarity, alpha=0.3))+
  ggtitle("Carat Vs Price")+ xlab("Carat Size of Diamond") + ylab("Price of Diamond")+
  theme(plot.title = element_text(size=30),
        axis.title.x = element_text(size=22),
        axis.title.y = element_text(size=22),
        legend.text = element_text(size=15))+
  scale_colour_brewer(name="Clarity", type="seq", palette="Blues")
 
At this point, I must emphasize a word of caution: changing the color in ggplot2 may not be a wise thing to do for beginners in R!  The creator of ggplot2, Hadley Wickham, implemented many years of research pertaining to the psychology of color into this package, so it already uses colorblind friendly hues that work for everyone as well as colors that are generally appealing to the eye.  However, for any reader who feels like a moderate to advanced R user may search the previous link for the keyword “brewer” and it will show you different links that are helpful.  With that warning aside, now I want you to notice that in the scale_colour_brewer() layer I chose the type to be “seq” which is short for sequential.  This way the computer will know to choose a monochromatic sequence of whichever color we choose.  Here I just happened to choose blue because it looked the most to be like an actual diamond – which is yet another example of data science having an artistic component to it.  This graph looks really great!  The labels are easy to read and the colors are in monochromatic sequential order making it easier to see the different layers, but there is one more problem to fix: the white points are difficult to see on top of the grey background.  Well, that’s an easy fix!  Let’s add one last layer and everything will look better than great.
ggplot()+geom_point(data=diamonds, aes(x=carat, y=price, color=clarity, alpha=0.3))+
  ggtitle("Carat Vs Price")+ xlab("Carat Size of Diamond") + ylab("Price of Diamond")+
  theme(plot.title = element_text(size=30),
        axis.title.x = element_text(size=22),
        axis.title.y = element_text(size=22),
        legend.text = element_text(size=15),
        panel.background = element_rect(fill="black"))+
  scale_colour_brewer(name = "Clarity", type="seq", palette="Blues")
 

ggplot()+geom_point(data=diamonds, aes(x=carat, y=price, color=clarity, alpha=0.3))+
  ggtitle("Carat Vs Price")+ xlab("Carat Size of Diamond") + ylab("Price of Diamond")+
  theme(plot.title = element_text(size=30),
        axis.title.x = element_text(size=22),
        axis.title.y = element_text(size=22),
        legend.text = element_text(size=15),
        panel.background = element_rect(fill="black"))+
  scale_colour_brewer(name = "Clarity", type="div", palette="Blues")


HOLY COW!!!! Look at that!  That’s one really sweet looking graph!  And we technically didn’t even add an extra layer.  We merely added another command inside the theme() layer.  The black simply contrasts with the white points and allows us to see that extra striation without straining our eyes.  So that’s basically it.  Most people would probably be satisfied with leaving it in the rainbow color but I feel that the sequential color palette makes it more visible. 





No comments:

Post a Comment