So in Module 1a we added color to the last plot and there is definitely more information
that is visible but our brains are scrambled just trying to unjumble that whole
mess.
I think it is time to unlock and
unleash the power the package
ggplot2
has to offer for graphics.
It is a
little more cumbersome and involved, but the more you use it the easier it
becomes. The new and improved
ggplot()
command comes with the
ggplot2
packages and since we already loaded it to access the
diamonds dataset, we don’t need to use the
library() command to evoke it.
Let me forewarn you, the
ggplot()
command is rather elaborate and may feel difficult to learn for some people (I
was categorized as one). So don’t feel bad if you get a lot of errors.
Be perseverant and research what the errors
mean online because – and I hope this is a little consolation to assuage your
anguish – you are not alone in your frustration!
People
discuss and troubleshoot these exact same errors on websites like Stack
Overflow.
Learn to use these websites to
your advantage and you will start to become more autonomously proficient.
Alright, let’s delve in and get messy with
some new code!
ggplot()+geom_point(data = diamonds, aes(x = carat, y = price, color = clarity))
This scatterplot definitely is an upgrade from the basic
plot() command but it still can use some
improvement.
However, I want you to take
notice how we no longer have to use the $ symbol to direct the computer where to
look in the
diamonds dataset due to
the bit where we said
data = diamonds.
Also, when we add the color aspect to the
aes(), or the aesthetics part of the
code, it automatically generated a legend describing what each color
represents.
The basic
plot() command lacks in this way leaving
us in the dark as how to figure out the meaning of each color.
Moreover, the
alpha command helps by adding a level of transparency to the
datapoints allowing us to see through them to the ones behind them.
Next, I want to draw your attention to the
plus sign, +, in the code.
The
ggplot() command is built up by using
this notion of layers.
These plots can
be basic like the one we just created, or they can have way more depth to them if
we add more layers.
All we have to do is
add (with the use of the plus sign, +) extra layers to print on top of the
previous layer(s).
You may have noticed
that I neglected to add a title or anything to the graph, and that’s because
titles and axis-labels are extra layers that we may or may not elect to add to
the graph.
Let’s generate a better graph
by adding more layers
ggplot()+geom_point(data=diamonds,aes(x=carat,y=price, color=clarity, alpha=0.3))+
ggtitle("Carat Vs Price")+xlab("Carat Size of Diamond")+ylab("Price of Diamond")
Wow!
That’s really
neat!
I merely used the same code as
before, but all I did this time was add a few extra layers and it added a title
and labels to the axes.
The layers are
fairly self-explanatory:
ggtitle()
adds a title to the scatterplot;
xlab()
and
ylab() add labels to the
respective axes.
The only problem is I
cannot read the titles or labels very well.
They are rather unproportional in comparison to the whole body of the
graph.
I feel like I need a magnifying
glass to be able to read them!
All we
have to do is add another layer and play around with the sizes until we feel
satisfied with what we see.
This aspect
of data science is the art part of being a data scientist and may take some
getting used to it because creativity and art have been divorced from the math
and sciences since we were in grade school.
After playing around with the code for a while, I felt happy with the
aesthetics.
ggplot()+geom_point(data=diamonds,aes(x=carat,y=price, color=clarity, alpha=0.3))+
ggtitle("Carat Vs Price")+xlab("Carat Size of Diamond")+ylab("Price of Diamond")+
theme(plot.title = element_text(size=30),
axis.title.x = element_text(size=22),
axis.title.y = element_text(size=22),
legend.text = element_text(size=15))
Ahhhhh, good!
Now I
can put off going to visit the eye doctor for another year or two!
There are a lot of ways that a person can
tweak a graph using
ggplot2
package.
For instance, I could have used
bold or
italics on any of the titles or labels as well as changed the color or
position.
All of this can be done with
in the
theme() layer.
Now keep in mind, we will barely scratch the
surface of everything that can be done within the
ggplot2 package.
For more
info on all the different graph-type commands as well as the different layers
of
ggplot() I find the following
website to be very helpful:
http://docs.ggplot2.org/current/
.
Next, let’s do something about that
color.
It is kind of hard to tell with
so many different colors from the
clarity
attribute but if you have a keen eye for detail then you may have noticed that
there are linear striations of color amongst the points.
This is a bit of the story that wants to be
told, but it is not 100% readily available to the naked eye when plotted in
this manner.
So let’s add another layer
to really make it pop and definitely be noticeable.
ggplot()+geom_point(data=diamonds, aes(x=carat, y=price, color=clarity, alpha=0.3))+
ggtitle("Carat Vs Price")+ xlab("Carat Size of Diamond") + ylab("Price of Diamond")+
theme(plot.title = element_text(size=30),
axis.title.x = element_text(size=22),
axis.title.y = element_text(size=22),
legend.text = element_text(size=15))+
scale_colour_brewer(name="Clarity", type="seq", palette="Blues")
At this point, I must emphasize a word of caution: changing
the color in
ggplot2 may not be a
wise thing to do for beginners in R!
The
creator of
ggplot2, Hadley Wickham,
implemented many years of research pertaining to the psychology of color into
this package, so it already uses colorblind friendly hues that work for
everyone as well as colors that are generally appealing to the eye.
However, for any reader who feels like a moderate
to advanced R user may search the previous link for the keyword “brewer” and it
will show you different links that are helpful.
With that warning aside, now I want you to notice that in the
scale_colour_brewer() layer I chose the
type to be “seq” which is short for sequential.
This way the computer will know to choose a monochromatic sequence of
whichever color we choose.
Here I just
happened to choose blue because it looked the most to be like an actual diamond
– which is yet another example of data science having an artistic component to
it.
This graph looks really great!
The labels are easy to read and the colors
are in monochromatic sequential order making it easier to see the different
layers, but there is one more problem to fix: the white points are difficult to
see on top of the grey background.
Well,
that’s an easy fix!
Let’s add one last
layer and everything will look better than great.
ggplot()+geom_point(data=diamonds, aes(x=carat, y=price, color=clarity, alpha=0.3))+
ggtitle("Carat Vs Price")+ xlab("Carat Size of Diamond") + ylab("Price of Diamond")+
theme(plot.title = element_text(size=30),
axis.title.x = element_text(size=22),
axis.title.y = element_text(size=22),
legend.text = element_text(size=15),
panel.background = element_rect(fill="black"))+
scale_colour_brewer(name = "Clarity", type="seq", palette="Blues")
ggplot()+geom_point(data=diamonds,
aes(x=carat, y=price, color=clarity, alpha=0.3))+
ggtitle("Carat Vs Price")+
xlab("Carat Size of Diamond") + ylab("Price of Diamond")+
theme(plot.title = element_text(size=30),
axis.title.x = element_text(size=22),
axis.title.y = element_text(size=22),
legend.text = element_text(size=15),
panel.background =
element_rect(fill="black"))+
scale_colour_brewer(name =
"Clarity", type="div", palette="Blues")
HOLY
COW!!!! Look at that! That’s one really
sweet looking graph! And we technically
didn’t even add an extra layer. We
merely added another command inside the theme()
layer. The black simply contrasts with
the white points and allows us to see that extra striation without straining
our eyes. So that’s basically it. Most people would probably be satisfied with
leaving it in the rainbow color but I feel that the sequential color palette
makes it more visible.
No comments:
Post a Comment