Purpose

The goal of this project is to visualize high-dimensional data from the GISS Surface Temperature Analysis data https://data.giss.nasa.gov/gistemp/.

Background

Streamgraphs are a nice way to visualize data of high-dimensinality. Essentially it allows us to visualize categorical data over time. It is also a type of stacked chart. The data set is comprised of many variables but we will focus on

We can treat the year as a group variable to apply custom colors to better observe the data. Our dependent variable or ‘y’ variable is temperature while our independent variable or ‘x’ is the month.

##import data 
GISTEMPData1 <- read.csv("ExcelFormattedGISTEMPDataCSV.csv")
GISTEMPData2 <- read.csv("ExcelFormattedGISTEMPData2CSV.csv")

## merge both data files
GIS_dat<-merge(GISTEMPData1, GISTEMPData2, by="Year")

## observe first 5 lines of the data
head(GIS_dat)
##   Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec J.D D.N  DJF MAM JJA SON
## 1 1880 -29 -19 -17 -27 -13 -28 -22  -6 -16 -15 -18 -20 -19 *** **** -19 -19 -16
## 2 1881  -8 -13   2  -2  -3 -27  -5  -1  -8 -18 -25 -14 -10 -11  -13  -1 -11 -17
## 3 1882  10  10   2 -19 -17 -24  -9   5   0 -21 -20 -24  -9  -8    2 -11  -9 -14
## 4 1883 -32 -41 -17 -23 -24 -11  -7 -12 -18 -11 -19 -17 -19 -20  -32 -22 -10 -16
## 5 1884 -17 -11 -33 -35 -31 -37 -33 -25 -22 -22 -30 -28 -27 -26  -15 -33 -32 -25
## 6 1885 -64 -29 -23 -44 -41 -50 -28 -27 -19 -19 -22  -5 -31 -33  -41 -36 -35 -20
##   Glob NHem SHem X24N.90N X24S.24N X90S.24S X64N.90N X44N.64N X24N.44N EQU.24N
## 1  -19  -33   -5      -38      -16       -5      -89      -54      -22     -26
## 2  -10  -18   -2      -27       -2       -5      -54      -40      -14      -5
## 3   -9  -17   -1      -21      -10        4     -125      -20       -3     -12
## 4  -19  -30   -8      -34      -22       -2      -28      -57      -20     -25
## 5  -27  -42  -12      -56      -17      -11     -127      -58      -41     -21
## 6  -31  -41  -21      -61      -17      -20     -119      -70      -43     -11
##   X24S.EQU X44S.24S X64S.44S X90S.64S
## 1       -5       -2       -8       39
## 2        2       -6       -3       37
## 3       -8        3        8       42
## 4      -19       -1        0       37
## 5      -14      -15       -5       40
## 6      -23      -27       -7       38

Result

# Load libraries
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2     ✓ purrr   0.3.4
## ✓ tibble  3.0.3     ✓ dplyr   1.0.1
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(streamgraph)
library(tidyr)
library(ggthemes)

# Subset the data we want
# convert from wide to long format before plotting
GIS_YM<-GIS_dat[,1:13] %>%
  gather(key=month, value=temp, Jan:Dec, factor_key = TRUE)
## Warning: attributes are not identical across measure variables;
## they will be dropped
#Plot
GIS_YM %>% 
  streamgraph(key="month", value="temp", date="Year", offset = "zero",interpolate="step") %>%
  sg_fill_tableau("greenorange12") %>%
  sg_axis_x(15, "year", "%Y") %>%
  sg_legend(show=TRUE) %>%
  sg_title("Cumulative Frequency of Surface Temperature 1850-2015")
Cumulative Frequency of Surface Temperature 1850-2015

Conclusions

Over time, the culmulative surface temperature has increased but interestingly had a leveling off around 1935-1980 which could be due to some other factor which was not considered here. Further analysis, such as time series analysis etc should be done to better address these changes over time.