Monday 16 April 2018

SABAP2 The Movie maps in R

SABAP2 'The Movie' maps in R

Professor Les Underhill's vision for SABAP2 would be that it would continue to collect data in perpetuity in order to examine patterns in space and time: SABAP2 The Movie.

Now, with 10 years of data and counting, we can create these movies. There are many stories that wait to be told.

The existing SABAP2 map is static: it is the result of compounded lists across the 10 year history. This hides any dynamic patterns. The first and most obvious is seasonality, especially for migratory species. First, we will look at the monthly occurrence of a migratory species, building on what we did here:
http://bluehillescape.blogspot.co.za/2018/04/animated-sabap2-reporting-rate-timelines.html

# These are the packages required to run these scripts
library(ggplot2); library(dplyr); library(gganimate); library(animation) 
# gganimate is still only available via GitHub at this stage: https://github.com/dgrtwo/gganimate

# You will need to download and install the free image processing software ImageMagick: 
# https://www.imagemagick.org/script/download.php

# Tell R where ImageMagick lives:
magickPath = shortPathName("C:\\Program Files\\ImageMagick-7.0.7-Q16\\magick.exe") 
ani.options(convert=magickPath)

# Download your data here by replacing the 493 here with the species code of your choice (or pasting the link)
swallow =
  read.csv("http://sabap2.adu.org.za/inc/species_data_download.php?spp=493&section=6", stringsAsFactors = F)

# restrict the data to southern Africa (the region with the best coverage)
swallow = filter(swallow, lat<(-15))

# add reporting rate
 swallow$reporting_rate = swallow$cards_with_spp/swallow$cards

The new code starts here. First, we use the raw data to create a background map of where we have bird lists (coverage) using the atlas data. We then create a new dataframe of average reporting rate by month. Finally, we plot the black coverage map and overlay reporting rate (on a gradient from black with nothing, to red i.e. high reporting rate). Pentad size here is proportionally larger than in reality, which helps fill in atlasing holes. This is a movie after all, not an analysis.

#first, background pentad map
pentad_map_data = swallow%>%select(pentad, lat, lng)%>%distinct()

# now dataframe for mapping
annual_pattern = swallow%>%group_by(mnth, lng, lat)%>%summarise(rr=mean(reporting_rate))%>filter(rr>0)

swallow_space_time =
  (
    ggplot(pentad_map_data, aes(lng, lat))+
      geom_point(shape=15)+ylim(min(swallow$lat), -15)+xlim(min(swallow$lng), 37)+
      theme_bw(base_size = 15)+coord_equal()+xlab("Longitude")+ylab("Latitude")+

      geom_point(data=annual_pattern, aes(lng,lat, colour=rr, frame=mnth),shape=15)+
      scale_colour_gradient(low="black", high="#de2d26", guide="none")+
      ggtitle("Spatial distribution of Barn Swallow for month ")
  )

 suppressMessages(gganimate(swallow_space_time, interval=1))





A static map is also not appropriate if a species is nomadic: range will change from year to year depending on rainfall and resources. This is especially the case for several species of the arid zone. One of the most common and familiar is the Lark-like Bunting Emberiza impetuani. Here we examine year by year reporting rate for this species, revealing a remarkeably dynamic range (as you will see when we compare this to an endemic bird species below). SABAP2 data reveals an extraordinary irruption of Lark-like Bunting in 2013 that went largely unmentioned in South Africa's ornithological literature.

# get the data:
llbunting =
  read.csv("http://sabap2.adu.org.za/inc/species_data_download.php?spp=871&section=6", stringsAsFactors = F)

llbunting = filter(llbunting, !yrr%in%c(2007, 2018))
llbunting$reporting_rate = llbunting$cards_with_spp/llbunting$cards
annual_pattern = llbunting%>%group_by(yrr, lng, lat)%>%
summarise(rr=mean(reporting_rate))%>%filter(rr>0)

# now a spatial map

llbunting_space_time =
(
    ggplot(pentad_map_data, aes(lng, lat))+
    geom_point(shape=15)+ylim(min(llbunting$lat), -15)+xlim(min(llbunting$lng), 37)+
    theme_bw(base_size = 15)+coord_equal()+xlab("Longitude")+ylab("Latitude")+

    geom_point(data=annual_pattern, aes(lng,lat, colour=rr, frame=yrr),shape=15)+
    scale_colour_gradient(low="black", high="#de2d26", guide="none")+
    ggtitle("Spatial distribution of Lark-like bunting for year ")
  )

suppressMessages(gganimate(llbunting_space_time, interval=2))



By comparison, the 'movie' for Cape Sugarbird is far less interesting, with the exception of perhaps of the odd individual wondering out of range.

# get the data:
csb =
  read.csv("http://sabap2.adu.org.za/inc/species_data_download.php?spp=749&section=6", stringsAsFactors = F)

csb = filter(csb, !yrr%in%c(2007, 2018))
csb$reporting_rate = csb$cards_with_spp/csb$cards
annual_pattern = csb%>%group_by(yrr, lng, lat)%>%
summarise(rr=mean(reporting_rate))%>%filter(rr>0)
# now an animated spatial map csb_space_time = ( ggplot(pentad_map_data, aes(lng, lat))+ geom_point(shape=15)+ylim(min(csb$lat), -15)+xlim(min(csb$lng), 37)+ theme_bw(base_size = 15)+coord_equal()+xlab("Longitude")+ylab("Latitude")+ geom_point(data=annual_pattern, aes(lng,lat, colour=rr, frame=yrr),shape=15)+ scale_colour_gradient(low="black", high="#de2d26", guide="none")+
ggtitle("Spatial distribution of Cape Sugarbird for year ")
  )

suppressMessages(gganimate(csb_space_time, interval=2))



And finally, we watch the spread of atlasing coverage over the last decade, coupled with the presence of Pied Crow. This chart is a cumulative one: each year laid over the previous.

pied_crow =   read.csv("http://sabap2.adu.org.za/inc/species_data_download.php?spp=522&section=6", stringsAsFactors = F)

pied_crow$rr = with(pied_crow, (cards_with_spp/cards))

annual_pattern = pied_crow%>%group_by(yrr, lng, lat)%>%summarise(rr=mean(rr))


pied_crow_space_time = 
  (
    ggplot()+
      ylim(min(pied_crow$lat), -15)+xlim(min(pied_crow$lng), 37)+
      geom_point(data=annual_pattern, aes(lng,lat, colour=rr, frame=yrr, cumulative=T),shape=15)+

      theme_bw(base_size = 15)+coord_equal()+xlab("Longitude")+ylab("Latitude")+

      scale_colour_gradient(low="black", high="#de2d26")+
ggtitle("Coverage and Pied Crow CUMULATIVE year ")
  )

gganimate(pied_crow_space_time, interval = 1)


Illustrating inter-annual trends of SABAP2 reporting rate in R

Illustrating inter-annual trends of SABAP2 reporting rate in R

In my last post we learnt how to animate annual trends in monthly reporting rate. But was there any change in overall reporting rate across years?

http://bluehillescape.blogspot.co.za/2018/04/animated-sabap2-reporting-rate-timelines.html

So how is reporting rate changing between years for Barn Swallow Hirundo rustica. We can't really tell from the species summary page:

http://sabap2.adu.org.za/species_info.php?spp=493#menu_left

For this exercise you will need the data at the link marked “Pentad level summary (monthly)” under the Data downloads options.

For the animated chart you will need to download and install the free image processing software ImageMagick: https://www.imagemagick.org/script/download.php

# These are the packages required to run these scripts
library(ggplot2); library(dplyr); library(gganimate); library(animation) 

# Tell R where ImageMagick lives:
magickPath <- ani.options="" code="" convert="magickPath)" files="" magemagick-7.0.7-q16="" magick.exe="" rogram="" shortpathname="">

Here we prepare the data (for explanation of these steps see previous post). I've included this so the code can run as a stand-along script.

# Download your data here by replacing the 493 here with the species code of your choice (or pasting the link)
swallow <- africa="" best="" coverage="" data="" filter="" group_by="" http:="" inc="" lat="" occurrence="" of="" pentad="" range="" read.csv="" region="" restrict="" sabap2.adu.org.za="" southern="" species="" species_data_download.php="" spp="493&section=6" stringsasfactors="F)" swallow="" the="" to="" with="">%
  summarise(ok=sum(cards_with_spp))

occurrence <- add="" amp="" cards="" cards_with_spp="" code="" filter="" occurrence="" ok="" rate="" reporting="" reporting_rate="" swallow="">

The new code starts here. First, it is a good idea to remove data from 2007 (and maybe even 2008). 2007 is the year SABAP2 got going: submission rates and spatial coverage were low. Also remove the current year.

# exclude unwanted years:
swallow <- 2018="" a="" and="" by="" c="" error="" filter="" group_by="" here="" in="" include="" measure="" now:="" of="" rate="" reporting="" standard="" summarise="" summary="" swallow="" year="" yrr="">%
summarise(mean=mean(reporting_rate), sd=sd(reporting_rate), n=sum(cards_with_spp), se=sd/sqrt(n))

chart_title = c("SABAP2 reporting rate trend for \n Barn Swallow")

p <- 2018="" aes="" base_size="14)+" chart="" code="" coord_cartesian="" cumulative="T))+geom_bar(stat=" ear="" eporting="" frame="yrr," geom_errorbar="" ggplot="" identity="" is="" labs="" mean="" p="" proportion="" rate="" static="" summary="" the="" theme_bw="" this="" title="chart_title)+" xlab="" xlim="c(2007," ylab="" ymax="mean+se))" ymin="mean-se," yrr="">



# Here I add a regression line across the interannual mean reporting rate values
q = p+
geom_smooth(data=summary, aes(yrr, mean),   method="lm", color="red")

q



# The animated version  
gganimate(q, interval = .5)

# Save the above as gif: suppressMessages(gganimate(q, interval=6, "SWALLOW_barchart_time_series.gif"))


# A simple linear regression suggests no significant change. 
# Obviously real life may be much more complicated than this suggests.   

summary(lm(mean~yrr, data=summary))
## 
## Call:
## lm(formula = mean ~ yrr, data = summary)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.042814 -0.015509  0.005625  0.024023  0.025913 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept)  8.854237   6.314515   1.402    0.198
## yrr         -0.004226   0.003138  -1.347    0.215
## 
## Residual standard error: 0.0285 on 8 degrees of freedom
## Multiple R-squared:  0.1848, Adjusted R-squared:  0.08293 
## F-statistic: 1.814 on 1 and 8 DF,  p-value: 0.215

##  Try a more statistically robust approach to interannual change in reporting rate
## using a clt / bootstrapping approach with random samples

#initialise the target dataframe:

df = data.frame(year=2007, rr=NA)

# Run a loop that selects each year, removes sites with low sampling effort, 
# and then subsamples a random set of the available data 1000 times

for(i in 2008:2017){
  temp = filter(swallow, yrr==i&cards>2)%>%select(reporting_rate)
  holder = rep(NA, 1000)

    for(b in 1:1000){
    temp2 = sample(temp$reporting_rate, 100, replace = F)
    holder[b] = mean(temp2)
  }

  holder2 = data.frame(year=i, rr=holder)
  df = rbind(df, holder2)
}

df = filter(df, year!=2007)

# plot it:

ggplot(df, aes(year, rr))+
  geom_smooth()+
  geom_smooth(method="lm", colour="red")+
  coord_cartesian(ylim=c(0,0.50))


Friday 13 April 2018

Animated SABAP2 reporting rate timelines

Making animated monthly reporting rate charts from SABAP2 public data in R

Currently you can get a lot of information from the SABAP2 website. For example, go here to find information on Barn Swallow Hirundo rustica.

http://sabap2.adu.org.za/species_info.php?spp=493#menu_left

The aim of this post is to animate the current static monthly reporting rate chart to the right of the map. As there are now many years of data, it is hard to understand what is going on in the static chart now.

For this exercise you will need the data at the link marked “Pentad level summary (monthly)” under the Data downloads options. Note this file is fairly large (8 Mb), so download can take a minute or two, depending on your bandwidth.

You will need to download and install the free image processing software ImageMagick: https://www.imagemagick.org/script/download.php

In addition gganimate is currently only available on github


# These are the packages required to run these scripts
library(ggplot2); library(dplyr); library(gganimate); library(animation) 

# For me (but maybe not for you), I need to tell R where ImageMagick lives:
magickPath <- shortPathName("C:\\Program Files\\ImageMagick-7.0.7-Q16\\magick.exe") 
#update your path as necessary
ani.options(convert=magickPath)
# Download your data here by replacing the 493 here with the species code of your choice (or pasting the link)
swallow <- read.csv("http://sabap2.adu.org.za/inc/species_data_download.php?spp=493&section=6", stringsAsFactors = F)

Have you got the data?

Here is a quick chart to make sure you have the data, a histogram of the number of full protocol cards submitted by year


hist(swallow$yrr)


Since SABAP2 now extends into Africa on a fairly ad-hoc basis, you may wish to restrict your analysis to the southern African region by running swallow <- filter(swallow, lat<(-15)).
Most of the atlased area (currently in your data frame) may well be outside the range of the species, especially if you are dealing with a range-restricted or endemic species. If we use all data, then we will have very low reporting rates. This is likely not what you want: you want to know reporting rate from within a species range. So this step restricts the dataset to the set of pentads where the species has ever been recorded.

occurrence <- group_by(swallow, pentad) %>%
  summarise(ok=sum(cards_with_spp))

# now get rid of pentads with ok ==0 i.e non occurrence. 
occurrence <- filter(occurrence, ok!=0) 
Unlike some of the data sets available for download, this one does not come with reporting rate. So we need to calculate that. It will be a proportion (between 0 and 1).

 swallow$reporting_rate <- swallow$cards_with_spp/swallow$cards
 hist(swallow$reporting_rate) 
#lots of 0 and 1: artefacts of coverage where pentads surveyed only once can only be 0 or 1


This is something like what the chart on the SABAP2 website currently looks like:

ggplot(data = group_by(swallow, yrr,mnth)%>%
    filter(pentad%in%occurrence$pentad)%>%
    summarise(mean_reporting_rate=mean(reporting_rate))
  , aes(mnth, mean_reporting_rate,colour=factor(yrr)))+geom_line()



So lets animate that.

# we need a time series value that combines year and month

swallow$yrr_mnth <- swallow$yrr+round(swallow$mnth/100, 2)

# Now create the chart
p <- ggplot(data = group_by(swallow, yrr, mnth, yrr_mnth)%>%
    filter(pentad%in%occurrence$pentad  & yrr%in%c(2009, 2012, 2015, 2017))%>% # displaying only a few years for clarity
    summarise(mean_reporting_rate=mean(reporting_rate)),
  aes(mnth, mean_reporting_rate,colour=factor(yrr), frame=yrr_mnth,cumulative=T, size=yrr/2009))+
  scale_size(guide='none')+
  geom_path(alpha=0.5)+ggtitle("Monthly reporting rates for Barn Swallow for ") +
  xlab("Month")+
  theme_bw(base_size = 14)

# Display the animated plot. Interval controls the speed. 
suppressMessages(gganimate(p, interval=0.5))
If you want to save the chart as a gif file run this. Various other formats exist: check the animation package documentation.

suppressMessages(gganimate(p, interval=0.5, "barn_swallow_time_series.gif"))
Related Posts Plugin for WordPress, Blogger...