Stories from Blue Hill Escape: July 2014

Friday 25 July 2014

Helping Fynbos

Last month we had our first intern from the Living Lands, Ted – from Holland. Living Lands is a collaborative partnership based on the eastern sections of the Baviaanskloof, working with the community and really interested in ecosystem services. Ted was looking for a restoration type project for a third year university assignment. Perfect – we had just the job.

In 2009 the first black wattle clearing operations were undertaken around the Hartbeesrivier community, where we live. The wattle infesting the streams of Blue Hill was cleared in 2011. Follow up operations have been ongoing ever since to varying degrees. However, certain areas are looking pretty bad in terms of erosion. So we wanted to know how the vegetation was doing, in terms of biomass and the types of vegetation (functional groups).

We really wanted to know whether or not clearing the alien vegetation had made the erosion worse. So this was Ted's task – systematically take samples in cleared, uncleared and natural vegetation. After a month of bashing through spiny cliffortia, phragmites, mud puddles and crumbling canyons, we were able to run some models.

Turns out even now several years later, and despite natural vegetation having burnt two years ago, the cleared area was lower in biomass and in terms of natural functional groups. And while erosion was really complicated in terms of contributing factors, areas with high livestock use were clearly worse.

So – what do we do about it? Well, it was clear the land needed some help with the regeneration process. Also – after the big fire of 2012 proteas were resprouting everywhere, including in the roads and tracks – not ideal. So we've started digging up those plants doomed to be squashed by 4x4 tyres and replanting them in the areas clearly struggling to recover. Win Win Win.

Well, we hope - it will be some time before we know if the transplanted plants have taken. So this is a documentation of the pilot project.

Road Warrior - Wendy Foden digs up a young Protea nerifolia in the track. Skeletal remains of the parent plants can be seen behind her.

Beata (Starbucks Supervisor and volunteer) guards the rescued plants

An example of the areas cleared of wattle with replanted proteas here and there. Note lack of vegetation recovery around the stumps.

Alexa-Storm, helping the Earth recover with love and care

A pretty shot of the winter moon setting over Hartbeesrivier

Protea nerifolia - maybe the plants we've planted will look like this one day

Making it through my MOOC: the Data Science specialization through the Johns Hopkins University

Although I completed my PhD in 2010 and have been taking data through my postdoctoral research project on the birds of the Fynbos, its been over 20 years since I did stats 101 at University. Although I've muddled through partly through a focus on a few statistical techniques, it was brought home to me that I needed to do something about my patchy statistical knowledge when I had a paper rejected from the world's lowest ranked ornithological journal, partly because of a basic statistical error.

Up until this year I didn't even know what a MOOC was – it stands for Massive Online Open Course; basically, learn anything online, and for free. Some of the big platforms for hosting these include Coursera, edX and Udacity and you read a comprehensive review of these big 3 here:

http://www.skilledup.com/blog/the-best-mooc-provider-a-review-of-coursera-udacity-and-edx/

So how did it all start for me? My university sent out a postgrad-development-program email including links to some Coursera classes, where I spotted the Foundations in Statistical Inference course offered via Duke University. So in February I enrolled in my first MOOC. The course consisted of weekly lectures that could be downloaded or viewed online as videos, as well as course notes, a link to a free basic stats book, weekly practical tutorials using the R programming language, weekly quizzes, a mid term exam, a course stats project, and a final exam.

The course was put together by Mine Çetinkaya-Rundel; material was clear, lucid and the best of all the online courses I was to subsequently take. The course does what is says on the box – and leads you by the hand from understanding Means and Standard-deviations all the way to an introduction on Multiple Regression. Since there are thousands of people signed up to any MOOC, support is provided not provided though interactions with the teacher but through the online discussion forums, with students helping each other. Yip – cheating is practically legitimized (ok, answers to quizz questions are not posted, but steps on how to get them often are; ok – not cheating, but serious collaboration).

In February I started my first Johns Hopkins Data Science specialization courses: the foundation courses being the Data Scientist's Toolbox, and R programming. You won't get anywhere on this course without embracing the R language (and if you have to analyse data at any level, you probably should do this anyway). The Data Scientist's Toolbox is a good overview to the rest of the course, and includes a good motivational first video. Its a very easy 'tick' in the serious of 9 courses. I believe it is presented by Jeff Leek, who has a clear lecture delivery style, is well prepared, and imparts a lot of information very quickly. For the later classes I had to pause videos frequently in order to back up over key concepts, but that is the joy of being able to do these things at your own pace (it is all doable if you are disciplined or motivated).

I've been getting by for the last decade with SPSS for my statistics. However, this is expensive licensed software and it was clear with conversations with clever colleagues that there were multiple benefits in learning R programming - not the least being that it is free. However, R is like learning a real language, and takes time to get to grips with syntax and idiosyncrasies. Six months into all of this, and I still struggle with some aspects of its use for things I could do very quickly and easily with Microsoft Excel. With Excel at least what you do is presented straight away in front of you in terms of data manipulation and reshaping, while in R its all hidden away in data.frames, and little mistakes can severely f***up results. However, my mind has also been blown open by the possibility of all the things that can be done, from charting, to exploring and acquiring data, to running extremely complicated data analysis models.

Enter R programming – presented by Roger Peng. He is great, and he is the only one of the three lecturers to video himself as part of his lecture presentation, which kind of makes it all a little more personal – which is actually really important considering the whole series could be presented by some archane Artificial Intelligence robot. The videos are a little less polished, and I often found myself distracted trying to read the book titles on his book shelf, or found myself amusing myself following the movement of coffee cups and personal items between video lectures.

Then I did something a little crazy during April/May once the Duke course was finished. I took four of the courses simultaneously – Getting and Cleaning Data (by Jeff, excellent); Exploratory Data Analysis (Roger Peng, good – a bit unclear at the end); Reproducible Research (Roger Peng, fundamental lessons here – very important); and Statistical Inference. Basically, the MOOCs became a full time occupation because each one takes about 8 hours a week, and some of the projects can take days if you get stuck, especially if you are learning R along the way. And don't kid yourself – you really need AT LEAST the recommended hours to get through each course proficiently.

Ok – now – back to that last course, Statistical Inference. Having just done the Duke Mooc I was pretty sure the Johns Hopkins version wouldn't be an issue – it was only a typical Data Science 4 week course. However, it is without a doubt the worst of the series and about the most terrible lecture style I have ever encountered in my life. Feedback on the discussion boards was scathing, and included an attempt to start a petition to refund those Coursera students who had paid the fee for Signature Track – i.e. those that wanted official recognition for their course participation. My course score for the Duke Mooc was 85% - and as I'd been on holiday for 2 weeks of it I had missed a quizz and the project proposal submission deadlines, which all counted for points. But despite completing everything for the John Hopkins regression course I scored only 72% - in other words my basic understanding of statistical inference at the end of the second course was actually worse!!!! By comparison, I scored 100% in Reproducible Research and Getting and Cleaning Data. The Statistical Inference course notes were also a disaster – I can only hope for those taking newer versions of course that things have improved. The presenter - Professor Brian Caffo - may be some later day genius in his field, but that does not translate to good teaching style by any means. I also had to suffer through Regression Models, where I am sure it was only some a-priori knowledge on these subjects that got me through.

At the moment I am in the last weeks of the Practical Machine Learning module, which has been a real eye-opener, and TG its Jeff Leek. I have one more course to go – Developing Data Products, and then apart from a Capstone project for those doing the paid version of the course, I'll have nailed it. So far – its been worth it, mostly because I am far more confident in using R – which like any language, only gets better the more you use it. And unlike stats 101 twenty years ago, all paper and equations, I can honestly say that stats is fun now. I never thought it possible that I could say that – but really, the way one can quickly visualize complicated data sets, explore data and interpret data – its almost like telling (or writing) a story – only with numbers and charts on a laptop. And the utility of it all – well, the sky is the limit (literally; get good with these skills and you could work for NASA).

So – Thanks to Coursera and John Hopkins University – this education revolution will change the world. Get on board before national governments start to see free and fair education as a threat to national job security and start to regulate who can participate. That, or global demand brings down the servers – in fact I wrote this entire post while waiting for the Coursera website to come back online from a temporary time out.

In the words of Rob Schneider (Adam Sandler's sidekick) - “You can do it!”

Thursday 10 July 2014

Citizen Science in South Africa: iSpot and ADU's Virtual Museums

Over the last couple of years there has been a recognition that the general public can play a very important role in science, and wildlife monitoring in particular. Anyone from the librarian's daughter to the postman can now also be a 'citizen scientist'. In South Africa, probably the most rigorous in terms of raw data collection is the South(ern) African Bird Atlas Project (SABAP2) where birdwatchers upload lists of birds to a central database. Due to the local focus on going 'wide and deep', as well as encouraging repeat surveys, this is an outstanding database. In some initial analysis I did on bird distribution of the fynbos, it proved way better than the global ebirds project, and even Birdlife International range maps. There is no ornithologist working on South Africa's birds that does not refer to this major database. Join the atlas efforts at http://sabap2.adu.org.za/

Citizen science projects cover a range of activities, from really specialized skill sets, like bird-ringing, to submitting photographs to online archives. The age of digital photography has been around for a while, and now almost everyone has a camera – ranging from built in cameras on mobile phones to fancy D-SLR cameras with massive lenses. Recording nature has never been easier. However, there is also now competition among citizen science programs to recruit people willing to record their observations. There are 2 major photo archive platforms in South Africa: iSpot and the Animal Demography Unit's Virtual Museum.

iSpot was launched in South Africa in 2011 and has an online community that boasts many expert members that has grown very rapidly through the institutional support of SANBI and a vast amount of time dedicated to the task by Dr Tony Rebelo. Tony's focus was initially to use the tool for documenting the plants of southern Africa and he has succeeded remarkably well – aiming to have 95% of South Africa's plant life documented by 2015. He describes iSpot first and foremost as a learning tool (i.e. you can upload photos and let others identify them). However, you can also contact them to obtain spatial and other information.

iSpot was developed through the Open University and they have brought incredible developmental power to play to create a slick interactive tool – iSpot allows multiple commenting streams which creates conversation and users are a tight knit community. The South African iSpot community can be found at www.ispot.org.za (don't get confused with the UK site). Through a single portal it is easy to upload photos to a range of groups (Amphibians and Reptiles, Birds, Fungi and Lichens, Fish, Invertebrates, Mammals, Plants, Other Organisms). Members collect points through interactions (agreeing with ids). Apart from plants, birds and insects feature prominently in group interactions (see http://www.ispot.org.za/Stats%20update#comment-126872)

The Cape Town University's Animal Demography Unit (ADU) Virtual Museum has been around for a few years more, but created their virtual museums from scratch. Despite constant financial constraints, the team led by Prof Les Underhill has done a remarkable job. Registered users number only a quarter of those of iSpot. A key difference is that identification is confirmed by an expert – as opposed to iSpot where the users agree or disagree on an identification. I prefer to do this than get involved with dialogue, but each to their own. I also prefer the mapping feature with the ADU's VM. There are several Virtual Museums (but only one data upload interface at http://vmus.adu.org.za/ – you then choose which museum your photo belongs in). Apart from trees, they don't do plants, but have more focus on the animal kingdom – including weavers (PHOWN) and Starfish (EchinoMAP). There flagship group is the MammalMap (mammalmap.adu.org.za), but their LepiMap (Butterflies and Moths) formed a major contribution to the recent “Conservation Assessment of Butterflies of South Africa, Lesotho and Swaziland”, part of their proven track record of doing something with the data submitted. http://lepimap.adu.org.za/

So which platform to use for documenting your wildlife in the most helpful manner? While both platforms would beg user loyalty, a simple answer is: Plants on iSpot and Animals on ADU Virtual Museums. In fact, iSpot has been courteous enough to link to the MammalMap and SABAP2 under their survey pages – so there is a tacit recognition of the broad domain of each of these.

Participating in citizen science programs is a really useful and rewarding exercise. Its a great way to do something useful with your photos collecting real or digital dust, and for recording your legacy – the information exists for as long as we can produce electricity to run servers, and in any scientific publications that result. By uploading photos with dates and locations you are contributing to a database that allows one to see where and when animals were documented at various locations, a valuable conservation and management tool – but their value will only be realised through sufficient participation – so register for both now!