Monday, 6 April 2015

What makes a good blog post? An analytical approach applied to the Blue Hill Escape blog

When I first started blogging, I remember googling the question: what makes a good blog post? Back then I wanted to set up a blog that was part diary, part marketing tool for the guest house. Of course it has evolved along the way and I've posted on a variety of topics with nature and fynbos as the central themes. However, its all just been winging it and I've never really known what makes a good blog post except from a personal perspective; except now I have data to look at.

So, I've been blogging ad-hoc for four years now, accumulating around 200 posts along the way. Finally, using my blog as the data I can can try and answer the following questions:

  1. Do longer posts (those with more words) get more hits?
  2. Do posts with more pictures get more hits?
  3. Do posts written on a particular day get more hits (from my facebook feed I know we're more social on the weekend for instance)?
  4. Is hit rate a function of time: i.e. are older posts getting more hits just because they have been around longer, or am I getting more hits now because I have more followers?
  5. Finally, and perhaps of greatest interest: does the sentiment of a post have an influence on hit-rate? By sentiment I mean are posts that are positive or negative in their overall tone impacting hit rate. This is an important question to a conservation biologist, where the fear is that the bad news that we are continually surrounded by may be putting people off. Sentiment is a difficult thing to measure, and I use the sentiment analysis tool Semantria to find out.
The short answer: ALL of these are important, but some were important in ways I did not expect.

  1. Do longer posts get more hits?

Generally, longer posts do get more hits, but this is almost certainly because they are more searchable over time. i.e. more words does not equal a good post, but is good for long-term exposure. So that answers question 4: clearly I have not been making inroads into gaining more readers, but I don't blog regularly enough and I don't advertise, so I can't complain.

  1. Do posts with more pictures get more hits?

Definitely. The more photos the better. Picture paints a thousand words bla bla. Since I hardly ever uploaded more than 20 photos, I cannot advise if too many pictures is a bad thing, but I'd guess that 10 or so pics is a good rule of thumb.

  1. Do posts written on a particular day get more hits?

The results here surprised me. For me: Wednesday is a good post day (mid-week hump?), Thursday is very bad, and surprisingly, so is Saturday. Maybe too much competition from other digital media on a Saturday? Overall, my Saturday posts have been shorter, so perhaps that is a confounder using these measures given the influence of time on hit rate as Friday is a good day.

  1. Does the tone of an article influence hit-rate?

Anyone who works in the conservation field knows that there are many depressing stories around: climate change, species in endanger of extinction, pollution, over-population etc etc. We also know that going on about these things doesn't exactly make one the life of the party. So I try not focus on the negative when I write.

Quantifying tone is pretty difficult to do objectively. To do this I used a cool analytical tool developed by the company Semantria You can try it out – they have a live-demo on their website where you can post an article and it analyses words, phrases, names and themes to come out with an overall score.

So while I was encouraged to see that on balance my writing is neutral to positive overall, the trend is towards negative articles having higher hit-rate. Overall, the Semantria score was a poor predictor of hit-rate though.

In summary, there are many factors to take into account when writing up a good blog post, and of course here I have only looked at trends from my blog – factors could be very different for other blogs in other situations!

The technical bits (of interest to data analysts only):

I used the R package rvest to scrape and summarise my posts. I then used the MuMIn package and the dredge function to choose the best model from these two starting models:

model <- glm(views~charactercount+semantria_score+photos+blogageDays+day, data=blogdata, family=poisson)
lmmodel <- lm(log(views)~charactercount+semantria_score+photos+blogageDays+fday, data=blogdata)

I ran both because the data followed a poisson distribution, but log transformed data were gaussian and I find linear models easier to interpret.

The best poisson model contained all variables in the final model, while the log transformed data model dropped the Semantria score. Code and data available on request.


  1. I don't worry about which day. Because of the 24/7 around the world my posts tend to get read over about 3 days. We are coming up to Google's deadline for slow-loading (especially on mobile) blogs so I'd be wary of too many photos, but 10 is my choice.

    1. Hi, and thanks for the note on the slow-load heads up! Yeah, day of the week is not really important.


Related Posts Plugin for WordPress, Blogger...