Machine Learning for Scientists

I recently taught a 1-day machine learning workshop for scientists for the good folks at SciNetHPC. There was enough interest (nearly forty people signed up for a day-long session near the end of term) that we had to book a large-ish classroom.

There’s a lot of interest in the topic — which might even be surprising, given that a lot of the material is either familiar or pretty easy to digest for those who spend a lot of their time doing scientific data analysis. But for those coming to it for the first time and on their own, the difference in terminology (“features”? “shrinkage”? Wait, you just mean variables and regularization?) and the huge number of different methods available can be pretty baffling.

And I think it helps to have someone with a science background to explain the very different approaches taken to modelling than in the sciences (especially the natural sciences) and why it is that way. Having that connection means that you can translate – so that the very real expertise and experience they do already have can be a benefit, rather than throwing up barriers. (“Bias-Variance tradeoff? You mean you’re willing to introduce error just to get the error bars down a bit – centred on the wrong answer? What kind of monster are you, and what dangerous nonsense is this machine learning stuff?”)

This was the first time teaching this material, and while there are some things I’d like to improve (especially doing more on PCA and clustering, although I don’t know what I’d take out for a 1-day class), I think that it went fairly well. The presentation can be seen online, and everything’s available on github.

Incidentally, this was my first time using Slidify for a presentation, and I really enjoyed it – this may be the first markdown/html5 setup that finally gets me willingly moving away from Keynote for this sort of material. Obviously, Slidify integrates much more closely with R than with python, particularly for graphics; but still, it was a pleasure to use.