I recently came across a wonderful suite of materials for introducing statistical learning:
- Hastie, et al’s free textbook (link to the PDF can be found on this page).
- The accompanying lecture videos – 15 hrs in total – freely available through YouTube (outline of, and links to, the videos here).
- Additional slides provided by professor Al Sharif (here), including PDF documents of R scripts and explanations for a wide range of topics covered in the book.
To give folks a feel for the content, it addresses many of the techniques presented at the University of Queensland’s graduate-level Machine Learning course. It also addresses many of the techniques I used, along with colleagues, at Shell to help optimise their massive coal-seam gas business in Brisbane, Australia.
In the aftermath of the global financial crisis some experts proclaimed that a key issue was a need to ‘make banking boring again.’ That, essentially, money and drugs and Ferraris were the crux of the motivation behind the risk taking that led to the GFC.
Relatedly, I tend to believe that a lot of boring engineering issues actually generate a lot of value. But they aren’t sexy. So they often get ignored by relative newcomers. Eventually those newcomers will get bit by the risks associated with skipping the unsexy bits. Then they, too, will write relatively obscure blog posts about them.
Until then, I’d like to highlight a key issue for helping ensure data analytics projects are set up for success: clear definition of the problem to be solved.
Note that it is ‘problem definition’ (not ‘tool definition’). Which means that the problem definition does not dictate if machine learning will or will not be used. It just requires that the problem be stated so engineers can determine effective options for solving it.