Principled AI with Probabilistic Machine Learning
(Note: Crossposted with the Haystax Blog.)
At Haystax Technology, we are proponents and early adopters of principled approaches to machine learning (ML) and artificial intelligence (AI) for cybersecurity.
We use the term 'principled AI' to describe what we call our modelbased approach, which is built on coherent mathematical principles. These principles help us keep our AI transparent, explainable and interpretable. Most importantly, they enable our systems to quantify uncertainty, unlike the blackbox approach of deep neural networks. Our users and followers often hear us evangelize this principled approach through publications and conferences, boot camps and local meetups.
Last month, I gave a presentation titled "Introduction to Probabilistic Machine Learning using PyMC3" at two local meetup groups (Bayesian Data Science D.C. and Data Science & Cybersecurity) in McLean, Virginia:
Guests just devoured delicious kabobs and spirits and now our @Emaasit is showing us how to build probabilistic models in computer code using PyMC3 (https://t.co/kWxaUL0ZGM) a Python PP language. Join future Data Science Meetups: https://t.co/vvHaXwT6ms pic.twitter.com/9tDTKW95sp
— Haystax Technology (@HaystaxTech) April 26, 2018
The following is a summary of the concepts we discussed during the meetup.
General Overview
Many datadriven solutions in cybersecurity are seeing a heavy use of machine learning to detect and predict cyber crimes. This may include monitoring streams of network data and predicting unusual events that deviate from the norm. For example, an employee downloading large volumes of intellectual property (IP) on a weekend. Immediately, we are faced with our first challenge, that is, we are dealing with quantities (unusual volume & unusual period) whose values are uncertain. To be more concrete, we start off very uncertain whether this download event is unusually large and then slowly get more and more certain as we uncover more clues such as the period of the week, performance reviews for the employee, did they visit WikiLeaks, etc.
In fact, the need to deal with uncertainty arises throughout our increasingly datadriven world. Whether it is Uber autonomous vehicles dealing with predicting pedestrians on roadways or Amazon's logistics apparatus that has to optimize its supply chain system. All these applications have to handle and manipulate uncertainty. Consequently, we need a principled framework for quantifying uncertainty which will allow us to create applications and build solutions in ways that can represent and process uncertain values. Fortunately, there is a simple framework for manipulating uncertain quantities which uses probability to quantify the degree of uncertainty. To quote Prof. Zhoubin Ghahramani, Uber's Chief Scientist and Professor of AI at University of Cambridge:
Just as Calculus is the fundamental mathematical principle for calculating rates of change, Probability is the fundamental mathematical principle for quantifying uncertainty.This has resulted in a principled approach to machine learning based on probability theory called Probabilistic Machine Learning(PML). It is an exciting area of research that is currently receiving a lot of attention in many conferences (NIPS, UAI, AISTATS), journals (JMLR, IEEE PAMI), opensource software tools (TensorFlow Probability, Pyro) and practical applications at notable companies such as Uber AI, Facebook AI Research, Google AI, Microsoft Research.
Probabilistic Machine Learning
In general, Probabilistic Machine Learning can be defined as an interdisciplinary field focusing on both the mathematical foundations and practical applications of systems that learn models from data. It brings together ideas from Statistics, Computer Science, Engineering and Cognitive Science as illustrated in the Figure below.Image Credit: http://mlg.eng.cam.ac.uk/zoubin/
In this framework, a model is defined as a description of data one could observe from a system. In other words, a model is a set of assumptions that describe the process by which the observed data was generated. This model can be developed graphically inform of a Probabilistic Graphical Model (PGM) as illustrated in the Figure below.
In the Figure above, the circular nodes represent random variables for the uncertain quantities (e.g. unusual volume or unusual period) and the square nodes represent the uncertainty over the corresponding quantities (e.g. probability of unusual volume). The downward arrow shows the direction of the process that generated the data. The upward arrow shows the direction of inference, that is, given observed data we can learn the parameters of the probability distributions that generated the observed data. As we observe more and more data, our uncertainty over the random variables (e.g. unusual volume) decreases. This is the modern view of machine learning according to Prof. Chris Bishop of Microsoft Research.
Learning follows from two simple rules of probability, namely:
 The sum rule: $p(\mathbf{\theta}) = \sum_{y} p(\mathbf{\theta}, y)$
 The product rule: $p(\mathbf{\theta}, y) = p(\mathbf{\theta}) p(y \mid \mathbf{\theta})$
These two rules can be formulated into Bayes Theorem which tells us the new information we have gained about our original hypothesis (or parameters) given observed data.
$p(\mathbf{\theta}\mid \textbf{y}) = \frac{p(\textbf{y} \mid \mathbf{\theta}) \, p(\mathbf{\theta})}{\textbf{y}}$,
where:
$\begin{aligned} p(\mathbf{\theta}\mid \textbf{y}) &= \text{the posterior distribution of the hypothesis (or parameters), given the observed data} \\ p(\textbf{y} \mid \mathbf{\theta}) &= \text{the data likelihood, given the hypothesis (or parameters)} \\ p(\mathbf{\theta}) &= \text{the prior over all possible hypotheses (or parameters)} \\ p(\textbf{y}) &= \text{the data (constant)} \end{aligned}$The probabilistic approach to machine learning has proven to be preferable to deep learning in many applications that require transparency and oversight. Although deep learning has produced amazing performance on many benchmark tasks in specific applications such as computer vision and conversational AI (e.g in the recent Google Duplex), it has several limitations in much more general and broader use cases such as Cybersecurity, and Banking. Deep learning systems are generally:
 very data hungry (i.e. often require millions of examples for training)
 very computeintensive to train and deploy (i.e. require cloud GPU & TPU resources)
 poor at representing uncertainty
 easily fooled by adversarial examples
 finicky to optimize: choice of architecture, learning procedure, etc, require expert knowledge and experimentation
 uninterpretable blackboxes, lacking in transparency, difficult to trust
In contrast, PML systems are transparent, explainable, do not require lots of data and computer power. Currently, it is easier than ever to get started building PML systems. This is attributed to the plethora of open source software tools called Probabilistic Programming Languages. These include Google's TensorFlow Probability, Uber's Pyro, Microsoft's Infer.Net, PyMC3, Stan, and many others.
These are a few of the topics that we discussed during this meetup. Materials from the meetup including slides and source code are provided below.
Daniel Emaasit is a Data Scientist at Haystax Technology. For a more detailed treatment of this subject, please see Daniel's blog.
Source code
For interested readers, two options are provided below to access the source code used for the demo:

The entire project (code, notebooks, data and results) can be found here on GitHub.

Click the Binder icon below to open the notebooks in a web browser and explore the entire project without downloading and installing any software.
References

Ghahramani, Z. (2015). Probabilistic machine learning and artificial intelligence. Nature, 521(7553), 452.

Bishop, C. M. (2013). Modelbased machine learning. Phil. Trans. R. Soc. A, 371(1984), 20120222.

Murphy, K. P. (2012). Machine learning: a probabilistic perspective. MIT Press.

Barber, D. (2012). Bayesian reasoning and machine learning. Cambridge University Press.

Salvatier, J., Wiecki, T. V., & Fonnesbeck, C. (2016). Probabilistic programming in Python using PyMC3. PeerJ Computer Science, 2, e55.