Tuesday, 8 August 2017

How to test Machine Learning Systems?

I was working on analytical project in Healthcare some time back and heard a term used in development for data analysis – Descriptive analysis and Predictive Analysis. I asked one of my directors and he explained it in simple words which helped me to understand the basics. 

Descriptive analysis is summary of a given data set and Predictive analysis is predicting future events with the help of current data set. I was thrilled and wanted to work on predictive analysis.
Today, we implement predictive analysis using many things like data mining, statistics, machine learning etc. So this post takes us through what Machine Learning is, and more important how to test Machine Learning Systems.

Let’s see, how we do the testing in general. In case of functional testing and black box testing, we first understand the requirements. Then we create test scenarios at high level including positive and negative. We write test scripts with test steps to perform some actions. We create test data to support execution of scripted test steps. Very simple. So here we know what the expected result is? Hence in our test steps we have validation steps to check if the expected result is achieved. Can we test something where we don’t have set of expected results, answer is NO.
Now what changes here, in machine learning systems; is, we provide the lessons to the systems to perform certain actions based on the lessons. So machines will learn the lessons and perform actions based on the learning from previous run. Each time we execute; different results are expected. Now we need to test, is it performing as expected? Is it learning from its previous run?
Here, we need to understand the algorithm and mathematical formula used by the system to generate results. The testing will be based on the algorithm and not on the straight forward expected results. Each time we may get different results or incorrect results.
Following are the few pointers, testers need to remember –
1.       Understand the architecture and algorithm using mathematical coefficient to find out the working of the system, expectation is to provide the inputs after testing to modify the algorithm if required
2.       Align with business on the acceptance criteria. The results will have acceptable number of failures which business should be aligned
3.       Always create new test data and don’t use the same test data. If test data repeated, the execution will not produce the likely results. If possible automate test data generation to keep the good number of test data ready.
4.       Do not use results as it is in any of the communication, provide in depth analysis of the results. No one is expecting all pass results from the communication but more analysis on the behaviour of the system.

In upcoming posts, we will explore more on the machine learning open source tools. Stay tuned. Happy Machine Learning!!