I was working on analytical project in Healthcare some time back
and heard a term used in development for data analysis – Descriptive analysis
and Predictive Analysis. I asked one of my directors and he explained it in simple
words which helped me to understand the basics.
Descriptive analysis is summary of a given data set and Predictive
analysis is predicting future events with the help of current data set. I was
thrilled and wanted to work on predictive analysis.
Today, we implement predictive analysis using many things like
data mining, statistics, machine learning etc. So this post takes us through
what Machine Learning is, and more important how to test Machine Learning
Systems.
Let’s see, how we do the testing in general. In case of functional
testing and black box testing, we first understand the requirements. Then we
create test scenarios at high level including positive and negative. We write
test scripts with test steps to perform some actions. We create test data to
support execution of scripted test steps. Very simple. So here we know what the
expected result is? Hence in our test steps we have validation steps to check
if the expected result is achieved. Can we test something where we don’t have
set of expected results, answer is NO.
Now what changes here, in machine learning systems; is, we provide
the lessons to the systems to perform certain actions based on the lessons. So
machines will learn the lessons and perform actions based on the learning from
previous run. Each time we execute; different results are expected. Now we need
to test, is it performing as expected? Is it learning from its previous run?
Here, we need to understand the algorithm and mathematical formula
used by the system to generate results. The testing will be based on the
algorithm and not on the straight forward expected results. Each time we may get
different results or incorrect results.
Following are the few pointers, testers need to remember –
1. Understand the architecture and algorithm
using mathematical coefficient to find out the working of the system,
expectation is to provide the inputs after testing to modify the algorithm if
required
2. Align with business on the acceptance
criteria. The results will have acceptable number of failures which business
should be aligned
3. Always create new test data and don’t use
the same test data. If test data repeated, the execution will not produce the
likely results. If possible automate test data generation to keep the good
number of test data ready.
4. Do not use results as it is in any of the communication,
provide in depth analysis of the results. No one is expecting all pass results
from the communication but more analysis on the behaviour of the system.
In upcoming posts, we will explore more on the machine learning
open source tools. Stay tuned. Happy Machine Learning!!