Dr Arindam Banerjee is a Professor of Marketing, IIM Ahmedabad and Prof. Tanushri Banerjee is an Associate Professor, Information Systems, School of Management, PDPU, Gandhinagar. SAGE is the proud publisher of their book “Business Analytics”. SAGE has a fast-growing list of high-quality textbooks in Business & Management.
Apart from other prominent stakeholders, the onus has also fallen on data scientists to provide some answer to this overarching question. In many parts of the world, machine learning experts have been building data-driven models to predict the proliferation and ultimate end of the pathogen (refer attempts made by SUTD, IHME). Past data on the growth of the infection in various countries has been used to build mathematical / ANN models that provide insights about how the pathogen may multiply further, slow down and, eventually die in the future.
Surprisingly, the model predicted “end dates” for the viral infection in many countries have been revised repeatedly to a later time period in the year. It certainly leaves many of us wondering the value of such prediction, if it changes so often. If there is an example where reliable prediction of the event (or the end of the event) is critical, this current epidemic situation would stand out to be on top of the list.
While our intention is not to put down a well-intentioned and relevant attempts to unearth the progression of this new virus, it would be worth pondering over the reasons why these elegant data models failed. The problem is not with the science involved, but more to do with the data used for applying the models. This is perhaps best summarized by a quote from the famous business tycoon, Ross Perot – “Market data-driven decision making is like driving a car looking at the rearview mirror”.
How then can one build some sense of the future when historical data is not as useful, as our current #Covid-19 challenge indicates? The right approach lies in the sensible inclusion in our analysis of contextual parameters that drive virus growth. Relevant “on the ground” parameters such as, the degree to which lockdown measures are being complied with, physical limitations of avoiding social distancing like, the density of population in various habitations, sanitary conditions of housing localities, usage of public amenities like toilets rather than availability of private facilities and, other elements that significantly influence the infection transmission rates are important considerations. Not all of these dimensions may have remained as they were in the past and hence, it is hard to tell that the data in the past is a true reflection of what lies ahead.
A robust approach to build a confident assessment of the future would be to compare the output of the data-driven model with a qualitative input from the expert on the ground and, to make necessary adjustments to the final forecast.
The advent of sophisticated techniques like data science methodology assists in mining data of multiple formats and across numerous sources very efficiently Such methodology should act as a good additional input (insight) to traditional triangulation approaches to forecasting rather than be regarded as a substitute for the robust age-old practice. In this regard, George Box’s quote, “All models are wrong, but some are useful” couldn’t be more appropriate. However, the usefulness of models, data-driven or otherwise, could be further enhanced if their outputs were juxtaposed against each other to see if they all imply the same about the future.
 Singapore University of Technology and Design