The “fallible” Data Science Discipline in times of the novel Pandemic

Dr Arindam Banerjee is a Professor of Marketing, IIM Ahmedabad and Prof. Tanushri Banerjee is an  Associate Professor, Information Systems, School of Management, PDPU, Gandhinagar. SAGE is the proud publisher of their book “Business Analytics”. SAGE has a fast-growing list of high-quality textbooks in Business & Management.

The world under the grip of #Covid-19 pathogen is desperate to find a solution to the rapidly growing problem of large scale infection. We have been under lockdown for the past two months to stave of the onslaught of the virus. Quarantine, social distancing, wearing masks, closing event with public gatherings have all been followed to a large extent in attempting to outwit the virus. The imminent question in everybody’s mind is: When will the epidemic end, with or without the discovery of an antidote.

Apart from other prominent stakeholders, the onus has also fallen on data scientists to provide some answer to this overarching question. In many parts of the world, machine learning experts have been building data-driven models to predict the proliferation and ultimate end of the pathogen (refer attempts made by SUTD[1], IHME[2]). Past data on the growth of the infection in various countries has been used to build mathematical / ANN models that provide insights about how the pathogen may multiply further, slow down and, eventually die in the future.

Surprisingly, the model predicted “end dates” for the viral infection in many countries have been revised repeatedly to a later time period in the year. It certainly leaves many of us wondering the value of such prediction, if it changes so often. If there is an example where reliable prediction of the event (or the end of the event) is critical, this current epidemic situation would stand out to be on top of the list.

While our intention is not to put down a well-intentioned and relevant attempts to unearth the progression of this new virus, it would be worth pondering over the reasons why these elegant data models failed. The problem is not with the science involved, but more to do with the data used for applying the models. This is perhaps best summarized by a quote from the famous business tycoon, Ross Perot – “Market data-driven decision making is like driving a car looking at the rearview mirror”.

How then can one build some sense of the future when historical data is not as useful, as our current #Covid-19 challenge indicates? The right approach lies in the sensible inclusion in our analysis of contextual parameters that drive virus growth. Relevant “on the ground” parameters such as, the degree to which lockdown measures are being complied with, physical limitations of avoiding social distancing like, the density of population in various habitations, sanitary conditions of housing localities, usage of public amenities like toilets rather than availability of private facilities and, other elements that significantly influence the infection transmission rates are important considerations. Not all of these dimensions may have remained as they were in the past and hence, it is hard to tell that the data in the past is a true reflection of what lies ahead.

A robust approach to build a confident assessment of the future would be to compare the output of the data-driven model with a qualitative input from the expert on the ground and, to make necessary adjustments to the final forecast.

The advent of sophisticated techniques like data science methodology assists in mining data of multiple formats and across numerous sources very efficiently Such methodology should act as a good additional input (insight) to traditional triangulation approaches to forecasting rather than be regarded as a substitute for the robust age-old practice. In this regard, George Box’s quote, “All models are wrong, but some are useful” couldn’t be more appropriate. However, the usefulness of models, data-driven or otherwise, could be further enhanced if their outputs were juxtaposed against each other to see if they all imply the same about the future.

[1] Singapore University of Technology and Design

[2] Institute of Health Metrics and Evaluation

Read the SAGE textbook Business Analytics: Text and Cases by Dr Arindam Banerjee and Prof. Tanushri Banerjee

Follow us on:



  1. You are in point of fact a just right webmaster. The website loading speed is amazing. It kind of feels that you're doing any distinctive trick. Moreover, The contents are masterpiece. you have done a fantastic activity on this subject!
    Business Analytics Course in Hyderabad | Business Analytics Training in Hyderabad

  2. I feel really happy to have seen your web page and look forward to so many more entertaining times reading here. Thanks once more for all the details.
    Data Science Training in Hyderabad | Data Science Course in Hyderabad


Post a comment