Arun Puri is currently working as Lead, Web Systems at Quantum Inventions Pte Ltd. He has 15 year experience working in start-ups, or start-up type organisations. He has experience in both sides of the aisle with respect to data; consuming data to create applications and mashups during his earlier ventures, and managing large amount of location based data at his current position. He maintains keen interest in driving real innovation in the world via technology.
Lin: Several scientific disciplines are dependent on data and analytical work for evidence-based findings; e.g. epidemiology and demographic studies. What are the different ways to use our daily ‘data’ to improve the quality of life and morbidity?
Arun: The straightforward approach is to pool data, for example from the Android system, collate the collected data for researchers to analyse and recognize patterns. However, analysts have to be cautious of the unintended data biases that may exist. An example is using sensors to collect road data. A mobile app has been developed in Boston to detect and report bumps and potholes along the roadway. Besides detecting and measuring car motions, this mobile application’s smart algorithm is also capable of determining the likelihood of potholes in the road. In the case of Bump App, the pre-requisite is that the individual has the money to purchase a smartphone or an iPad. In other words, the data and findings will be geared towards individuals living on a good income, and individuals in the lower income strata may not be captured during the analysis.
Lin: What about a similar GPS data app catered for infectious diseases? For example, MERS, Ebola and seasonal respiratory diseases, if an app can be made available to record the early symptoms, contact rate, and individuals who were within the radar of an infected individual?
Arun: It can be done, but most people will have concerns over privacy and data protection. There may be a lack of empathy, and most may find it difficult to understand why their data has to be collected. Though the intentions of the study may purely be for diagnostic reasons, but it is difficult to gain approval from the app users. The most common ways to track contact rate and movements are through travel tickets, word-of-mouth, and memory recall. But note that these approaches were used after the outbreak of the infectious diseases, hence, continuous data collection may not be a practical approach for communicable diseases.
Lin: How would existing Big Data help to forecast future events or to obtain better risk estimates?
Arun: In general, the bigger the size of the data, the better the study. A large pool of data allows us to perform extensive correlation studies and discern patterns easily. This would allow us to make predictions from the collected data. Algorithms can be written to test various hypothesis, if the amount of data available is large enough. Then these algorithms can be used to both analyse and predict outcomes. The comparison between the prediction and actual results can then be used to further refine the algorithm. Again, one has to be mindful of using robust statistical methods when performing the analyses.
Lin: What are the determinants contributing to the value of a data?
Arun: I would say, research funding takes precedence over everything else. Next in line would be performing risk analysis. There is always some prior to approving or rejecting each grant proposal. As a rule of thumb, a grant proposal is accepted only when it presents the possibilities for further research work. Occasionally, a collected dataset may show no fruitful outcome or meaningful analytical work for the intended study, but may serve as the data source for another study.