Time vampires holding data analysts back
The upsurge in the data analytics world is now common knowledge. It is estimated that with a CAGR of 11.7%, the Big Data and Analytics market size will surpass the $200 billion mark by 2020. Be it the thousands of transactions generated every second in Banking and Finance, or the inventory and production management of manufacturing houses, every enterprise and business generates tons of data and requires big data analytics today.
Enterprises want to be ahead of their customers’ needs. They want to be the welcome party whenever their customers arrive. Prediction, insight, and intelligent inference are what data analysts constantly seek to derive. All the data dumped in storage houses are pretty much inaccessible and unusable if not discovered, processed and utilized in the right fashion.
That is easier said than done. On the surface, it may appear that with automated processing, all you need is the right rules engine, and the problem will be solved by passing data through it. Unfortunately, the problems faced in today’s world by data analysts are much deeper and broader than this. The sheer volume of data is one issue, but let’s talk about the other more pressing problems data analysts face.
Accuracy, Disparity, and Duplication of data
When data is collected from multiple sources, being disparate is expected, but today data is also unstructured. It cannot directly be organized into any kind of tuples, tables or columns, simply because no such mapping exists to achieve this.
In addition, the generated data is not necessarily accurate – mechanisms are required to ensure that the data received is correct.
Along with these, data-duplication also has to be addressed in the design of data models, especially when dealing with multiple input streams. This is mainly to avoid wastage of resources and inaccuracy of results obtained from the data.
Immediacy, Communication and ROI justification
Data received into warehouses and data lakes is continually growing. This data must be taken up for processing in almost real-time. Without immediate action, the insight derived tends to become obsolete quickly, severely impacting business.
Analysts today are expected to be focused on the technicalities of processing data more so than just beautification of results. Having said that, highly consumable graphs and visuals are what helps business users estimate the potential impact of their decisions. Analysts have the tough job of balancing their priorities between doing robust analysis and providing consumable insights.
To add to this, proving the ROI of an analytics exercise is extremely difficult because there is always a possibility that it may not produce immediate results. It is also plausible that many of these exercises will be marked failures simply because they delivered unexpected results. Justifying such costly exercises is a task that is not easy for analysts. This may deter them from trying out new and risky options or POCs.
Security, Talent gaps and Collaboration
With multiple analysts and multiple tools being used to handle data, there is bound to be a security risk. Sensitive data also poses a huge legal risk from the point of view of privacy. Analysts must frequently make sure that they don’t make the organization vulnerable to such problems.
All analysts do not have the same level of visibility over data due to seniority, departments, access controls and so on – and could be working in silos. Segregation of datasets and views available to analysts makes it difficult to consolidate insights, and require collaboration mechanisms to be secure and easy to use.
Emotional issues and Faulty hypotheses
Data analysts interface with multiple stakeholders, right from IT for obtaining datasets, to business heads who expect insights. Every stakeholder has his or her own take on expected results. At the same time, they always also have a fair idea of what they want the data to say to them. This can pressurize the data analyst into telling them what they want to hear. This emotional attachment and gut feeling of stakeholders can be detrimental to the accuracy of results.
Knowing the expectations of the business, analysts offer relevant hypotheses – which may eventually be disproved by untainted data. But in a bid to save all the effort that has gone into creating the hypotheses there is a tendency to extrapolate the data such that the hypothesis is proved right. This is a huge problem of attachment and sunk costs, which is extremely difficult to overcome.
While this article is an attempt to highlight the most prominent issues faced by the data analyst today, we must submit that it is not a complete list. It is an ongoing catalogue that will get appended with more problem areas in due time. Having said that, there is a new and more efficient way of doing analytics that is on the rise. New age products solve many of the above issues, and they will be the answer to the problems faced by data analysts today.
Stay tuned for more updates.