Big Data Preparation: A Seemingly Insurmountable Problem
Ask any analyst about the issues they face and the answer you will invariably get is – creating charts is only one part of the challenge, connecting to and using the right data is the bigger issue. Analysts almost never encounter a situation in which a single source of data is able to provide all the answers. They need to access data from a variety of sources from internal data warehouses and CRM systems to cloud platforms and social media. But this data residing in multiple sources is in disparate formats that are usually incompatible with each other. It has to be combined together in the right format to get the appropriate context and extract the right insights.
Getting access to these sources through IT is a long process. Analysts frequently have to wait in line for attention from IT. Requesting access to external sources such as social media only complicates matters further because then IT has to apply a brand-new cycle of integrating these sources.
The challenge does not stop at getting access to all the desired sources, analysts end up spending 90 percent of their time ensuring that this data is relevant, clean, and in the right format to create visual dashboards. This also includes joining various datasets, enriching the data, pivoting the data and essentially performing all similar activities to ensure that data is ready for analysis. Before analysts can combine data, they also need to determine which data is pertinent to the current project from the various datasets available i.e. data discovery. They need to build the right dataset that will help answer a specific business question. Since data today is far bigger and more intricate than ever before, preparing it for analysis is extremely time-consuming and technically complicated for analysts. This means that they are able to spend only 10 percent of their time performing actual analysis activities of truly understanding the data, determining which algorithms to apply, and digging up insights.
While traditional data integration tools available in the market offer features that carry out these activities, they are primarily meant for IT and are not easy to use. Additionally, most of them are not built to handle a ton of raw data stored in multiple silos. Which means that in spite of investing in these tools, the path from data to insights remains painfully slow.
A new league of self-service data preparation tools that claim to reduce IT dependency has emerged, but these tools typically offer pure-play data integration, profiling, and quality features, and depend on external tools for other analytics activities such as visualization. Also, self-service tools throw up bigger questions related to data security and governance. Whether a company wants to prevent security breaches or just confirm the authenticity of data, self-service tools need to deliver governance to ensure successful projects. Tracking data lineage and maintaining records of data handling are elementary requirements, but few tools offer these features.
Data analysts today need a complete solution that offers ease of use and independence across data access, preparation, and visualization while adhering to data governance standards established by IT. Such a tool can reverse the balance of time investment for analysts so they can spend 80% of their time analyzing data and only 20% in data preparation.