TATA Steel creates a single source of truth of data using data lake technology

TATA Steel, Europe's second largest steel producer, has collaborated with Xomnia to create a data lake. The data lake, created in the Databricks environment, serves as the single source of truth (SSOF) for TATA Steel, where all the manufacturing data collected in its factories is stored consistently and securely.

Using the data lake, data that is collected from any factory in Europe can be accessed to conduct various data analyses immediately. This allows the steel manufacturer to conduct data analyses at a scale and in a more timely manner. Sharing the data can also happen more securely, since a data governance team overlooks who has access to which data, and any updates made to any of the databases appear to everyone.


TATA Steel owns multiple factories across Europe. Their advanced analytics teams in the Netherlands work with data to improve the manufacturing processes in the factories, detect anomalies, develop new products, and conduct BI/reporting. They use data that originates in aggregated and controlled business warehouses, measurements, public data and financial/ERP applications.

This data, however, used to be scattered across the locations where it has been collected. This posed a challenge to TATA Steel’s advanced analytics team, which didn’t have a consistent way to request and access this data. In addition, due to the lack of a single point where all manufacturing data across the company could be accessed, different locations or teams used to make copies of the data that they needed. This created a situation whereby different versions of the same database were being used and updated across the company.

To overcome this challenge, the steel producer reached out to Xomnia to assist its data teams in building a centralized data lake, where all the manufacturing data is collected and stored in the cloud. In addition, Xomnia’s consultants assisted in the deployment of tools to improve the scalability of the data flow.


Using Databricks lake house architecture, a data lake was created, which we connected to different databases, ingesting the data into the data lake in batches, and in the future as streaming data. An intermediary data team known as the Data Governance Team (DGT) decides who can access which data, and they make sure that the data adheres to high standards of quality data and is consistently organized.

To ensure that the data is secure, data within the data lake is divided into different restriction points, depending on how sensitive it is. We created different ‘buckets’ where different data is stored, groups of employees are then connected through these buckets by the DGT, and members can access the data available in that bucket. Those who want access to buckets that they are not connected to have to request it from the owner, and if the owner approves, the data team gives the users permission to access the bucket.

In addition, the Databricks can also be used to build logic, code, and models to analyze the data.


The data lake is already in use and the data team in TATA Steel is overlooking its work and maintenance. Using an internally developed data governance app, the advanced analytics team can request a piece of data to be ingested in the data lake and that has to go through a clearance process.

Using the data lake, TATA Steel guarantees that there is a single source of truth where its data is uniform across the company. It allows the data to be more easily accessed using the Databricks environment, which allows business analytics teams to immediately start using it to conduct analyses, build models, and program. On top of this, it allows monitoring who uses data and for what purpose, to prevent any potential data leaks.