The ETL and ELT are necessary in data science because information sources—whether they use a structured SQL database or an unstructured NoSQL database—will rarely use the same or compatible formats.
Therefore, you have to clean, enrich, and transform your data sources before integrating them into an analyzable whole. That way, your business intelligence platform (like Looker, Chartio, Tableau, or QuickSight) can understand the data to derive insights from it.
Regardless of whether it's ETL or ELT, the data transformation/integration process involves the following three steps:
Extract: Extraction refers to pulling the source data from the original database or data source. With ETL, the data goes into a temporary staging area. With ELT it goes immediately into a data lake storage system.
Transform: Transformation refers to the process of changing the structure of the information, so it integrates with the target data system and the rest of the data in that system.
Load: Loading refers to the process of depositing the information into a data storage system.
To make this happen with as little code interaction as possible, Pentaho tools for ETL and NiFi or Streamsets will be covered in this hands on lab training for use case implementation
ETL and ELT Fundamental Concepts When and Why to use Architecture Overview Hands-on Lab with the tools