The skills of data scientists and data engineers may overlap to some extent, but the roles are quite distinct. Data scientists focus on effectively analysing data to identify patterns, derive insights and define potentially useful data products, while data engineers actually build those products and the underlying data foundation required for those products.
Data scientists work in a lab environment, where they get the flexibility to do everything they wish to do with data. Data engineers, on the other hand, work in highly structured and governed environments with defined tasks and objectives.
Data scientists need to be creative. They need to have a deep understanding of the business. They need to continuously explore and experiment. They need to build business cases and effectively communicate data insights and business value of their findings. They spend lot of time in mixing new data acquired through new channels with existing curated data within the organization to build new insights and business cases. They typically build algorithms to potentially validate the business cases on test datasets
Data engineers are technical experts in specific tasks and need to adhere to architectural standards, timelines and implementation methodologies. Data engineers work in teams to build strong curated data foundations and manage the associated processes to ensure data quality, integrity and governance. In large organizations, this can incur huge expenses due to the complexities involved in managing legacy data, but without a strong data foundation, the work of data scientists may not yield useful results.
Data scientists are typically skilled in open source data analysis programming languages like R and Python, while data engineers are skilled in enterprise tools, packages, platforms and products as defined by the chief enterprise architects within the organization. Data scientists typically report to the chief data officer, while data engineers report to the chief technology officer.