Machine Learning for Data Engineers: The Patterns I Actually Used Across 7 Projects
Data engineers are not supposed to be machine learning engineers. But at some point every serious DE pipeline ends with a question the data alone cannot answer, and you end up building a model.
Over the past six months I've shipped seven ML-driven projects: price prediction on used Japanese cars, health outcome modelling across 53 African countries, 109 time-series forecasts for 15 African development indicators, financial news sentiment analysis, semantic job search with vector embeddings, inflation forecasting for the East African Community, and crop yield projections for East Africa. None of them were data science projects in the traditional sense. They were data engineering projects where the final step was a model instead of a dashboard.
This article is about what the ML stack actually looks like when a data engineer builds it, what each tool is genuinely good for, and the specific gotchas I hit in production that the documentation does not warn you about.
The Core Stack







