Jeremiah Wangaruro

jeremiah.ruro@gmail.com | linkedin.com/in/jeremiahwangaruro | jeremiahruro.com | github.com/jeremiah-wa | Melbourne, VIC
Professional Summary
Data Engineer with 5+ years building production data platforms, ML pipelines, and Python tooling across AWS, GCP, and Azure. Background spans the full ML lifecycle: data extraction (Lambda, Airbyte CDC), model training and deployment (SageMaker, Scikit-Learn, TensorFlow, CodePipeline), and production orchestration (Airflow, dbt). Built internal CLI frameworks, federated Data Mesh governance, and ABAC access control systems.
Core Competencies
Data Engineering ML Pipelines & MLOps Apache Airflow (MWAA) dbt Core Airbyte CDC AWS SageMaker Data Mesh Architecture Python (Typer · Pydantic · FastAPI) SQL (Redshift · BigQuery · T-SQL) AWS / GCP / Azure Docker Great Expectations Power BI & DAX ABAC / IAM Policies
Work Experience
aPriori Technologies Nov 2024 – Mar 2026
Data Engineer • Belfast, Northern Ireland
  • Source AI: Replaced Lambda-based ingestion jobs with Airbyte (CDC), reducing data ingestion latency and eliminating key bottlenecks feeding ML workloads. Contributed to the architecture of Source AI, an ML-powered negotiation assistant leveraging LLMs for automated sourcing workflows.
  • GCP → AWS Migration: Migrated the data platform from GCP to AWS, contributing to technology decisions and implementation to ensure feature parity while optimising for cost and performance. Designed and implemented ABAC IAM policies using resource tagging to enforce least-privilege access. Built custom dbt macros for JSON flattening and Redshift SUPER column handling.
  • Data Mesh Platform: Architected a federated Data Mesh platform enabling domain-driven data ownership with automated metadata management and consistent governance. Created data-platform-cli, a Typer-based Python CLI standardising Data Product scaffolding, deployment, and integration with dbt and Airflow. Integrated Great Expectations into dbt pipelines for continuous data validation and contract testing.
Stack: Python · SQL · dbt · Airflow (MWAA) · AWS (Redshift, S3, Lambda, ECS) · GCP (BigQuery, Cloud Composer) · Airbyte · Great Expectations · Docker
Civica Feb 2022 – Mar 2024
Data Scientist • Belfast, Northern Ireland
  • Tourism Ireland: Extracted and transformed customer engagement data from Dynamics 365 via Azure Synapse Analytics. Designed interactive Power BI dashboards and strategic BI reports that informed €2M+ marketing budget allocation.
  • NI Appeals Service: Refactored legacy SSAS tabular models with data partitioning and query optimisation, reducing report load times. Developed advanced DAX measures and Power BI reports enabling self-service analytics for case management. Managed SSAS and SSRS infrastructure including automated refresh schedules.
Stack: Azure Synapse Analytics · Power BI · T-SQL · DAX · SSAS · SSRS · Power Query · SQL Server · Dynamics 365
Sentireal Nov 2020 – Feb 2022
Data Scientist • Belfast, Northern Ireland
  • InterTradeIreland Co-Innovate: Designed a serverless data extraction pipeline using AWS Lambda for cost-efficient API integration. Conducted EDA and feature engineering in SageMaker notebooks. Developed and deployed an end-to-end ML pipeline with SageMaker and CodePipeline for automated model training, evaluation, and deployment. Trained predictive models using Scikit-Learn and TensorFlow to assess VR training effectiveness.
Stack: Python · Scikit-Learn · TensorFlow · Docker · AWS (Lambda, SageMaker, Cognito, API Gateway, CodePipeline, CodeDeploy)
Publications & Projects
Recommender Systems in Virtual Learning Environments Publication
Investigated the application of conventional recommendation models in virtual learning environments. Published in AI Journal.
aijourn.com/recommender-systems-in-virtual-learning-environments
Multi-criteria Recommender Systems
Comprehensive comparative study investigating the effectiveness of multi-criteria ratings on recommendation quality.
Python · Scikit-Learn · Pandas · SciPy · SQL
Education
BSc Computer Science with Data Science  ·  University College Dublin 2016 – 2020
Second Class Honours, Grade 1 (2:1)  ·  GPA: 3.43 / 4.2  ·  Relevant coursework: Data Science in Python, Programming for Big Data, Probability & Statistical Analysis, Machine Learning, Introduction to Artificial Intelligence
Technical Skills
ML & Model Deployment: AWS SageMaker (notebooks, training, endpoint deployment), Scikit-Learn, TensorFlow, predictive modelling, feature engineering, ML pipeline CI/CD (CodePipeline/CodeDeploy)
Languages & Frameworks: Python (Typer, Pydantic, FastAPI), SQL (PostgreSQL, Redshift, BigQuery, T-SQL), JavaScript/TypeScript, dbt Core, DAX, Power Query M, Bash
Orchestration & Pipelines: Apache Airflow (MWAA, Cloud Composer), Airbyte CDC, AWS CodePipeline, DAG-based workflow design, retry logic, idempotent execution patterns
Cloud & Infrastructure: AWS (S3, Redshift, Lambda, ECS, MWAA, SageMaker, Cognito, API Gateway, CodePipeline), GCP (BigQuery, Cloud Composer, Cloud Build), Azure (Synapse, Data Factory, SSAS, SSRS), Docker, Git, Jenkins
Data & Governance: Data Mesh architecture, medallion architecture, Great Expectations, dbt, data contracts, ABAC IAM policies, federated governance, metadata management, data modelling (CDC, ETL/ELT)
BI & Analytics: Power BI Desktop/Service, DAX, SSAS, SSRS, data visualisation, self-service reporting