open to work

Gabriel
Henrique

~/.gabriel

$ cat about.txt

Data Engineer building the infrastructure

that transforms raw data into revenue.

Architect of CRM pipelines (+$100M) and

Data Owner of the Data Catalog for 3,000+ users/month.

$ echo $STACK

→ Python · SQL · Airflow · Databricks · Azure · Spark

$ echo $LOCATION

São Paulo, SP · Brazil

$ _

15+ Projects delivered ◆ 500K+ Lines of code ◆ 10TB+ Data processed ◆ 98% Pipeline uptime ◆ 3+ Years of experience ◆ 15+ Projects delivered ◆ 500K+ Lines of code ◆ 10TB+ Data processed ◆ 98% Pipeline uptime ◆ 3+ Years of experience ◆

Ownership & Pipelines

I don't just build pipelines — I own data products end-to-end. I prioritize reliability, performance and quality to ensure information drives real impact on business decision-making.

SparkDatabricksAzureAirflow

Automation & Impact

I designed and delivered automation solutions and web crawlers that unlock real business value, not just technical wins, aligning technology with commercial strategy.

PythonSeleniumScrapyFastAPI

Performance & SQL

I optimize legacy queries and ELT/ETL pipelines with time reductions exceeding 80% through hands-on performance tuning, serving 3,000+ users at scale.

SQLPythonSASPostgreSQL

Python 95%

SQL (Spark SQL, Data Warehousing, PostgreSQL) 93%

Azure Databricks 90%

Apache Spark 88%

Azure (Cloud) 88%

Apache Airflow 87%

SAS 85%

FastAPI / Flask / Node.js 82%

Web Scraping (Selenium, Scrapy, BS4) 85%

Git & GitHub 88%

📂 db_gabriel

▸ gabriel

name TEXT age INT city TEXT state TEXT email TEXT github TEXT available BOOL

▸ family

relation TEXT support TEXT knows_it TEXT description TEXT

▸ hobbies

name TEXT category TEXT weekly_freq INT dedication_level TEXT mental_note TEXT

▸ education

institution TEXT course TEXT type TEXT start_year INT end_year INT status TEXT

▸ certifications

title TEXT issuer TEXT year INT impact TEXT

▸ skills

name TEXT level INT category TEXT years_exp INT

▸ soft_skills

skill TEXT context TEXT

▸ tools

name TEXT category TEXT level TEXT daily_use BOOL

▸ experience

role TEXT company TEXT start_year INT end_year TEXT description TEXT

▸ achievements

milestone TEXT year TEXT business_impact TEXT

gabriel_db=#

-- 👹 Welcome to my personal database!

-- Try running a query. Examples:

-- SELECT * FROM gabriel;

-- SELECT name, level FROM skills WHERE level > 85;

-- SELECT * FROM education ORDER BY start_year DESC;

-- SELECT role, company FROM experience;

-- SHOW TABLES;

-- Tip: click on the examples below 👇

gabriel_db=#

gabriel family hobbies education certifications skills soft_skills tools experience achievements SHOW TABLES

CVM-210 Data Pipeline

End-to-end analytics pipeline for CVM 210 investment funds. Serverless ingestion via AWS Lambda, S3 Data Lake storage and distributed processing in Databricks with Medallion Architecture (Bronze→Silver→Gold).

🎯 Highlight: Automated Medallion Architecture (AWS + Databricks) processing restricted CVM 210 data with Delta Lake (ACID).

AWSDatabricksPySparkS3

GitHub ↗

PNAD COVID Data Engineering

Comprehensive AWS Data Lake with Medallion Architecture. Processed 1.1M records via Glue (PySpark), analytical queries in Athena and impact dashboards in Power BI about COVID-19 in Brazil.

🎯 Highlight: Processing of 340+ MB of raw data (1.1M+ records) via AWS Glue (PySpark) and Athena.

AWS GluePySparkAthenaPower BI

GitHub ↗

Obesity Prediction

Predictive ML model for preventive health: classifies obesity risks by analyzing behavioral patterns (diet, physical activity, transportation) instead of traditional anthropometric metrics.

🎯 Metric: End-to-End Pipeline with 87% accuracy (Random Forest) and interactive web interface via Streamlit.

PythonScikit-learnPandasML

GitHub ↗

Ibovespa Forecasting System

Machine Learning system to predict the daily direction of the Ibovespa. Complete pipeline with advanced feature engineering, automated hyperparameter optimization and rigorous time-series holdout validation.

🎯 Metric: 75.76% accuracy and automated optimization via Optuna with temporal holdout validation.

PythonMLFeature Eng.Forecasting

GitHub ↗

Loan Default Prediction

Predictive model (Random Forest) for loan default propensity with interactive analytical dashboard in Dash/Plotly. Risk visualization by state/region, key default metrics and credit portfolio analysis.

🎯 Metric: 72% accuracy and interactive analytical dashboard for corporate decision-making.

PythonRandom ForestDashPlotly

GitHub ↗

Azure Fundamentals (AZ-900)

Microsoft

2026

HBS

Aspire Leaders Program

Aspire Institute (Harvard Business School)

2025

Airflow 3 DAG Authoring

Astronomer

2025

Airflow 3 Fundamentals

Astronomer

2025

Scrum Foundation

Certiprof

2023

Python — Nano Course (80h)

FIAP

2022

2023 — present now

Data Governance / Data Engineer

Bradesco

I architect and manage the core data pipelines of a CRM that supports $100M+ in monthly revenue. I act as Data Owner of the internal Data Catalog (3,000+ users/month). Optimized SQL/ELT processes by +80% and built automations with high business impact.

2024 — 2026

Data Engineer (Consulting)

Confidential (NDA)

Autonomous development of ETLs processing millions of daily records. Built pipelines and automations using Python, DBT and DuckDB. Created frameworks to accelerate development, also coding in Node.js, Java and Scala.

2022 — 2023

Data Engineer / Software Engineer

Keyrus · Internship

Developed ETL/ELT pipelines and features for a corporate data catalog. Implemented a business glossary, MySQL integrations, and refactored legacy SQL achieving +50% performance gains.

2026 — present ongoing

MBA People and Technology Management

FIA Business School

Technical leadership and corporate vision. Focused on connecting data engineering to business goals, managing the crucial intersection between teams, technology, and financial results.

2025 — 2026

Graduate Studies in Data Analytics

FIAP

Advanced analytical intelligence and modeling. The essential link to ensure the technical data infrastructure supports precise, value-driven decisions.

2022 — 2024

Systems Analysis & Development

FATEC-SP

The cornerstone of my vision as a Data Engineer. Rigorous foundations in system architecture, software engineering, and modeling to build robust and scalable infrastructures.

Or find me here:

GitHub ↗ LinkedIn ↗ Dev.to ↗ Medium ↗ X (Twitter) ↗

Gabriel
Henrique

What I do

Ownership & Pipelines

Automation & Impact

Performance & SQL

Stack

SQL Playground

Projects

CVM-210 Data Pipeline

PNAD COVID Data Engineering

Obesity Prediction

Ibovespa Forecasting System

Loan Default Prediction

Certifications

Azure Fundamentals (AZ-900)

Aspire Leaders Program

Airflow 3 DAG Authoring

Airflow 3 Fundamentals

Scrum Foundation

Python — Nano Course (80h)

Experience

Data Governance / Data Engineer

Data Engineer (Consulting)

Data Engineer / Software Engineer

Education

MBA People and Technology Management

Graduate Studies in Data Analytics

Systems Analysis & Development

Let's work
together.

Gabriel Henrique

What I do

Ownership & Pipelines

Automation & Impact

Performance & SQL

Stack

SQL Playground

Projects

CVM-210 Data Pipeline

PNAD COVID Data Engineering

Obesity Prediction

Ibovespa Forecasting System

Loan Default Prediction

Certifications

Azure Fundamentals (AZ-900)

Aspire Leaders Program

Airflow 3 DAG Authoring

Airflow 3 Fundamentals

Scrum Foundation

Python — Nano Course (80h)

Experience

Data Governance / Data Engineer

Data Engineer (Consulting)

Data Engineer / Software Engineer

Education

MBA People and Technology Management

Graduate Studies in Data Analytics

Systems Analysis & Development

Let's worktogether.

Gabriel
Henrique

Let's work
together.