Skip to content
View MEDHAT-ALHADDAD's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@shakacode

Block or report MEDHAT-ALHADDAD

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
MEDHAT-ALHADDAD/README.md

Hey, I’m Medhat 👋

Data Engineer • Big Data • Curious Builder

📍 Riyadh | 🌍 Explorer at heart
🧠 Turning data into something useful
🎮 Strategy games enthusiast


About Me

I’m a Data Engineer with a software engineering past.
I enjoy working where systems, data, and logic meet.

These days I spend most of my time:

  • moving data reliably (and fixing it when it breaks),
  • tuning Spark jobs that refuse to behave,
  • designing data models that actually make sense,
  • keeping big data platforms secure, fast, and boring (boring is good).

I like understanding things deeply — not just using them.


What I Play With

(tools come after thinking)

🧠 Core Data Engineering

  • 🐍 Python
  • 🧮 SQL
  • Apache Spark / PySpark
  • 🛫 Apache Airflow
  • 🐘 Apache Hive
  • 🚀 Apache Impala
  • 📬 Apache Kafka
  • 🌊 Apache Flink
  • 🔁 Batch & Streaming Pipelines

📊 Data Modeling & Analytics

  • 🗂 Dimensional Modeling (Star / Snowflake)
  • 🥇 Medallion Architecture (Bronze / Silver / Gold)
  • 🔄 ETL / ELT Design Patterns
  • 📈 Analytics-Ready Datasets
  • Data Quality & Validation

🏗️ Data Platforms & Cloud

  • 🧱 Cloudera CDP (HDFS, YARN, Hive, Spark)
  • 🧠 Databricks
  • ☁️ Google Cloud Platform (BigQuery)
  • ☁️ AWS
  • 🧩 Informatica (DEI / EDC)
  • 🔍 Denodo

⚙️ Platform Ops, Security & Reality

  • 🐧 Linux (RHEL / CentOS)
  • 📦 Docker
  • 🔧 CI/CD Pipelines
  • 🛡 Apache Ranger
  • 🧭 Apache Atlas
  • 🔐 Kerberos Authentication
  • 🔒 TLS / SSL
  • 🔥 Spark & SQL Performance Tuning
  • 🧯 Production Incident Handling

🚀 Actively Exploring / Next

  • 🧊 Lakehouse Formats (Iceberg / Delta / Hudi)
  • 🔎 Data Observability
  • 🧱 Infrastructure as Code (Terraform)
  • ☸️ Kubernetes for Data Platforms
  • 🧠 Distributed Systems Internals

⭐ Pinned Projects (Things I Actually Built)

A small selection of projects I enjoyed working on — some are analytical, some technical, all taught me something.


📚 Data Projects Catalogue

My personal data playground & archive.

A curated collection of my work in:

  • Exploratory Data Analysis
  • SQL case studies
  • Business analytics
  • Machine learning experiments

🔗 Explore here → 👉 https://github.com/MEDHAT-ALHADDAD/Data-Projects-Catalogue

This is where most of my data experiments live.


🔥 Real-Time Social Media Sentiment Pipeline

Streaming-first sentiment analysis pipeline (Arabic / English).

  • Bronze → Silver → Gold architecture
  • Batch + real-time processing
  • Feature extraction for training & inference
  • Production-inspired ML data platform design

📊 Global Terrorism — EDA & Dashboard

Turning complex global data into readable insights.

  • Real-world dataset
  • Strong storytelling focus
  • Insight-driven visualizations

🔗 https://github.com/MEDHAT-ALHADDAD/Global-Terrorism---Exploratory-Data-Analysis-and-Dashboarding


🍕 Pizza Runner — SQL Case Study

A fun but serious SQL project.

  • Business-driven questions
  • KPI-oriented thinking
  • Clean analytical SQL

🔗 https://github.com/MEDHAT-ALHADDAD/Pizza_Runner


🛍️ Super Store Retail — EDA

Sales, profit, and performance analysis.

  • Practical business insights
  • Dashboard-ready outputs
  • Clear analytical reasoning

🔗 https://github.com/MEDHAT-ALHADDAD/Super-Store-Retail-Exploratory-Data-Analysis-and-Dashboarding


More projects live in the catalogue — this is just the highlight reel.


🧯 Data Incidents I Survived

(the real résumé)

Every data engineer has scars. These are some of mine.

  • 🔥 Warehouse deleted in production
    Recovered by prioritizing dependencies, restoring from DR, reprocessing Spark jobs, and backfilling critical tables — reports delivered the same day.

  • 🐌 Queries that never returned
    Tracked down Hive small-files issues, fixed compaction & storage layout, and restored query reliability.

  • Pipelines finishing at 4 PM (not acceptable)
    Optimized Airflow concurrency, Spark/YARN resources, and moved to event-driven DAGs → pipelines completed by 6 AM.

  • 🌪️ Full production ownership during team absence
    Ran Airflow, Spark, and Informatica pipelines solo for weeks — zero downtime, multiple incidents resolved.

  • 🔁 Replication stuck at 70% forever
    Diagnosed platform issues, tuned jobs, and stabilized cross-cluster replication to 100%.


A Bit of My Story

  • Started as a full-stack & mobile developer
  • Fell in love with data & analysis
  • Ended up in big data platforms & banking systems
  • Slowly moving toward data architecture & system design

I still enjoy clean code, good abstractions, and well-designed systems — just at data scale now.


When I’m Not Working With Data

  • 🎮 Strategy games (Age of Empires, grand strategy, anything with thinking)
  • 📚 Learning how distributed systems really work
  • ✍️ Organizing knowledge (Notion, notes, diagrams)
  • ☕ Over-engineering simple things for fun

Find Me Around the Internet

     



This profile is a snapshot of how I think, build, break, and fix systems.

Pinned Loading

  1. Data-Projects-Catalogue Data-Projects-Catalogue Public

    A Full Catalogue Of All My Projects

    1

  2. Sentment_analysis_protoype Sentment_analysis_protoype Public

    Python 1

  3. Global-Terrorism---Exploratory-Data-Analysis-and-Dashboarding Global-Terrorism---Exploratory-Data-Analysis-and-Dashboarding Public

    Jupyter Notebook 1

  4. Pizza_Runner Pizza_Runner Public

    Case Study SQL Reporting

    Jupyter Notebook 1

  5. Super-Store-Retail-Exploratory-Data-Analysis-and-Dashboarding Super-Store-Retail-Exploratory-Data-Analysis-and-Dashboarding Public

  6. Predicting-Credit-Card-Approvals Predicting-Credit-Card-Approvals Public

    an automatic credit card approval predictor using machine learning classifiers

    Jupyter Notebook