Follow my progress on Twitter.

Introduction

In this hobby project I build an end-to-end machine learning pipeline. The objective is to predict real estate prices in Belgium based on available data from ImmoWeb.

For each new listing give a label by comparing the predicted price to the actual price:

  • undervalued: 10% lower
  • correctly valued: within 10%
  • overvalued: 10% higher

Major components

  • web scraping engine
  • data pipeline
  • data lake and warehouse
  • machine learning training and prediction pipeline

Goals

By building this project I hope to:

  • become proficient at GCP and practice for the Professional Data Engineer Certification
  • build a framework for future SaaS-products
  • learn how to document and share my work

Progress thread

Week 1

3 July 2021

This week was mostly about setting up the various tools to keep track of the project:

  • Created Twitter Twitter
  • Outsourced web scraping engine through Fiverr
  • Created GitHub repo
  • Created Confluence page to document the project
  • Created a Jira project to keep track of work through tickets