Occupancy
Imperial College London
Data
2,026
01/
Overview
For this project, I refactored the data ingestion layer of a real-time room occupancy tracking service, decoupling it from a single data source and making it pluggable without changing any downstream behaviour. The system processes raw Wi-Fi device location data every five minutes — smoothing positions, assigning devices to rooms, and aggregating counts into a live occupancy table. I replaced a hardcoded Splunk dependency with a source-agnostic interface, introducing the Unified Data Platform (UDP) as the primary ingestion path while preserving Splunk as a deprecated fallback. Source selection is now controlled by a single environment variable, making the system easier to test, deploy, and maintain across environments.

02/
Process
02/ Process
The existing pipeline assumed Splunk as its only data source — the query logic was embedded directly inside the main data API class, with no separation between fetching data and processing it. To make ingestion pluggable, I isolated the Splunk logic into its own class without changing its behaviour, then built a new ingestion class for UDP that reads from a SQL table populated by the upstream UDP pipeline. Both classes implement the same interface, returning a pandas DataFrame with identical columns, so the refinement pipeline requires no changes regardless of which source is active.
I then introduced a source factory that reads a `DATA_SOURCE_TYPE` environment variable at startup and returns the appropriate ingestion class. This meant the switch from Splunk to UDP required no code changes in production — just a config update. I updated the environment configuration files across dev, pre-prod, and prod, and extended the local database schema to include the UDP-sourced Wi-Fi table for development and testing.
Throughout the refactor I kept the downstream refinement and occupancy aggregation pipeline entirely untouched, validating that device tracking, position smoothing, room assignment, and occupancy counts all behaved identically under the new source. I also added defensive parsing to the UDP ingestion class — malformed records are logged and skipped rather than crashing a run — and documented the full architecture and deployment process for the team.
Technologies used:
MS SQL Server — for the UDP-sourced Wi-Fi table, refined data, and occupancy outputs.
pandas — for the shared DataFrame interface across ingestion sources.
Unified Data Platform — new primary data source, replacing Splunk.
Git — version control and branch-based review workflow.
Confluence — architecture documentation and handover notes.

03/
Key Features
Decoupled data ingestion from the processing pipeline, making the source swappable without downstream changes.
Built a new UDP ingestion path that reads from a SQL table and returns the same data shape as the legacy Splunk source.
Introduced a source factory controlled by a single environment variable — no code changes needed to switch sources.
Preserved full Splunk functionality as a deprecated fallback for safe rollback.
Added defensive parsing to UDP ingestion, logging and skipping malformed records rather than failing a run.
Extended local database schema and updated environment configs across dev, pre-prod, and prod.
Documented the full architecture, deployment checklist, and rollback steps for the team.

