Translating Research in Imperial College (TRIC)

Imperial College London

Data

2,025

01/

Overview

For this project, I migrated legacy encryption logic from undocumented C# DLLs to a Python function app, reverse-engineering the existing files to extract and reuse encryption/decryption classes. I standardised historical staff survey datasets, resolving inconsistent schemas and missing data across departments by referencing old databases and context clues. I then built SQL tables, automated ingestion pipelines and integrated an opt-out mechanism to ensure compliance with GDPR for future research phases. The pipeline ensures data integrity, security, and repeatability, allowing researchers to rely on consistent, anonymised datasets for longitudinal studies.

Stack:

  • Python

  • MS SQL Server

  • DotPeek

  • Excel/CSV

  • Git

  • Confluence

02/

Process

The TRIC project involved processing sensitive staff data that could not include personally identifiable information. The legacy system consisted of executables, DLLs, and configuration files with little documentation, so I had to reverse-engineer the C# DLLs using DotPeek to extract the classes and methods necessary for encryption and decryption. I then implemented a Python function app to reference these classes, making it easier to call the DLLs and integrate the logic into a modern pipeline. I wrote and tested a decryption script to ensure that encrypted values matched the historical data when using the original key, which remains securely stored for future research phases.

Simultaneously, I worked with stakeholders to source schema and data, ensuring consistency across column headings and filling in missing information from multiple departments. This process involved examining old MS SQL Server tables, reconciling inconsistencies and standardising datasets using context clues and previous records. Once the CSVs were correct, I created SQL tables and wrote Python ingestion scripts to load the data, testing that the tables and data were accurate. I also added a GDPR-compliant opt-out parameter to exclude participants who chose not to take part in future research phases.

The final pipeline integrated the encryption/decryption function, ensured data consistency, and maintained secure version control using Git and an updated gitignore. I also conducted unit tests throughout the pipeline to validate each component, including the encryption/decryption functions and data ingestion scripts, ensuring correctness and reliability. Throughout the project, I collaborated closely with stakeholders, had peer programming sessions and documented all processes in Confluence.

Technologies used:

  • MS SQL Server - for creating and validating database tables and queries.

  • DotPeek - to reverse-engineer C# DLLs and extract reusable logic.

  • Excel / CSV - for data cleaning, reconciliation and testing.

  • Git & Gitignore - for version control and secure handling of sensitive data.

  • Confluence - for documentation and stakeholder communication.

03/

Key features

  • Reverse-engineered legacy C# DLLs and migrated encryption/decryption logic to Python.

  • Validated historical data against the original encryption key to maintain consistency.

  • Reconciled and standardised datasets from multiple departments.

  • Created SQL tables and Python ingestion pipeline with GDPR-compliant opt-out functionality.

  • Collaborated with stakeholders through demos and regular feedback sessions.

  • Documented processes and upskilled the team for future maintenance.

  • Ensured secure version control practices and reproducible data processing.