Entity Resolution & Data Quality Dashboard

A production-grade pipeline bridging deterministic logic and Probabilistic Expectation-Maximisation (Splink) to uncover hidden record linkages across disparate datasets.

How It Works

1. Upload

Ingest Dataset A and B. Get instant quality grades.

2. Configure

Set Splink blocking rules and select ML Classifiers.

3. Analyse

Train models and synthesize cross-domain matches.

4. Results

Export high-confidence clusters and evaluate metrics.

Probabilistic Matching

Automated Expectation-Maximisation modeling using Splink & DuckDB to resolve complex typographical edge cases effortlessly.

ML Classification

Supercharges standard record linkage with SciKit-Learn Random Forests, calculating optimized F1 thresholds live.

Data Quality Scoring

Holistic dataset profiling detecting null densities ensuring garbage-in-garbage-out isn't a problem.

Interactive Visualisations

Rendered natively with React-Plotly across intricate ROC & Precision-Recall dimensional sweeps.

Next.js 14FastAPITailwind v4Scikit-LearnSplink (DuckDB)Plotly