Entity Resolution & Data Quality Dashboard
A production-grade pipeline bridging deterministic logic and Probabilistic Expectation-Maximisation (Splink) to uncover hidden record linkages across disparate datasets.
How It Works
1. Upload
Ingest Dataset A and B. Get instant quality grades.
2. Configure
Set Splink blocking rules and select ML Classifiers.
3. Analyse
Train models and synthesize cross-domain matches.
4. Results
Export high-confidence clusters and evaluate metrics.
Probabilistic Matching
Automated Expectation-Maximisation modeling using Splink & DuckDB to resolve complex typographical edge cases effortlessly.
ML Classification
Supercharges standard record linkage with SciKit-Learn Random Forests, calculating optimized F1 thresholds live.
Data Quality Scoring
Holistic dataset profiling detecting null densities ensuring garbage-in-garbage-out isn't a problem.
Interactive Visualisations
Rendered natively with React-Plotly across intricate ROC & Precision-Recall dimensional sweeps.