Data Cleaning Demo
Interactive demonstration of data cleaning utilities for handling noisy, erratic, or unreliable scientific data. Generate various types of problematic data and experiment with different filtering strategies.
Data Generation
Cleaning Options
Single Series Cleaning
Quality Report: 25 removed, oscillation detected (80 → 55 points) ⚠️ Instability at x=55 (score:
0.72)
import { clean_series } from '$lib/plot'
import type { DataSeries, CleaningConfig } from '$lib/plot'
const series: DataSeries = {
x: [0, 1, 2, 3, 4, ...],
y: [10.0, 10.5, 11.0, 11.5, 12.0, ...],
}
const config: CleaningConfig = {
invalid_values: 'remove',
oscillation_threshold: 2.5,
window_size: 5,
truncation_mode: 'hard_cut',
in_place: false,
}
const { series: cleaned, quality } = clean_series(series, config)
// Result: 55 points (25 removed)
// quality.invalid_values_found = 0
// quality.oscillation_detected = trueMulti-Series Cleaning (Correlated Data)
For correlated measurements (e.g., temperature and pressure from the same sensor at each timestep), if one reading is invalid, the comparison at that point is meaningless. Here, synchronized filtering removes the entire row when any series has a bad value.
Result: 50 → 46 timesteps (Temp: 2 glitches, Pressure: 2 glitches)
Raw Data (NaN positions marked)
Cleaned (series aligned)
Trajectory Alignment
A spiral trajectory with NaN values at t=15, 35, and 42. When any coordinate (x or y) has NaN, that entire point is removed from both arrays, keeping the trajectory synchronized.
Result: 50 → 47 points (3 invalid values removed from all coordinates)
Raw Data (NaN positions marked)
Cleaned (NaN points removed)
How It Works
- Invalid Values: Remove, interpolate, or keep NaN/Infinity
- Oscillation Detection: Finds unstable regions via derivative analysis
- Bounds: Clamp, filter, or nullify out-of-range values
- Smoothing: Moving average or Savitzky-Golay filtering
- Alignment: Multi-series and 3D data stay synchronized
- Quality Reports: Track points removed, violations, and issues found