Frequency Filter Explorer

See how frequency masking removes noise while preserving signal

Noise Reduced

0.0%

Signal Preserved

0.0%

Components Removed

Retain Fraction (ρ)

0.75

📋 Gradient Examples

🎛️ Filter Parameters

Retain Fraction (ρ)0.75

Energy Kept:75%

Energy Removed:25%

Show Comparison View

💡 How It Works

1. Transform to Frequency Domain

Apply 2D FFT to convert gradient matrix into frequency components. Each component represents a specific frequency pattern.

2. Energy-Based Ranking

Sort all frequency components by their energy (magnitude²). High-frequency noise typically has low energy contribution.

3. Adaptive Masking

Keep top ρ% of components by energy, zero out the rest. ρ decreases during training (0.95 → 0.50) for progressive denoising.

4. Inverse Transform

Apply inverse FFT to get cleaned gradient back in spatial domain. Result: noise removed, essential structure preserved.

🎯 Real-World Impact

Training Stability

Reduces gradient variance by up to 40%, leading to smoother convergence.

Generalization

Prevents overfitting to high-frequency patterns in training data.

Computational Cost

FFT adds ~26% overhead but improves sample efficiency.

📥 Input: Original Gradient

Gradient corrupted with high-frequency noise

Notice: High values (noise spikes) in noisy gradient

📤 Output: Cleaned Gradient

After frequency filtering at ρ = 0.75

Notice: Noise spikes removed, smooth structure preserved

🔬 Frequency Domain Analysis

Full Magnitude Spectrum

DC component (center) = low-frequency structure

Brightness = energy magnitude

After Energy-Based Masking

High-frequency components (edges/noise) removed

0 components zeroed out

🧪 Try It Yourself

Experiment 1: Compare Clean vs Noisy

Select "Clean neural network gradient"
Note the smooth spectrum (energy concentrated in center)
Switch to "Gradient corrupted with noise"
See scattered high-frequency components appear
Adjust ρ slider - watch noise components get filtered

Experiment 2: Optimal ρ Value

Select noisy gradient example
Set ρ = 0.95 (keep almost everything)
Gradually decrease ρ to 0.50
Watch noise reduction increase while signal stays intact
Notice: Too low ρ (<0.50) removes useful structure

Key Insight

Falcon adapts ρ during training: starts at 0.95 (preserve everything early), ends at 0.50 (aggressive denoising late). This progressive filtering balances exploration (early) with exploitation (late).

"In the frequency domain, we see the skeleton of information—
preserving structure while discarding noise, a sculptor's touch."