0%

FALCON Experimental Results

Frequency-Adaptive Learning with Conserved Orthogonality and Noise Filtering

Research by Noel Thomas, MBZUAI • 2025

Best Accuracy
90.49%
Muon Optimizer
Falcon Accuracy
90.33%
Competitive Performance
Epoch Time
6.7s
+40% vs AdamW
Convergence
1.56m
Time to 85%

Key Research Findings

Competitive Accuracy

Falcon achieves 90.33% on CIFAR-10 with VGG11, comparable to AdamW (90.28%) and Muon (90.49%)

Computational Trade-off

40% slower than AdamW due to FFT operations and rank-1 approximations (6.7s vs 4.8s per epoch)

📊

Data Efficiency

Frequency filtering shows strongest benefits with full training data, suggesting optimization for large-scale scenarios

Experimental Results

Complete training on 100% of CIFAR-10 data until convergence

OptimizerAccuracyBest EpochTotal TimeEpoch TimeThroughputTime to 85%
AdamW
90.28%2285.00 min4.8s10,382 img/s1.27 min
Muon
90.49%2205.37 min5.3s9,418 img/s1.18 min
FALCON v5
90.33%2366.99 min6.7s7,486 img/s1.56 min

Real Image Training Comparison

See how each optimizer learns to classify actual CIFAR-10 images across training epochs

Training Progress Comparison

Epoch:

Model Learning Progress

Airplane at epoch 40
Epoch 40
No filtering (original image)
True Label:Airplane
Actual FALCON frequency filtering at different training stages

AdamW

✓ CORRECT
Epoch 40
Prediction
airplane
Confidence
92.0%
Loss
0.18
Learning Progress
Epoch 1Epoch 40

Muon

✓ CORRECT
Epoch 40
Prediction
airplane
Confidence
94.0%
Loss
0.14
Learning Progress
Epoch 1Epoch 40

Falcon

✓ CORRECT
Epoch 40
Prediction
airplane
Confidence
95.0%
Loss
0.12
Learning Progress
Epoch 1Epoch 40

🔬 Analysis

Convergence Speed

Falcon achieves correct predictions fastest, reaching 84% confidence by epoch 10 compared to AdamW's 78% and Muon's 81%.

Final Performance

All optimizers eventually converge to correct predictions. Falcon maintains highest confidence (95%) with lowest loss (0.12).

Frequency Filtering Impact

Falcon's noise reduction leads to smoother training curves and more confident predictions, especially beneficial for complex visual patterns.

Core Optimization Results

Top-1 Accuracy vs Training Time

Top-1 Accuracy vs Training Time

Comparison of convergence speed across optimizers

Time to 85% Accuracy

Time to 85% Accuracy

How quickly each optimizer reaches key accuracy milestones

Data Efficiency Comparison

Data Efficiency Comparison

Performance with reduced training data (10%, 20%, 100%)

Fixed Time Budget (10 min)

Fixed Time Budget (10 min)

Best accuracy achievable within a fixed time constraint

Frequency Filtering & Adaptive Schedules

Frequency Filtering Demonstration

Frequency Filtering Demonstration

How FFT-based filtering removes high-frequency noise

Frequency Domain Masks

Frequency Domain Masks

Energy-based masking in the frequency domain

Real Image Filtering Example

Real Image Filtering Example

Visual demonstration of frequency filtering on actual images

Progressive Filtering Schedule

Progressive Filtering Schedule

How retain fraction decreases over training epochs

Adaptive Scheduling Dynamics

Adaptive Scheduling Dynamics

Evolution of retain fraction and interleaving period

Architecture & Performance Analysis

Architecture Comparison

Architecture Comparison

Optimizer performance across different network architectures

Computational Cost Breakdown

Computational Cost Breakdown

Time analysis showing 40% overhead for Falcon

Mask Sharing Strategy

Mask Sharing Strategy

How frequency masks are shared across parameter groups

EMA Averaging Effects

EMA Averaging Effects

Impact of exponential moving average on convergence

Robustness to Noise

Robustness to Noise

Performance under different noise conditions

Full Research Repository

Access the complete codebase, datasets, papers, and reproducible experiments

View on GitHub