FALCON Experimental Results
Frequency-Adaptive Learning with Conserved Orthogonality and Noise Filtering
Research by Noel Thomas, MBZUAI • 2025
Key Research Findings
Competitive Accuracy
Falcon achieves 90.33% on CIFAR-10 with VGG11, comparable to AdamW (90.28%) and Muon (90.49%)
Computational Trade-off
40% slower than AdamW due to FFT operations and rank-1 approximations (6.7s vs 4.8s per epoch)
Data Efficiency
Frequency filtering shows strongest benefits with full training data, suggesting optimization for large-scale scenarios
Experimental Results
Complete training on 100% of CIFAR-10 data until convergence
| Optimizer | Accuracy | Best Epoch | Total Time | Epoch Time | Throughput | Time to 85% |
|---|---|---|---|---|---|---|
AdamW | 90.28% | 228 | 5.00 min | 4.8s | 10,382 img/s | 1.27 min |
Muon | 90.49% | 220 | 5.37 min | 5.3s | 9,418 img/s | 1.18 min |
FALCON v5 | 90.33% | 236 | 6.99 min | 6.7s | 7,486 img/s | 1.56 min |
Real Image Training Comparison
See how each optimizer learns to classify actual CIFAR-10 images across training epochs
Training Progress Comparison
Model Learning Progress

AdamW
✓ CORRECTMuon
✓ CORRECTFalcon
✓ CORRECT🔬 Analysis
Convergence Speed
Falcon achieves correct predictions fastest, reaching 84% confidence by epoch 10 compared to AdamW's 78% and Muon's 81%.
Final Performance
All optimizers eventually converge to correct predictions. Falcon maintains highest confidence (95%) with lowest loss (0.12).
Frequency Filtering Impact
Falcon's noise reduction leads to smoother training curves and more confident predictions, especially beneficial for complex visual patterns.
Core Optimization Results

Top-1 Accuracy vs Training Time
Comparison of convergence speed across optimizers

Time to 85% Accuracy
How quickly each optimizer reaches key accuracy milestones

Data Efficiency Comparison
Performance with reduced training data (10%, 20%, 100%)

Fixed Time Budget (10 min)
Best accuracy achievable within a fixed time constraint
Frequency Filtering & Adaptive Schedules

Frequency Filtering Demonstration
How FFT-based filtering removes high-frequency noise

Frequency Domain Masks
Energy-based masking in the frequency domain

Real Image Filtering Example
Visual demonstration of frequency filtering on actual images

Progressive Filtering Schedule
How retain fraction decreases over training epochs

Adaptive Scheduling Dynamics
Evolution of retain fraction and interleaving period
Architecture & Performance Analysis

Architecture Comparison
Optimizer performance across different network architectures

Computational Cost Breakdown
Time analysis showing 40% overhead for Falcon

Mask Sharing Strategy
How frequency masks are shared across parameter groups

EMA Averaging Effects
Impact of exponential moving average on convergence

Robustness to Noise
Performance under different noise conditions
Full Research Repository
Access the complete codebase, datasets, papers, and reproducible experiments
View on GitHub