💡 Ideas for Improving Our Score

Current best: 0.925 public LB. Here's where we think the biggest gains are.

🔴 High Impact

Better use of `train_audio/` (iNat + XC recordings)

We currently only use train_soundscapes/ for training. The train_audio/ folder has thousands of individual species recordings that could be used for: - Pre-training or fine-tuning the ProtoSSM on species-level data - Building better per-class prototypes - Data augmentation (mix species recordings into synthetic soundscapes)

Smarter ensemble / stacking

The current blend is a simple linear weight between ProtoSSM and MLP probes. Ideas: - Train a lightweight meta-learner (e.g. LightGBM) on OOF predictions - Per-class ensemble weights instead of one global weight - Stack predictions from multiple model variants

More robust cross-validation

With only 59 labeled soundscape files, validation is noisy. A single split can mislead. Ideas: - Stratified GroupKFold ensuring each fold has all sites - Repeated K-fold (3x5-fold) and average - Use neuropt with 3-fold CV instead of single split (already supported — just run scripts/optimize.py)

🟡 Medium Impact

Temporal modeling improvements

The ProtoSSM processes 12 windows sequentially. Ideas: - Longer context (overlap windows, use 2.5s stride instead of 5s) - Hierarchical model: per-window features → file-level aggregation - Attention over the full file rather than just local SSM context

Better calibration

Current per-taxon temperature scaling is coarse (Aves vs texture). Ideas: - Per-species temperature learned on OOF - Platt scaling on OOF predictions - Isotonic regression per class

Pseudo-labeling

Use confident predictions on unlabeled soundscapes to expand training data: 1. Run the current model on all 10,658 train_soundscapes/ 2. Threshold high-confidence predictions as pseudo-labels 3. Retrain with expanded labeled set

🟢 Quick Wins

Tune post-processing on OOF

The fusion parameters (lambda_event, smooth_texture, etc.) were tuned once and frozen. Re-tuning on current OOF might help. Run neuropt with the post-processing params in the search space.

Try different Perch features

Use the full Perch logits (not just mapped species) as additional features
Concatenate multiple Perch output layers if available
PCA dim search (currently fixed at 64)

Wall-time optimization

Faster inference = more room for model complexity. Profile with scripts/profile_time.py and identify bottlenecks. Ideas: - Increase batch_files for Perch inference (test memory limits) - Reduce MLP probe count (skip classes with very few positives) - Cache more aggressively

🛠️ How to Start

Pick an idea, create configs/experiments/my_idea.yaml
Train: uv run python scripts/train.py --config configs/experiments/my_idea.yaml --data-dir data/competition
Evaluate: uv run python scripts/evaluate.py --config configs/experiments/my_idea.yaml --data-dir data/competition
If OOF AUC improves, export and submit: uv run python scripts/export_notebook.py --config configs/experiments/my_idea.yaml