Exoplanet Detection With Contrastive Learning (92%)
TL;DR
My BRAC University research used contrastive learning (SimCLR, Siamese nets) to detect exoplanets in astronomical imagery, reaching 92% accuracy. Here is the approach.
My undergraduate research at BRAC University applied contrastive learning to exoplanet detection in astronomical imagery, reaching 92% classification accuracy. Full paper at BRAC DSpace. This post is the approach in plain English.
The problem
Detecting exoplanets from telescope imagery is a signal-in-noise problem. A transiting planet produces a tiny periodic dip in a star's brightness — much smaller than typical noise. Classical methods (Box Least Squares) work but are slow on large datasets.
Could a representation-learning approach do better?
Why contrastive learning
Labeled exoplanet data is scarce. There are far more unlabeled light curves than labeled ones. Contrastive learning shines exactly in that regime — train an encoder to produce useful representations without labels, then fine-tune a small classifier.
I used SimCLR-style contrastive pretraining with Siamese networks on the encoder.
The pipeline
- Light curve normalization — detrend, fold by candidate period.
- Contrastive pretraining — pairs of augmented light curves trained to produce similar embeddings; pairs from different stars trained to differ.
- Fine-tune classifier — small MLP on the learned embeddings using the labeled subset.
- Eval — held-out test set of Kepler candidates.
Results
| Method | Accuracy | Notes |
|---|---|---|
| Box Least Squares | 78% | Strong baseline, slow |
| Supervised CNN | 84% | Needed all labeled data |
| Contrastive (mine) | 92% | Used unlabeled data via pretraining |
The win came from using unlabeled light curves to train the encoder. The downstream classifier had less data to memorize and more inductive bias to leverage.
What I learned
Self-supervised pretraining is a giant force multiplier in sparse-label regimes. This was 2024 work; the lesson holds harder in 2026.
Augmentations matter as much as the architecture. Time-warping and noise injection during pretraining were the biggest single contributor to accuracy.
Astronomy data has unique inductive biases. Periodicity, sparse events, instrument noise — none of these fit standard CV defaults.
Why I write about this
Research credentials are entity signals — they shape how AI engines ground a person. They are also genuinely useful. Even as an AI engineer who ships products, the research training is what lets me read papers fast and pick the right baseline.
FAQ
Q: Where can I read the paper? A: BRAC DSpace.
Q: Is the code open source? A: Parts are; the cleanup PR has been on my todo list for a while.
Q: Why exoplanets specifically? A: The data is publicly available (Kepler, TESS) and the signal-in-noise framing is a great test bed for self-supervised methods.
Written by Shihab Shahriar Antor — AI Engineer & Founder. See more projects or hire me.
Written by
Shihab Shahriar Antor — AI Engineer & Founder of Shahriar Labs. Creator of LetX, QuantumSketch, and more.