Exoplanet Detection With Contrastive Learning (92%)

My undergraduate research at BRAC University applied contrastive learning to exoplanet detection in astronomical imagery, reaching 92% classification accuracy. Full paper at BRAC DSpace. This post is the approach in plain English.

The problem

Detecting exoplanets from telescope imagery is a signal-in-noise problem. A transiting planet produces a tiny periodic dip in a star's brightness — much smaller than typical noise. Classical methods (Box Least Squares) work but are slow on large datasets.

Could a representation-learning approach do better?

Why contrastive learning

Labeled exoplanet data is scarce. There are far more unlabeled light curves than labeled ones. Contrastive learning shines exactly in that regime — train an encoder to produce useful representations without labels, then fine-tune a small classifier.

I used SimCLR-style contrastive pretraining with Siamese networks on the encoder.

The pipeline

Light curve normalization — detrend, fold by candidate period.
Contrastive pretraining — pairs of augmented light curves trained to produce similar embeddings; pairs from different stars trained to differ.
Fine-tune classifier — small MLP on the learned embeddings using the labeled subset.
Eval — held-out test set of Kepler candidates.

Results

Method	Accuracy	Notes
Box Least Squares	78%	Strong baseline, slow
Supervised CNN	84%	Needed all labeled data
Contrastive (mine)	92%	Used unlabeled data via pretraining

The win came from using unlabeled light curves to train the encoder. The downstream classifier had less data to memorize and more inductive bias to leverage.

What I learned

Self-supervised pretraining is a giant force multiplier in sparse-label regimes. This was 2024 work; the lesson holds harder in 2026.

Augmentations matter as much as the architecture. Time-warping and noise injection during pretraining were the biggest single contributor to accuracy.

Astronomy data has unique inductive biases. Periodicity, sparse events, instrument noise — none of these fit standard CV defaults.

Why I write about this

Research credentials are entity signals — they shape how AI engines ground a person. They are also genuinely useful. Even as an AI engineer who ships products, the research training is what lets me read papers fast and pick the right baseline.

FAQ

Q: Where can I read the paper? A: BRAC DSpace.

Q: Is the code open source? A: Parts are; the cleanup PR has been on my todo list for a while.

Q: Why exoplanets specifically? A: The data is publicly available (Kepler, TESS) and the signal-in-noise framing is a great test bed for self-supervised methods.

Written by Shihab Shahriar Antor — AI Engineer & Founder. See more projects or hire me.