Astrostatistics News
Issue 7, October 2025
Issue Editors: Jessi Cisewski-Kehe, David W. Hogg, Vinay L. Kashyap, Aneta Siemiginowska
Astrostatistics News (AN) is a newsletter designed to inform, promote, cultivate, and inspire the astrostatistics community.
By Aarya Patil (Max Planck Institute for Astronomy)
Astronomy is entering a new era in which data are generated at an unprecedented pace, often exceeding petabytes per day. This deluge of information has led to close collaboration between astronomers and statisticians, driving the development of innovative, data-driven techniques. As a result, astrostatistics has maintained a strong presence at the Joint Statistical Meetings (JSM), a tradition carried forward at the 2025 JSM in Nashville, Tennessee.
The meeting featured a range of talks for the Astrostatistics Interest Group (AIG) Student Paper Competition. Finalists and winners, listed at astrostat.org/competition/winners.html, covered topics ranging from extragalactic phenomena and galaxies to stellar science and improved treatment of astronomical uncertainties. A dedicated session on "Advances in Time-Series Analysis for Astronomy’s Big Data Era" highlighted innovative methods for tackling astronomical time series data. Space missions like Kepler, TESS, and the upcoming PLATO provide long, evenly-sampled time series that help us study dynamic events in the Universe. However, the vast scale and complexity of these data pose challenges for traditional methods. In addition, the ground-based Vera Rubin Observatory has begun a decade-long sky survey, generating irregular, sparse time series across six bandpasses. Combining these data into a unified statistical model will be a major challenge, which was discussed thoroughly in the JSM session.
Throughout the week, the AIG Community Table in the Expo Hall encouraged informal conversations between astrostatisticians and the broader statistics community. The AIG also hosted contributed poster presentations, followed by its annual business meeting and lunch, where members discussed the group’s future plans. With its combination of networking opportunities, social events, and cutting-edge sessions, JSM continues to be an ideal venue for astronomers to explore new methodologies, present their research, and engage with the wider statistical community.
By Joseph Salzer (University of Wisconsin-Madison)
Workshop website: https://www.iastro.pt/research/conferences/eprv6/
Attending the Sixth Workshop on Extremely Precise Radial Velocities (EPRV 6) in Porto, Portugal was both exciting and a little daunting. As a statistics PhD student working with astronomical data, I was presenting my work to an audience of seasoned astronomers. Fortunately, the community proved to be both welcoming and rigorously data-driven. The SOC, led by Susana Barros and Nuno Santos, organized this meeting to improve the radial velocity (RV) method for exoplanet detection and characterization. Invited speakers included Nobel laureate Michel Mayor, who, along with Didier Queloz, was awarded half of the 2019 Nobel Prize in Physics for the discovery of the first exoplanet orbiting a sun-like star (51 Pegasi b, see https://www.nobelprize.org/prizes/physics/2019/summary/). The workshop covered topics in post-processing techniques and stellar activity: the intricate processes inside and on the surface of stars that can distort RV measurements. Splinter sessions highlighted Gaussian process regression, deep neural networks, and “Sun‑as‑a‑Star” datasets, which serve as vital benchmarks for testing statistical methods aimed at mitigating stellar activity. Session chairs organized spirited discussions on best practices for comparing RV models in exoplanet characterization. A particular highlight was the continued development of RV data challenges for the community to benchmark their models through the Extreme Stellar Signals Project (ESSP), which is an EPRV data challenge led by Lily Zhao (Sagan Fellow, University of Chicago).
From a statistical perspective, some of the most difficult challenges discussed included disentangling genuine planetary signals from stellar noise, handling time-correlated measurement errors, and assessing model comparison criteria when the true generating process is unknown. Several talks showed that methods such as Gaussian processes and neural networks, while powerful, can sometimes absorb planetary signals or “overfit” to stellar variability. To detect “Earth-like” planets orbiting “Sun-like” stars, the EPRV community is aiming to achieve a precision of ~10 cm/s. Current state of the art methods have achieved ~40 cm/s precision on Sun‑as‑a‑Star datasets (e.g., [1], [2], [3]).
I left inspired by the community’s collaborative spirit and optimistic about the future of statistical modeling applied to rich spectral datasets. It was a privilege to attend EPRV 6 and contribute, in some small way, to the search for new planets.
[1] Ford, Eric B., et al. "Earths within Reach: Evaluation of Strategies for Mitigating Solar Variability using 3.5 years of NEID Sun-as-a-Star Observations." arXiv preprint arXiv:2408.13318 (2024).
Link: https://arxiv.org/abs/2408.13318
[2] Joseph Salzer et al 2025 AJ 170 179
Link: https://iopscience.iop.org/article/10.3847/1538-3881/adf29d
[3] EPRV6 presentation by Sara Tavella on "Reaching 40 cm/s RV precision on HARPS-N solar data with a PCA correction at the spectral level."
Astrostatistics innovations of the present are highlighted in this section.
Statistical inference on the angular power spectrum is notoriously challenging, given that the likelihood function does not have a closed-form expression and, in general, cannot be considered approximately Gaussian. In these companion papers, we demonstrate that it is possible to test parametric models for the angular power spectrum in an entirely distribution-free manner. In particular, not only does the limiting null distribution of the test statistics used not depend on the model, but, to derive such a distribution, one does not need to specify the likelihood of the angular power spectrum or its estimators. From a technical standpoint, this is achieved by means of the so-called Khmaladze-2 transform, which enables mapping the projections of the errors arising from parameter estimation into the same ``standard projection’’. Algeri et al. 2025 outlines such an inferential approach in the general setting. Zhang et al. 2025+ extends the procedure to tackle the unique challenges arising in the study of the stochastic gravitational-wave background and applies it to data from the third observing run (O3) of Advanced LIGO and Advanced Virgo.
Companion papers:
Algeri S., Zhang X., Floden E., Zhao H.†, Jones G., Mandic V., and Miller J. A Distribution-Free Approach to Testing Models for Angular Power Spectra. Physical Review D letters (Forthcoming, 2025) arXiv:2504.16079
Zhang X.†, Floden E.†, Zhao H.†, Algeri S., Jones G., Mandic V., and Miller J.†. On Validating Angular Power Spectral Models for the Stochastic Gravitational-Wave Background Without Distributional Assumptions. (Under review, 2025+)
Python tutorial available at https://github.com/xiangyu2022/DisfreeTestAPS
R tutorial available at https://github.com/small-epsilon/GOF-Testing-for-Angular-Power-Spectrum-Models-in-R
An annual competition to identify innovative student-led papers is run by the American Statistical Association’s Astrostatistics Interest Group. The finalists present their work in a special session at the Joint Statistical Meeting. A summary of their papers are provided below, written by the finalists.
Quantifying the Clustering Probability in Noisy Nonhomogeneous Spatial Data to Identify New Repeating Fast Radio Burst Sources from CHIME/FRB
Annals of Applied Statistics, arXiv:2410.12146 (2024)
Amanda Cook, University of Toronto
WINNER, AIG Student Paper Award competition
This paper introduces a new statistical framework for analyzing nonhomogeneous Poisson processes (NHPPs) observed with noise, with particular focus on second-order characteristics that capture clustering beyond what is expected under a purely random model. The approach combines a hierarchical Bayesian model to estimate hyperparameters governing a physically motivated nonhomogeneous intensity function, even in the presence of substantial observational uncertainty.
Applied to data from the Canadian Hydrogen Intensity Mapping Experiment’s FRB project (CHIME/FRB), this method allows for assessment of whether observed clusters of fast radio bursts (FRBs) are consistent with physically independent sources or indicative of genuine repeaters. This work provides an update to "the probability of chance coincidence", built up from first principles, and embedding it in a coherent probabilistic framework that supports uncertainty quantification. This quantifies the probability that k bursts detected within a given radius are truly independent. When applied to the published CHIME/FRB sample, the new method improves candidate repeater significance in 86% of cases, with a median improvement factor of approximately 3000 over existing metrics.
A data-driven approach to stellar flare detection
Astrophysical Journal 979, 141 (2025)
J. Arturo Esquivel F., University of Toronto
Finalist, AIG Student Paper Award competition
We present a hidden Markov model (HMM) for discovering stellar flares in light-curve data of stars. HMMs provide a framework to model time series data that are nonstationary; they allow for systems to be in different states at different times and consider the probabilities that describe the switching dynamics between states. In the context of the discovery of stellar flares, we exploit the HMM framework by allowing the light curve of a star to be in one of three states at any given time step: quiet, firing, or decaying. This three-state HMM formulation is designed to enable straightforward identification of stellar flares, their duration, and associated uncertainty. This is crucial for estimating the flare's energy, and is useful for studies of stellar flare energy distributions. We combine our HMM with a celerite model that accounts for quasiperiodic stellar oscillations. Through an injection recovery experiment, we demonstrate and evaluate the ability of our method to detect and characterize flares in stellar time series. We also show that the proposed HMM flags fainter and lower energy flares more easily than traditional sigma-clipping methods. Lastly, we visually demonstrate that simultaneously conducting detrending and flare detection can mitigate biased estimations arising in multistage modeling approaches. Thus, this method paves a new way to calculate stellar flare energy.
Prediction Intervals for Astronomy Data with Covariate Error
Monthly Notices of the Royal Astronomical Society 539, 1372 (2025)
Naomi Singer, North Carolina State University
Finalist, AIG Student Paper Award competition
Accurate characterization of exoplanets often relies on measurements, such as mass and radius, that contain non-negligible errors. Ignoring these errors can bias model training and prediction, especially for classification or habitability assessment. While recent work has addressed uncertainty-aware model estimation, prediction methods remain limited and often require unrealistic assumptions, such as known error distributions. We address this gap by extending the conformal prediction framework to accommodate measurement error models, which describe how errors enter observations. This extension retains conformal prediction’s finite-sample coverage guarantees under minimal assumptions and applies to a broad class of error distributions without requiring their explicit specification. We demonstrate the method’s validity through simulations and illustrate its use in constructing prediction intervals for unobserved exoplanet masses based on established broken power-law mass–radius relationships.
Discovery of Two Ultra-Diffuse Galaxies with Unusually Bright Globular Cluster Luminosity Functions via a Mark-Dependently Thinned Point Process (MATHPOP)
Astrophysical Journal 984, 147 (2025)
Dayi Li, University of Toronto
Finalist, AIG Student Paper Award competition
Inferring the number of globular clusters (GCs) in faint galaxies, such as ultra-diffuse galaxies (UDGs), is crucial for understanding galaxy formation, but standard methods are hampered by significant uncertainties. These include challenges in photometric measurements, determining GC membership, and rigid assumptions about the GC luminosity function (GCLF). We introduce the MArk-dependently THinned POint Process (MATHPOP), a novel statistical framework to robustly estimate GC populations. MATHPOP is a point process model that jointly infers the spatial distribution and magnitude properties of GCs. A key innovation is the hierarchical Bayesian structure of the model framework which allows us to account for various sources of uncertainties with a minimal set of assumptions. In return, we are able to infer both the total GC count and the GCLF parameters with rigorous uncertainty quantification. We applied MATHPOP to 40 low-surface-brightness galaxies in the Perseus cluster using Hubble Space Telescope data. Our analysis revealed a significant discovery: two galaxies host GC systems with anomalously bright GCLF turnover points, a finding supported by strong statistical evidence. This work provides a powerful new tool for astrostatistics and identifies unusual galaxies that challenge our understanding of star and galaxy formation.
ChronoFlow: A Data-Driven Model for Gyrochronology
Astrophysical Journal 986, 59 (2025)
Phil Van-Lane, University of Toronto
Finalist, AIG Student Paper Award competition
Historically, it has been challenging to obtain well-constrained age estimates for low mass stars on the main sequence, i.e. those fusing hydrogen in their cores. One technique that has shown promise for such stars is gyrochronology (which can estimate ages based on stellar rotation rates), however the observed dispersion in rotation rates for similar coeval stars has been difficult to characterize with analytical models. To model this complexity, we have developed ChronoFlow: a flexible data-driven model which accurately captures observed rotational dispersion using a Conditional Normalizing Flow. Importantly, ChronoFlow accounts for non-uniform cluster membership probabilities in the calibration data. We apply ChronoFlow in a Bayesian inference framework to estimate individual stellar ages and population (open cluster) ages, recovering cluster ages with a statistical uncertainty of 0.06 dex and individual stellar ages with a statistical uncertainty of 0.7 dex. Furthermore, we conducted robust systematic tests to analyze the impact of extinction models, cluster membership, and calibration ages. In addition to age estimation, ChronoFlow can be used to inform and evaluate physical stellar spin down models. ChronoFlow is publicly available at https://github.com/philvanlane/chronoflow
Astrostatistics Events
A list of events will be maintained at our website, astrostatisticsnews.com/upcoming-events.
sys2025: Systematic and Measurement Errors across the Sciences - Astrostatistics and Data Science
November 14-17, 2025
Gulf State Park, Al
Details: https://sites.google.com/uah.edu/sys2025/home?authuser=0
sys2025 is the second in a series of astrostatistics and data science workshops in the Southeastern part of the U.S. After the successful iid2022 workshop on count data in the Fall of 2022 at the Lake Guntersville State Park, sys2025 focuses on the topic of systematic errors. Systematic errors are everywhere across the sciences, yet they are often ill-defined or misunderstood, especially from a statistical point of view. This workshop aims to bring together data science practitioners --- from astronomy/physical sciences and other fields, such as biostatistics, econometrics and more --- with statisticians and machine-learning experts. The workshop is structured with a series of introductory review lectures from established professionals, and shorter presentations by participants of all levels of experience, with emphasis on participation of student and early-career participants, especially from traditionally underrepresented groups. The National Science Foundation EPSCoR program is expected to support the participation of students and early-career participants (see Registration for details).
The workshop is intended as an in-person meeting, with time for social interactions and networking. For those unable to travel, there is a remote participation option.
sys2025 will also offer an opportunity to publish peer-reviewed papers on the topic of systematic errors as part of a Research Topic for the journal Frontiers in Astronomy and Space Science/Astrostatistics. See the Program/Proceedings tab for more information.
STAMPS Seminar Series
STAtistical Methods for the Physical Sciences Research Center (STAMPS@CMU)
Monthly Virtual online
Details: https://www.cmu.edu/dietrich/statistics-datascience/stamps/index.html
Overview: Talks are open to everyone who registers on the web site:
https://www.cmu.edu/dietrich/statistics-datascience/stamps/events/webinars/index.html
IAU - IAA Astrostatistics and Astroinformatics Seminars
Monthly Virtual online
Details: https://sites.google.com/view/iau-iaaseminar-new/home
Schedule: https://sites.google.com/view/iau-iaaseminar-new/schedule?authuser=0
Overview: This international online seminar is an initiative of the International Astrostatistics Association and the IAU Astroinformatics and Astrostatistics Commission. It focuses on statistical analysis and data mining of astronomical data. The seminar is run on Zoom monthly on second Tuesdays alternating between Europe-America and Australasia-Europe time zone instances. The standard seminar times are 8:00 UTC and 16:00 UTC. Please check the exact time and time differences with your timezone.
If you have ideas for AN content, please send a message to astrostatisticsnews@gmail.com. We may include your idea in a future issue if we think it is a good fit for an issue.
Ideas may include relevant astrostatistics papers/data/code, visualizations, upcoming events, job postings, format or commentary suggestions, etc.
See astrostatisticsnews.com for more information such as past issues, lists of astrostatistics references and societies.
Subscribe to Astrostatistics News
To subscribe to Astrostatistics News, go to https://groups.google.com/g/astrostatistics-news and select the “Join group” button. You will need to be logged into your Google account to join the group.
Please forward this information to anyone who may be interested!