Astrostatistics News
Issue 6, April 2025
Issue Editors: Jessi Cisewski-Kehe, David W. Hogg, Vinay L. Kashyap, Aneta Siemiginowska
Astrostatistics News (AN) is a newsletter designed to inform, promote, cultivate, and inspire the astrostatistics community.
If you have ideas for AN content, please send a message to astrostatisticsnews@gmail.com. If you have written or come across astrostatistics-related articles, data, code, meetings, jobs, etc. that you find interesting or useful, others in our community may as well. Please let us know!
By Vinay Kashyap (Center for Astrophysics | Harvard & Smithsonian)
Paper: Tak, Hyungsuk, Yang Chen, Vinay L. Kashyap, Kaisey S. Mandel, Xiao-Li Meng, Aneta Siemiginowska, and David A. van Dyk. "Six Maxims of Statistical Acumen for Astronomical Data Analysis." The Astrophysical Journal Supplement Series 275, no. 2 (2024): 30.
[DOI, arXiv:2408.16179]
Way back in Jan 2019 we were asked to come up with a critique of statistical analyses in astro papers. But there are infinitely many ways of being wrong, and very few ways of doing right, so we ended up not doing that. The project morphed into something else entirely, and has ended up in this...uh, pamphlet? manifesto? a sort-of-review, sort-of-tutorial, sort-of-practical-guide, not-a-paper on Astrostatistical Things For Astronomers To Be Aware Of™ when they are handling complex data, which is what most of it is nowadays. It is organized in the same conceit as George Box's famous aphorism (“all models are wrong, but some are useful”). The idea is to point to important concepts and shine a light on potential pitfalls using (mostly) existing astronomical literature.
"All data have stories, but some are mistold" : Data in nice tables and catalogs can be alluring. But be aware of the sampling process used to select or generate them, the selection effects that affect them (Eddington and Malmquist are just the tip of the iceberg), preprocessing and filtering, and both the necessity of calibration and the uncertainties in calibration.
"All assumptions are meant to be helpful, but some can be harmful" : Direct quote – "standard statistical models do not account for unusual features of astronomical data or models." Check the residuals, check sensitivity to starting values, check for multimodality, deploy domain knowledge to see whether the results make sense, and understand the asymptotics, both of the regularity conditions of the statistical tests and of the data sizes.
"All prior distributions are informative, even those that are uniform" : One of my pet peeves is the conflation of a “flat” prior with an uninformative prior. No! Flat is a choice! Even the scale -- linear/log/inverse -- is a choice! You don't get to claim some exalted objectivity by ignoring known information and misspecifying the prior. Always check the sensitivity of the adopted prior on the results. If the posterior has too much mass near the bounds, something is going wrong.
"All models can be given interpretations, but some are more compelling" : This is almost a sequel to the original Box aphorism. Sure, there are useful models which have parameters with specific meaning. But just because it fits the data does not mean it captures the essential physics. Avoid overinterpreting the model.
"All statistical tests have thresholds, but some are mis-set" : You can't have a "test" without a threshold, and you have to pay real attention to how it is set. Astronomers have generally avoided the p-hacking scandal of medical and social sciences by setting more stringent thresholds than the usual p<0.05, but that only takes you so far, especially when multiple tests are done. Bonferroni corrections are too conservative, so instead it is better to control for False Discovery Rate (FDR).
"All model checks consider variations of the data, but some variants are more relevant than others" : The idea here is to not ignore what you already know. E.g., it is common for goodness of fit checks to be computed bin-wise, and do not take into account the total counts observed. What if you did? Well, it makes a big difference if you do – error bars will be smaller, tests will have more power!
Astrostatistics innovations of the present are highlighted in this section.
By Naomi Giertych and Jonathan P. Williams (North Carolina State University)
Paper: Giertych, Naomi, Ahmed Shaban, Pragya Haravu, and Jonathan P. Williams. "A statistical primer on classical period-finding techniques in astronomy." Reports on Progress in Physics 87, no. 7 (2024): 078401. [DOI]
The historical lack of cross-disciplinary communication between astronomers and statisticians led to the development of a variety of statistical methods for detecting a periodic signal from unevenly spaced time-series with heteroskedastic noise in the astronomy literature. Astronomers have found general utility in their approaches, but conflicting accounts of the statistical properties of their classical methods, such as the phase-dispersion minimization (PDM) statistic, were published in seminal articles in the 1970-80s. The PDM statistic alone has been used in approximately 1,500 astronomical articles published over four decades (Feigelson et al., 2021). The article “A statistical primer on classical period-finding techniques in astronomy” serves to offer a perspective from the statistical community, including mathematical and empirical descriptions and comparisons of the null distributions of classical period-finding statistics, their extreme-value distributions, and commentary on multiple-testing issues.
By Soham Ghosh (University of Wisconsin-Madison), Uttaran Chatterjee (Purdue University), Jyotishka Datta (Virginia Tech)
Paper: Ghosh, Soham, Uttaran Chatterjee, and Jyotishka Datta. "The Curious Problem of the Normal Inverse Mean." arXiv preprint arXiv:2410.20641 (2024). [arXiv:2410.20641]
In astronomical observations, accurately estimating distances from stellar parallaxes is challenging because of inherent measurement errors and the non-linear inverse relationship between parallax and distance. Our work leverages robust Bayesian inference to address these challenges, systematically investigating a broad class of heavy-tailed priors to achieve reduced bias and variance in distance estimates, particularly when fractional parallax errors are substantial. To rigorously quantify tail behavior, we employ the concept of credence, a measure of tail-thickness of probability distributions, which allows us to clearly identify priors that are better equipped to handle large errors. Motivated by these insights, we introduce the Product Half-Cauchy prior, a novel and slightly nonstandard heavy-tailed prior constructed as the product of two Half-Cauchy random variables, leading to a density with polynomial decay modulated by a logarithmic factor. This distinctive tail behavior provides superior robustness compared to commonly used heavy-tailed priors, effectively mitigating bias and variance inflation even in scenarios with large fractional parallax errors. Through theoretical analysis, we highlight the “curse of a single observation,” showing that the posterior is primarily driven by the likelihood, yet demonstrate that our proposed class of priors significantly delays the associated explosion in posterior risk, thus enhancing robustness. Finally, applying these methods to real stellar parallax measurements from the Gaia Data Release 1 (GDR1), we illustrate substantial practical improvements over traditional inverse-parallax estimates, offering astronomers a more reliable and robust statistical toolkit for stellar distance estimation. Further technical details and extensive results can be found in our full manuscript available at https://arxiv.org/abs/2410.20641.
By Yang Chen (Department of Statistics, University of Michigan, Ann Arbor)
Paper: Chen, Yang, Ward Manchester, Meng Jin, and Alexei Pevtsov. "Solar Imaging Data Analytics: A Selective Overview of Challenges and Opportunities." Statistics and Data Science in Imaging 1, no. 1 (2024): 2391688. [DOI]
We provide an introduction to solar imaging data, which is becoming increasingly accessible to researchers due to the advancement of data collection and storage capabilities. Our focus is on the challenges and opportunities presented by data-driven approaches in understanding solar eruptions, such as solar flare events, which are rare and typically sudden events that happen at a high intensity. We describe prediction problems related to solar phenomena that can benefit from statistical methods adapted to the vast volume of heterogeneous and multimodal solar imaging data. We also describe the available data products and software packages so interested researchers can directly download and process solar imaging data based on their needs. Moreover, we summarize the state-of-the-art forecasting models for solar eruptions and their limitations. Finally, we point out several promising research directions from statistical modeling and computational perspectives.
A large number of astronomical datasets are public, and a mere tempting click away. Easy as they may be to get at, it is also easy for the unwary to trip over details. This section highlights some sources of data which are statistician friendly. Typically, significant efforts have been made to document their contents.
Maintained by the Centre de Données astronomiques de Strasbourg (CDS), VizieR (Ochsenbein et al. 2000) is a massive library of astronomical catalogs pulled in from the literature. There are currently more than 25,000 catalogs available, all in a machine-readable format, and which can be downloaded or queried in a number of ways. It is the other side of the coin to the SIMBAD database, which stores the measurements and bibliographic information for individual astronomical objects.
Link: https://vizier.cds.unistra.fr/index.gml
There is a fleet of Sun gazing telescopes, on Earth and in orbit, monitoring the solar photosphere, chromosphere, corona, magnetosphere, and the surrounding interplanetary space nearly continuously at high cadence over wavelengths ranging from radio to X-ray. In the overview by Chen et al. (2024) above, they list several sites to get the data from, see their Tables 1 and 2. Some aggregators, like the one at JSOC (http://jsoc.stanford.edu/), are more user friendly than others, and allow on the fly filtering over time, passband, spatial registration to correct for solar rotation and include up to date calibration.
General definitions of astronomy or statistical terms are included in this section.
The International Astronomical Union has a glossary of common astronomy terms (see https://astro4edu.org/resources/glossary/search/). Here we plan to build up a similar dictionary, focusing on both statistics and astronomy jargon. A list of defined terms is maintained at our website, https://www.astrostatisticsnews.com/dictionary
If you have comments, questions, concerns, edits, or terms you would like included please let us know at astrostatisticsnews@gmail.com.
A stationary process is a random process such that its statistical properties (e.g., expected value, variance) do not change with time. This is because the cumulative distribution function of the process is time invariant. This is in contradistinction to non-stationary processes like Brownian motion where the mean distance from the starting point grows without bound. However, Brownian motion does have stationary increments, which means that the distribution of the difference between two time points of the stochastic process (e.g., at t1 and t2) depends on their difference (e.g., t2 - t1), but not the starting time (e.g., the distribution of the difference between t1 and t2 is the same as between t1+h and t2+h).
–VLK/JCK
Job Opportunities in Astrostatistics
Director of the Institute of Computational and Data Sciences (ICDS) at Pennsylvania State University
As one of Penn State’s interdisciplinary research institutes housed within the Office of the Senior Vice President for Research (OSVPR), ICDS supports a broad swath of the research portfolio—exploring the origins of far-away galaxies to materials for next-generation energy production; from ways to overcome societal discord to environmental challenges. The Director will build on a legacy of visionary excellence and guide Penn State’s future investments in computational and data sciences research, AI, quantum computing, and other disciplines to reach the next level.
Deadlines: Review of applications begins April 7, 2025
A list of job opportunities is maintained at our website, astrostatisticsnews.com/job-opportunities.
Astrostatistics Events
A list of events is maintained at our website, astrostatisticsnews.com/upcoming-events.
Virtual Summer School in Statistics for Astronomers
June 2-6, 2025, Online
Details: https://sites.psu.edu/astrostatistics/su25/
Deadlines: Registration deadline is May 9, 2025
Overview: Penn State's Center for Astrostatistics is pleased to offer its 20th annual Summer School in Statistics for Astronomers. Taught by experienced faculty in statistics and astrostatistics, it provides a foundation in statistical inference, methods, and software within the context of problems arising in astronomical research. Topics include principles of probability and inference, regression and model selection, bootstrap resampling, supervised and unsupervised learning, Bayesian data analysis, Markov chain Monte Carlo (MCMC), nested sampling, time series analysis, spatial statistics, deep learning neural networks, Gaussian processes regression, and random forests. Extensive training in the public domain R statistical software environment is provided. Typical attendees are graduate students and young researchers, but others from undergraduates to senior researchers are welcome.
Stimulated by the enthusiastic world-wide participation in our Summer Schools conducted online during the Covid-19 pandemic, we will provide the 2025 school in an enhanced online format. The lectures will be pre-recorded and can be viewed by participants any hour of the day. They will be supplemented by synchronous Zoom events and Slack channels where participants can communicate with instructors and teaching assistants. Participants will also learn the R statistical software language through applications to astronomical problems via recorded tutorials and by independent work using Jupyter notebooks. Teaching assistants will be available to assist with R for a wide range of time zones. Asynchronous Slack channels will also be available to discuss the lectures, consult with astrostatisticians on individual participant’s research, and informally interact with other participants. Altogether, participants should expect to spend several hours per day – in their own time zones – working on the Summer School during the June 2-6 week.
Joint Statistical Meeting 2025 Astrostatistics Sessions
August 2-7, 2025
Nashville, TN
Details: https://ww2.amstat.org/meetings/jsm/2025/
Astrostatistics sessions sponsored by the American Statistical Association’s Astrostatistics Interest Group at the Joint Statistical Meeting, JSM2025:
Sunday, August 3, 2-3:50pm
Astrostatistics Interest Group: Student Paper Award
Chair: David Stenning (Simon Fraser University)
Session organizer: Aarya Patil (Max Planck Institute for Astronomy)
Speakers and Titles of the Nominations for the Award:
J. Arturo Esquivel F. (University of Toronto): A data-driven approach to stellar flare detection
Phil Van-Lane (University of Toronto): ChronoFlow: A Data-Driven Model for Gyrochronology
Dayi Li (University of Toronto): Discovery of Two Ultra-Diffuse Galaxies with Unusually Bright Globular Cluster Luminosity Functions via a Mark-Dependently Thinned Point Process (MATHPOP)
Naomi Giertych (North Carolina State University): Prediction Intervals for Astronomy Data with Covariate Error
Amanda Cook (University of Toronto): Quantifying the Clustering Probability in Noisy Nonhomogeneous Spatial Data to identify New Repeating Fast Radio Burst Sources from CHIME/FRB.
Sunday, August 3 at 4pm-5:30pm
Advances in Time-Series Analysis for Astronomy's Big Data Era
Session organizer: Aarya Patil (Max Planck Institute for Astronomy).
Discussant: Chad Schafer (Carnegie Mellon University)
Speakers and Titles
Malgorzata Sobolewska (CfA): Detecting periodic signatures in red-noise dominated lightcurves of accreting black holes.
David Corliss (Grafham Analytics): Longitudinal Analysis of Sudden Behavioral Changes in Red Supergiants Betelgeuse and RW Cephei
Victor Verma (University of Michigan) On the optimal prediction of extreme events in heavy-tailed time-series with applications to solar flare forecasting
Vinay Kashyap (CfA) Solar and Stellar Flares: dealing with cyclic, stochastic, and cascading events
Wednesday, August 6 at 10:30am - 12:20pm
Contributed Poster Presentations: Astrostatistics Interest Group
Chair: Shirin Golchi (McGill University)
Speakers and Titles
Xiaoli Li (University of Chicago): Boosting C-statistics in Astronomy: Higher-order Asymptotics for Improved Goodness-of-fit Testing.
Massimiliano Bonamente (University of Alabama in Huntsville): Goodness of fit Statistics for the Regression of Integer-count Data with Systematic Errors.
Kevin Jin (University of Michigan): Leveraging Generative Models for Forecasting Solar Flares.
Joseph Salzer (University of Wisconsin): Searching for Exoplanets in Stellar Spectra: Embedding Techniques for Local Feature Shape Analysis.
sys2025: Systematic and Measurement Errors across the Sciences - Astrostatistics and Data Science
November 14-17, 2025
Gulf State Park, Al
Details: https://sites.google.com/uah.edu/sys2025/home?authuser=0
sys2025 is the second in a series of astrostatistics and data science workshops in the Southeastern part of the U.S. After the successful iid2022 workshop on count data in the Fall of 2022 at the Lake Guntersville State Park, sys2025 focuses on the topic of systematic errors. Systematic errors are everywhere across the sciences, yet they are often ill-defined or misunderstood, especially from a statistical point of view. This workshop aims to bring together data science practitioners --- from astronomy/physical sciences and other fields, such as biostatistics, econometrics and more --- with statisticians and machine-learning experts. The workshop is structured with a series of introductory review lectures from established professionals, and shorter presentations by participants of all levels of experience, with emphasis on participation of student and early-career participants, especially from traditionally underrepresented groups. The National Science Foundation EPSCoR program is expected to support the participation of students and early-career participants (see Registration for details).
The workshop is intended as an in-person meeting, with time for social interactions and networking. For those unable to travel, there is a remote participation option.
sys2025 will also offer an opportunity to publish peer-reviewed papers on the topic of systematic errors as part of a Research Topic for the journal Frontiers in Astronomy and Space Science/Astrostatistics. See the Program/Proceedings tab for more information.
STAMPS Seminar Series
STAtistical Methods for the Physical Sciences Research Center (STAMPS@CMU)
https://www.cmu.edu/dietrich/statistics-datascience/stamps/index.html
launched the seminar series on September 20, 2024.
Talks are open to everyone who registers on the web site:
https://www.cmu.edu/dietrich/statistics-datascience/stamps/events/webinars/index.html
IAU - IAA Astrostatistics and Astroinformatics Seminars
Monthly Virtual online
Details: https://sites.google.com/view/iau-iaaseminar-new/home
Schedule: https://sites.google.com/view/iau-iaaseminar-new/schedule?authuser=0
This international online seminar is an initiative of the International Astrostatistics Association and the IAU Astroinformatics and Astrostatistics Commission. It focuses on statistical analysis and data mining of astronomical data. The seminar is run on Zoom monthly on second Tuesdays alternating between Europe-America and Australasia-Europe time zone instances. The standard seminar times are 8:00 UTC and 16:00 UTC. Please check the exact time and time differences with your timezone.
If you have ideas for AN content, please send a message to astrostatisticsnews@gmail.com. We may include your idea in a future issue if we think it is a good fit for an issue.
Ideas may include relevant astrostatistics papers/data/code, visualizations, upcoming events, job postings, format or commentary suggestions, etc.
See astrostatisticsnews.com for more information such as past issues, lists of astrostatistics references and societies.
Subscribe to Astrostatistics News
To subscribe to Astrostatistics News, go to https://groups.google.com/g/astrostatistics-news and select the “Join group” button. You will need to be logged into your Google account to join the group.
Please forward this information to anyone who may be interested!