Data Challenges
To download and work with the datasets, see the Getting Started page. See the Submissions page for submission deadlines and details on what and how to submit.
Simulations
The substructure realizations, comprising subhalos and line-of-sight halos, are generated by pyHalo. pyHalo accounts for this addition of mass to the mass model by also including a negative convergence sheet.
We assume the survey strategy of the Medium Tier of the High Latitude Wide Area Survey (HLWAS) described in the Roman Observations Time Allocation Committee's Final Report (released in April 2025). The Medium Tier will cover 2415 square degrees in the F106, F129, and F158 filters with 2 passes, 3 dither positions, and 107 second exposures, for an effective exposure time of 642 seconds (in reality, lower due to chip gaps and variable due to the tiling strategy). For the romanisim simulations (see details in the rung descriptions below), we use MA table number 17 (601 second exposure time).
For additional details about the software pipeline used to produce these data, see Wedig et al. 2025 and the mejiro documentation.
Note
We only simulate static, galaxy-galaxy strong gravitational lenses. In other words, we do not simulate quasars, group-scale, or cluster-scale systems.
Note
The signal-to-noise ratio values provided in the datasets are approximate for all datasets using romanisim. For computational reasons, they are calculated with galsim using similar parameters and with a smooth mass model. The percent-level differences between the exposures generated by these packages will give rise to slightly different SNRs.
Rung 0 (Tutorial)
The goal of this rung is to serve as a tutorial for participants. You will familiarize yourself with the systematics of the challenge (e.g. where and how to submit, obtain and work with datasets, where to find relevant documentation) and for us as the organizers to make sure that the data challenge runs smoothly.
In this rung, you will train a regression model to determine the Einstein radius of lenses.
Real galaxies from the HST COSMOS survey (F814W) are used as source galaxies. See the SLSim documentation for details.
Note
Versions 1.x use GalSim to add noise, while Versions 2.x use romanisim. Version 2.0 does not use CRDS.
Downloads
- Latest rung 0 dataset and example notebook on Zenodo
- Latest unlabeled rung 0 dataset and example notebook on Zenodo
Configuration File
The full mejiro configuration YAML file used to generate the training dataset is included below:
data_dir: &data_dir /nfsdata1/bwedig/mejiro # where output data should be written
pipeline_label: roman_data_challenge_rung_0
psf_cache_dir: cached_psfs
dev: False
nice: 19
show_progress_bar: True
suppress_warnings: True
logging_level: WARNING
limit: null
seed: &seed 42
cores:
script_00: 32
script_01: 32
script_01a: 32
script_01b: 32
script_03: 32
script_04: 32
script_05: 32
script_05_romanisim: 12
script_snr: 1
jaxtronomy:
use_jax: False
jax_platform: cpu # cpu or gpu
instrument: roman
survey:
runs: 756 # 90
num_galaxy_tables: 756 # number of independent galaxy tables for _01a; higher = more intrinsic diversity
speed_factor: 1 # set >1 for faster but less complete population draws in _01b
area: 0.5
skypy_config: lsst-like_triple_SF
write_to_csv: False
total_population: False
use_real_sources: True
catalog_source_kwargs:
catalog_path: /data/bwedig/COSMOS
catalog_type: COSMOS
sersic_fallback: True
max_scale: 1
use_slhammocks_pipeline: True
slhammocks_pipeline_kwargs:
skypy_config: slhammocks
slhammocks_config: null
loghm_min: 11
loghm_max: 15
detectors: &detectors [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18] #
bands: [F062, F087, F106, F129, F158, F184, F213, F146] # SkyPy will calculate magnitudes for each of these bands; they'll be stored in the physical_params attribute of the strong_lens
deflector_cut_band: F129
deflector_cut_band_max: 27
deflector_z_min: 0.01
deflector_z_max: 5.0
source_cut_band: F129
source_cut_band_max: 29
source_z_min: 0.01
source_z_max: 8.0
min_image_separation: 0.3 # arcseconds
max_image_separation: 10.0 # arcseconds
mag_arc_limit_band: F129
mag_arc_limit: 27
magnification: 3
subhalos:
fraction: 1. # fraction of systems to add substructure to
pyhalo_model: CDM
realization_kwargs:
log_mlow: 6
log_mhigh: 12
LOS_normalization: 1.
concentration_model_subhalos: LUDLOW2016
concentration_model_fieldhalos: LUDLOW2016
shmf_log_slope: -1.9
r_tidal: 0.5 # see Section 3.1 of Gilman et al. 2020 https://ui.adsabs.harvard.edu/abs/2020MNRAS.491.6077G/abstract
sigma_sub: 0.055 # see Section 6.3 of Gilman et al. 2020 https://ui.adsabs.harvard.edu/abs/2020MNRAS.491.6077G/abstract
synthetic_image:
bands: [ F106, F129, F158 ]
fov_arcsec: 8.03
supersampling_compute_mode: adaptive
supersampling_factor: &supersampling_factor 5
pieces: True
exposure:
ma_table_number: 17
date: 2027-04-15T00:00:00
coordinates:
ra: 150.
dec: 2.
imaging:
exposure_time: &exposure_time 642
engine: galsim
engine_params:
rng_seed: *seed
min_zodi_factor: 1.4
sky_background: True
detector_effects: True
poisson_noise: True
reciprocity_failure: True
dark_noise: True
nonlinearity: True
ipc: True
read_noise: True
snr:
snr_band: F129
snr_exposure_time: *exposure_time
snr_fov_arcsec: 8.03
snr_supersampling_compute_mode: adaptive
snr_supersampling_factor: &snr_supersampling_factor 1
snr_threshold: 20
snr_per_pixel_threshold: 1
psf:
bands: [ F106, F129, F158 ]
oversamples: [*snr_supersampling_factor, *supersampling_factor]
num_pixes: [73] # the first element of this list is used for the pipeline
detectors: *detectors
divide_up_detector: 4 # this sets the detector positions, e.g., 5 means 25 positions on each detector
dataset:
version: 2.0 # used in the filename of the .h5 file
labeled: True
include_psfs: False
include_synthetic_images: False
Rung 1
The goal of this rung is to distinguish between mass distributions with Cold Dark Matter subhalos and no subhalos. In this rung, you will train a binary classifier to determine whether subhalos are present.
Real galaxies from the JWST COSMOS-Web survey are used as source galaxies. NIRCam F115W, F150W, F277W, and F444W images are mapped to Roman filters to create source galaxies that are not monochromatic. See the SLSim documentation for details.
All versions use romanisim to add noise.
Downloads
Configuration File
The full mejiro configuration YAML file used to generate the training dataset is included below:
data_dir: &data_dir /nfsdata1/bwedig/mejiro # where output data should be written
pipeline_label: roman_data_challenge_rung_1
psf_cache_dir: cached_psfs
dev: False
nice: 19
show_progress_bar: True
suppress_warnings: True
logging_level: WARNING
limit: null
seed: &seed 42
cores:
script_00: 60
script_01: 60
script_01a: 60
script_01b: 48
script_03: 60
script_04: 60
script_05: 60
script_05_romanisim: 12
script_snr: 24
jaxtronomy:
use_jax: False
jax_platform: cpu # cpu or gpu
instrument: roman
survey:
runs: 7002 # 7056
num_galaxy_tables: 100 # number of independent galaxy tables for _01a; higher = more intrinsic diversity
speed_factor: 1 # set >1 for faster but less complete population draws in _01b
area: 0.5
skypy_config: lsst-like_triple_SF
write_to_csv: False
total_population: False
use_real_sources: True
catalog_source_kwargs:
catalog_path: /nfsdata1/bwedig/slsim_source_catalogs/COSMOSWeb_galaxy_catalog
catalog_type: COSMOS_WEB
sersic_fallback: True
max_scale: 1
# Override which catalog cutout backs each band in source_images.
# {destination_band: source_band}; both must be keys in source_images
# (i.e. listed in survey.bands). Self-mappings are no-ops; omit a band
# to leave its default in place.
# SLSim COSMOS-Web wavelength-ordering puts the pure JWST cutouts at:
# F062=F115W, F106=F150W, F158=F277W, F213=F444W
remap_bands:
F106: F062
F129: F106
F158: F158
use_slhammocks_pipeline: True
slhammocks_pipeline_kwargs:
skypy_config: slhammocks
slhammocks_config: null
loghm_min: 11
loghm_max: 15
detectors: &detectors [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18] #
bands: [F062, F087, F106, F129, F158, F184, F213, F146] # SkyPy will calculate magnitudes for each of these bands; they'll be stored in the physical_params attribute of the strong_lens
deflector_cut_band: F129
deflector_cut_band_max: 27
deflector_z_min: 0.01
deflector_z_max: 5.0
source_cut_band: F129
source_cut_band_max: 29
source_z_min: 0.01
source_z_max: 8.0
min_image_separation: 0.3 # arcseconds
max_image_separation: 10.0 # arcseconds
mag_arc_limit_band: F129
mag_arc_limit: 27
magnification: 3
subhalos:
split:
model_1: CDM
model_2: WDM
fraction: 0.5 # fraction of systems to add substructure to
pyhalo_model: CDM
realization_kwargs:
log_mlow: 6
log_mhigh: 12
LOS_normalization: 1.
concentration_model_subhalos: LUDLOW2016
concentration_model_fieldhalos: LUDLOW2016
shmf_log_slope: -1.9
r_tidal: 0.5 # see Section 3.1 of Gilman et al. 2020 https://ui.adsabs.harvard.edu/abs/2020MNRAS.491.6077G/abstract
sigma_sub: 0.055 # see Section 6.3 of Gilman et al. 2020 https://ui.adsabs.harvard.edu/abs/2020MNRAS.491.6077G/abstract
synthetic_image:
bands: [ F106, F129, F158 ]
fov_arcsec: 8.03
supersampling_compute_mode: adaptive
supersampling_factor: &supersampling_factor 5
pieces: False
exposure:
ma_table_number: 17
date: 2027-04-15T00:00:00
coordinates:
ra: 150.
dec: 2.
imaging:
exposure_time: &exposure_time 601
engine: galsim
engine_params:
rng_seed: *seed
min_zodi_factor: 1.4
sky_background: True
detector_effects: True
poisson_noise: True
reciprocity_failure: True
dark_noise: True
nonlinearity: True
ipc: True
read_noise: True
snr:
snr_band: F129
snr_exposure_time: *exposure_time
snr_fov_arcsec: 8.03
snr_supersampling_compute_mode: adaptive
snr_supersampling_factor: &snr_supersampling_factor 1
snr_threshold: 20
snr_per_pixel_threshold: 1
psf:
bands: [ F106, F129, F158 ]
oversamples: [*snr_supersampling_factor, *supersampling_factor]
num_pixes: [73] # the first element of this list is used for the pipeline
detectors: *detectors
divide_up_detector: 4 # this sets the detector positions, e.g., 5 means 25 positions on each detector
dataset:
version: 1.0 # used in the filename of the .h5 file
labeled: True
include_psfs: False
include_synthetic_images: False
Rung 2
In this rung, you will train a binary classification model to distinguish between cold dark matter (CDM) and warm dark matter (WDM). This will quantify a pipeline's ability to detect the collective perturbative effect of a population of low-mass subhalos just below Roman's single-subhalo detection threshold.