Data Challenges

To download and work with the datasets, see the Getting Started page. See the Submissions page for submission deadlines and details on what and how to submit.

Simulations

The substructure realizations, comprising subhalos and line-of-sight halos, are generated by pyHalo. pyHalo accounts for this addition of mass to the mass model by also including a negative convergence sheet.

We assume the survey strategy of the Medium Tier of the High Latitude Wide Area Survey (HLWAS) described in the Roman Observations Time Allocation Committee's Final Report (released in April 2025). The Medium Tier will cover 2415 square degrees in the F106, F129, and F158 filters with 2 passes, 3 dither positions, and 107 second exposures, for an effective exposure time of 642 seconds (in reality, lower due to chip gaps and variable due to the tiling strategy). For the romanisim simulations (see details in the rung descriptions below), we use MA table number 17 (601 second exposure time).

For additional details about the software pipeline used to produce these data, see Wedig et al. 2025 and the mejiro documentation.

Note

We only simulate static, galaxy-galaxy strong gravitational lenses. In other words, we do not simulate quasars, group-scale, or cluster-scale systems.

Note

The signal-to-noise ratio values provided in the datasets are approximate for all datasets using romanisim. For computational reasons, they are calculated with galsim using similar parameters and with a smooth mass model. The percent-level differences between the exposures generated by these packages will give rise to slightly different SNRs.

Rung 0 (Tutorial)

The goal of this rung is to serve as a tutorial for participants. You will familiarize yourself with the systematics of the challenge (e.g. where and how to submit, obtain and work with datasets, where to find relevant documentation) and for us as the organizers to make sure that the data challenge runs smoothly.

In this rung, you will train a regression model to determine the Einstein radius of lenses.

Real galaxies from the HST COSMOS survey (F814W) are used as source galaxies. See the SLSim documentation for details.

Note

Versions 1.x use GalSim to add noise, while Versions 2.x use romanisim. Version 2.0 does not use CRDS.

Downloads

Configuration File

The full mejiro configuration YAML file used to generate the training dataset is included below:

data_dir: &data_dir /nfsdata1/bwedig/mejiro  # where output data should be written
pipeline_label: roman_data_challenge_rung_0
psf_cache_dir: cached_psfs
dev: False
nice: 19
show_progress_bar: True
suppress_warnings: True
logging_level: WARNING
limit: null
seed: &seed 42
cores:
  script_00: 32
  script_01: 32
  script_01a: 32
  script_01b: 32
  script_03: 32
  script_04: 32
  script_05: 32
  script_05_romanisim: 12
  script_snr: 1
jaxtronomy: 
  use_jax: False
  jax_platform: cpu  # cpu or gpu
instrument: roman
survey:
  runs: 756 # 90
  num_galaxy_tables: 756  # number of independent galaxy tables for _01a; higher = more intrinsic diversity
  speed_factor: 1  # set >1 for faster but less complete population draws in _01b
  area: 0.5
  skypy_config: lsst-like_triple_SF
  write_to_csv: False
  total_population: False
  use_real_sources: True
  catalog_source_kwargs:
    catalog_path: /data/bwedig/COSMOS
    catalog_type: COSMOS
    sersic_fallback: True
    max_scale: 1
  use_slhammocks_pipeline: True
  slhammocks_pipeline_kwargs:
    skypy_config: slhammocks
    slhammocks_config: null
    loghm_min: 11
    loghm_max: 15
  detectors: &detectors [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]  # 
  bands: [F062, F087, F106, F129, F158, F184, F213, F146]  # SkyPy will calculate magnitudes for each of these bands; they'll be stored in the physical_params attribute of the strong_lens
  deflector_cut_band: F129
  deflector_cut_band_max: 27
  deflector_z_min: 0.01
  deflector_z_max: 5.0
  source_cut_band: F129
  source_cut_band_max: 29
  source_z_min: 0.01
  source_z_max: 8.0
  min_image_separation: 0.3  # arcseconds
  max_image_separation: 10.0  # arcseconds
  mag_arc_limit_band: F129
  mag_arc_limit: 27
  magnification: 3
subhalos:
  fraction: 1.  # fraction of systems to add substructure to
  pyhalo_model: CDM
  realization_kwargs:
    log_mlow: 6
    log_mhigh: 12
    LOS_normalization: 1.
    concentration_model_subhalos: LUDLOW2016
    concentration_model_fieldhalos: LUDLOW2016
    shmf_log_slope: -1.9
    r_tidal: 0.5  # see Section 3.1 of Gilman et al. 2020 https://ui.adsabs.harvard.edu/abs/2020MNRAS.491.6077G/abstract
    sigma_sub: 0.055  # see Section 6.3 of Gilman et al. 2020 https://ui.adsabs.harvard.edu/abs/2020MNRAS.491.6077G/abstract
synthetic_image:
  bands: [ F106, F129, F158 ]
  fov_arcsec: 8.03
  supersampling_compute_mode: adaptive
  supersampling_factor: &supersampling_factor 5
  pieces: True
exposure:
  ma_table_number: 17
  date: 2027-04-15T00:00:00
  coordinates:
    ra: 150.
    dec: 2.
imaging:
  exposure_time: &exposure_time 642
  engine: galsim
  engine_params:
    rng_seed: *seed
    min_zodi_factor: 1.4
    sky_background: True
    detector_effects: True
    poisson_noise: True
    reciprocity_failure: True
    dark_noise: True
    nonlinearity: True
    ipc: True
    read_noise: True
snr:
  snr_band: F129
  snr_exposure_time: *exposure_time
  snr_fov_arcsec: 8.03
  snr_supersampling_compute_mode: adaptive
  snr_supersampling_factor: &snr_supersampling_factor 1
  snr_threshold: 20
  snr_per_pixel_threshold: 1
psf:
  bands: [ F106, F129, F158 ]
  oversamples: [*snr_supersampling_factor, *supersampling_factor]
  num_pixes: [73]  # the first element of this list is used for the pipeline
  detectors: *detectors
  divide_up_detector: 4  # this sets the detector positions, e.g., 5 means 25 positions on each detector
dataset:
  version: 2.0  # used in the filename of the .h5 file
  labeled: True
  include_psfs: False
  include_synthetic_images: False

Rung 1

The goal of this rung is to distinguish between mass distributions with Cold Dark Matter subhalos and no subhalos. In this rung, you will train a binary classifier to determine whether subhalos are present.

Real galaxies from the JWST COSMOS-Web survey are used as source galaxies. NIRCam F115W, F150W, F277W, and F444W images are mapped to Roman filters to create source galaxies that are not monochromatic. See the SLSim documentation for details.

All versions use romanisim to add noise.

Downloads

Configuration File

The full mejiro configuration YAML file used to generate the training dataset is included below:

data_dir: &data_dir /nfsdata1/bwedig/mejiro  # where output data should be written
pipeline_label: roman_data_challenge_rung_1
psf_cache_dir: cached_psfs
dev: False
nice: 19
show_progress_bar: True
suppress_warnings: True
logging_level: WARNING
limit: null
seed: &seed 42
cores:
  script_00: 60
  script_01: 60
  script_01a: 60
  script_01b: 48
  script_03: 60
  script_04: 60
  script_05: 60
  script_05_romanisim: 12
  script_snr: 24
jaxtronomy: 
  use_jax: False
  jax_platform: cpu  # cpu or gpu
instrument: roman
survey:
  runs: 7002  # 7056
  num_galaxy_tables: 100  # number of independent galaxy tables for _01a; higher = more intrinsic diversity
  speed_factor: 1  # set >1 for faster but less complete population draws in _01b
  area: 0.5
  skypy_config: lsst-like_triple_SF
  write_to_csv: False
  total_population: False
  use_real_sources: True
  catalog_source_kwargs:
    catalog_path: /nfsdata1/bwedig/slsim_source_catalogs/COSMOSWeb_galaxy_catalog
    catalog_type: COSMOS_WEB
    sersic_fallback: True
    max_scale: 1
  # Override which catalog cutout backs each band in source_images.
  # {destination_band: source_band}; both must be keys in source_images
  # (i.e. listed in survey.bands). Self-mappings are no-ops; omit a band
  # to leave its default in place.
  # SLSim COSMOS-Web wavelength-ordering puts the pure JWST cutouts at:
  #   F062=F115W, F106=F150W, F158=F277W, F213=F444W
  remap_bands:
    F106: F062
    F129: F106
    F158: F158
  use_slhammocks_pipeline: True
  slhammocks_pipeline_kwargs:
    skypy_config: slhammocks
    slhammocks_config: null
    loghm_min: 11
    loghm_max: 15
  detectors: &detectors [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]  # 
  bands: [F062, F087, F106, F129, F158, F184, F213, F146]  # SkyPy will calculate magnitudes for each of these bands; they'll be stored in the physical_params attribute of the strong_lens
  deflector_cut_band: F129
  deflector_cut_band_max: 27
  deflector_z_min: 0.01
  deflector_z_max: 5.0
  source_cut_band: F129
  source_cut_band_max: 29
  source_z_min: 0.01
  source_z_max: 8.0
  min_image_separation: 0.3  # arcseconds
  max_image_separation: 10.0  # arcseconds
  mag_arc_limit_band: F129
  mag_arc_limit: 27
  magnification: 3
subhalos:
  split:
    model_1: CDM
    model_2: WDM
  fraction: 0.5  # fraction of systems to add substructure to
  pyhalo_model: CDM
  realization_kwargs:
    log_mlow: 6
    log_mhigh: 12
    LOS_normalization: 1.
    concentration_model_subhalos: LUDLOW2016
    concentration_model_fieldhalos: LUDLOW2016
    shmf_log_slope: -1.9
    r_tidal: 0.5  # see Section 3.1 of Gilman et al. 2020 https://ui.adsabs.harvard.edu/abs/2020MNRAS.491.6077G/abstract
    sigma_sub: 0.055  # see Section 6.3 of Gilman et al. 2020 https://ui.adsabs.harvard.edu/abs/2020MNRAS.491.6077G/abstract
synthetic_image:
  bands: [ F106, F129, F158 ]
  fov_arcsec: 8.03
  supersampling_compute_mode: adaptive
  supersampling_factor: &supersampling_factor 5
  pieces: False
exposure:
  ma_table_number: 17
  date: 2027-04-15T00:00:00
  coordinates:
    ra: 150.
    dec: 2.
imaging:
  exposure_time: &exposure_time 601
  engine: galsim
  engine_params:
    rng_seed: *seed
    min_zodi_factor: 1.4
    sky_background: True
    detector_effects: True
    poisson_noise: True
    reciprocity_failure: True
    dark_noise: True
    nonlinearity: True
    ipc: True
    read_noise: True
snr:
  snr_band: F129
  snr_exposure_time: *exposure_time
  snr_fov_arcsec: 8.03
  snr_supersampling_compute_mode: adaptive
  snr_supersampling_factor: &snr_supersampling_factor 1
  snr_threshold: 20
  snr_per_pixel_threshold: 1
psf:
  bands: [ F106, F129, F158 ]
  oversamples: [*snr_supersampling_factor, *supersampling_factor]
  num_pixes: [73]  # the first element of this list is used for the pipeline
  detectors: *detectors
  divide_up_detector: 4  # this sets the detector positions, e.g., 5 means 25 positions on each detector
dataset:
  version: 1.0  # used in the filename of the .h5 file
  labeled: True
  include_psfs: False
  include_synthetic_images: False

Rung 2

In this rung, you will train a binary classification model to distinguish between cold dark matter (CDM) and warm dark matter (WDM). This will quantify a pipeline's ability to detect the collective perturbative effect of a population of low-mass subhalos just below Roman's single-subhalo detection threshold.