RomanSpectralSubset

class astrocut.RomanSpectralSubset(spectral_files: str | Path | S3Path | List[str | Path | S3Path], source_ids: str | int | List[str | int], wl_range: tuple | list = None, lite: bool | None = True, max_workers: int | None = None, verbose: bool = False)[source]

Bases: ASDFSpectralSubset

Class for creating subsets from Roman spectral data. Inherits from ASDFSpectralSubset and implements the same interface, but is designed for Roman data.

Parameters:
spectral_filesstr, Path, S3Path, or list

Path(s) to the input spectral files. Can be a single file or a list of files.

source_idsstr, int, or list

Source ID(s) to cut out. Can be a single ID or a list of IDs.

wl_rangetuple or list, optional

Wavelength range to cut out, specified as (min_wavelength, max_wavelength). If None, the full wavelength range will be used.

litebool, optional

If True, only a subset of the data and metadata will be included in the subsets to reduce memory usage. Default is True.

Maximum number of worker processes to use when generating subsets in parallel. Default is None.

If None, the number of workers will be set based on the number of CPUs and input files. If an integer is provided, the number of workers used will be the minimum of that value and the number of input files.

It is recommended to use parallel processing when generating subsets from multiple large input files. For a single input file, or for multiple small input files, multiprocessing may not provide a significant speedup and may even slow down execution due to the overhead of parallelization.

verbosebool, optional

If True, log messages will be printed during subset generation. Default is False.

Methods Summary

get_asdf_subsets(*[, group_by, ...])

Get ASDF subset objects for specified source IDs and input files, grouped by source and file, file, or combined.

get_source_file_keys(*[, spectral_files, ...])

Get valid string keys for source/file subset selection.

subset()

Generate the spectral subset(s) from the input ASDF files based on the specified source IDs and wavelength range.

write_as_asdf([output_dir, group_by, ...])

Write the ASDF subset(s) to files in the specified output directory, grouped by source and file, file, or combined.

Methods Documentation

get_asdf_subsets(*, group_by: Literal['source_file', 'file', 'combined'] = 'combined', spectral_files: str | Path | S3Path | List[str | Path | S3Path] = None, source_ids: str | int | List[str | int] = None) dict

Get ASDF subset objects for specified source IDs and input files, grouped by source and file, file, or combined.

Parameters:
group_by{‘source_file’, ‘file’, ‘combined’}, optional

Determines how the subsets are grouped in the output ASDF objects. Default is ‘combined’. - ‘source_file’: Separate ASDF object for each source ID and input file combination. - ‘file’: One ASDF object per input file, containing all specified source IDs from that file. - ‘combined’: A single ASDF object containing all specified source IDs from all input files.

spectral_filesstr or Path or S3Path or list, optional

Specific spectral files to include in the output. If None, all input spectral files will be included. Can be a single file or a list of files.

source_idsstr or int or list, optional

Specific source IDs to include in the output. If None, all source IDs from the subset results will be included. Can be a single ID or a list of IDs.

Returns:
dict or asdf.AsdfFile

Depending on the value of group_by, this method returns either a dictionary of ASDF subset objects keyed by source ID and input file combination (‘source_file’), a dictionary of ASDF subset objects keyed by input file (‘file’), or a single ASDF subset object containing all subsets (‘combined’).

get_source_file_keys(*, spectral_files: str | Path | S3Path | List[str | Path | S3Path] = None, source_ids: str | int | List[str | int] = None) Dict[str, Tuple[str, str]]

Get valid string keys for source/file subset selection.

Parameters:
spectral_filesstr or Path or S3Path or list, optional

Specific spectral files to include. If None, all available files are included.

source_idsstr or int or list, optional

Specific source IDs to include. If None, all available source IDs are included.

Returns:
dict

Mapping from a user-friendly key string to (file, source_id) tuples.

subset()

Generate the spectral subset(s) from the input ASDF files based on the specified source IDs and wavelength range.

Raises:
InvalidQueryError

If no subsets were created, which may indicate that the source IDs are not present in the spectral files or that the wavelength range does not overlap with the data.

write_as_asdf(output_dir: str | Path = '.', *, group_by: Literal['source_file', 'file', 'combined'] = 'combined', spectral_files: List[str | Path] = None, source_ids: str | int | List[str | int] = None, max_workers: int | None = None, validate_output: bool = True) List[str]

Write the ASDF subset(s) to files in the specified output directory, grouped by source and file, file, or combined.

Parameters:
output_dirstr or Path, optional

Output directory for subset files. Default is the current directory.

group_by{‘source_file’, ‘file’, ‘combined’}, optional

Determines how the subsets are grouped in the output ASDF files. Default is ‘combined’. - ‘source_file’: Separate ASDF file for each source ID and input file combination. - ‘file’: One ASDF file per input file, containing all specified source IDs from that file. - ‘combined’: A single ASDF file containing all specified source IDs from all input files.

spectral_filesstr or Path or S3Path or list, optional

Specific spectral files to include in the output. If None, all input spectral files will be included. Can be a single file or a list of files.

source_idsstr or int or list, optional

Specific source IDs to include in the output. If None, all source IDs from the subset results will be included. Can be a single ID or a list of IDs.

max_workersint or None, optional

Maximum number of worker processes to use when writing files in parallel. Default is None. If None, the number of workers will be set based on the number of write jobs and CPU count. It is recommended to use parallel processing when writing large batches of files (>5000).

validate_outputbool, optional

If True, run ASDF schema validation during each file write. If False, validate only a single output file during its write, then skip schema validation for all remaining writes. Default is True. Consider setting to False for improved performance when writing large batches of files that are expected to be valid.

Returns:
List[str]

List of file paths to the written ASDF subset files.