RomanSpectralSubset
- class astrocut.RomanSpectralSubset(spectral_files: str | Path | S3Path | List[str | Path | S3Path], source_ids: str | int | List[str | int], wl_range: tuple | list = None, lite: bool | None = True, max_workers: int | None = None, verbose: bool = False)[source]
Bases:
ASDFSpectralSubsetClass for creating subsets from Roman spectral data. Inherits from
ASDFSpectralSubsetand implements the same interface, but is designed for Roman data.- Parameters:
- spectral_filesstr, Path, S3Path, or list
Path(s) to the input spectral files. Can be a single file or a list of files.
- source_idsstr, int, or list
Source ID(s) to cut out. Can be a single ID or a list of IDs.
- wl_rangetuple or list, optional
Wavelength range to cut out, specified as (min_wavelength, max_wavelength). If None, the full wavelength range will be used.
- litebool, optional
If True, only a subset of the data and metadata will be included in the subsets to reduce memory usage. Default is True.
- Maximum number of worker processes to use when generating subsets in parallel. Default is None.
If None, the number of workers will be set based on the number of CPUs and input files. If an integer is provided, the number of workers used will be the minimum of that value and the number of input files.
It is recommended to use parallel processing when generating subsets from multiple large input files. For a single input file, or for multiple small input files, multiprocessing may not provide a significant speedup and may even slow down execution due to the overhead of parallelization.
- verbosebool, optional
If True, log messages will be printed during subset generation. Default is False.
Methods Summary
get_asdf_subsets(*[, group_by, ...])Get ASDF subset objects for specified source IDs and input files, grouped by source and file, file, or combined.
get_source_file_keys(*[, spectral_files, ...])Get valid string keys for source/file subset selection.
subset()Generate the spectral subset(s) from the input ASDF files based on the specified source IDs and wavelength range.
write_as_asdf([output_dir, group_by, ...])Write the ASDF subset(s) to files in the specified output directory, grouped by source and file, file, or combined.
Methods Documentation
- get_asdf_subsets(*, group_by: Literal['source_file', 'file', 'combined'] = 'combined', spectral_files: str | Path | S3Path | List[str | Path | S3Path] = None, source_ids: str | int | List[str | int] = None) dict
Get ASDF subset objects for specified source IDs and input files, grouped by source and file, file, or combined.
- Parameters:
- group_by{‘source_file’, ‘file’, ‘combined’}, optional
Determines how the subsets are grouped in the output ASDF objects. Default is ‘combined’. - ‘source_file’: Separate ASDF object for each source ID and input file combination. - ‘file’: One ASDF object per input file, containing all specified source IDs from that file. - ‘combined’: A single ASDF object containing all specified source IDs from all input files.
- spectral_filesstr or Path or S3Path or list, optional
Specific spectral files to include in the output. If None, all input spectral files will be included. Can be a single file or a list of files.
- source_idsstr or int or list, optional
Specific source IDs to include in the output. If None, all source IDs from the subset results will be included. Can be a single ID or a list of IDs.
- Returns:
- dict or asdf.AsdfFile
Depending on the value of
group_by, this method returns either a dictionary of ASDF subset objects keyed by source ID and input file combination (‘source_file’), a dictionary of ASDF subset objects keyed by input file (‘file’), or a single ASDF subset object containing all subsets (‘combined’).
- get_source_file_keys(*, spectral_files: str | Path | S3Path | List[str | Path | S3Path] = None, source_ids: str | int | List[str | int] = None) Dict[str, Tuple[str, str]]
Get valid string keys for source/file subset selection.
- Parameters:
- spectral_filesstr or Path or S3Path or list, optional
Specific spectral files to include. If None, all available files are included.
- source_idsstr or int or list, optional
Specific source IDs to include. If None, all available source IDs are included.
- Returns:
- dict
Mapping from a user-friendly key string to
(file, source_id)tuples.
- subset()
Generate the spectral subset(s) from the input ASDF files based on the specified source IDs and wavelength range.
- Raises:
- InvalidQueryError
If no subsets were created, which may indicate that the source IDs are not present in the spectral files or that the wavelength range does not overlap with the data.
- write_as_asdf(output_dir: str | Path = '.', *, group_by: Literal['source_file', 'file', 'combined'] = 'combined', spectral_files: List[str | Path] = None, source_ids: str | int | List[str | int] = None, max_workers: int | None = None, validate_output: bool = True) List[str]
Write the ASDF subset(s) to files in the specified output directory, grouped by source and file, file, or combined.
- Parameters:
- output_dirstr or Path, optional
Output directory for subset files. Default is the current directory.
- group_by{‘source_file’, ‘file’, ‘combined’}, optional
Determines how the subsets are grouped in the output ASDF files. Default is ‘combined’. - ‘source_file’: Separate ASDF file for each source ID and input file combination. - ‘file’: One ASDF file per input file, containing all specified source IDs from that file. - ‘combined’: A single ASDF file containing all specified source IDs from all input files.
- spectral_filesstr or Path or S3Path or list, optional
Specific spectral files to include in the output. If None, all input spectral files will be included. Can be a single file or a list of files.
- source_idsstr or int or list, optional
Specific source IDs to include in the output. If None, all source IDs from the subset results will be included. Can be a single ID or a list of IDs.
- max_workersint or None, optional
Maximum number of worker processes to use when writing files in parallel. Default is None. If None, the number of workers will be set based on the number of write jobs and CPU count. It is recommended to use parallel processing when writing large batches of files (>5000).
- validate_outputbool, optional
If True, run ASDF schema validation during each file write. If False, validate only a single output file during its write, then skip schema validation for all remaining writes. Default is True. Consider setting to False for improved performance when writing large batches of files that are expected to be valid.
- Returns:
- List[str]
List of file paths to the written ASDF subset files.