Reading/Writing a 🌈¶
This page describes how to read and/or write Rainbow
objects, using a variety of format definitions that have been included with the main chromatic
package.
from chromatic import read_rainbow, version
version()
'0.4.14'
Quickstart¶
To get started reading files, if you have a file that you think contains flux as a function of wavelength and time ("time-series spectra" or "multiwavelength light curves" or some such), try just using the default read_rainbow
function. It will try to guess the file format from the file name.
rainbow = read_rainbow(
"example-datasets/stsci/jw02734002001_04101_00001-seg00*_nis_x1dints.fits"
)
🌈🤖 This file contains data for 3 spectroscopic orders. Because no `order=` keyword was supplied, we're defaulting to first order. You can hide this warning by expliciting stating which order you want to load. For this file, the options include [1 2 3]. 🌈🤖 No `int_times` extension was found in the first file jw02734002001_04101_00001-seg001_nis_x1dints.fits 🌈🤖 Times were set by linearly interpolating between the exposure start and end points, as estimated from the 'SCI' extension using the 'TDB-BEG', 'TDB-END', and 'EFFINTTM' keywords. Times may be off by a few seconds and possibly up to the duration of one integration (= 76.916s).
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) Cell In[3], line 1 ----> 1 rainbow = read_rainbow( 2 "example-datasets/stsci/jw02734002001_04101_00001-seg00*_nis_x1dints.fits" 3 ) File ~/Dropbox/zach/code/chromatic/chromatic/rainbows/__init__.py:29, in read_rainbow(filepath, **kw) 8 def read_rainbow(filepath, **kw): 9 """ 10 A friendly wrapper to load time-series spectra and/or 11 multiwavelength light curves into a `chromatic` Rainbow (...) 27 The loaded data! 28 """ ---> 29 r = Rainbow(filepath, **kw) 30 if "model" in r.fluxlike: 31 return RainbowWithModel(**r._get_core_dictionaries()) File ~/Dropbox/zach/code/chromatic/chromatic/rainbows/rainbow.py:227, in Rainbow.__init__(self, filepath, format, wavelength, time, flux, uncertainty, wavelike, timelike, fluxlike, metadata, name, **kw) 225 # then try to initialize from a file 226 elif isinstance(filepath, (str, list, Column)): --> 227 self._initialize_from_file(filepath=filepath, format=format, **kw) 229 # finally, tidy up by guessing the scales 230 self._guess_wscale() File ~/Dropbox/zach/code/chromatic/chromatic/rainbows/rainbow.py:463, in Rainbow._initialize_from_file(self, filepath, format, **kw) 461 # pick the appropriate reader 462 reader = guess_reader(filepath=filepath, format=format) --> 463 reader(self, filepath, **kw) 465 # validate that something reasonable got populated 466 self._validate_core_dictionaries() File ~/Dropbox/zach/code/chromatic/chromatic/rainbows/readers/x1dints.py:366, in from_x1dints(rainbow, filepath, order, **kw) 362 current_integration = hdu[e].header["int_num"] 364 # in case of missing segments, convert integration to index 365 current_time_index = np.nonzero( --> 366 timelike["integration_number"] == current_integration 367 )[0][0] 368 # print(current_integration, current_time_index) 369 370 # loop through all the columns in the data extension 371 for column in hdu[e].columns: 372 373 # get a lower case name for the unit KeyError: 'integration_number'
Then, to save a file, try just using the .save()
method. Again, it will try to guess the file format from the filename.
rainbow.save("example-datasets/chromatic/ero-transit-wasp-96b.rainbow.npy")
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[4], line 1 ----> 1 rainbow.save("example-datasets/chromatic/ero-transit-wasp-96b.rainbow.npy") NameError: name 'rainbow' is not defined
The sections below provide more details on some of the available file formats for reading and writing files, but the basic process is what you've already seen: use read_rainbow()
and .save()
to load and save spectroscopic light curve data with a variety of formats!
Reading Files¶
chromatic
can load data from a variety of different file formats. Whether these are time-series spectra or binned spectroscopic light curves, there's a good chance that the read_rainbow
function might be able to load them into a 🌈. By writing custom readers for different data formats, we hope to make it easier to use chromatic
to compare the results of different analyses.
Download Example Inputs: If you want to test out any of these readers, you'll need data files in each format to test on. You can download some example datasets from this link. Simply extract that .zip
file into the directory from which you'll be running this notebook. Another source of files you might want to try reading would be the simulated data generated for the ers-transit Spring 2022 Data Challenge.
chromatic
rainbow files (*.rainbow.npy
)¶
The chromatic
toolkit saves files in its own default format, which can then be shared and loaded back in. These files directly encode the core dictionaries in binary files, so they load and save quickly. They have the extension .rainbow.npy
and can be written from any Rainbow
object.
r = read_rainbow("example-datasets/chromatic/test.rainbow.npy")
The Rainbow
reader will try to guess the format of the file from the filepath. If that doesn't work for some reason, in this case you can feed in the keyword format='rainbow_npy'
, to require the use of the from_rainbow_npy
reader needed for these files.
chromatic
rainbow FITS files (*.rainbow.fits
)¶
Because you might want to share a Rainbow
object with someone not using Python, we define a FITS-based file format. The Flexible Image Transport System is common in astronomy, so there's a good chance someone will be able to load this file into whatever coding language they're using. These files have the extension .rainbow.fits
, and they will load a tiny bit more slowly than .rainbow.npy
files; they can be written from any Rainbow
object.
r = read_rainbow("example-datasets/chromatic/test.rainbow.fits")
The Rainbow
reader will try to guess the format of the file from the filepath. If that doesn't work for some reason, in this case you can feed in the keyword format='rainbow_FITS'
, to require the use of the from_rainbow_FITS
reader needed for these files.
generic text files (*.txt
, *.csv
)¶
Text files are slower to read or write, but everyone can make them. This reader will try to load one giant text file in which light curves for all wavelengths are stacked on top of each other or spectra for all times are stacked on top of each other. The text file should at least have columns that look like:
wavelength
for wavelength in micronstime
for time in days (preferably BJD$_{\rm TDB}$)flux
for flux in any unitsuncertainty
for flux uncertainties in the same units asflux
Additional columns will also be read, and they will be stored in the.fluxlike
core dictionary.
r = read_rainbow("example-datasets/chromatic/test.rainbow.txt")
If the file-format guess fails, you can feed in the keyword format='text'
to tell the reader to expect one of these files.
STScI jwst
pipeline outputs (x1dints.fits
)¶
The jwst
pipeline developed at the Space Telescope Science Institute will produce extract 1D stellar spectra for time-series observations with the James Webb Space Telescope. Details about the pipeline itself are available here.
These files typically end with the _x1dints.fits
suffix. Each file contains a number of individual "integrations" (= time points). Because the datasets can get large, sometimes a particular observation might be split into multiple segments, each with its own file. As such, the reader for these files is designed to handle either a single file or a path with a *
in it that points to a group of files from an observation that's been split into segments.
r = read_rainbow("example-datasets/stsci/*_x1dints.fits")
🌈🤖 This file contains data for 3 spectroscopic orders. Because no `order=` keyword was supplied, we're defaulting to first order. You can hide this warning by expliciting stating which order you want to load. For this file, the options include [1 2 3]. 🌈🤖 No `int_times` extension was found in the first file jw02734002001_04101_00001-seg001_nis_x1dints.fits 🌈🤖 Times were set by linearly interpolating between the exposure start and end points, as estimated from the 'SCI' extension using the 'TDB-BEG', 'TDB-END', and 'EFFINTTM' keywords. Times may be off by a few seconds and possibly up to the duration of one integration (= 76.916s).
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) Cell In[8], line 1 ----> 1 r = read_rainbow("example-datasets/stsci/*_x1dints.fits") File ~/Dropbox/zach/code/chromatic/chromatic/rainbows/__init__.py:29, in read_rainbow(filepath, **kw) 8 def read_rainbow(filepath, **kw): 9 """ 10 A friendly wrapper to load time-series spectra and/or 11 multiwavelength light curves into a `chromatic` Rainbow (...) 27 The loaded data! 28 """ ---> 29 r = Rainbow(filepath, **kw) 30 if "model" in r.fluxlike: 31 return RainbowWithModel(**r._get_core_dictionaries()) File ~/Dropbox/zach/code/chromatic/chromatic/rainbows/rainbow.py:227, in Rainbow.__init__(self, filepath, format, wavelength, time, flux, uncertainty, wavelike, timelike, fluxlike, metadata, name, **kw) 225 # then try to initialize from a file 226 elif isinstance(filepath, (str, list, Column)): --> 227 self._initialize_from_file(filepath=filepath, format=format, **kw) 229 # finally, tidy up by guessing the scales 230 self._guess_wscale() File ~/Dropbox/zach/code/chromatic/chromatic/rainbows/rainbow.py:463, in Rainbow._initialize_from_file(self, filepath, format, **kw) 461 # pick the appropriate reader 462 reader = guess_reader(filepath=filepath, format=format) --> 463 reader(self, filepath, **kw) 465 # validate that something reasonable got populated 466 self._validate_core_dictionaries() File ~/Dropbox/zach/code/chromatic/chromatic/rainbows/readers/x1dints.py:366, in from_x1dints(rainbow, filepath, order, **kw) 362 current_integration = hdu[e].header["int_num"] 364 # in case of missing segments, convert integration to index 365 current_time_index = np.nonzero( --> 366 timelike["integration_number"] == current_integration 367 )[0][0] 368 # print(current_integration, current_time_index) 369 370 # loop through all the columns in the data extension 371 for column in hdu[e].columns: 372 373 # get a lower case name for the unit KeyError: 'integration_number'
If the file-format guess fails, you can feed in the keyword format='x1dints'
to tell the reader to expect one of these files. This reader was rewritten on 13 July 2022 to read in the JWST/ERO x1dints
datasets. It might not work on earlier simulated x1dints
files like those in the simulated datasets available here; for those, try using the format='x1dints_kludge'
keyword.
eureka
pipeline outputs (S[3|4|5].[h5|txt]
)¶
The Eureka! pipeline is one of many community tools being designed to extract spectra from JWST data. The current outputs have filenames that look like S3*SpecData.h5
for Stage 3 (extracted spectra), S4*LCData.h5
for Stage 4 (raw binned light curves), and a group of files *S5_*_Table_Save_*.txt
for Stage 5 (fitted binned light curves) for all channels. Any of these three stages can be read with chromatic
.
s3 = read_rainbow("example-datasets/eureka/S3_example_SpecData.h5")
🌈🤖 You are trying import `astraeus`, which is needed for reading and writing `Eureka!` pipeline files. We're having trouble importing it, so you could please confirm that it is installed into you current environment by running the following command? pip install git+https://github.com/kevin218/Astraeus.git Once you've installed it, please restart whatever code you were trying to run, and hopefully it will work! Thanks for your patience! 🌈🤖 Something doesn't line up! The flux array has a shape of (). The wavelength array has 0 wavelengths. The time array has 0 times. 🌈🤖 Watch out! The 'ok' array has a shape of (0, 0), which doesn't match the flux array's shape of (). 🌈🤖 Wavelength, time, and flux arrays don't match; the `._sort()` step is being skipped.
s4 = read_rainbow("example-datasets/eureka/S4_example_LCData.h5")
🌈🤖 You are trying import `astraeus`, which is needed for reading and writing `Eureka!` pipeline files. We're having trouble importing it, so you could please confirm that it is installed into you current environment by running the following command? pip install git+https://github.com/kevin218/Astraeus.git Once you've installed it, please restart whatever code you were trying to run, and hopefully it will work! Thanks for your patience! 🌈🤖 Something doesn't line up! The flux array has a shape of (). The wavelength array has 0 wavelengths. The time array has 0 times. 🌈🤖 Watch out! The 'ok' array has a shape of (0, 0), which doesn't match the flux array's shape of (). 🌈🤖 Wavelength, time, and flux arrays don't match; the `._sort()` step is being skipped.
s5 = read_rainbow("example-datasets/eureka/S5*Table_Save_*.txt")
If the file-format guess fails, you can feed in the keywords format='eureka_s3'
, format='eureka_s4'
, or format='eureka_s5'
to tell the reader what file(s) to expect. (Older versions of Eureka! used text files for earlier stages, with filenames like S3_*_Table_Save.txt
; that format will continue work with format='eureka_txt'
.)
xarray
-based ERS format (*.xc
)¶
Natasha Batalha, Lili Alderson, Munazza Alam, and Hannah Wakeford put together some specifications for a standard format for publishing datasets. The details may still change a little bit (as of 13 July 2022), but chromatic
can currently read a version their stellar-spec
, raw-light-curves
, and fitted-light-curves
formats.
spectra = read_rainbow("example-datasets/xarray/stellar-spec.xc")
raw_lcs = read_rainbow("example-datasets/xarray/raw-light-curves.xc")
fitted_lcs = read_rainbow("example-datasets/xarray/fitted-light-curves.xc")
Writing Files¶
chromatic
can write out files in a variety of different file formats. By pairing with the available readers, this makes it possible to effectively switch one file format to another, simply by reading one file in and saving it out as another. To demonstrate the readers, let's create a simple simulated dataset.
from chromatic import SimulatedRainbow
simulated = SimulatedRainbow().inject_transit().inject_systematics().inject_noise()
chromatic
rainbow files (*.rainbow.npy
)¶
The default file format for saving files encodes the core dictionaries in binary files, using the extension .rainbow.npy
. This is a file that can be read directly back into chromatic
. (Indeed, the commands below created the file that we read above.)
simulated.save("example-datasets/chromatic/test.rainbow.npy")
chromatic
rainbow FITS files (*.rainbow.fits
)¶
If you want to share your Rainbow object with someone who might not be using Python, consider sharing a .rainbow.fits
file. This is a normal FITS file that many astronomers will have a way of reading. The primary extension has no data but a header that might contain some metadata. The three other extensions fluxlike
, wavelike
, and timelike
contain quantities that have shapes of (nwave, ntime)
, (nwave)
, (ntime)
, respectively.
simulated.save("example-datasets/chromatic/test.rainbow.fits")
WARNING: VerifyWarning: Keyword name 'injected_transit_method' is greater than 8 characters or contains characters not allowed by the FITS standard; a HIERARCH card will be created. [astropy.io.fits.card] WARNING: VerifyWarning: Keyword name 'injected_transit_parameters' is greater than 8 characters or contains characters not allowed by the FITS standard; a HIERARCH card will be created. [astropy.io.fits.card] 🌈🤖 metadata item 'injected_transit_parameters' cannot be saved to FITS header WARNING: VerifyWarning: Keyword name 'systematics_components' is greater than 8 characters or contains characters not allowed by the FITS standard; a HIERARCH card will be created. [astropy.io.fits.card] 🌈🤖 metadata item 'systematics_components' cannot be saved to FITS header WARNING: VerifyWarning: Keyword name 'systematics_equation' is greater than 8 characters or contains characters not allowed by the FITS standard; a HIERARCH card will be created. [astropy.io.fits.card] 🌈🤖 metadata item 'systematics_equation' cannot be saved to FITS header WARNING: VerifyWarning: Keyword name 'signal_to_noise' is greater than 8 characters or contains characters not allowed by the FITS standard; a HIERARCH card will be created. [astropy.io.fits.card]
xarray
-based ERS format (*.xc
)¶
chromatic
can write out to the standard xarray
-based format described above. These writers will generally raise warnings if important metadata is missing.
spectra = simulated.save("example-datasets/xarray/stellar-spec.xc")
🌈🤖 The required metadata keyword `author` was not found. Before saving, please set it with `rainbow.author = ?` 🌈🤖 The required metadata keyword `contact` was not found. Before saving, please set it with `rainbow.contact = ?` 🌈🤖 The required metadata keyword `code` was not found. Before saving, please set it with `rainbow.code = ?`
raw_lcs = simulated.save("example-datasets/xarray/raw-light-curves.xc")
🌈🤖 The required metadata keyword `data_origin` was not found. Before saving, please set it with `rainbow.data_origin = ?`
fitted_lcs = simulated.save("example-datasets/xarray/fitted-light-curves.xc")
generic text files (*.txt
, *.csv
)¶
Text files provide a more generally readable file format, even though they may be slower to read or write. This writer will create one giant text file that stacks the light curves for all wavelengths on top of each other (if the group_by='wavelength'
keyword is set) or the spectra for all times on top of each other (if the group_by='time'
keyword is set). The resulting text file should at least have columns that look like:
wavelength
for wavelength in micronstime
for time in days (preferably BJD$_{\rm TDB}$)flux
for flux in any unitsuncertainty
for flux uncertainties in the same units asflux
simulated.save("example-datasets/chromatic/test.rainbow.txt")
Other File Formats¶
Naturally, you might want to use other readers or writers than have already been listed here, to be able to interpret outputs from other analyses or to output the inputs needed for various light curve analyses. We've already added a number of custom readers and writers. Here are the currently available file formats:
from chromatic import available_readers, available_writers
list(available_readers)
['from_x1dints', 'from_x1dints_kludge', 'from_eureka_S3_txt', 'from_eureka_SpecData', 'from_eureka_S3', 'from_eureka_LCData', 'from_eureka_S4', 'from_eureka_channels', 'from_eureka_S5', 'from_rainbow_npy', 'from_rainbow_FITS', 'from_text', 'from_xarray_stellar_spectra', 'from_xarray_raw_light_curves', 'from_xarray_fitted_light_curves', 'from_nres', 'from_atoca', 'from_espinoza', 'from_dossantos', 'from_feinstein_numpy', 'from_feinstein_h5', 'from_schlawin', 'from_coulombe', 'from_kirk_fitted_light_curves', 'from_kirk_stellar_spectra', 'from_radica', 'from_aylin']
list(available_writers)
['to_rainbow_npy', 'to_rainbow_FITS', 'to_xarray_stellar_spectra', 'to_xarray_raw_light_curves', 'to_xarray_fitted_light_curves', 'to_text']
If you would like a reader and/or writer for a format that doesn't exist above, please either submit an Issue to discuss its creation or see Designing New 🌈 Features to learn how to add it yourself. We've invested effort in trying to make it as easy as possible to develop readers/writers for new formats, so please don't hesitate to ask for something to be added. It's really not that hard to add something new.