Reading/Writing a 🌈¶

This page describes how to read and/or write Rainbow objects, using a variety of format definitions that have been included with the main chromatic package.

In [1]:

Copied!

from chromatic import read_rainbow, version
from chromatic import read_rainbow, version

In [2]:

Copied!

version()
version()

Out[2]:

'0.5.0'

Quickstart¶

To get started reading files, if you have a file that you think contains flux as a function of wavelength and time ("time-series spectra" or "multiwavelength light curves" or some such), try just using the default read_rainbow function. It will try to guess the file format from the file name.

In [3]:

Copied!

rainbow = read_rainbow(
    "example-datasets/stsci/jw02734002001_04101_00001-seg00*_nis_x1dints.fits"
)
rainbow = read_rainbow(
    "example-datasets/stsci/jw02734002001_04101_00001-seg00*_nis_x1dints.fits"
)

🌈🤖 This file contains data for 2 spectroscopic orders. Because no
`order=` keyword was supplied, we're defaulting to first order. You can
hide this warning by expliciting stating which order you want to load.
For this file, the options include [1 2].

🌈🤖 The 2048 input wavelengths were not monotonically increasing.
<🌈(2048w, 280t)> has been sorted from lowest to highest wavelength.
If you want to recover the original wavelength order, the original
wavelength indices are available in `rainbow.original_wave_index`.

Then, to save a file, try just using the .save() method. Again, it will try to guess the file format from the filename.

In [4]:

Copied!

rainbow.save("example-datasets/chromatic/ero-transit-wasp-96b.rainbow.npy")
rainbow.save("example-datasets/chromatic/ero-transit-wasp-96b.rainbow.npy")

The sections below provide more details on some of the available file formats for reading and writing files, but the basic process is what you've already seen: use read_rainbow() and .save() to load and save spectroscopic light curve data with a variety of formats!

Reading Files¶

chromatic can load data from a variety of different file formats. Whether these are time-series spectra or binned spectroscopic light curves, there's a good chance that the read_rainbow function might be able to load them into a 🌈. By writing custom readers for different data formats, we hope to make it easier to use chromatic to compare the results of different analyses.

Download Example Inputs: If you want to test out any of these readers, you'll need data files in each format to test on. You can download some example datasets from this link. Simply extract that .zip file into the directory from which you'll be running this notebook. Another source of files you might want to try reading would be the simulated data generated for the ers-transit Spring 2022 Data Challenge.

`chromatic` rainbow files (`*.rainbow.npy`)¶

The chromatic toolkit saves files in its own default format, which can then be shared and loaded back in. These files directly encode the core dictionaries in binary files, so they load and save quickly. They have the extension .rainbow.npy and can be written from any Rainbow object.

In [5]:

Copied!

r = read_rainbow("example-datasets/chromatic/test.rainbow.npy")
r = read_rainbow("example-datasets/chromatic/test.rainbow.npy")

The Rainbow reader will try to guess the format of the file from the filepath. If that doesn't work for some reason, in this case you can feed in the keyword format='rainbow_npy', to require the use of the from_rainbow_npy reader needed for these files.

`chromatic` rainbow FITS files (`*.rainbow.fits`)¶

Because you might want to share a Rainbow object with someone not using Python, we define a FITS-based file format. The Flexible Image Transport System is common in astronomy, so there's a good chance someone will be able to load this file into whatever coding language they're using. These files have the extension .rainbow.fits, and they will load a tiny bit more slowly than .rainbow.npy files; they can be written from any Rainbow object.

In [6]:

Copied!

r = read_rainbow("example-datasets/chromatic/test.rainbow.fits")
r = read_rainbow("example-datasets/chromatic/test.rainbow.fits")

The Rainbow reader will try to guess the format of the file from the filepath. If that doesn't work for some reason, in this case you can feed in the keyword format='rainbow_FITS', to require the use of the from_rainbow_FITS reader needed for these files.

generic text files (`.txt`, `.csv`)¶

Text files are slower to read or write, but everyone can make them. This reader will try to load one giant text file in which light curves for all wavelengths are stacked on top of each other or spectra for all times are stacked on top of each other. The text file should at least have columns that look like:

wavelength for wavelength in microns
time for time in days (preferably BJD$_{\rm TDB}$)
flux for flux in any units
uncertainty for flux uncertainties in the same units as flux Additional columns will also be read, and they will be stored in the .fluxlike core dictionary.

In [7]:

Copied!

r = read_rainbow("example-datasets/chromatic/test.rainbow.txt")
r = read_rainbow("example-datasets/chromatic/test.rainbow.txt")

If the file-format guess fails, you can feed in the keyword format='text' to tell the reader to expect one of these files.

STScI `jwst` pipeline outputs (`x1dints.fits`)¶

The jwst pipeline developed at the Space Telescope Science Institute will produce extract 1D stellar spectra for time-series observations with the James Webb Space Telescope. Details about the pipeline itself are available here.

These files typically end with the _x1dints.fits suffix. Each file contains a number of individual "integrations" (= time points). Because the datasets can get large, sometimes a particular observation might be split into multiple segments, each with its own file. As such, the reader for these files is designed to handle either a single file or a path with a * in it that points to a group of files from an observation that's been split into segments.

In [8]:

Copied!

r = read_rainbow("example-datasets/stsci/*_x1dints.fits")
r = read_rainbow("example-datasets/stsci/*_x1dints.fits")

🌈🤖 This file contains data for 2 spectroscopic orders. Because no
`order=` keyword was supplied, we're defaulting to first order. You can
hide this warning by expliciting stating which order you want to load.
For this file, the options include [1 2].

🌈🤖 The 2048 input wavelengths were not monotonically increasing.
<🌈(2048w, 280t)> has been sorted from lowest to highest wavelength.
If you want to recover the original wavelength order, the original
wavelength indices are available in `rainbow.original_wave_index`.

If the file-format guess fails, you can feed in the keyword format='x1dints' to tell the reader to expect one of these files. This reader was rewritten on 13 July 2022 to read in the JWST/ERO x1dints datasets. It might not work on earlier simulated x1dints files like those in the simulated datasets available here; for those, try using the format='x1dints_kludge' keyword.

`eureka` pipeline outputs (`S[3|4|5].[h5|txt]`)¶

The Eureka! pipeline is one of many community tools being designed to extract spectra from JWST data. The current outputs have filenames that look like S3*SpecData.h5 for Stage 3 (extracted spectra), S4*LCData.h5 for Stage 4 (raw binned light curves), and a group of files *S5_*_Table_Save_*.txt for Stage 5 (fitted binned light curves) for all channels. Any of these three stages can be read with chromatic.

In [9]:

Copied!

s3 = read_rainbow("example-datasets/eureka/S3_example_SpecData.h5")
s3 = read_rainbow("example-datasets/eureka/S3_example_SpecData.h5")

🌈🤖 Times are being estimated from the 'BJD_TDB'
keyword but being interpreted as "modified"
BJD_TDB ("modified" = BJD_TDB - 2400000.5).
This accounts for an earlier version of Eureka;
in the latest version times should likely be
listed as 'BMJD_TDB' to be more honest about
the fact that they're modified.

If we're interpreting times wrongly, please
raise an issue on the chromatic github!

In [10]:

Copied!

s4 = read_rainbow("example-datasets/eureka/S4_example_LCData.h5")
s4 = read_rainbow("example-datasets/eureka/S4_example_LCData.h5")

🌈🤖 Times are being estimated from the 'BJD_TDB'
keyword but being interpreted as "modified"
BJD_TDB ("modified" = BJD_TDB - 2400000.5).
This accounts for an earlier version of Eureka;
in the latest version times should likely be
listed as 'BMJD_TDB' to be more honest about
the fact that they're modified.

If we're interpreting times wrongly, please
raise an issue on the chromatic github!

In [11]:

Copied!

s5 = read_rainbow("example-datasets/eureka/S5*Table_Save_*.txt")
s5 = read_rainbow("example-datasets/eureka/S5*Table_Save_*.txt")

If the file-format guess fails, you can feed in the keywords format='eureka_s3', format='eureka_s4', or format='eureka_s5' to tell the reader what file(s) to expect. (Older versions of Eureka! used text files for earlier stages, with filenames like S3_*_Table_Save.txt; that format will continue work with format='eureka_txt'.)

`xarray`-based ERS format (`*.xc`)¶

Natasha Batalha, Lili Alderson, Munazza Alam, and Hannah Wakeford put together some specifications for a standard format for publishing datasets. The details may still change a little bit (as of 13 July 2022), but chromatic can currently read a version their stellar-spec, raw-light-curves, and fitted-light-curves formats.

In [12]:

Copied!

spectra = read_rainbow("example-datasets/xarray/stellar-spec.xc")
spectra = read_rainbow("example-datasets/xarray/stellar-spec.xc")

In [13]:

Copied!

raw_lcs = read_rainbow("example-datasets/xarray/raw-light-curves.xc")
raw_lcs = read_rainbow("example-datasets/xarray/raw-light-curves.xc")

In [14]:

Copied!

fitted_lcs = read_rainbow("example-datasets/xarray/fitted-light-curves.xc")
fitted_lcs = read_rainbow("example-datasets/xarray/fitted-light-curves.xc")

Writing Files¶

chromatic can write out files in a variety of different file formats. By pairing with the available readers, this makes it possible to effectively switch one file format to another, simply by reading one file in and saving it out as another. To demonstrate the readers, let's create a simple simulated dataset.

In [15]:

Copied!

from chromatic import SimulatedRainbow
from chromatic import SimulatedRainbow

In [16]:

Copied!

simulated = SimulatedRainbow().inject_transit().inject_systematics().inject_noise()
simulated = SimulatedRainbow().inject_transit().inject_systematics().inject_noise()

`chromatic` rainbow files (`*.rainbow.npy`)¶

The default file format for saving files encodes the core dictionaries in binary files, using the extension .rainbow.npy. This is a file that can be read directly back into chromatic. (Indeed, the commands below created the file that we read above.)

In [17]:

Copied!

simulated.save("example-datasets/chromatic/test.rainbow.npy")
simulated.save("example-datasets/chromatic/test.rainbow.npy")

`chromatic` rainbow FITS files (`*.rainbow.fits`)¶

If you want to share your Rainbow object with someone who might not be using Python, consider sharing a .rainbow.fits file. This is a normal FITS file that many astronomers will have a way of reading. The primary extension has no data but a header that might contain some metadata. The three other extensions fluxlike, wavelike, and timelike contain quantities that have shapes of (nwave, ntime), (nwave), (ntime), respectively.

In [18]:

Copied!

simulated.save("example-datasets/chromatic/test.rainbow.fits")
simulated.save("example-datasets/chromatic/test.rainbow.fits")

WARNING: VerifyWarning: Keyword name 'injected_transit_method' is greater than 8 characters or contains characters not allowed by the FITS standard; a HIERARCH card will be created. [astropy.io.fits.card]
WARNING: VerifyWarning: Keyword name 'injected_transit_parameters' is greater than 8 characters or contains characters not allowed by the FITS standard; a HIERARCH card will be created. [astropy.io.fits.card]
🌈🤖 metadata item 'injected_transit_parameters' cannot be saved to FITS header

WARNING: VerifyWarning: Keyword name 'systematics_components' is greater than 8 characters or contains characters not allowed by the FITS standard; a HIERARCH card will be created. [astropy.io.fits.card]
🌈🤖 metadata item 'systematics_components' cannot be saved to FITS header

WARNING: VerifyWarning: Keyword name 'systematics_equation' is greater than 8 characters or contains characters not allowed by the FITS standard; a HIERARCH card will be created. [astropy.io.fits.card]
🌈🤖 metadata item 'systematics_equation' cannot be saved to FITS header

WARNING: VerifyWarning: Keyword name 'signal_to_noise' is greater than 8 characters or contains characters not allowed by the FITS standard; a HIERARCH card will be created. [astropy.io.fits.card]

`xarray`-based ERS format (`*.xc`)¶

chromatic can write out to the standard xarray-based format described above. These writers will generally raise warnings if important metadata is missing.

In [19]:

Copied!

spectra = simulated.save("example-datasets/xarray/stellar-spec.xc")
spectra = simulated.save("example-datasets/xarray/stellar-spec.xc")

🌈🤖 The required metadata keyword `author` was not found.
Before saving, please set it with `rainbow.author = ?`

🌈🤖 The required metadata keyword `contact` was not found.
Before saving, please set it with `rainbow.contact = ?`

🌈🤖 The required metadata keyword `code` was not found.
Before saving, please set it with `rainbow.code = ?`

In [20]:

Copied!

raw_lcs = simulated.save("example-datasets/xarray/raw-light-curves.xc")
raw_lcs = simulated.save("example-datasets/xarray/raw-light-curves.xc")

🌈🤖 The required metadata keyword `data_origin` was not found.
Before saving, please set it with `rainbow.data_origin = ?`

In [21]:

Copied!

fitted_lcs = simulated.save("example-datasets/xarray/fitted-light-curves.xc")
fitted_lcs = simulated.save("example-datasets/xarray/fitted-light-curves.xc")

generic text files (`.txt`, `.csv`)¶

Text files provide a more generally readable file format, even though they may be slower to read or write. This writer will create one giant text file that stacks the light curves for all wavelengths on top of each other (if the group_by='wavelength' keyword is set) or the spectra for all times on top of each other (if the group_by='time' keyword is set). The resulting text file should at least have columns that look like:

wavelength for wavelength in microns
time for time in days (preferably BJD$_{\rm TDB}$)
flux for flux in any units
uncertainty for flux uncertainties in the same units as flux

In [22]:

Copied!

simulated.save("example-datasets/chromatic/test.rainbow.txt")
simulated.save("example-datasets/chromatic/test.rainbow.txt")

Other File Formats¶

Naturally, you might want to use other readers or writers than have already been listed here, to be able to interpret outputs from other analyses or to output the inputs needed for various light curve analyses. We've already added a number of custom readers and writers. Here are the currently available file formats:

In [23]:

Copied!

from chromatic import available_readers, available_writers
from chromatic import available_readers, available_writers

In [24]:

Copied!

list(available_readers)
list(available_readers)

Out[24]:

['from_x1dints',
 'from_x1dints_kludge',
 'from_eureka_S3_txt',
 'from_eureka_SpecData',
 'from_eureka_S3',
 'from_eureka_LCData',
 'from_eureka_S4',
 'from_eureka_channels',
 'from_eureka_S5',
 'from_rainbow_npy',
 'from_rainbow_FITS',
 'from_text',
 'from_xarray_stellar_spectra',
 'from_xarray_raw_light_curves',
 'from_xarray_fitted_light_curves',
 'from_nres',
 'from_atoca',
 'from_espinoza',
 'from_dossantos',
 'from_feinstein_numpy',
 'from_feinstein_h5',
 'from_schlawin',
 'from_coulombe',
 'from_kirk_fitted_light_curves',
 'from_kirk_stellar_spectra',
 'from_radica',
 'from_aylin',
 'from_carter_and_may']

In [25]:

Copied!

list(available_writers)
list(available_writers)

Out[25]:

['to_rainbow_npy',
 'to_rainbow_FITS',
 'to_xarray_stellar_spectra',
 'to_xarray_raw_light_curves',
 'to_xarray_fitted_light_curves',
 'to_text']

Adding a Custom Reader¶

You might want to create a new reader or writer, to allow chromatic to interact with your own datasets or tools. To facilitate this, templates are available with human-friendly instructions for how to add a new reader or writer.

If you want to try to incorporate a new format, modify the templates for a reader or writer to create your own from_abcdefgh or to_abcdefgh functions. These functions can be passed directly the format= keyword for read_rainbow(filepath, format=from_abcdefgh) or rainbow.save(filepath, format=to_abcdefgh).

If you would like help implementing a new reader/writer and/or incorporating your format as a default for chromatic, please consider submitting an Issue!

Reading/Writing a 🌈¶

Quickstart¶

Reading Files¶

chromatic rainbow files (*.rainbow.npy)¶

chromatic rainbow FITS files (*.rainbow.fits)¶

generic text files (*.txt, *.csv)¶

STScI jwst pipeline outputs (x1dints.fits)¶

eureka pipeline outputs (S[3|4|5].[h5|txt])¶

xarray-based ERS format (*.xc)¶

Writing Files¶

chromatic rainbow files (*.rainbow.npy)¶

chromatic rainbow FITS files (*.rainbow.fits)¶

xarray-based ERS format (*.xc)¶

generic text files (*.txt, *.csv)¶

Other File Formats¶

Adding a Custom Reader¶

`chromatic` rainbow files (`*.rainbow.npy`)¶

`chromatic` rainbow FITS files (`*.rainbow.fits`)¶

generic text files (`.txt`, `.csv`)¶

STScI `jwst` pipeline outputs (`x1dints.fits`)¶

`eureka` pipeline outputs (`S[3|4|5].[h5|txt]`)¶

`xarray`-based ERS format (`*.xc`)¶

`chromatic` rainbow files (`*.rainbow.npy`)¶

`chromatic` rainbow FITS files (`*.rainbow.fits`)¶

`xarray`-based ERS format (`*.xc`)¶

generic text files (`.txt`, `.csv`)¶