104.1. Get started with the butler¶
104.1. Get started with the Butler¶
Data Release: Data Preview 1
Container Size: large
LSST Science Pipelines version: v29.1.1
Last verified to run: 2025-06-21
Repository: github.com/lsst/tutorial-notebooks
Learning objective: How to use the Butler to access image data.
LSST data products: deep_coadd
, visit_image
Packages: lsst.daf.butler
Credit: Originally developed by the Rubin Community Science team. Please consider acknowledging them if this notebook is used for the preparation of journal articles, software releases, or other notebooks.
Get Support: Everyone is encouraged to ask questions or raise issues in the Support Category of the Rubin Community Forum. Rubin staff will respond to all questions posted there.
1. Introduction¶
The Butler is the LSST Science Pipelines interface for managing, reading, and writing datasets.
As the interface between the pipelines and the data, it is often referred to as "middleware".
Why use the Butler?
The Butler enables users to both search for and retrieve data based on its properties (sky location, band, date) without having to understand how the data files are formatted or where they're stored.
It was designed and built to interface seamlessly with the LSST Science Pipelines.
When to use the Butler.
It is recommended to use the Butler when accessing image data in the Notebook aspect.
It is necessary to use the Butler when running LSST Science Pipelines tasks on images (e.g., re-running source detection and measurement, creating custom coadds).
For most other use cases, the Table Access Protocol (TAP) and Simple Image Access (SIA) services are recommended for catalog and image data access.
This tutorial only demonstrates how to create an instance of the Butler and how use it - in a very simple and basic way - to search for and retrieve an image.
Related tutorials: The 100-level tutorials in this Butler series demonstrate advanced Butler use, such as exploring the Butler collections, dimensions, and schema, and making advanced queries for image and catalog data products.
The 100-level tutorials on image display have more on using Firefly. The 200-level tutorials on coadded and visit images have more on these data products (e.g., the pixel data and metadata).
1.1. Import packages¶
Import the Butler
and Timespan
modules from the lsst.daf.butler
package, and the display
module from the lsst.afw
package (for image display).
Also import the Time
function from the astropy.time
module.
from lsst.daf.butler import Butler, Timespan
import lsst.afw.display as afwDisplay
from astropy.time import Time
1.2. Define parameters and functions¶
Set afwDisplay
to use Firefly, and define afw_display
to show images in frame 1.
afwDisplay.setDefaultBackend("firefly")
afw_display = afwDisplay.Display(frame=1)
2. Create an instance of the Butler¶
Use the dafButler.Butler
function to instantiate a butler
that is configured to access the Data Preview 1 (DP1) data release:
- repository
dp1
- collection
LSSTComCam/DP1
It is not necessary to understand how repositories and collections work, or where or how the DP1 data is stored - the Butler takes care of that for the user.
Instantiate a Butler for DP1 and assert that it exists.
butler = Butler("dp1", collections="LSSTComCam/DP1")
assert butler is not None
3. Query and retrieve images¶
The most common Butler image queries constrain the band (filter), sky location (coordinate), and/or time of observation.
To find and retrieve an image using the Butler:
- Choose the type of image (e.g., visit image, deep coadd image).
- Define query constraints (e.g., coordinate, band, time of observation).
- Query the Butler for data that meets the constraints.
- Use the Butler to retrieve an image.
3.1. Deep coadd images¶
Dataset type name: deep_coadd
.
Define an RA, Dec, and band (filter). These coordinates are near the center of the Extended Chandra Deep Field South (ECDFS).
ra = 53.076
dec = -28.110
band = 'r'
Define a query string using the coordinates and band as search constraints.
Note: How to figure out that band.name
and patch.region
can be used in a Butler query is demonstrated in the next tutorial in this series. A patch
is defined in Section 4, below.
query = f"band.name = '{band}' AND patch.region OVERLAPS POINT({ra}, {dec})"
print(query)
band.name = 'r' AND patch.region OVERLAPS POINT(53.076, -28.11)
Use the butler.query_datasets
function to search the Butler for deep_coadd
images that match the search constraints of the defined query.
Store the returned dataset references in dataset_refs
.
dataset_refs = butler.query_datasets("deep_coadd", where=query)
Use the butler.get
function to retrieve the image associated with the first of the returned datasets.
deep_coadd = butler.get(dataset_refs[0])
Display the retrieved image in Firefly.
afw_display.mtv(deep_coadd)
afw_display.setMaskTransparency(100)
Clean up.
del query, dataset_refs, deep_coadd
3.2. Visit images¶
Dataset type name: visit_image
.
Use the same RA, Dec, and band as in Section 3.1.
Define a time span that is one night, early on in the series of observations with ComCam.
Note: use International Atomic Time (Temps Atomique International; TAI) to define astropy
times, as the Butler uses (and expects) TAI times. Times can be defined as calendar dates or MJD, as in time1
and time2
below.
time1 = Time("2024-11-09T00:00:00.0", format="isot", scale="tai")
time2 = Time(60624.0, format="mjd", scale="tai")
timespan = Timespan(time1, time2)
del time1, time2
Instead of defining a full query statement as in Section 3.1,
use the bind
functionality to define the parameters in a dictionary,
then pass them to the Butler.
Use colons (:
) in the query statement to indicate which words should
be replaced with their value from the bind
dictionary.
dataset_refs = butler.query_datasets("visit_image",
where="band.name = :band AND \
visit.timespan OVERLAPS :timespan AND \
visit_detector_region.region OVERLAPS POINT(:ra, :dec)",
bind={"band": band, "timespan": timespan,
"ra": ra, "dec": dec},
order_by=["visit.timespan.begin"])
Use the butler.get
function to retrieve the visit_image
for the first of the returned dataset references.
visit_image = butler.get(dataset_refs[0])
Display the retrieved image in Firefly.
afw_display.mtv(visit_image)
afw_display.setMaskTransparency(100)
Clean up.
del ra, dec, band, timespan, dataset_refs, visit_image
4. Retrieve images by dataId¶
The dataId
is a dictionary, the components of which uniquely identify a data product
of a given dataset type.
If the components of a dataId
are already known, data products can be retrieved from
the Butler without a query.
4.1. Deep coadd images¶
The deep_coadd
images are uniquely identified by their band (filter), tract, patch, and the skymap.
The necessary dataId
components for a deep_coadd
are:
band
: An LSST filter (u, g, r, i, z, or y).tract
: A subsection of the all-sky tesselation of the LSST sky map (2.8 sq. deg).patch
: A subsection of a tract, about the size of an LSSTCam detector (0.028 sq. deg).skymap
: The all-sky tesselation of LSST tracts and patches.
Patches and tracts overlap at their edges. In Section 3.1, two deep coadd images are returned as containing the search coordinates because those coordinates are near the edge of two adjacent patches.
Re-execute the query from Section 3.1, and print the dataIds
for the two dataset references.
ra = 53.076
dec = -28.110
band = "r"
dataset_refs = butler.query_datasets("deep_coadd",
where="band.name = :band AND \
patch.region OVERLAPS POINT(:ra, :dec)",
bind={"band": band, "ra": ra, "dec": dec},
order_by=["tract", "patch", "band"])
for ref in dataset_refs:
print(ref.dataId)
{band: 'r', skymap: 'lsst_cells_v1', tract: 5063, patch: 14} {band: 'r', skymap: 'lsst_cells_v1', tract: 5063, patch: 15}
Retrieve an image by passing the components of the dataId
as keyword arguments.
deep_coadd = butler.get("deep_coadd", band="r", tract=5063, patch=14,
skymap="lsst_cells_v1")
Option: define the dataId
as a dictionary, dataId
, and pass it to the butler.get
statement.
dataId = {"band": "r", "skymap": "lsst_cells_v1", "tract": 5063, "patch": 14}
deep_coadd = butler.get("deep_coadd", dataId=dataId)
del dataId
Option to display the image, same as in Section 3.1.
# afw_display.mtv(deep_coadd)
# afw_display.setMaskTransparency(100)
del deep_coadd
4.2. Visit images¶
A visit_image
is uniquely identified by the visit identifier (visit
) and detector number.
The band does not need to be specified for the dataId
of a visit_image
, because the band is a property of the visit.
In other words, a visit - also referred to as an observation or an exposure - uses one filter.
Retrieve the visit_image
for visit identifier 2024110800254, detector 5.
This is one of the dataset references returned by the query in Section 3.2.
visit_image = butler.get("visit_image", visit=2024110800254, detector=5)
Option to instead define the dataId
for the visit and detector, and pass the dataId
to butler.get()
.
# dataId = {"visit": 2024110800254, "detector": 5}
# visit_image = butler.get("visit_image", dataId=dataId)
Option to display the visit_image
.
# afw_display.mtv(visit_image)
# afw_display.setMaskTransparency(100)
Print the filter.
visit_image.filter
FilterLabel(band="r", physical="r_03")
Print the date.
visit_image.visitInfo.date
DateTime("2024-11-09T06:21:27.168494104", TAI)
del visit_image
4.3. Retrieving metadata only¶
Image metadata can be retrieved without the pixel data.
For example, retrieve only the filter metadata.
visit_image_filter = butler.get("visit_image.filter", visit=2024110800254, detector=5)
visit_image_filter
FilterLabel(band="r", physical="r_03")
del visit_image_filter
5. Exercises for the learner¶
Use a Butler query to find r-band visit images that overlap RA, Dec = 106.3, -10.4 (the Seagull nebula) and were obtained on UTC date Dec 3, 2024.
There should be six such images.
Retrieve and display the visit_image
for visit number 2024120200214 in the Firefly window.