104.1. Get started with the butler¶

104_1_Get_started_with_the_butler

104.1. Get started with the Butler¶

For the Rubin Science Platform at data.lsst.cloud.
Data Release: Data Preview 1
Container Size: large
LSST Science Pipelines version: v29.1.1
Last verified to run: 2025-06-21
Repository: github.com/lsst/tutorial-notebooks

Learning objective: How to use the Butler to access image data.

LSST data products: deep_coadd, visit_image

Packages: lsst.daf.butler

Credit: Originally developed by the Rubin Community Science team. Please consider acknowledging them if this notebook is used for the preparation of journal articles, software releases, or other notebooks.

Get Support: Everyone is encouraged to ask questions or raise issues in the Support Category of the Rubin Community Forum. Rubin staff will respond to all questions posted there.

1. Introduction¶

The Butler is the LSST Science Pipelines interface for managing, reading, and writing datasets.

As the interface between the pipelines and the data, it is often referred to as "middleware".

Why use the Butler?

The Butler enables users to both search for and retrieve data based on its properties (sky location, band, date) without having to understand how the data files are formatted or where they're stored.

It was designed and built to interface seamlessly with the LSST Science Pipelines.

When to use the Butler.

It is recommended to use the Butler when accessing image data in the Notebook aspect.

It is necessary to use the Butler when running LSST Science Pipelines tasks on images (e.g., re-running source detection and measurement, creating custom coadds).

For most other use cases, the Table Access Protocol (TAP) and Simple Image Access (SIA) services are recommended for catalog and image data access.

This tutorial only demonstrates how to create an instance of the Butler and how use it - in a very simple and basic way - to search for and retrieve an image.

Related tutorials: The 100-level tutorials in this Butler series demonstrate advanced Butler use, such as exploring the Butler collections, dimensions, and schema, and making advanced queries for image and catalog data products.

The 100-level tutorials on image display have more on using Firefly. The 200-level tutorials on coadded and visit images have more on these data products (e.g., the pixel data and metadata).

1.1. Import packages¶

Import the Butler and Timespan modules from the lsst.daf.butler package, and the display module from the lsst.afw package (for image display). Also import the Time function from the astropy.time module.

In [1]:

from lsst.daf.butler import Butler, Timespan
import lsst.afw.display as afwDisplay
from astropy.time import Time

1.2. Define parameters and functions¶

Set afwDisplay to use Firefly, and define afw_display to show images in frame 1.

In [2]:

afwDisplay.setDefaultBackend("firefly")
afw_display = afwDisplay.Display(frame=1)

2. Create an instance of the Butler¶

Use the dafButler.Butler function to instantiate a butler that is configured to access the Data Preview 1 (DP1) data release:

repository dp1
collection LSSTComCam/DP1

It is not necessary to understand how repositories and collections work, or where or how the DP1 data is stored - the Butler takes care of that for the user.

Instantiate a Butler for DP1 and assert that it exists.

In [3]:

butler = Butler("dp1", collections="LSSTComCam/DP1")
assert butler is not None

3. Query and retrieve images¶

The most common Butler image queries constrain the band (filter), sky location (coordinate), and/or time of observation.

To find and retrieve an image using the Butler:

Choose the type of image (e.g., visit image, deep coadd image).
Define query constraints (e.g., coordinate, band, time of observation).
Query the Butler for data that meets the constraints.
Use the Butler to retrieve an image.

3.1. Deep coadd images¶

Dataset type name: deep_coadd.

Define an RA, Dec, and band (filter). These coordinates are near the center of the Extended Chandra Deep Field South (ECDFS).

In [4]:

ra = 53.076
dec = -28.110
band = 'r'

Define a query string using the coordinates and band as search constraints.

Note: How to figure out that band.name and patch.region can be used in a Butler query is demonstrated in the next tutorial in this series. A patch is defined in Section 4, below.

In [5]:

query = f"band.name = '{band}' AND patch.region OVERLAPS POINT({ra}, {dec})"
print(query)

band.name = 'r' AND patch.region OVERLAPS POINT(53.076, -28.11)

Use the butler.query_datasets function to search the Butler for deep_coadd images that match the search constraints of the defined query.

Store the returned dataset references in dataset_refs.

In [6]:

dataset_refs = butler.query_datasets("deep_coadd", where=query)

Use the butler.get function to retrieve the image associated with the first of the returned datasets.

In [7]:

deep_coadd = butler.get(dataset_refs[0])

Display the retrieved image in Firefly.

In [8]:

afw_display.mtv(deep_coadd)
afw_display.setMaskTransparency(100)

Clean up.

In [9]:

del query, dataset_refs, deep_coadd

3.2. Visit images¶

Dataset type name: visit_image.

Use the same RA, Dec, and band as in Section 3.1.

Define a time span that is one night, early on in the series of observations with ComCam.

Note: use International Atomic Time (Temps Atomique International; TAI) to define astropy times, as the Butler uses (and expects) TAI times. Times can be defined as calendar dates or MJD, as in time1 and time2 below.

In [10]:

time1 = Time("2024-11-09T00:00:00.0", format="isot", scale="tai")
time2 = Time(60624.0, format="mjd", scale="tai")
timespan = Timespan(time1, time2)
del time1, time2

Instead of defining a full query statement as in Section 3.1, use the bind functionality to define the parameters in a dictionary, then pass them to the Butler. Use colons (:) in the query statement to indicate which words should be replaced with their value from the bind dictionary.

In [11]:

dataset_refs = butler.query_datasets("visit_image",
                                     where="band.name = :band AND \
                                     visit.timespan OVERLAPS :timespan AND \
                                     visit_detector_region.region OVERLAPS POINT(:ra, :dec)",
                                     bind={"band": band, "timespan": timespan,
                                           "ra": ra, "dec": dec},
                                     order_by=["visit.timespan.begin"])

Use the butler.get function to retrieve the visit_image for the first of the returned dataset references.

In [12]:

visit_image = butler.get(dataset_refs[0])

Display the retrieved image in Firefly.

In [13]:

afw_display.mtv(visit_image)
afw_display.setMaskTransparency(100)

Clean up.

In [14]:

del ra, dec, band, timespan, dataset_refs, visit_image

4. Retrieve images by dataId¶

The dataId is a dictionary, the components of which uniquely identify a data product of a given dataset type.

If the components of a dataId are already known, data products can be retrieved from the Butler without a query.

4.1. Deep coadd images¶

The deep_coadd images are uniquely identified by their band (filter), tract, patch, and the skymap.

The necessary dataId components for a deep_coadd are:

band: An LSST filter (u, g, r, i, z, or y).
tract: A subsection of the all-sky tesselation of the LSST sky map (2.8 sq. deg).
patch: A subsection of a tract, about the size of an LSSTCam detector (0.028 sq. deg).
skymap: The all-sky tesselation of LSST tracts and patches.

Patches and tracts overlap at their edges. In Section 3.1, two deep coadd images are returned as containing the search coordinates because those coordinates are near the edge of two adjacent patches.

Re-execute the query from Section 3.1, and print the dataIds for the two dataset references.

In [15]:

ra = 53.076
dec = -28.110
band = "r"
dataset_refs = butler.query_datasets("deep_coadd",
                                     where="band.name = :band AND \
                                     patch.region OVERLAPS POINT(:ra, :dec)",
                                     bind={"band": band, "ra": ra, "dec": dec},
                                     order_by=["tract", "patch", "band"])
for ref in dataset_refs:
    print(ref.dataId)

{band: 'r', skymap: 'lsst_cells_v1', tract: 5063, patch: 14}
{band: 'r', skymap: 'lsst_cells_v1', tract: 5063, patch: 15}

Retrieve an image by passing the components of the dataId as keyword arguments.

In [16]:

deep_coadd = butler.get("deep_coadd", band="r", tract=5063, patch=14,
                        skymap="lsst_cells_v1")

Option: define the dataId as a dictionary, dataId, and pass it to the butler.get statement.

In [17]:

dataId = {"band": "r", "skymap": "lsst_cells_v1", "tract": 5063, "patch": 14}
deep_coadd = butler.get("deep_coadd", dataId=dataId)
del dataId

Option to display the image, same as in Section 3.1.

In [18]:

# afw_display.mtv(deep_coadd)
# afw_display.setMaskTransparency(100)

In [19]:

del deep_coadd

4.2. Visit images¶

A visit_image is uniquely identified by the visit identifier (visit) and detector number.

The band does not need to be specified for the dataId of a visit_image, because the band is a property of the visit. In other words, a visit - also referred to as an observation or an exposure - uses one filter.

Retrieve the visit_image for visit identifier 2024110800254, detector 5. This is one of the dataset references returned by the query in Section 3.2.

In [20]:

visit_image = butler.get("visit_image", visit=2024110800254, detector=5)

Option to instead define the dataId for the visit and detector, and pass the dataId to butler.get().

In [21]:

# dataId = {"visit": 2024110800254, "detector": 5}
# visit_image = butler.get("visit_image", dataId=dataId)

Option to display the visit_image.

In [22]:

# afw_display.mtv(visit_image)
# afw_display.setMaskTransparency(100)

Print the filter.

In [23]:

visit_image.filter

Out[23]:

FilterLabel(band="r", physical="r_03")

Print the date.

In [24]:

visit_image.visitInfo.date

Out[24]:

DateTime("2024-11-09T06:21:27.168494104", TAI)

In [25]:

del visit_image

4.3. Retrieving metadata only¶

Image metadata can be retrieved without the pixel data.

For example, retrieve only the filter metadata.

In [26]:

visit_image_filter = butler.get("visit_image.filter", visit=2024110800254, detector=5)

In [27]:

visit_image_filter

Out[27]:

FilterLabel(band="r", physical="r_03")

In [28]:

del visit_image_filter

5. Exercises for the learner¶

Use a Butler query to find r-band visit images that overlap RA, Dec = 106.3, -10.4 (the Seagull nebula) and were obtained on UTC date Dec 3, 2024. There should be six such images. Retrieve and display the visit_image for visit number 2024120200214 in the Firefly window.

In [ ]: