104.3. Image queries with the butler¶
104.3. Image queries with the Butler¶
Data Release: Data Preview 1
Container Size: large
LSST Science Pipelines version: r29.1.1
Last verified to run: 2025-06-21
Repository: github.com/lsst/tutorial-notebooks
Learning objective: How to query and retrieve image data with the Butler.
LSST data products: visit_image
, deep_coadd
Packages: lsst.daf.butler
Credit: Originally developed by the Rubin Community Science team. Please consider acknowledging them if this notebook is used for the preparation of journal articles, software releases, or other notebooks.
Get Support: Everyone is encouraged to ask questions or raise issues in the Support Category of the Rubin Community Forum. Rubin staff will respond to all questions posted there.
1. Introduction¶
The Butler is LSST Science Pipelines middleware for managing, reading, and writing datasets.
As the interface between the pipelines and the data, it is often referred to as "middleware".
Butler-related documentation:
- pipelines middleware Frequently Asked Questions
- Butler python module documentation
- Butler query expressions and operators
This tutorial demonstrates the individual components of Butler queries and how to compose expressions using the allowed operators.
Related tutorials: The earlier 100-level Butler tutorials in this series show how to retrieve data with the Butler and how to explore and discover the dataset types and their properties.
1.1. Import packages¶
Import the butler
module from the lsst.daf
package, and the display
module from the lsst.afw
package (for image display).
Also import the Time
function from the astropy.time
module.
from lsst.daf.butler import Butler, Timespan
import lsst.afw.display as afwDisplay
import lsst.sphgeom as sphgeom
import lsst.geom as geom
from astropy.time import Time
1.2. Define parameters¶
Create an instance of the Butler with the repository and collection for DP1, and assert that it exists.
butler = Butler("dp1", collections="LSSTComCam/DP1")
assert butler is not None
Set afwDisplay
to use Firefly.
afwDisplay.setDefaultBackend("firefly")
Define inputs to use in all query demonstrations (these are arbitrary, just for the purposes of this tutorial).
Coordinates: Use coordinates RA, Dec = $53.076, -28.110$ deg, which are near the center of the Extended Chandra Deep Field South (ECDFS).
ra = 53.076
dec = -28.110
Region: Define a circle with a radius of 2 deg, centered on the coordinates.
region = sphgeom.Region.from_ivoa_pos("CIRCLE 53.076 -28.110 2.0")
Tract and patch: Define the coordinates as a point
, and use the skymap
to get the tract
and patch
which covers that point.
point = geom.SpherePoint(ra*geom.degrees, dec*geom.degrees)
skymap = butler.get("skyMap", skymap="lsst_cells_v1")
tract = skymap.findTract(point).tract_id
patch = skymap.findTract(point).findPatch(point).getSequentialIndex()
print(tract, patch)
5063 15
Filter: Use the $r$-band.
band = 'r'
Time and timespan: Always use International Atomic Time (Temps Atomique International; TAI) times with the butler.
Define a time
that is the MJD at the midpoint of the visit image with visitId
= 2024110800263.
Define a timespan
that is one night, early on in the series of observations with ComCam.
mjd = 60623.27103473955
time = Time(mjd, format="mjd", scale="tai")
time1 = Time(60623.0, format="mjd", scale="tai")
time2 = Time(60624.0, format="mjd", scale="tai")
timespan = Timespan(time1, time2)
timespan
del time1, time2
Visit list: Use a set of visit identifiers that are the first 10 $r$-band visits on MJD 60623.
visit_set = (2024110800246, 2024110800247, 2024110800250,
2024110800251, 2024110800254, 2024110800255,
2024110800258, 2024110800259, 2024110800262,
2024110800263)
2. Query formation¶
A call to the Butler's query_datasets
function requires at least a dataset_type
(e.g., image type, such as visit_image
or deep_coadd
) and a where
statement (a string expression resembling an SQL WHERE clause).
The where
statement can use bind parameters.
The order_by
and limit
parameters are optional.
butler.query_datasets(<dataset_type>,
where=<query>,
bind=<bind_dictionary>,
order_by=<dimension_list>,
limit=<integer>)
The query_datasets
function returns the dataset references for all data that meet the query constraints.
The dataset reference can then be used with the butler.get
function to retrieve the data itself.
Note that there is also a find_dataset
function, however, it is not for querying the Butler.
Rather, it can be used to retrieve the dataset reference when a dataId
is already known (documentation for find_dataset).
2.1. Dataset type¶
Dataset types for DP1 processed images:
deep_coadd
visit_image
difference_image
template_coadd
Print the information for each dataset type.
image_types = ['deep_coadd', 'visit_image',
'difference_image', 'template_coadd']
for itype in image_types:
print('')
print(itype)
print(butler.get_dataset_type(itype))
print('required: ', butler.get_dataset_type(itype).dimensions.required)
deep_coadd DatasetType('deep_coadd', {band, skymap, tract, patch}, ExposureF) required: {band, skymap, tract, patch} visit_image DatasetType('visit_image', {band, instrument, day_obs, detector, physical_filter, visit}, ExposureF) required: {instrument, detector, visit} difference_image DatasetType('difference_image', {band, instrument, day_obs, detector, physical_filter, visit}, ExposureF) required: {instrument, detector, visit} template_coadd DatasetType('template_coadd', {band, skymap, tract, patch}, ExposureF) required: {band, skymap, tract, patch}
2.2. Where statement¶
The where
statement can be used to place constraints on the dimensions and fields of a given dataset type.
For a deep_coadd
, the dimensions are sky location (coordinate, or patch and tract) and filter.
For an individual visit_image
, the dimensions include the filter and time of the observation.
This cell provides the option to print the dimensions and schema for the visit_image
dataset type.
# dataset_type = butler.get_dataset_type('visit_image')
# for dimension in dataset_type.dimensions.data_coordinate_keys:
# print('dimension = ', dimension)
# print(butler.dimensions[dimension].schema)
# print(' ')
Execute a query for data with dataset type visit_image
that were obtained in the $r$-band and overlap the defined coordinates.
query = f"band.name = '{band}' AND \
visit_detector_region.region OVERLAPS POINT({ra}, {dec})"
print(query)
dataset_refs = butler.query_datasets("visit_image",
where=query)
print(len(dataset_refs))
del query, dataset_refs
band.name = 'r' AND visit_detector_region.region OVERLAPS POINT(53.076, -28.11)
214
2.3. Bind parameters¶
Recreate the same query as above, but in the query statement use bind parameters (placeholders) and define the list bind_params
to hold the values.
query = "band.name = :band AND " \
"visit_detector_region.region OVERLAPS POINT(:ra, :dec)"
print(query)
bind_params = {"band": band, "ra": ra, "dec": dec}
print(bind_params)
band.name = :band AND visit_detector_region.region OVERLAPS POINT(:ra, :dec) {'band': 'r', 'ra': 53.076, 'dec': -28.11}
dataset_refs = butler.query_datasets("visit_image",
where=query,
bind=bind_params)
print(len(dataset_refs))
del dataset_refs
214
2.4. Order by¶
The order_by
parameter accepts a list of strings specifying the sort parameters.
Results can be sorted by dimension name (e.g., band
, visit
, detector
) and
dimension name and schema field (e.g., band.name
, visit.day_obs
, visit.timespan.begin
).
dataset_refs = butler.query_datasets("visit_image",
where=query,
bind=bind_params,
order_by=["visit.timespan.begin"])
print(dataset_refs[0])
del dataset_refs
visit_image@{instrument: 'LSSTComCam', detector: 0, visit: 2024110800246, band: 'r', day_obs: 20241108, physical_filter: 'r_03'} [sc=ExposureF] (run=LSSTComCam/runs/DRP/DP1/DM-51335 id=973bf9a0-cb8c-44e4-9cce-6a68a1a483a2)
Use a -
in front of the string to reverse sort.
dataset_refs = butler.query_datasets("visit_image",
where=query,
bind=bind_params,
order_by=["-visit.timespan.begin"])
print(dataset_refs[0])
del dataset_refs
visit_image@{instrument: 'LSSTComCam', detector: 4, visit: 2024121000426, band: 'r', day_obs: 20241210, physical_filter: 'r_03'} [sc=ExposureF] (run=LSSTComCam/runs/DRP/DP1/DM-51335 id=af26489b-0afc-4801-ba93-63a333a5e29f)
2.5. Limit¶
Use the limit
parameter to limit the number of results.
This is recommended when testing queries.
dataset_refs = butler.query_datasets("visit_image",
where=query,
bind=bind_params,
order_by=["visit.timespan.begin"],
limit=10)
print(len(dataset_refs))
del dataset_refs
10
Use a -
in front of the limit value to have the query return a warning if
there were more results that were not returned.
dataset_refs = butler.query_datasets("visit_image",
where=query,
bind=bind_params,
order_by=["visit.timespan.begin"],
limit=-10)
print(len(dataset_refs))
del dataset_refs
lsst.daf.butler._butler WARNING: More datasets are available than the requested limit of 10.
10
del query, bind_params
3. Spatial queries for images¶
3.1. Overlaps point¶
Execute a query for data with dataset type visit_image
that overlap the defined coordinates.
Unlike in Section 2.2, do not include a constraint on the filter (band).
For this example, embed the coordinates in the query string instead of using bind parameters.
query = f"visit_detector_region.region OVERLAPS POINT({ra}, {dec})"
dataset_refs = butler.query_datasets("visit_image",
where=query)
print(len(dataset_refs))
del query, dataset_refs
779
Execute a query for data with dataset type deep_coadd
that overlap the defined coordinates.
Twelve results are returned because two of the skymap
patches overlap these coordinates,
and there are six LSST filters.
query = f"patch.region OVERLAPS POINT({ra}, {dec})"
dataset_refs = butler.query_datasets("deep_coadd",
where=query)
print(len(dataset_refs))
del query, dataset_refs
12
3.2. Overlaps region¶
Execute a query for data with dataset type visit_image
that overlap the defined region.
Since the region is larger, significantly more visit images are returned.
For this example, use bind parameters for the query constraints.
query = "visit_detector_region.region OVERLAPS :region"
bind_params = {'region': region}
dataset_refs = butler.query_datasets("visit_image",
where=query,
bind=bind_params)
print(len(dataset_refs))
del query, bind_params, dataset_refs
7684
print(region)
Circle([0.5298928578176015, 0.7051356719745795, -0.47116583422703023], 0.03490658503988659)
Execute a query for data with dataset type deep_coadd
that overlap the defined coordinates.
Since the region is larger, significantly more deep coadd image patches are returned.
query = "patch.region OVERLAPS :region"
bind_params = {'region': region}
dataset_refs = butler.query_datasets("deep_coadd",
where=query,
bind=bind_params)
print(len(dataset_refs))
del query, bind_params, dataset_refs
485
4. Temporal queries for images¶
4.1. Overlaps timepoint¶
Execute a query to return the dataset references for the nine visit_images
(corresponding to the 9 LSSTComCam detectors) from the visit that was being executed at the defined time
.
query = "visit.timespan OVERLAPS :time"
bind_params = {'time': time}
dataset_refs = butler.query_datasets("visit_image",
where=query,
bind=bind_params)
print(len(dataset_refs))
9
Since every visit
with LSSTComCam results in one visit_image
for each of the 9 detectors,
the above query returns 9 dataset references.
Print the 9 dataId
.
for ref in dataset_refs:
print(ref.dataId)
{instrument: 'LSSTComCam', detector: 0, visit: 2024110800263, band: 'r', day_obs: 20241108, physical_filter: 'r_03'} {instrument: 'LSSTComCam', detector: 1, visit: 2024110800263, band: 'r', day_obs: 20241108, physical_filter: 'r_03'} {instrument: 'LSSTComCam', detector: 2, visit: 2024110800263, band: 'r', day_obs: 20241108, physical_filter: 'r_03'} {instrument: 'LSSTComCam', detector: 3, visit: 2024110800263, band: 'r', day_obs: 20241108, physical_filter: 'r_03'} {instrument: 'LSSTComCam', detector: 4, visit: 2024110800263, band: 'r', day_obs: 20241108, physical_filter: 'r_03'} {instrument: 'LSSTComCam', detector: 5, visit: 2024110800263, band: 'r', day_obs: 20241108, physical_filter: 'r_03'} {instrument: 'LSSTComCam', detector: 6, visit: 2024110800263, band: 'r', day_obs: 20241108, physical_filter: 'r_03'} {instrument: 'LSSTComCam', detector: 7, visit: 2024110800263, band: 'r', day_obs: 20241108, physical_filter: 'r_03'} {instrument: 'LSSTComCam', detector: 8, visit: 2024110800263, band: 'r', day_obs: 20241108, physical_filter: 'r_03'}
Store the visit identifier number in my_visit_id
to use in Section 5.1.
ref0 = dataset_refs[0]
my_visit_id = ref0.dataId.get('visit')
print(my_visit_id)
2024110800263
del query, bind_params, dataset_refs, ref0
4.2. Overlaps timespan¶
Execute a query for all visits that have a visit.timespan
(the time between the start and the end of the exposure)
that overlaps with the timespan
defined in Section 1.2.
Order the results by the start time of the visit.timespan
.
query = "visit.timespan OVERLAPS :timespan"
bind_params = {'timespan': timespan}
dataset_refs = butler.query_datasets("visit_image",
where=query,
bind=bind_params,
order_by=["visit.timespan.begin"])
print(len(dataset_refs))
del query, bind_params, dataset_refs
621
query = f"visit.id = {my_visit_id} AND detector IN (3, 4, 5)"
print(query)
dataset_refs = butler.query_datasets("visit_image",
where=query)
print(len(dataset_refs))
visit.id = 2024110800263 AND detector IN (3, 4, 5)
3
Print the dataId
for the returned dataset references.
for ref in dataset_refs:
print(ref.dataId)
{instrument: 'LSSTComCam', detector: 3, visit: 2024110800263, band: 'r', day_obs: 20241108, physical_filter: 'r_03'} {instrument: 'LSSTComCam', detector: 4, visit: 2024110800263, band: 'r', day_obs: 20241108, physical_filter: 'r_03'} {instrument: 'LSSTComCam', detector: 5, visit: 2024110800263, band: 'r', day_obs: 20241108, physical_filter: 'r_03'}
del query, dataset_refs
5.2. Visits in a list of identifiers¶
Recall in Section 1.2., visit_set
was defined.
print(visit_set)
(2024110800246, 2024110800247, 2024110800250, 2024110800251, 2024110800254, 2024110800255, 2024110800258, 2024110800259, 2024110800262, 2024110800263)
Convert this to a string, so it can be embedded in a query statement.
string_visit_set = "(" + ", ".join(str(value) for value in visit_set) + ")"
print(string_visit_set)
(2024110800246, 2024110800247, 2024110800250, 2024110800251, 2024110800254, 2024110800255, 2024110800258, 2024110800259, 2024110800262, 2024110800263)
Find all visit_image
s that are associated with these 10 visits.
query = f"visit.id IN {string_visit_set}"
print(query)
dataset_refs = butler.query_datasets("visit_image",
where=query)
print(len(dataset_refs))
visit.id IN (2024110800246, 2024110800247, 2024110800250, 2024110800251, 2024110800254, 2024110800255, 2024110800258, 2024110800259, 2024110800262, 2024110800263) 90
Since each visit with LSSTComCam has 9 detectors, there are 9 visit_image
s per visit, and thus 90 results.
6. Retrieve data from the Butler¶
As a reminder, the resulting dataset reference from query_datasets
can be used to retrieve data from the Butler.
For the identified visit in Section 4.2 (my_visit_id
), query for the visit_image
for detector 4.
query = f"visit.id = {my_visit_id} AND detector = 4"
dataset_refs = butler.query_datasets("visit_image",
where=query)
assert len(dataset_refs) == 1
Retrieve the visit_image
.
visit_image = butler.get(dataset_refs[0])
Display the image in Firefly.
Define afw_display
to show images in frame 1 (the Firefly window will open in a new tab).
afw_display = afwDisplay.Display(frame=1)
Display the image and turn the mask off.
afw_display.mtv(visit_image)
afw_display.setMaskTransparency(100)
del query, dataset_refs, my_visit_id, visit_image