104.5. Image subsets with the Butler#

104_5_Image_subsets_with_the_butler

104.5. Image subsets with the Butler¶

For the Rubin Science Platform at data.lsst.cloud.
Data Release: Data Preview 1
Container Size: large
LSST Science Pipelines version: r29.2.0
Last verified to run: 2026-02-24
Repository: github.com/lsst/tutorial-notebooks

Learning objective: How to make image subsets with the Butler.

LSST data products: deep_coadd

Packages: lsst.daf.butler, lsst.geom, lsst.afw

Credit: Originally developed by the Rubin Community Science team. Please consider acknowledging them if this notebook is used for the preparation of journal articles, software releases, or other notebooks.

Get Support: Everyone is encouraged to ask questions or raise issues in the Support Category of the Rubin Community Forum. Rubin staff will respond to all questions posted there.

1. Introduction¶

This notebook demonstrates how to make image subsets (a smaller piece of a parent LSST image) using the Butler. This method for generating small image subsets is the right tool when small images are desired from a single visit image or deep coadd patch, or, when the analysis workflow is already using the Butler for image analysis. Instead, the Rubin cutout service is a better tool when wanting to generate image cutouts at a range of coordinates or observations times. The Rubin cutout service is demonstrated in notebooks in the DP1 103 notebook series.

Butler-related documentation:

This tutorial demonstrates how to generate image subsets using the Butler using a variety of methods.

Related tutorials: The earlier 100-level Butler tutorials in this series show how to retrieve data with the Butler and how to explore and discover the dataset types and their properties. The 103 series demonstrate the Rubin cutout service to produce image cutouts.

1.1. Import packages¶

Import the butler module from the lsst.daf package, and the display and image modules from the lsst.afw package (for image display). Import the lsst.geom and lsst.sphgeom packages to enable spatial and subset functionality.

In [1]:

import matplotlib.pyplot as plt
import math

from lsst.daf.butler import Butler

import lsst.sphgeom as sphgeom
import lsst.geom as geom
from lsst.geom import Box2I, Point2I, Extent2I, Point2D

from lsst.afw.image import ExposureF, LOCAL, PARENT
import lsst.afw.display as afw_display

from astropy.visualization import ZScaleInterval

1.2. Define parameters¶

Create an instance of the Butler with the repository and collection for DP1, and assert that it exists.

In [2]:

butler = Butler("dp1", collections="LSSTComCam/DP1")
assert butler is not None

Set afwDisplay to use matplotlib.

In [3]:

afw_display.setDefaultBackend('matplotlib')

Coordinates: Use coordinates RA, Dec = $53.076, -28.110$ deg, which are near the center of the Extended Chandra Deep Field South (ECDFS).

Region: Define a circle with a radius of 0.1 deg, centered on the coordinates.

In [4]:

ra = 53.318792
dec = -27.844306

radius = 0.1
region = sphgeom.Region.from_ivoa_pos(f"CIRCLE {ra} {dec} {radius}")

2. The Bounding Box¶

A Bounding Box is useful for defining boundaries of images (either of the parent tract, the deep_coadd itself, or to make a subset from an image). This section will retrieve an image from the Butler and demonstrate how to define a bounding box (bbox) for a subset that is centered on pixel coordinate x,y and is 1000 pixels wide and 1000 pixels high.

2.1. Identify deep_coadds¶

First, query the Butler for r-band images that overlap the sky region defined above.

In [5]:

dataset_refs = butler.query_datasets("deep_coadd",
                                     where="band.name='r' AND\
                                     patch.region OVERLAPS POINT(ra, dec)",
                                     bind={"ra": ra, "dec": dec},
                                     with_dimension_records=True,
                                     order_by=["patch.tract"])

Get the dataId for the first returned dataset reference.

In [6]:

coadd = butler.get(dataset_refs[0])
dataId = dataset_refs[0].dataId
print(dataId)

{band: 'r', skymap: 'lsst_cells_v1', tract: 5063, patch: 33}

2.2. Define image boundaries¶

To retrieve the bounding box that defines the full deep_coadd, use the getBBox method. By default, getBBox returns pixel coordinates in the parent coordinate system of the entire image tract.

In [7]:

bbox = coadd.getBBox()
print(f"X-pixel range: [{bbox.beginX}, {bbox.endX}]")
print(f"Y-pixel range: [{bbox.beginY}, {bbox.endY}]")

X-pixel range: [8800, 12200]
Y-pixel range: [8800, 12200]

The above is equivalent to specifying bbox = coadd.getBBox(PARENT), where PARENT is an instance of the ImageOrigin class (an LSST pipelines class of the lsst.afw package). Alternatively, one can specify LOCAL, a different instance of the ImageOrigin class. By changing the default from PARENT to LOCAL when defining the bounding box, the pixels are specified with respect to the images own coordinate system rather than the entire image tract to which it belongs.

In [8]:

bbox_local = coadd.getBBox(LOCAL)
print(f"X-pixel range: [{bbox_local.beginX}, {bbox_local.endX}]")
print(f"Y-pixel range: [{bbox_local.beginY}, {bbox_local.endY}]")

X-pixel range: [0, 3400]
Y-pixel range: [0, 3400]

Values of interest to defining the subset, including the origin and dimensions of the bounding box can be accessed through the getDimensions, getMin and getMax methods.

In [9]:

print("The pixel dimensions of the coadd is ", bbox.getDimensions())
print("The minimum pixel coordinates of the coadd are ", bbox.getMin())
print("The maximum pixel coordinates of the coadd are ", bbox.getMax())

The pixel dimensions of the coadd is  (3400, 3400)
The minimum pixel coordinates of the coadd are  (8800, 8800)
The maximum pixel coordinates of the coadd are  (12199, 12199)

2.3. Use Box2I to define subset¶

The Box2I (2D Integer Bounding Box) object can be used to define a bounding box for the subset. First, use the bounding box previously defined in the parent coordinate system (this can also be done by specifying PARENT). For these examples, use definitions of the bounding box based on pixel coordinates. Sky coordinates are demonstrated below in section 2.2.3.

2.3.1. Use edge corners to define box¶

Now, define a bounding box called subset_box using the Box2I function. First, use the values returned by the methods beginX, beginY, and getDimensions to define the bounding box in pixel coordinates for an image subset with edge size 1000 pixels, before calling Box2I in the second cell.

Note that Box2I takes as its first input the lower left corner coordinates of the box, not the center coordinates.

In [10]:

subset_xctr = bbox.beginX + bbox.getDimensions()[0]/2
subset_yctr = bbox.beginY + bbox.getDimensions()[1]/2

subset_pixsize = 1000 / 2

subset_xmin = subset_xctr - subset_pixsize
subset_ymin = subset_yctr - subset_pixsize
subset_xmax = subset_xctr + subset_pixsize
subset_ymax = subset_yctr + subset_pixsize

In [11]:

subset_box = Box2I(minimum=Point2I(x=subset_xmin, y=subset_ymin),
                   maximum=Point2I(x=subset_xmax, y=subset_ymax))

2.3.2. Use Point2I to define box¶

A more compact alternative statement to create the same bounding box is using Point2I to define the starting x and y coordinates, and Extent2I to define the width and height. Note that 2I refers to integer pixels.

In [12]:

w = subset_pixsize
h = subset_pixsize
subset_box = Box2I(Point2I(subset_xmin, subset_ymin), Extent2I(w, h))

2.3.3. Use sky coordinates¶

The bounding box can also be defined starting with sky coordinates (converted to pixels) in the following way. Retrieving the world coordinate system (WCS) of the deep_coadd enables converting an ra, dec and spatial extent in (60 arcseconds) to pixel coordinates to define the bounding box.

In [13]:

point = geom.SpherePoint(ra, dec, geom.degrees)

wcs = butler.get('deep_coadd.wcs', dataId=dataId)

xy = Point2I(wcs.skyToPixel(point))

pixel_scale_arcsec = wcs.getPixelScale().asArcseconds()
subset_side_arcsec = 60.0
subset_side_pixels = int(subset_side_arcsec / pixel_scale_arcsec)
half_size = subset_side_pixels // 2

subset_box_sky = Box2I(xy - Extent2I(half_size, half_size),
                       Extent2I(subset_side_pixels, subset_side_pixels))

3. Retrieve subset from the butler¶

Several methods exist for extracting subsets from the butler. A few are demonstrated below, including passing the bounding box directly to the butler (Section 3.1), retrieving the full deep_coadd and using the subset method (Section 3.2) and passing the bounding box to the full deep_coadd via ExposureF (Section 3.3).

3.1. Pass bounding box to the Butler¶

The bounding box can be passed directly to the butler to retrieve only a subset of the image using the keyword parameters. First, define the parameters dictionary to contain the bounding box in a dictionary key called subset_box_sky.

In [14]:

parameters = {'bbox': subset_box_sky}

Next, pass the bounding box to the butler in the parameters dictionary directly. This saves memory and time by only loading the requested pixels. Some properties of the subset can be accessed with methods such as .shape, and image extensions using .image or .variance

In [15]:

subset_image = butler.get('deep_coadd', parameters=parameters, dataId=dataId)

image_array = subset_image.getImage().getArray()
print(f"Successfully created subset of shape: {image_array.shape}")

Successfully created subset of shape: (300, 300)

Plot the image extension of the subset.

In [16]:

fig, ax = plt.subplots()
display = afw_display.Display(frame=fig)
display.scale('asinh', 'zscale')
display.mtv(subset_image.image)
plt.show()

No description has been provided for this image

Figure 1: A subset of a sky region in the ECDFS returned after passing a bounding box to the Butler

Some attributes of the subset can be displayed with the getXY0 and getDimensions methods. The first returns the difference in x and y values between the global image pixel origin and subset pixel origin, and the second returns the subset dimensions in pixels.

In [17]:

print(f"Subset pixel origin relative to the coadd pixel origin (XY0): {subset_image.getXY0()}")
print(f"Subset dimensions in pixels: {subset_image.getDimensions()}")

Subset pixel origin relative to the coadd pixel origin (XY0): (11169, 9020)
Subset dimensions in pixels: (300, 300)

3.2. Subsets after retrieving coadd¶

It is also possible to make subsets after retrieving the full deep_coadd from the butler. This may be preferable if many coadds are needed from the same patch and tract image. There are three ways to use subset_box to retrieve the desired subset.

First, retrieve the full deep_coadd of interest.

In [18]:

full_exposure = butler.get('deep_coadd', dataId=dataId)

3.2.1. Bounding box indexing¶

Extract the region of interest using direct bounding box indexing on the full deep_coadd exposure.

In [19]:

subset_exposure = full_exposure[subset_box_sky]

fig, ax = plt.subplots()
display = afw_display.Display(frame=fig)
display.scale('asinh', 'zscale')
display.mtv(subset_exposure.image)
plt.show()

Figure 2: A subset of the same sky region in Figure 1 but generated using the bounding box indexing method

3.2.2. Subset method¶

Extract the same region of interest but now using the subset method. The use case: When the user already has the full image from the butler in memory, and wants to make one or more subsets from it.

An optional keyword origin will specify the the coordinate system of the input bounding box (options are PARENT, the default. Alternatively, if the input bounding box is defined in LOCAL coordinates, the keyword should be reset.) Since subset_box was defined based on the PARENT coordinate system, set origin=PARENT.

Note that subset is not creating a "deep copy" of the image. Any operations that are performed on the pixel array generated by the subset method, subset_subset, will also happen to full_exposure.

In [20]:

subset_subset = full_exposure.subset(subset_box_sky, origin=PARENT)

fig, ax = plt.subplots()
display = afw_display.Display(frame=fig)
display.scale('asinh', 'zscale')
display.mtv(subset_subset.image)
plt.show()

Figure 3: A subset of the same sky region in Figure 1 and 2 but generated using the subset method

3.2.3. Pass bounding box to ExposureF¶

Another method to generate the subset is by explicit constructor instantiation, which means passing the full_exposure image and bounding box directly into the ExposureF image class.

The deep keyword controls the type of image subset that is returned, similar to the python deep_copy function. If setting deep=False, the subset points to the same block of memory as the parent full_exposure. No new pixel data is duplicated into memory. While this is efficient in that it is quick and uses less memory, because the subset and the parent share the same memory, modifying one modifies the other. Any change to the pixel values in the subset will apply also to the full_exposure stored in memory.

Setting deep=True instead is like deep_copy in that it allocates new memory and physically duplicates the subset into a new ExposureF object. This allows the user to apply mathematical operations on the subset without altering the the full_exposure image also stored in memory. However, it takes longer to execute and uses more memory (though for small or few subsets this is negligible).

Like the subset method, origin is also a possible keyword here for specifying the coordinate system of the input bounding box.

In [21]:

subset = ExposureF(full_exposure, subset_box_sky, deep=False, origin=PARENT)

fig, ax = plt.subplots()
display = afw_display.Display(frame=fig)
display.scale('asinh', 'zscale')
display.mtv(subset.image)
plt.show()

Figure 4: A subset of the same sky region in Figures 1-3 but generated by passing the bounding box to ExposureF

4. Retrieve multiple subsets from the butler¶

This section demonstrates an efficient way to retrieve multiple subsets from the butler. Instead of first retrieving the deep_coadd image into memory (which is many gigabytes is size), an alternative is to use the butler.getDeferred() method. This identifies the images on the remote server and returns a lightweight "pointer" or "handle" (specifically, a DeferredDatasetHandle) instead of the actual data. getDeferred is fast and uses nearly no memory, while queuing up a bunch of subsets to retrieve at a later time all at once without transferring the deep_coadd(s) themselves. The call handle.get(parameters=params) moves the subsets to the local disk. Passing the bbox parameter enables the butler to transfer only the pixels inside the bounding box.

Since the galaxies may not be centered in the middle of pixels, use the Point2D to convert the ra and dec to floating point pixel coordinates, rather than rounded integer types returned by Point2I. The example below will use the Box2I.makeCenteredBox method, which will take as input the subset center and subset edge size, rather than the bottom left coordinate of the subset. This simplifies the bounding box definition for this example.

First, define a list of targets in a dictionary containing bright (V band magnitude < 22) variability selected galaxies hosting active galactic nuclei (AGN) in the ECDFS field from Boutsia et al. 2009.

In [22]:

targets = [{'id': '3', 'ra': 53.335875, 'dec': -27.819472},
           {'id': '8', 'ra': 53.194833, 'dec': -28.146306},
           {'id': '9', 'ra': 53.411917, 'dec': -27.671222},
           {'id': '11', 'ra': 53.318792, 'dec': -27.844306},
           {'id': '12', 'ra': 53.191458, 'dec': -27.962583},
           {'id': '14', 'ra': 53.133333, 'dec': -28.052750},
           {'id': '15', 'ra': 52.977708, 'dec': -28.176583},
           {'id': '18', 'ra': 53.380708, 'dec': -27.942833},
           {'id': '20', 'ra': 53.049333, 'dec': -28.153056},
           {'id': '22', 'ra': 53.370542, 'dec': -27.944750},
           {'id': '24', 'ra': 53.290458, 'dec': -27.937222},
           {'id': '25', 'ra': 52.819542, 'dec': -27.724889},
           {'id': '26', 'ra': 53.144042, 'dec': -28.053889},
           {'id': '28', 'ra': 53.337875, 'dec': -27.653278},
           {'id': '29', 'ra': 53.155375, 'dec': -28.146389},
           {'id': '30', 'ra': 53.302625, 'dec': -27.931000},
           {'id': '31', 'ra': 52.797417, 'dec': -27.692194},
           {'id': '32', 'ra': 53.344958, 'dec': -27.923278},
           {'id': '33', 'ra': 53.224583, 'dec': -27.898361},
           {'id': '34', 'ra': 53.359333, 'dec': -27.974917},
           {'id': '35', 'ra': 53.084583, 'dec': -28.037444},
           {'id': '36', 'ra': 53.132417, 'dec': -28.119556},
           {'id': '37', 'ra': 53.371750, 'dec': -27.990750},
           {'id': '39', 'ra': 53.184083, 'dec': -28.174583},
           {'id': '41', 'ra': 52.992208, 'dec': -28.044861},
           {'id': '43', 'ra': 53.333375, 'dec': -27.986778},
           {'id': '44', 'ra': 52.812667, 'dec': -27.921833}]

First set up some useful parameters for to generate nice looking and organized plots. Also define the skymap, which will be used to convert ra/dec coordinates to pixels based on the image WCS.

In [23]:

zscale = ZScaleInterval()
subset_size = 50
ncols = 4
nrows = math.ceil(len(targets) / ncols)

skymap = butler.get('skyMap')

The cell below loops through the targets to define the deferred handles. For each target, it first identifies the patch and tract containing the galaxy (since the galaxies are not necessarily all in the same patch and tract) and then constructs a 50x50 pixel bounding box centered at the target. As the loop iterates to the next galaxy, the previous subset object is overwritten so that the memory footprint never exceeds the size of a single subset.

In [24]:

fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(16, 4 * nrows))
axes = axes.flatten()

deferred_handles = []
for target in targets:

    point = geom.SpherePoint(target['ra'], target['dec'], geom.degrees)

    tract_info = skymap.findTract(point)
    patch_info = tract_info.findPatch(point)

    dataId = {
        'tract': tract_info.getId(),
        'patch': patch_info.getSequentialIndex(),
        'band': 'z'
    }

    handle = butler.getDeferred('deep_coadd', dataId=dataId)
    deferred_handles.append((target, handle))

for idx, (target, handle) in enumerate(deferred_handles):

    radec = geom.SpherePoint(target['ra'], target['dec'], geom.degrees)
    xy = Point2D(wcs.skyToPixel(radec))

    selection_bbox = Box2I.makeCenteredBox(xy, Extent2I(subset_size, subset_size))
    params = {'bbox': selection_bbox}

    subset = handle.get(parameters=params)

    print(f"Processed ID {target['id']}")

    img_array = subset.image.array
    vmin, vmax = zscale.get_limits(img_array)
    ax = axes[idx]

    ax.imshow(img_array, origin='lower')
    ax.set_title(f"{target['id']}")

for i in range(len(targets), len(axes)):
    fig.delaxes(axes[i])

plt.tight_layout()
plt.show()

Processed ID 3
Processed ID 8
Processed ID 9
Processed ID 11
Processed ID 12
Processed ID 14
Processed ID 15
Processed ID 18
Processed ID 20
Processed ID 22
Processed ID 24
Processed ID 25
Processed ID 26
Processed ID 28
Processed ID 29
Processed ID 30
Processed ID 31
Processed ID 32
Processed ID 33
Processed ID 34
Processed ID 35
Processed ID 36
Processed ID 37
Processed ID 39
Processed ID 41
Processed ID 43
Processed ID 44

Figure 5: A grid of subsets for 27 AGNs in the ECDFS field generated by the Butler using deferred handles.