306.3. Cross match queries¶

306_3_Cross_match_queries

306.3. Cross-match by coordinate¶

logo_for_header 5.59.56 PM.png

For the Rubin Science Platform at data.lsst.cloud.
Data Release: Data Preview 1
Container Size: large
LSST Science Pipelines version: Release r29.1.1
Last verified to run: 2025-07-16
Repository: github.com/lsst/tutorial-notebooks

Learning objective: An efficient way to execute queries to check if lightcurves exist for a set of coordinates.

LSST data products: DiaObject, ForcedSourceOnDiaObject

Packages: lsst.rsp,lsst.afw

Credit: Originally developed by the Rubin Community Science team. Please consider acknowledging them if this notebook is used for the preparation of journal articles, software releases, or other notebooks.

Get Support: Everyone is encouraged to ask questions or raise issues in the Support Category of the Rubin Community Forum. Rubin staff will respond to all questions posted there.

1. Introduction¶

This tutorial demonstrates how to make efficient queries for the common question of "does LSST have light curves at these sky coordinates?"

Avoid: querying the forced photometry tables by coordinate.

It is recommended that the fluxes in the forced photometry table, ForcedSourceOnDiaObject, be used to generate light curves. However, queries by coordinate on the ForcedSourceOnDiaObject table are very inefficient and should be avoided. This is because the ForcedSourceOnDiaObject table contains forced flux measurements made on every difference and direct image, at the location of every DiaObject. It is much more efficient to do the spatial cross-match with DiaObject once and then retrieve forced fluxes for successful matches based on their IDs, rather than spatially cross-matching to the tens or hundreds (and in the future, thousands) of visits. The same advice applies to the ForcedSource and Object tables.

Avoid: putting asynchronous queries into for loops.

Although it is possible loop over a set of RA and Dec, and make a spatial cross-match query for each coordinate, this should be avoided. In the future users will be rate-limited on TAP queries and unable to have many queries executing simultaneously.

Correct: upload a table of coordinates and cross-match using a TAP table join.

This notebook demonstrates how to upload a table of coordinates and use the TAP service to cross-match to the DiaObject table and then retrieve light curves for successful cross-matches.

Related tutorials: The 200-level tutorials on the DiaObject and ForcedSourceOnDiaObject catalogs. The 100-level tutorials on catalog access with TAP and ADQL query statements.

1.1. Import packages¶

Import modules from matplotlib, numpy, and astropy.

From the lsst package, import the module get_tap_service for accessing the Table Access Protocol (TAP) service.

In [1]:

import matplotlib.pyplot as plt
import numpy as np
from astropy.table import Table
from lsst.rsp import get_tap_service
from lsst.utils.plotting import (get_multiband_plot_colors,
                                 get_multiband_plot_symbols)

1.2. Define parameters and functions¶

Create an instance of the TAP service, and assert that it exists.

In [2]:

service = get_tap_service("tap")
assert service is not None

Define filter names, plot markers, linestyles, and colors for plotting

In [3]:

filter_names = ['u', 'g', 'r', 'i', 'z', 'y']
filter_colors = get_multiband_plot_colors()
filter_symbols = get_multiband_plot_symbols()

2. Load a coordinate list¶

This pre-made demo list includes the DiaObjects identified in another tutorial in this series as being potential supernovae, and stars which may or may not be variable, all in the Extended Chandra Deep Field South (ECDFS).

The central coordinates of ECDFS are RA, Dec = 53.13, -28.10 degrees.

In [4]:

ra_cen = 53.13
dec_cen = -28.10

Read in the targets as an Astropy Table.

In [5]:

path = '/rubin/cst_repos/tutorial-notebooks-data/data/'
user_table = Table.read(path + 'dp1_306_3_targets.dat', format='ascii.csv')
assert len(user_table) == 32

Option to view the table.

In [6]:

# user_table

Visualize the target coordinates.

Targets have been "named" with integers that represent their origin:

100s: known DiaObjects
200s: known Objects (random stars)
300s: random coordinates

In [7]:

fig = plt.figure(figsize=(4, 4))
plt.plot(ra_cen, dec_cen, '*', ms=10, mew=0, color='black', label='ECDFS center')
tx1 = np.where(user_table['name'] < 200)[0]
tx2 = np.where((user_table['name'] >= 200) & (user_table['name'] < 300))[0]
tx3 = np.where(user_table['name'] >= 300)[0]
plt.plot(user_table['ra'][tx1], user_table['dec'][tx1], 'o', ms=5, mew=0, alpha=0.7,
         color='grey', label='DiaObject')
plt.plot(user_table['ra'][tx2], user_table['dec'][tx2], 's', ms=5, mew=0, alpha=0.7,
         color='dodgerblue', label='Object')
plt.plot(user_table['ra'][tx3], user_table['dec'][tx3], '^', ms=6, mew=0, alpha=0.7,
         color='darkorange', label='random')
plt.legend(loc='best', handletextpad=0)
plt.gca().invert_xaxis()
plt.xlabel('RA')
plt.ylabel('Dec')
plt.show()

No description has been provided for this image

Figure 1: The sky locations of the coordinates for this demonstration.

3. Query in two steps¶

3.1. Cross-match to diaObject¶

Upload the table to the TAP service and join it with the DiaObject table.

Use a spatial cross-match radius of 0.5 arcsec (0.00014 deg), and impose the constraint that only matched DiaObjects that were detected more than 10 times are returned (nDiaSources > 10).

Also return the DISTANCE(), the radial distance in degrees, between the target and the DiaObject. In this scenario, all matches to the DiaObject table are exact matches to known DiaObjects, so all distances will be 0.0, but in other scenarios it would be informative.

In [8]:

query = """
        SELECT diao.diaObjectId, diao.ra, diao.dec, diao.nDiaSources,
        ut.name AS ut_name, ut.ra AS ut_ra, ut.dec AS ut_dec,
        DISTANCE(POINT('ICRS', diao.ra, diao.dec), POINT('ICRS', ut.ra, ut.dec)) AS distance
        FROM dp1.DiaObject AS diao
        JOIN TAP_UPLOAD.ut AS ut
        WHERE CONTAINS(POINT('ICRS', diao.ra, diao.dec),
        CIRCLE('ICRS', ut.ra, ut.dec, 0.00014))=1
        AND diao.nDiaSources > 10
        """

In [9]:

job = service.submit_job(query, uploads={"ut": user_table})
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
if job.phase == 'ERROR':
    job.raise_if_error()

Job phase is COMPLETED

In [10]:

assert job.phase == 'COMPLETED'
dia_objects = job.fetch_result().to_table()

Show the table of the 13 matched DiaObjects.

In [11]:

assert len(dia_objects) == 13
print(len(dia_objects))
dia_objects

Out[11]:

Table length=13

diaObjectId	ra	dec	nDiaSources	ut_name	ut_ra	ut_dec	distance
	deg	deg
int64	float64	float64	int64	int64	float64	float64	float64
611255278800732178	52.69695529429179	-27.988292690356108	78	108	52.69695529429179	-27.988292690356108	0.0
609788942606139423	53.057543091666545	-28.470666917705937	30	107	53.057543091666545	-28.470666917705937	0.0
609789423642476567	53.59070920538382	-28.337607282878942	38	109	53.59070920538382	-28.337607282878942	0.0
609788805167185927	53.38674314387356	-28.372341661331774	167	104	53.38674314387356	-28.372341661331774	0.0
611256447031836758	53.11255514455679	-27.68482288916528	183	103	53.11255514455679	-27.68482288916528	0.0
611255759837069429	53.13130494487106	-27.743080378923775	21	110	53.13130494487106	-27.743080378923775	0.0
611255759837069401	53.124767650110215	-27.739814591611168	253	102	53.124767650110215	-27.739814591611168	0.0
611255003922825298	53.290560409754	-27.937289443173917	52	113	53.290560409754	-27.937289443173917	0.0
611255003922825270	53.30261321636857	-27.93106467624108	37	112	53.30261321636857	-27.93106467624108	0.0
611255141361779352	53.04016935490686	-27.993923912026723	35	101	53.04016935490686	-27.993923912026723	0.0
611255141361779062	53.007957021858076	-27.902674310446752	36	106	53.007957021858076	-27.902674310446752	0.0
611255072642302059	53.2226933114704	-27.889567374680585	37	105	53.2226933114704	-27.889567374680585	0.0
611255072642301958	53.097942739683255	-27.986920270662733	109	111	53.097942739683255	-27.986920270662733	0.0

The table above shows that only known DiaObjects (with a ut_name starting with the number 1) were successfully cross matched, as expected.

Confirm that no targets were matched with multiple DiaObjects.

In [12]:

values, counts = np.unique(dia_objects['ut_name'], return_counts=True)
tx = np.where(counts > 1)[0]
print('Number of multiply-matched targets: ', len(tx))

Number of multiply-matched targets:  0

3.2. Retrieve forced photometry¶

Retrieve forced photometry by diaObjectId using a WHERE ... IN () statement.

In [13]:

dia_object_id_list = "(" + ", ".join(str(value) for value in dia_objects['diaObjectId']) + ")"
print(dia_object_id_list)

(611255278800732178, 609788942606139423, 609789423642476567, 609788805167185927, 611256447031836758, 611255759837069429, 611255759837069401, 611255003922825298, 611255003922825270, 611255141361779352, 611255141361779062, 611255072642302059, 611255072642301958)

In [14]:

query = """SELECT fsodo.coord_ra, fsodo.coord_dec,
        fsodo.diaObjectId, fsodo.visit, fsodo.band,
        fsodo.psfDiffFlux, fsodo.psfDiffFluxErr,
        vis.expMidptMJD
        FROM dp1.ForcedSourceOnDiaObject as fsodo
        JOIN dp1.Visit as vis ON vis.visit = fsodo.visit
        WHERE diaObjectId IN {}
        ORDER BY diaObjectId ASC
        """.format(dia_object_id_list)

In [15]:

job = service.submit_job(query)
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
if job.phase == 'ERROR':
    job.raise_if_error()

Job phase is COMPLETED

In [16]:

assert job.phase == 'COMPLETED'
fsodo_table = job.fetch_result().to_table()

In [17]:

assert len(fsodo_table) == 7034
print(len(fsodo_table))

Option to display the table of forced photometry.

In [ ]:

# fsodo_table

See Section 5 for a demonstration of how to plot light curves using the retrieved forced photometry.

In [18]:

del dia_objects, fsodo_table

4. Query in one step (table join)¶

Combine the two steps of Section 3 into a single query with table joins.

This query does the spatial cross-match to the targets on the DiaObject table, and joins with the ForcedSourceOnDiaObject and Visit table to retrieve light curves for successful matches.

In [19]:

query = """
        SELECT diao.diaObjectId, diao.ra, diao.dec, diao.nDiaSources,
        ut.name AS ut_name, ut.ra AS ut_ra, ut.dec AS ut_dec,
        fsodo.diaObjectId, fsodo.visit, fsodo.band,
        fsodo.psfDiffFlux, fsodo.psfDiffFluxErr,
        v.expMidptMJD
        FROM dp1.DiaObject AS diao
        JOIN TAP_UPLOAD.ut AS ut
        JOIN dp1.ForcedSourceOnDiaObject AS fsodo ON fsodo.diaObjectId = diao.diaObjectId
        JOIN dp1.Visit AS v ON v.visit = fsodo.visit
        WHERE CONTAINS(POINT('ICRS', diao.ra, diao.dec),
        CIRCLE('ICRS', ut.ra, ut.dec, 0.00014))=1
        AND diao.nDiaSources > 10
        """
print(query)

        SELECT diao.diaObjectId, diao.ra, diao.dec, diao.nDiaSources,
        ut.name AS ut_name, ut.ra AS ut_ra, ut.dec AS ut_dec,
        fsodo.diaObjectId, fsodo.visit, fsodo.band,
        fsodo.psfDiffFlux, fsodo.psfDiffFluxErr,
        v.expMidptMJD
        FROM dp1.DiaObject AS diao
        JOIN TAP_UPLOAD.ut AS ut
        JOIN dp1.ForcedSourceOnDiaObject AS fsodo ON fsodo.diaObjectId = diao.diaObjectId
        JOIN dp1.Visit AS v ON v.visit = fsodo.visit
        WHERE CONTAINS(POINT('ICRS', diao.ra, diao.dec),
        CIRCLE('ICRS', ut.ra, ut.dec, 0.00014))=1
        AND diao.nDiaSources > 10

In [20]:

job = service.submit_job(query, uploads={"ut": user_table})
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
if job.phase == 'ERROR':
    job.raise_if_error()

Job phase is COMPLETED

In [21]:

assert job.phase == 'COMPLETED'
fsodo_table_2 = job.fetch_result().to_table()

In [22]:

assert len(fsodo_table_2) == 7034
print(len(fsodo_table_2))

The fsodo_table_2 table contains the same data as the fsodo_table from Section 3.2.

5. Plot light curves¶

How to plot light curves is also demonstrated in other tutorials.

First get the identifiers of the unique DiaObjects for which forced photometry was returned.

In [23]:

unique_dia_obj_id = np.unique(fsodo_table_2['diaObjectId'])
assert len(unique_dia_obj_id) == 13
print(len(unique_dia_obj_id))

In [25]:

fig, ax = plt.subplots(3, 2, figsize=(8, 6), sharey=False, sharex=False)
x = 0
for i in range(3):
    for j in range(2):
        for f, filt in enumerate(filter_names):
            dx = np.where(fsodo_table_2['diaObjectId'] == unique_dia_obj_id[x])
            lc = fsodo_table_2[dx]
            fx = np.where(lc['band'] == filt)[0]
            if (i == 0) & (j == 0):
                ax[i, j].plot(lc['expMidptMJD'][fx], lc['psfDiffFlux'][fx],
                              filter_symbols[filt], ms=5, mew=0, alpha=0.5,
                              color=filter_colors[filt], label=filt)
            else:
                ax[i, j].plot(lc['expMidptMJD'][fx], lc['psfDiffFlux'][fx],
                              filter_symbols[filt], ms=5, mew=0, alpha=0.5,
                              color=filter_colors[filt])
            del dx, fx
        ax[i, j].set_title(unique_dia_obj_id[x])
        del lc
        if (i == 0) & (j == 0):
            ax[i, j].legend(loc='best', ncol=3, handletextpad=0)
        if i == 2:
            ax[i, j].xaxis.set_label_text('MJD (days)')
        if j == 0:
            ax[i, j].yaxis.set_label_text('Diff Flux (nJy)')
        x += 1
plt.tight_layout()
plt.show()

Figure 2: The first six light curves.

In [26]:

del fsodo_table_2

In [ ]: