102.3. Table joins with TAP¶
102.3. Table joins with TAP¶
Data Release: Data Preview 1
Container Size: large
LSST Science Pipelines version: r29.1.1
Last verified to run: 2025-06-25
Repository: github.com/lsst/tutorial-notebooks
Learning objective: How to join tables in TAP queries on catalogs.
LSST data products: ForcedSourceOnDiaObject
, Visit
, DiaSource
, SSSource
tables
Packages: lsst.rsp.get_tap_service
Credit: Originally developed by the Rubin Community Science team. Please consider acknowledging them if this notebook is used for the preparation of journal articles, software releases, or other notebooks.
Get Support: Everyone is encouraged to ask questions or raise issues in the Support Category of the Rubin Community Forum. Rubin staff will respond to all questions posted there.
1. Introduction¶
TAP provides standardized access to catalog data for discovery, search, and retrieval. Full documentation for TAP is provided by the International Virtual Observatory Alliance (IVOA).
ADQL (Astronomy Data Query Language) is similar to SQL (Structured Query Langage). The documentation for ADQL includes more information about syntax and keywords. Not all ADQL functionality is supported yet in the preview-era RSP.
1.1. Table joins¶
Tables can be joined when they have an index or identifier column in common.
There are four main types of table joins in ADQL/SQL:
- inner: will return rows that appear in both tables and meet constraints (intersection set)
- outer: will return rows from either table that meet constraints (union set)
- left: will return rows from the first table that meet constraints, plus matching rows from the second table
- right: will return rows from the second table that meet constraints, plus matching rows from the first table
When JOIN
is used in an ADQL query, it is an inner join by default.
Generic example:
To include a table join in an ADQL query requires the specification of which column to be "joined on".
For example, the query below retrieves all rows from "table1" that are within 0.002 degrees of coordinates ra_cen, dec_cen, but only if there is a row in "table2" which has the same value of id
(i.e., an inner join, which is default).
Columns mag
and band
from "table2" are also returned for these matching rows.
SELECT t1.id, t1.ra, t1.dec, t2.mag, t2.band
FROM catalog.table1 AS t1
JOIN catalog.table2 AS t2 ON t1.id = t2.id
WHERE CONTAINS(POINT('IRCS', t1.ra, t1.dec), CIRCLE('ICRS', ra_cen, dec_cen, 0.002)) = 1
Related tutorials: The other 100-level tutorials in this series demonstrate the TAP basics. The 200-level tutorials for the individual catalogs list which tables are joinable with them.
1.2. Import packages¶
Import general python packages numpy
, matplotlib
,
the RSP TAP service, and LSST utility functions for plotting.
import numpy as np
import matplotlib.pyplot as plt
from lsst.rsp import get_tap_service
from lsst.utils.plotting import (get_multiband_plot_colors,
get_multiband_plot_symbols)
1.3. Define parameters¶
Instantiate the TAP service.
service = get_tap_service("tap")
assert service is not None
Define filter names, plot markers, and colors for plotting.
filter_names = ['u', 'g', 'r', 'i', 'z', 'y']
filter_colors = get_multiband_plot_colors()
filter_symbols = get_multiband_plot_symbols()
2. Use case: light curves¶
Light curves are an object's brightness over time.
The tables of forced photometry measurements in all individual visit_images
and difference_images
, which should be used for LSST light curves, are:
ForcedSource
: at the locations of allObjects
ForcedSourceOnDiaObject
: at the locations of allDiaObjects
They both need to be joined with the Visit
table on the shared column of the visit identifier (visit
or visitId
) to obtain the time (MJD) of the observation, and plot the light curve.
2.1. Example: join Visit with ForcedSourceOnDiaObject¶
For a known variable star that was detected with difference image analysis (DIA) and has a diaObjectId
of 614435753623027782,
join the ForcedSourceOnDiaObject
table with the Visit
table.
Retrieve the band
(filter) and psfFlux
(direct-image forced PSF photometry) for every visit from the ForcedSourceOnDiaObject
.
Join to the Visit
table on the visit identifier to retrieve each visit's Modified Julian Date, so that flux vs. time can be plotted (i.e., the light curve).
query = """SELECT fsodo.band, fsodo.psfFlux, vis.expMidptMJD
FROM dp1.ForcedSourceOnDiaObject as fsodo
JOIN dp1.Visit as vis
ON vis.visit = fsodo.visit
WHERE fsodo.diaObjectId = 614435753623027782"""
print(query)
SELECT fsodo.band, fsodo.psfFlux, vis.expMidptMJD FROM dp1.ForcedSourceOnDiaObject as fsodo JOIN dp1.Visit as vis ON vis.visit = fsodo.visit WHERE fsodo.diaObjectId = 614435753623027782
job = service.submit_job(query)
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
if job.phase == 'ERROR':
job.raise_if_error()
assert job.phase == 'COMPLETED'
results = job.fetch_result().to_table()
print(len(results))
Job phase is COMPLETED 225
results.colnames
['band', 'psfFlux', 'expMidptMJD']
Plot the light curve.
fig = plt.figure(figsize=(4, 3))
for filt in filter_names:
fx = np.where(results['band'] == filt)[0]
plt.plot(results['expMidptMJD'][fx],
results['psfFlux'][fx], filter_symbols[filt],
color=filter_colors[filt], label=filt)
plt.xlabel('MJD')
plt.ylabel('PSF direct image flux (nJy)')
plt.legend(ncols=2)
plt.title('DiaObject 614435753623027782')
plt.show()
Figure 1: A light curve created via table join.
3. Use case: moving objects¶
All of the Solar System catalogs are joinable to each other on identifiers diaSourceId
(individual detections in the difference images), ssObjectId
(associations of detections into single moving objects).
3.1. Example: join SSSource and DiaSource¶
Use the ssObjectId
of a known moving object, 21163620217073748, and retrieve the heliocentric X and Y coordinates at the times of the individual observations from the SSSource
table.
Join to the DiaSource
table to retrieve the MJD times of the individual observations. Sort the results by MJD.
query = """SELECT sss.heliocentricX, sss.heliocentricY,
dias.midpointMjdTai
FROM dp1.SSSource as sss
JOIN dp1.DiaSource as dias
ON dias.diaSourceId = sss.diaSourceId
WHERE sss.ssObjectId = 21163620217073748
ORDER BY dias.midpointMjdTai ASC"""
print(query)
SELECT sss.heliocentricX, sss.heliocentricY, dias.midpointMjdTai FROM dp1.SSSource as sss JOIN dp1.DiaSource as dias ON dias.diaSourceId = sss.diaSourceId WHERE sss.ssObjectId = 21163620217073748 ORDER BY dias.midpointMjdTai ASC
job = service.submit_job(query)
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
if job.phase == 'ERROR':
job.raise_if_error()
assert job.phase == 'COMPLETED'
results = job.fetch_result().to_table()
print(len(results))
Job phase is COMPLETED 133
# results
Plot the heliocentric X and Y coordinates for the individual observations, and color the points by days since first detection to visualize the orbit.
day1 = np.floor(results['midpointMjdTai'][0])
fig = plt.figure(figsize=(4, 3))
plt.scatter(results['heliocentricX'],
results['heliocentricY'],
c=results['midpointMjdTai']-day1,
cmap='viridis', alpha=0.5, edgecolors='none')
plt.xlabel('Heliocentric X')
plt.ylabel('Heliocentric Y')
plt.colorbar(label='MJD - '+str(day1))
plt.show()
Figure 2: A moving object's trajectory created via table join.