201.2. Source table¶
201.2. Source table¶
Data Release: Data Preview 1
Container Size: large
LSST Science Pipelines version: r29.1.1
Last verified to run: 2025-06-21
Repository: github.com/lsst/tutorial-notebooks
Learning objective: To understand the contents of the Source
table and how to access it.
LSST data products: Source
Packages: lsst.rsp
, lsst.daf.butler
Credit: Originally developed by the Rubin Community Science team. Please consider acknowledging them if this notebook is used for the preparation of journal articles, software releases, or other notebooks.
Get Support: Everyone is encouraged to ask questions or raise issues in the Support Category of the Rubin Community Forum. Rubin staff will respond to all questions posted there.
1. Introduction¶
The Source
table contains measurements for sources detected in individual visit_images
with signal-to-noise ratio $\geq5$.
- TAP table name:
dp1.Source
- butler table name:
source
- columns: 156
- rows: 45,565,632
Related tutorials: The TAP and Butler data access services are demonstrated in the 100-level "How to" tutorials. There is a 200-level tutorial on visit_images
.
1.1. Import packages¶
Import standard python packages re
, numpy
, matplotlib
and astropy
.
From the lsst
package, import modules for the TAP service, the Butler, and plotting.
import re
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from astropy.time import Time
from lsst.rsp import get_tap_service
from lsst.daf.butler import Butler, Timespan
from lsst.utils.plotting import (get_multiband_plot_colors,
get_multiband_plot_symbols)
1.2. Define parameters and functions¶
Create an instance of the TAP service, and assert that it exists.
service = get_tap_service("tap")
assert service is not None
Create an instance of the Rubin data Butler, and assert that it exists.
butler = Butler('dp1', collections="LSSTComCam/DP1")
assert butler is not None
Define the colors and symbols to represent the LSST filters in plots.
filter_names = ['u', 'g', 'r', 'i', 'z', 'y']
filter_colors = get_multiband_plot_colors()
filter_symbols = get_multiband_plot_symbols()
2. Schema (columns)¶
To browse the table schema visit the Rubin schema browser, or use the TAP service via the Portal Aspect or as demonstrated in Section 2.1.
2.1. Retrieve table schema¶
To retrieve the table schema, define a query for the schema columns of the Source
table and run the query job.
query = "SELECT column_name, datatype, description, unit " \
"FROM tap_schema.columns " \
"WHERE table_name = 'dp1.Source'"
job = service.submit_job(query)
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
if job.phase == 'ERROR':
job.raise_if_error()
Job phase is COMPLETED
Retrieve the query results and display them as an astropy
table with the to_table()
attribute.
assert job.phase == 'COMPLETED'
results = job.fetch_result().to_table()
results
column_name | datatype | description | unit |
---|---|---|---|
str64 | str64 | str512 | str64 |
ap03Flux | double | Flux within 3.0-pixel aperture | nJy |
ap03Flux_flag | boolean | General Failure Flag | |
ap03FluxErr | double | Flux uncertainty within 3.0-pixel aperture | nJy |
ap06Flux | double | Flux within 6.0-pixel aperture | nJy |
ap06Flux_flag | boolean | General Failure Flag | |
ap06FluxErr | double | Flux uncertainty within 6.0-pixel aperture | nJy |
ap09Flux | double | Flux within 9.0-pixel aperture | nJy |
ap09Flux_flag | boolean | General Failure Flag | |
ap09FluxErr | double | Flux uncertainty within 9.0-pixel aperture | nJy |
ap12Flux | double | Flux within 12.0-pixel aperture | nJy |
ap12Flux_flag | boolean | General Failure Flag | |
ap12FluxErr | double | Flux uncertainty within 12.0-pixel aperture | nJy |
ap17Flux | double | Flux within 17.0-pixel aperture | nJy |
ap17Flux_flag | boolean | General Failure Flag | |
ap17FluxErr | double | Flux uncertainty within 17.0-pixel aperture | nJy |
ap25Flux | double | Flux within 25.0-pixel aperture | nJy |
ap25Flux_flag | boolean | General Failure Flag | |
ap25FluxErr | double | Flux uncertainty within 25.0-pixel aperture | nJy |
ap35Flux | double | Flux within 35.0-pixel aperture | nJy |
ap35Flux_flag | boolean | General Failure Flag | |
ap35FluxErr | double | Flux uncertainty within 35.0-pixel aperture | nJy |
ap50Flux | double | Flux within 50.0-pixel aperture | nJy |
... | ... | ... | ... |
psfFlux_flag | boolean | Failure to derive linear least-squares fit of psf model forced on the calexp | |
psfFlux_flag_apCorr | boolean | Set if unable to aperture correct base_PsfFlux | |
psfFlux_flag_edge | boolean | Object was too close to the edge of the image to use the full PSF model | |
psfFlux_flag_noGoodPixels | boolean | Not enough non-rejected pixels in data to attempt the fit | |
psfFluxErr | double | Uncertainty on the flux derived from linear least-squares fit of psf model forced on the calexp | nJy |
ra | double | Position in right ascension. | deg |
ra_dec_Cov | float | Covariance between right ascension and declination. | deg**2 |
raErr | float | Error in right ascension. | deg |
sizeExtendedness | double | Moments-based measure of a source to be a galaxy. | |
sizeExtendedness_flag | boolean | Set to 1 for any fatal failure. | |
sky | double | Background in annulus around source | nJy |
sky_source | boolean | Sky objects. | |
skyErr | double | Background in annulus around source | nJy |
sourceId | long | Unique id. Unique Source ID. Primary Key. | |
variance_flag | boolean | Set for any fatal failure | |
variance_flag_emptyFootprint | boolean | Set to True when the footprint has no usable pixels | |
variance_value | double | Variance at object position | |
visit | long | Id of the visit where this source was measured. | |
x | double | Centroid from Sdss Centroid algorithm | pixel |
xErr | float | 1-sigma uncertainty on x position | pixel |
y | double | Centroid from Sdss Centroid algorithm | pixel |
yErr | float | 1-sigma uncertainty on y position | pixel |
The table displayed above has been truncated.
Option to print every column name as a list.
# for col in results['column_name']:
# print(col)
Option to use the regular expressions package re
to search for column names that contain the string temp
.
# temp = 'Err'
# temp = 'Flux'
# temp = 'psf'
# for col in results['column_name']:
# if re.search(temp, col):
# print(col)
Delete the job, but not the results
.
del query
job.delete()
2.2.2. Visit and detector¶
The visit
is a long integer that uniquely identifies each visit of the survey (a single observation / image).
The detector
is an integer between 1 and 9 indicating which of the 9 LSSTComCam detectors the source was detected on.
visit
,detector
2.2.4. Photometry¶
Most fluxes ($f$) are in nanoJanskys and can be converted to AB magnitudes ($m$) with:
$m = -2.5\log(f) + 31.4$
PSF fluxes
A forced fit of the Point Spread Function (PSF) at the object's coordinates in each image.
PSF fluxes are best to use for point-like sources (e.g., stars).
psfFlux
,psfFluxErr
Other measurements
gaussianFlux
: Gaussian fluxap*
: aperture fluxes (radii; pixels)
2.2.5. Shapes¶
HSM moments
Hirata-Seljak-Mandelbaum (HSM) moments (Hirata & Seljak 2003, Mandelbaum et al. 2005):
ixx
,iyy
, andixy
Extendedness (star/galaxy separation)
If the source is point-like or "not extended", extendedness
= 0 (False).
Otherwise, 1 (True).
extendedness
2.2.6. Flags¶
Pixel flags
A variety of flags indicating whether pixels that are saturated, or affected by cosmic rays, contributed to the source's measurements.
pixelFlags_*
Measurement flags
The flux and shape measurements mentioned above have associated flag columns suffixed with _flag
.
Blendedness
A measure of how much the flux is affected by neighbors $1 - \frac{f_{\rm child}}{f_{\rm parent}}$. This uses the absolute value of the instrumental flux to try to obtain a de-noised value. See section 4.9.11 of Bosch et al. 2018, PASJ, 70.
blendedness_abs
2.3. Descriptions and units¶
For a subset of the key columns show the table of their descriptions and units.
col_list = set(['sourceId', 'visit', 'ra', 'dec', 'psfFlux', 'psfFluxErr', 'extendedness'])
tx = [i for i, item in enumerate(results['column_name']) if item in col_list]
results[tx]
column_name | datatype | description | unit |
---|---|---|---|
str64 | str64 | str512 | str64 |
dec | double | Position in declination. | deg |
extendedness | double | Set to 1 for extended sources, 0 for point sources. | |
psfFlux | double | Flux derived from linear least-squares fit of psf model forced on the calexp | nJy |
psfFluxErr | double | Uncertainty on the flux derived from linear least-squares fit of psf model forced on the calexp | nJy |
ra | double | Position in right ascension. | deg |
sourceId | long | Unique id. Unique Source ID. Primary Key. | |
visit | long | Id of the visit where this source was measured. |
Clean up.
del col_list, tx, results
3. Data access¶
The Source
table is available via the TAP service and the Butler.
Recommended access method: TAP.
3.1. Advisory: avoid full-table queries¶
Avoid full-table queries. Always include a constraint on the coordinates or visit
.
The Source
table is a large, inclusive union set of all the measurements made in all the visit images of all detected sources.
The DP1 data release Source
table is relatively small, however, skipping spatial constraints is not a good habit to form, because future data release Source
tables will contain trillions of rows.
3.2. Advisory: sources are not light curves¶
The Source
table contains measurements for sources detected in the individual visit images.
This is not the recommended table to use for variable and transient object light curves.
It is recommended to use the Forced Source and Difference Image Analysis (DIA) tables for light curves.
3.3. TAP (Table Access Protocol)¶
The Source
table is stored in Qserv and accessible via the TAP services using ADQL queries.
Include spatial constraints:
Qserv stores catalog data sharded by coordinate (RA, Dec), so ADQL query statements that include constraints by coordinate do not requre a whole-catalog search and are typically faster (and can be much faster) than ADQL query statements which only include constraints for other columns.
Use either an ADQL cone or polygon search for faster queries (do not use WHERE ... BETWEEN
statements to set boundaries on RA and Dec).
3.3.1. Demo query¶
Define a query to return the seven "key columns" from Section 2.3.
Impose spatial constraints: search within a 2 arcsecond radius of a known object near the center of the Extended Chandra Deep Field South (ECDFS) field, RA, Dec = $53.128137, -28.103526$.
query = "SELECT sourceId, visit, ra, dec, psfFlux, psfFluxErr, extendedness " \
"FROM dp1.Source " \
"WHERE CONTAINS(POINT('ICRS', ra, dec), " \
"CIRCLE('ICRS', 53.128137, -28.103526, 0.00056)) = 1 " \
"ORDER BY sourceId ASC "
job = service.submit_job(query)
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
if job.phase == 'ERROR':
job.raise_if_error()
Job phase is COMPLETED
Fetch the results as an astropy
table.
assert job.phase == 'COMPLETED'
results = job.fetch_result().to_table()
print(len(results))
726
Option to display the table.
# results
Check if any visits had more than one source detected in the search radius.
values, counts = np.unique(results['visit'], return_counts=True)
tx = np.where(counts > 1)[0]
print('visits with >1 source detected in radius: ', len(tx))
for x in tx:
print('visit: ', values[tx[0]], ' number of sources', counts[tx[0]])
del values, counts, tx
visits with >1 source detected in radius: 1 visit: 2024120100201 number of sources 2
As an example, plot the RA and Dec offsets of every source from the search coordinates, in arcseconds.
cos_dec = np.cos(np.deg2rad(28.103526))
results['ra_offset_arcsec'] = (results['ra'] - 53.128137) * 3600.0 * cos_dec
results['dec_offset_arcsec'] = (results['dec'] + 28.103526) * 3600.0
fig, ax = plt.subplots(figsize=(4, 4))
circle = patches.Circle((0, 0), radius=0.00056*3600.0,
facecolor='None', edgecolor='blue')
ax.add_patch(circle)
ax.plot(results['ra_offset_arcsec'],
results['dec_offset_arcsec'],
'o', ms=2, mew=0, alpha=0.3, color='black')
ax.set_aspect(1)
plt.xlabel('RA offset [arcsec]')
plt.ylabel('Dec offset [arcsec]')
plt.tight_layout()
plt.show()
Figure 1: The offset in RA and Dec, in arcseconds, of every source from the search coordinates. A 2" radius circule marks the search region.
Clean up.
job.delete()
del query, results
3.3.2. Joinable tables¶
The Source
table can be joined to the Visit
table on the column containing the unique visit identifier, visit
.
The Visit
table contains information about the observation, such as the MJD and band.
To the query used in Section 3.2.1, add a table join to the Visit
table and retrieve columns expMidptMJD
and band
.
query = "SELECT s.sourceId, s.visit, s.ra, s.dec, v.expMidptMJD, v.band " \
"FROM dp1.Source AS s " \
"JOIN dp1.Visit AS v ON s.visit = v.visit " \
"WHERE CONTAINS(POINT('ICRS', s.ra, s.dec), " \
"CIRCLE('ICRS', 53.128137, -28.103526, 0.00056)) = 1 " \
"AND v.expMidptMJD > 60623.0 AND v.expMidptMJD < 60624.0 " \
"ORDER BY s.sourceId ASC "
job = service.submit_job(query)
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
if job.phase == 'ERROR':
job.raise_if_error()
Job phase is COMPLETED
Fetch the results as an astropy
table.
assert job.phase == 'COMPLETED'
results = job.fetch_result().to_table()
print(len(results))
62
Similar to Figure 1, plot the RA and Dec offsets of every source from the search coordinates, in arcseconds.
cos_dec = np.cos(np.deg2rad(28.103526))
results['ra_offset_arcsec'] = (results['ra'] - 53.128137) * 3600.0 * cos_dec
results['dec_offset_arcsec'] = (results['dec'] + 28.103526) * 3600.0
fig, ax = plt.subplots(figsize=(4, 4))
for filt in filter_names:
fx = np.where(results['band'] == filt)[0]
if len(fx) > 0:
ax.plot(results['ra_offset_arcsec'][fx],
results['dec_offset_arcsec'][fx],
filter_symbols[filt], ms=5, mew=0, alpha=0.7,
color=filter_colors[filt], label=filt)
ax.set_aspect(1)
plt.xlabel('RA offset [arcsec]')
plt.ylabel('Dec offset [arcsec]')
plt.legend(loc='upper left')
plt.tight_layout()
plt.show()
Figure 2: The offset in RA and Dec, in arcseconds, of every source from the search coordinates, with colors and symbols to represent the three filters (gri) obtained on the night of MJD = 60623.
Clean up.
job.delete()
del query, results
3.4. Butler¶
TAP is the recommended way to access the source table, but the Butler is a convenient way to retrieve all the sources in a given visit.
Show that the dimensions for the source
table are the filter, instrument, observation date, and visit, and that only instrument and visit are required.
butler.get_dataset_type('source')
DatasetType('source', {band, instrument, day_obs, physical_filter, visit}, ArrowAstropy)
butler.get_dataset_type('source').dimensions.required
{instrument, visit}
3.4.1. Demo query¶
Include spatial constraints:
The Butler source
table contents are stored and retrieved by individual visit.
Retrieve all dataset_refs
for source
tables for visits that overlap the coordinates near the center of the ECDFS field,
and which were obtained on MJD = 60623.
ra = 53.13
dec = -28.10
time1 = Time(60623.0, format="mjd", scale="tai")
time2 = Time(60624.0, format="mjd", scale="tai")
timespan = Timespan(time1, time2)
refs = butler.query_datasets("source",
where="visit.timespan OVERLAPS :timespan AND \
visit_detector_region.region OVERLAPS POINT(:ra, :dec)",
bind={"timespan": timespan,
"ra": ra, "dec": dec})
print(len(refs))
62
When using the Butler, source
table data is returned by visit (by dataset reference).
Print the dataId
of the first element of the refs
.
print(refs[0].dataId)
{instrument: 'LSSTComCam', visit: 2024110800245, band: 'i', day_obs: 20241108, physical_filter: 'i_06'}
Define the columns to retrieve.
col_list = ['sourceId', 'ra', 'dec', 'psfFlux', 'psfFluxErr']
Get the data from the Butler.
results = butler.get(refs[0],
parameters={'columns': col_list})
print(len(results))
29534
Option to display the results.
# results
Constrain the results to sources with magnitudes between 20 and 25 mag.
tx = np.where((results['psfFlux'] > 360.0)
& (results['psfFlux'] <= 36000.0))[0]
print(len(tx))
28081
Plot the sky coordinates and cModel
fluxes for the subset of objects.
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(7, 3))
ax1.plot(results['ra'][tx], results['dec'][tx],
'o', ms=2, mew=0, alpha=0.1, color='grey')
ax1.set_xlabel('Right Ascension')
ax1.set_ylabel('Declination')
ax2.plot(results['psfFlux'][tx], results['psfFluxErr'][tx],
'o', ms=2, mew=0, alpha=0.1, color='grey')
ax2.set_xlabel('PSF Flux')
ax2.set_ylabel('Error')
plt.tight_layout()
plt.show()
Figure 3: At left, the RA vs. Dec of retrieved sources as grey points. At right, the PSF flux vs. its uncertainty.
del ra, dec, time1, time2, timespan
del refs, col_list, results, tx