201.2. Source table¶

201_2_Source_table

201.2. Source table¶

For the Rubin Science Platform at data.lsst.cloud.
Data Release: Data Preview 1
Container Size: large
LSST Science Pipelines version: r29.1.1
Last verified to run: 2025-06-21
Repository: github.com/lsst/tutorial-notebooks

Learning objective: To understand the contents of the Source table and how to access it.

LSST data products: Source

Packages: lsst.rsp, lsst.daf.butler

Credit: Originally developed by the Rubin Community Science team. Please consider acknowledging them if this notebook is used for the preparation of journal articles, software releases, or other notebooks.

Get Support: Everyone is encouraged to ask questions or raise issues in the Support Category of the Rubin Community Forum. Rubin staff will respond to all questions posted there.

1. Introduction¶

The Source table contains measurements for sources detected in individual visit_images with signal-to-noise ratio $\geq5$.

TAP table name: dp1.Source
butler table name: source
columns: 156
rows: 45,565,632

Related tutorials: The TAP and Butler data access services are demonstrated in the 100-level "How to" tutorials. There is a 200-level tutorial on visit_images.

1.1. Import packages¶

Import standard python packages re, numpy, matplotlib and astropy.

From the lsst package, import modules for the TAP service, the Butler, and plotting.

In [1]:

import re
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from astropy.time import Time
from lsst.rsp import get_tap_service
from lsst.daf.butler import Butler, Timespan
from lsst.utils.plotting import (get_multiband_plot_colors,
                                 get_multiband_plot_symbols)

1.2. Define parameters and functions¶

Create an instance of the TAP service, and assert that it exists.

In [2]:

service = get_tap_service("tap")
assert service is not None

Create an instance of the Rubin data Butler, and assert that it exists.

In [3]:

butler = Butler('dp1', collections="LSSTComCam/DP1")
assert butler is not None

Define the colors and symbols to represent the LSST filters in plots.

In [4]:

filter_names = ['u', 'g', 'r', 'i', 'z', 'y']
filter_colors = get_multiband_plot_colors()
filter_symbols = get_multiband_plot_symbols()

2. Schema (columns)¶

To browse the table schema visit the Rubin schema browser, or use the TAP service via the Portal Aspect or as demonstrated in Section 2.1.

2.1. Retrieve table schema¶

To retrieve the table schema, define a query for the schema columns of the Source table and run the query job.

In [5]:

query = "SELECT column_name, datatype, description, unit " \
        "FROM tap_schema.columns " \
        "WHERE table_name = 'dp1.Source'"
job = service.submit_job(query)
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
if job.phase == 'ERROR':
    job.raise_if_error()

Job phase is COMPLETED

Retrieve the query results and display them as an astropy table with the to_table() attribute.

In [6]:

assert job.phase == 'COMPLETED'
results = job.fetch_result().to_table()
results

Out[6]:

Table length=156

column_name	datatype	description	unit
str64	str64	str512	str64
ap03Flux	double	Flux within 3.0-pixel aperture	nJy
ap03Flux_flag	boolean	General Failure Flag
ap03FluxErr	double	Flux uncertainty within 3.0-pixel aperture	nJy
ap06Flux	double	Flux within 6.0-pixel aperture	nJy
ap06Flux_flag	boolean	General Failure Flag
ap06FluxErr	double	Flux uncertainty within 6.0-pixel aperture	nJy
ap09Flux	double	Flux within 9.0-pixel aperture	nJy
ap09Flux_flag	boolean	General Failure Flag
ap09FluxErr	double	Flux uncertainty within 9.0-pixel aperture	nJy
ap12Flux	double	Flux within 12.0-pixel aperture	nJy
ap12Flux_flag	boolean	General Failure Flag
ap12FluxErr	double	Flux uncertainty within 12.0-pixel aperture	nJy
ap17Flux	double	Flux within 17.0-pixel aperture	nJy
ap17Flux_flag	boolean	General Failure Flag
ap17FluxErr	double	Flux uncertainty within 17.0-pixel aperture	nJy
ap25Flux	double	Flux within 25.0-pixel aperture	nJy
ap25Flux_flag	boolean	General Failure Flag
ap25FluxErr	double	Flux uncertainty within 25.0-pixel aperture	nJy
ap35Flux	double	Flux within 35.0-pixel aperture	nJy
ap35Flux_flag	boolean	General Failure Flag
ap35FluxErr	double	Flux uncertainty within 35.0-pixel aperture	nJy
ap50Flux	double	Flux within 50.0-pixel aperture	nJy
...	...	...	...
psfFlux_flag	boolean	Failure to derive linear least-squares fit of psf model forced on the calexp
psfFlux_flag_apCorr	boolean	Set if unable to aperture correct base_PsfFlux
psfFlux_flag_edge	boolean	Object was too close to the edge of the image to use the full PSF model
psfFlux_flag_noGoodPixels	boolean	Not enough non-rejected pixels in data to attempt the fit
psfFluxErr	double	Uncertainty on the flux derived from linear least-squares fit of psf model forced on the calexp	nJy
ra	double	Position in right ascension.	deg
ra_dec_Cov	float	Covariance between right ascension and declination.	deg**2
raErr	float	Error in right ascension.	deg
sizeExtendedness	double	Moments-based measure of a source to be a galaxy.
sizeExtendedness_flag	boolean	Set to 1 for any fatal failure.
sky	double	Background in annulus around source	nJy
sky_source	boolean	Sky objects.
skyErr	double	Background in annulus around source	nJy
sourceId	long	Unique id. Unique Source ID. Primary Key.
variance_flag	boolean	Set for any fatal failure
variance_flag_emptyFootprint	boolean	Set to True when the footprint has no usable pixels
variance_value	double	Variance at object position
visit	long	Id of the visit where this source was measured.
x	double	Centroid from Sdss Centroid algorithm	pixel
xErr	float	1-sigma uncertainty on x position	pixel
y	double	Centroid from Sdss Centroid algorithm	pixel
yErr	float	1-sigma uncertainty on y position	pixel

The table displayed above has been truncated.

Option to print every column name as a list.

In [7]:

# for col in results['column_name']:
#     print(col)

Option to use the regular expressions package re to search for column names that contain the string temp.

In [8]:

# temp = 'Err'
# temp = 'Flux'
# temp = 'psf'
# for col in results['column_name']:
#     if re.search(temp, col):
#         print(col)

Delete the job, but not the results.

In [9]:

del query
job.delete()

2.2. Key columns¶

Of the $>$100 columns of the Source table, a few tens are the most commonly used.

2.2.1. Source Id¶

The long integer that uniquely identifies each row of the Source table.

sourceId

2.2.2. Visit and detector¶

The visit is a long integer that uniquely identifies each visit of the survey (a single observation / image). The detector is an integer between 1 and 9 indicating which of the 9 LSSTComCam detectors the source was detected on.

visit, detector

2.2.3. Coordinates¶

The sky coordinates in decimal degrees for each source:

ra, raErr
dec, decErr

2.2.4. Photometry¶

Most fluxes ($f$) are in nanoJanskys and can be converted to AB magnitudes ($m$) with:

$m = -2.5\log(f) + 31.4$

PSF fluxes

A forced fit of the Point Spread Function (PSF) at the object's coordinates in each image.

PSF fluxes are best to use for point-like sources (e.g., stars).

psfFlux, psfFluxErr

Other measurements

gaussianFlux: Gaussian flux
ap*: aperture fluxes (radii; pixels)

2.2.5. Shapes¶

HSM moments

Hirata-Seljak-Mandelbaum (HSM) moments (Hirata & Seljak 2003, Mandelbaum et al. 2005):

ixx, iyy, and ixy

Extendedness (star/galaxy separation)

If the source is point-like or "not extended", extendedness = 0 (False). Otherwise, 1 (True).

extendedness

2.2.6. Flags¶

Pixel flags

A variety of flags indicating whether pixels that are saturated, or affected by cosmic rays, contributed to the source's measurements.

pixelFlags_*

Measurement flags

The flux and shape measurements mentioned above have associated flag columns suffixed with _flag.

Blendedness

A measure of how much the flux is affected by neighbors $1 - \frac{f_{\rm child}}{f_{\rm parent}}$. This uses the absolute value of the instrumental flux to try to obtain a de-noised value. See section 4.9.11 of Bosch et al. 2018, PASJ, 70.

blendedness_abs

2.3. Descriptions and units¶

For a subset of the key columns show the table of their descriptions and units.

In [10]:

col_list = set(['sourceId', 'visit', 'ra', 'dec', 'psfFlux', 'psfFluxErr', 'extendedness'])
tx = [i for i, item in enumerate(results['column_name']) if item in col_list]
results[tx]

Out[10]:

Table length=7

column_name	datatype	description	unit
str64	str64	str512	str64
dec	double	Position in declination.	deg
extendedness	double	Set to 1 for extended sources, 0 for point sources.
psfFlux	double	Flux derived from linear least-squares fit of psf model forced on the calexp	nJy
psfFluxErr	double	Uncertainty on the flux derived from linear least-squares fit of psf model forced on the calexp	nJy
ra	double	Position in right ascension.	deg
sourceId	long	Unique id. Unique Source ID. Primary Key.
visit	long	Id of the visit where this source was measured.

Clean up.

In [11]:

del col_list, tx, results

3. Data access¶

The Source table is available via the TAP service and the Butler.

Recommended access method: TAP.

3.1. Advisory: avoid full-table queries¶

Avoid full-table queries. Always include a constraint on the coordinates or visit.

The Source table is a large, inclusive union set of all the measurements made in all the visit images of all detected sources.

The DP1 data release Source table is relatively small, however, skipping spatial constraints is not a good habit to form, because future data release Source tables will contain trillions of rows.

3.2. Advisory: sources are not light curves¶

The Source table contains measurements for sources detected in the individual visit images. This is not the recommended table to use for variable and transient object light curves.

It is recommended to use the Forced Source and Difference Image Analysis (DIA) tables for light curves.

3.3. TAP (Table Access Protocol)¶

The Source table is stored in Qserv and accessible via the TAP services using ADQL queries.

Include spatial constraints: Qserv stores catalog data sharded by coordinate (RA, Dec), so ADQL query statements that include constraints by coordinate do not requre a whole-catalog search and are typically faster (and can be much faster) than ADQL query statements which only include constraints for other columns. Use either an ADQL cone or polygon search for faster queries (do not use WHERE ... BETWEEN statements to set boundaries on RA and Dec).

3.3.1. Demo query¶

Define a query to return the seven "key columns" from Section 2.3.

Impose spatial constraints: search within a 2 arcsecond radius of a known object near the center of the Extended Chandra Deep Field South (ECDFS) field, RA, Dec = $53.128137, -28.103526$.

In [12]:

query = "SELECT sourceId, visit, ra, dec, psfFlux, psfFluxErr, extendedness " \
        "FROM dp1.Source " \
        "WHERE CONTAINS(POINT('ICRS', ra, dec), " \
        "CIRCLE('ICRS', 53.128137, -28.103526, 0.00056)) = 1 " \
        "ORDER BY sourceId ASC "
job = service.submit_job(query)
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
if job.phase == 'ERROR':
    job.raise_if_error()

Job phase is COMPLETED

Fetch the results as an astropy table.

In [13]:

assert job.phase == 'COMPLETED'
results = job.fetch_result().to_table()
print(len(results))

Option to display the table.

In [14]:

# results

Check if any visits had more than one source detected in the search radius.

In [15]:

values, counts = np.unique(results['visit'], return_counts=True)
tx = np.where(counts > 1)[0]
print('visits with >1 source detected in radius: ', len(tx))
for x in tx:
    print('visit: ', values[tx[0]], '  number of sources', counts[tx[0]])
del values, counts, tx

visits with >1 source detected in radius:  1
visit:  2024120100201   number of sources 2

As an example, plot the RA and Dec offsets of every source from the search coordinates, in arcseconds.

In [16]:

cos_dec = np.cos(np.deg2rad(28.103526))
results['ra_offset_arcsec'] = (results['ra'] - 53.128137) * 3600.0 * cos_dec
results['dec_offset_arcsec'] = (results['dec'] + 28.103526) * 3600.0

In [17]:

fig, ax = plt.subplots(figsize=(4, 4))
circle = patches.Circle((0, 0), radius=0.00056*3600.0,
                        facecolor='None', edgecolor='blue')
ax.add_patch(circle)
ax.plot(results['ra_offset_arcsec'],
        results['dec_offset_arcsec'],
        'o', ms=2, mew=0, alpha=0.3, color='black')
ax.set_aspect(1)
plt.xlabel('RA offset [arcsec]')
plt.ylabel('Dec offset [arcsec]')
plt.tight_layout()
plt.show()

No description has been provided for this image

Figure 1: The offset in RA and Dec, in arcseconds, of every source from the search coordinates. A 2" radius circule marks the search region.

Clean up.

In [18]:

job.delete()
del query, results

3.3.2. Joinable tables¶

The Source table can be joined to the Visit table on the column containing the unique visit identifier, visit.

The Visit table contains information about the observation, such as the MJD and band.

To the query used in Section 3.2.1, add a table join to the Visit table and retrieve columns expMidptMJD and band.

In [19]:

query = "SELECT s.sourceId, s.visit, s.ra, s.dec, v.expMidptMJD, v.band " \
        "FROM dp1.Source AS s " \
        "JOIN dp1.Visit AS v ON s.visit = v.visit " \
        "WHERE CONTAINS(POINT('ICRS', s.ra, s.dec), " \
        "CIRCLE('ICRS', 53.128137, -28.103526, 0.00056)) = 1 " \
        "AND v.expMidptMJD > 60623.0 AND v.expMidptMJD < 60624.0 " \
        "ORDER BY s.sourceId ASC "
job = service.submit_job(query)
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
if job.phase == 'ERROR':
    job.raise_if_error()

Job phase is COMPLETED

Fetch the results as an astropy table.

In [20]:

assert job.phase == 'COMPLETED'
results = job.fetch_result().to_table()
print(len(results))

Similar to Figure 1, plot the RA and Dec offsets of every source from the search coordinates, in arcseconds.

In [21]:

cos_dec = np.cos(np.deg2rad(28.103526))
results['ra_offset_arcsec'] = (results['ra'] - 53.128137) * 3600.0 * cos_dec
results['dec_offset_arcsec'] = (results['dec'] + 28.103526) * 3600.0

In [22]:

fig, ax = plt.subplots(figsize=(4, 4))
for filt in filter_names:
    fx = np.where(results['band'] == filt)[0]
    if len(fx) > 0:
        ax.plot(results['ra_offset_arcsec'][fx],
                results['dec_offset_arcsec'][fx],
                filter_symbols[filt], ms=5, mew=0, alpha=0.7,
                color=filter_colors[filt], label=filt)
ax.set_aspect(1)
plt.xlabel('RA offset [arcsec]')
plt.ylabel('Dec offset [arcsec]')
plt.legend(loc='upper left')
plt.tight_layout()
plt.show()

Figure 2: The offset in RA and Dec, in arcseconds, of every source from the search coordinates, with colors and symbols to represent the three filters (gri) obtained on the night of MJD = 60623.

Clean up.

In [23]:

job.delete()
del query, results

3.4. Butler¶

TAP is the recommended way to access the source table, but the Butler is a convenient way to retrieve all the sources in a given visit.

Show that the dimensions for the source table are the filter, instrument, observation date, and visit, and that only instrument and visit are required.

In [24]:

butler.get_dataset_type('source')

Out[24]:

DatasetType('source', {band, instrument, day_obs, physical_filter, visit}, ArrowAstropy)

In [25]:

butler.get_dataset_type('source').dimensions.required

Out[25]:

{instrument, visit}

3.4.1. Demo query¶

Include spatial constraints: The Butler source table contents are stored and retrieved by individual visit.

Retrieve all dataset_refs for source tables for visits that overlap the coordinates near the center of the ECDFS field, and which were obtained on MJD = 60623.

In [26]:

ra = 53.13
dec = -28.10
time1 = Time(60623.0, format="mjd", scale="tai")
time2 = Time(60624.0, format="mjd", scale="tai")
timespan = Timespan(time1, time2)

refs = butler.query_datasets("source",
                             where="visit.timespan OVERLAPS :timespan AND \
                             visit_detector_region.region OVERLAPS POINT(:ra, :dec)",
                             bind={"timespan": timespan,
                                   "ra": ra, "dec": dec})
print(len(refs))

When using the Butler, source table data is returned by visit (by dataset reference).

Print the dataId of the first element of the refs.

In [27]:

print(refs[0].dataId)

{instrument: 'LSSTComCam', visit: 2024110800245, band: 'i', day_obs: 20241108, physical_filter: 'i_06'}

Define the columns to retrieve.

In [28]:

col_list = ['sourceId', 'ra', 'dec', 'psfFlux', 'psfFluxErr']

Get the data from the Butler.

In [29]:

results = butler.get(refs[0],
                     parameters={'columns': col_list})
print(len(results))

Option to display the results.

In [30]:

# results

Constrain the results to sources with magnitudes between 20 and 25 mag.

In [31]:

tx = np.where((results['psfFlux'] > 360.0)
              & (results['psfFlux'] <= 36000.0))[0]
print(len(tx))

Plot the sky coordinates and cModel fluxes for the subset of objects.

In [32]:

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(7, 3))
ax1.plot(results['ra'][tx], results['dec'][tx],
         'o', ms=2, mew=0, alpha=0.1, color='grey')
ax1.set_xlabel('Right Ascension')
ax1.set_ylabel('Declination')
ax2.plot(results['psfFlux'][tx], results['psfFluxErr'][tx],
         'o', ms=2, mew=0, alpha=0.1, color='grey')
ax2.set_xlabel('PSF Flux')
ax2.set_ylabel('Error')
plt.tight_layout()
plt.show()

Figure 3: At left, the RA vs. Dec of retrieved sources as grey points. At right, the PSF flux vs. its uncertainty.

In [33]:

del ra, dec, time1, time2, timespan
del refs, col_list, results, tx

In [ ]: