308.3. Interactive catalog visualization#

308_3_Interactive_catalog_visualization

308.3. Solar System interactive catalog visualization¶

For the Rubin Science Platform at data.lsst.cloud.
Data Release: dp1.lsst.io
Container Size: large
LSST Science Pipelines version: r29.2.0
Last verified to run: 2026-02-13
Repository: github.com/lsst/tutorial-notebooks
DOI: 10.11578/rubin/dc.20250909.20

Learning objective: Data Preview 1 (DP1) Solar System MPCORB catalog interactive data visualizations with three open-source python libraries.

LSST data products: MPCORB, SSObject

Packages: lsst.rsp

Credit: Originally developed by the Rubin Community Science team. Please consider acknowledging them if this notebook is used for the preparation of journal articles, software releases, or other notebooks.

Get Support: Everyone is encouraged to ask questions or raise issues in the Support Category of the Rubin Community Forum. Rubin staff will respond to all questions posted there.

1. Introduction¶

This notebook examines interactive data visualization for the millions of Solar System objects included in the DP1 MPCORB catalog. The DP1 MPCORB catalog includes a snapshot of all known solar system objects in the Minor Planet Center (MPC) database as of 24 March 2025, as well as the known object associations and discoveries submitted by Rubin for DP1. The DP1 release includes 430 unique Solar System objects detected by Rubin, 93 of the objects are discoveries. For the purposes of this notebook that examines interactive catalog visualization for millions of data points, this notebook uses the DP1 MPCORB catalog.

The Rubin Science Platform was designed to enable scientific analysis of the LSST data sets, which will be unprecedentedly large and complex. The software and techniques that are best suited for visualizing large data sets might be new to many astronomers. This notebook introduces learners with some knowledge of python to three open-source Python libraries that enable powerful interactive visualization of catalogs.

HoloViews: Produce high-quality interactive visualizations easily by annotating plots and images rather than using direct calls to a plotting library.
Bokeh: A powerful data visualization library that provides interactive tools including brushing and linking between multiple plots.
Datashader: Accurately render very large datasets quickly and flexibly.

These packages are part of the Holoviz ecosystem of tools intended for visualization in a web browser and can be used to create quite sophisticated dashboard-like interactive displays and widgets. The goal of this tutorial is to provide an introduction and starting point from which to create more advanced, custom interactive visualizations of the DP1 Solar System catalogs. Holoviz has a vibrant and active community where you can ask questions and discuss visualizations with a global community.

Notice: If the notebook or any interactive features seem to stall, first try going a few cells back and rerunning them in order (the order in which cells are run is imporant for this notebook's functionality). If that does not work, try restarting the kernel. If issues persist, try logging out and restarting the Notebook aspect using a "large" instance of the JupyterLab environment.

Warning: It is not recommended to "Restart Kernel and Run All Cells" in this notebook, or to execute multiple cells very quickly. Some of the examples require interaction (e.g., for the user to select points on a graph) in order to run correctly, and going too fast can cause some plots to not display properly.

Related tutorials: The 100-level tutorials demonstrate how to use the TAP service. The 200-level tutorials introduce the types of catalog data.

1.1. Import packages¶

Import numpy, a fundamental package for scientific computing with arrays in Python (numpy.org), and matplotlib, a comprehensive library for data visualization (matplotlib.org; matplotlib gallery).

From the lsst package, import modules for accessing the Table Access Protocol (TAP) service. Additional modules support standardized multiband plotting (lsst.utils.plotting) for LSST data analysis and visualization.

Import bokeh, holoviews, and datashader, packages for data visualization, and their various functions.

In [1]:

import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from lsst.rsp import get_tap_service

from math import pi

import bokeh
from bokeh.io import output_notebook, show, output_file, reset_output
from bokeh.models import ColumnDataSource, Range1d, HoverTool, CDSView
from bokeh.plotting import figure, gridplot
from bokeh.transform import factor_cmap, cumsum
from bokeh.palettes import Colorblind

import holoviews as hv
from holoviews import streams
from holoviews.operation.datashader import datashade, dynspread
from holoviews.plotting.util import process_cmap

import datashader as dsh

Show which versions of Bokeh, HoloViews, and Datashader we are working with. This is important when referring to online documentation as APIs can change between versions.

In [2]:

print("Bokeh version: " + bokeh.__version__)
print("HoloViews version: " + hv.__version__)
print("Datashader version: " + dsh.__version__)

Bokeh version: 3.8.2
HoloViews version: 1.22.1
Datashader version: 0.18.2

1.2. Define parameters and functions¶

Create an instance of the TAP service, and assert that it exists.

In [3]:

service = get_tap_service("tap")
assert service is not None

Define parameters to use colorblind-friendly colors with matplotlib.

In [4]:

plt.style.use('seaborn-v0_8-colorblind')
prop_cycle = plt.rcParams['axes.prop_cycle']
colors = prop_cycle.by_key()['color']

Update the maximum number of display rows for Pandas tables.

In [5]:

pd.set_option('display.max_rows', 5)

Notice: The ordering of the next two cells is important for ensuring the plots in this notebook render properly.

Set the display of the output Bokeh plots to be inline, in the notebook.

In [6]:

output_notebook()

Loading BokehJS ...

Set the HoloViews plotting library to be bokeh. The HoloViews and Bokeh icons are displayed when the library is loaded successfully.

In [7]:

hv.extension('bokeh')

ⓘ

Notice: Sometimes the bokeh.io.show function can be finicky when output modes are switched (e.g., from inline to an HTML file and back again).

To avert a "Models must be owned by only a single document" error (see, e.g., https://github.com/bokeh/bokeh/issues/8579), define the following two functions and use them in Section 4.

In [8]:

def show_bokeh_inline(p):
    try:
        reset_output()
        output_notebook()
        show(p)
    except Exception:
        output_notebook()
        show(p)

In [9]:

def show_bokeh_to_file(p, outputfile):
    try:
        reset_output()
        output_file(outputfile)
        show(p)
    except Exception:
        output_file(outputfile)
        show(p)

Define a function to convert a given perihelion distance ($q$) and eccentricity ($e$) to an orbital semimajor axis ($a$). Their relationship is defined by $q = a * (1 - e)$.

In [10]:

def calc_semimajor_axis(q, e):
    """
    Given a perihelion distance and orbital eccentricity,
    calculate the semi-major axis of the orbit.

    Parameters
    ----------
    q: ndarray
        Distance at perihelion, in au.
    e: ndarray
        Orbital eccentricity.

    Returns
    -------
    a: ndarray
        Semi-major axis of the orbit, in au.
        q = a(1-e), so a = q/(1-e)
    """

    return q / (1.0 - e)

Define a function to convert a given perihelion distance ($q$) and eccentricity ($e$) to an aphelion distance ($Q$). Their relationship is defined by $Q = q * (1 + e) / (1 - e)$.

In [11]:

def calc_aphelion(q, e):
    """
    Given a perihelion distance and orbital eccentricity,
    calculate the semi-major axis of the orbit.

    Parameters
    ----------
    q: ndarray
        Distance at perihelion, in au.
    e: ndarray
        Orbital eccentricity.

    Returns
    -------
    Q: ndarray
        Distance at aphelion, in au.
        Q = q*(1+e)/(1-e)
    """

    return q * (1.0 + e) / (1.0 - e)

Define the orbital parameter boundaries for querying the DP1 MPCORB catalog for all main-belt asteroids (MBAs).

In [12]:

a_mba_min = 1.8
a_mba_max = 3.7
q_mba_min = 1.3
e_mba_max = 1.0

2. Extract main-belt asteroids from `MPCORB` catalog data¶

Query the MPCORB table for the orbital parameters for the millions of DP1 Solar System objects in the MPCORB catalog for the orbital parameters above to obtain a sample of main-belt asteroids with semimajor axis 1.8 < $a$ < 3.7 au, eccentricity $e$ < 1.0, and perihelion $q$ > 1.3.

In [13]:

query = """
SELECT
    mpc.ssObjectId, mpc.mpcDesignation, mpc.epoch,
    mpc.q, mpc.e, mpc.incl, mpc.node, mpc.peri, mpc.mpcH
FROM
    dp1.MPCORB as mpc
WHERE mpc.q/(1.0-mpc.e) > {}
        AND mpc.q/(1.0-mpc.e) < {}
        AND mpc.e < {}
        AND mpc.q > {}
        ORDER by mpc.ssObjectId
""".format(a_mba_min, a_mba_max, e_mba_max, q_mba_min)

In [14]:

job = service.submit_job(query)
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
if job.phase == 'ERROR':
    job.raise_if_error()

Job phase is COMPLETED

Fetch the job results and assign to an astropy result table. There are 1,361,726 Solar System objects in DP1 within the main asteroid belt orbital parameters defined above.

In [15]:

assert job.phase == 'COMPLETED'
result = job.fetch_result()
print(len(result))

Convert the result astropy table to a pandas dataframe.

In [16]:

result_df = pd.DataFrame(result)

Calculate the semimajor axis $a$ for all objects in the result_df dataframe and add as a new column.

In [17]:

semi_axis = calc_semimajor_axis(result_df['q'], result_df['e'])
result_df['a'] = semi_axis

Calculate the aphelion distance $Q$ for all objects in result_df dataframe and add as a new column.

In [18]:

aphelion = calc_aphelion(result_df['q'], result_df['e'])
result_df['Q'] = aphelion

Define the conditions for assigning MBAs to their dynamical classifications (Note: the conditions used here are simplified to only include limits in semimajor axis and not eccentricity or inclination for the purposes of this tutorial).

In [19]:

MBApop_conditions = [
    (result_df['a'] >= 1.8) & (result_df['a'] < 2.0),
    (result_df['a'] >= 2.0) & (result_df['a'] < 2.5),
    (result_df['a'] >= 2.5) & (result_df['a'] < 2.82),
    (result_df['a'] >= 2.82) & (result_df['a'] < 3.25),
    (result_df['a'] >= 3.25) & (result_df['a'] <= 3.7)
]

Define the names of the MBA dynamical classes as they correspond to the MBApop_conditions above.

In [20]:

MBApop_types = ['Hungaria', 'Inner Belt',
                'Middle Belt', 'Outer Belt',
                'Cybele']

Determine the MBA population for each object in result_df and add as a new column.

In [21]:

result_df['MBApop'] = np.select(
    MBApop_conditions, MBApop_types,
    default="Unknown")

Option to print the result.

In [22]:

# result_df

In [23]:

del semi_axis, aphelion, MBApop_conditions, MBApop_types

Display counts of objects in each MBA population.

In [24]:

result_df['MBApop'].value_counts()

Out[24]:

MBApop
Middle Belt    496867
Outer Belt     447515
Inner Belt     373309
Hungaria        35492
Cybele           8543
Name: count, dtype: int64

To visualize the fraction of MBAs detected in each population, a pie chart can be created.

In [25]:

pop_counts = result_df['MBApop'].value_counts().to_dict()
data = pd.Series(pop_counts).reset_index(name='value').rename(
    columns={'index': 'subpop'})
data['angle'] = data['value']/data['value'].sum() * 2*pi
data['color'] = Colorblind[len(pop_counts)]

SSpop = figure(height=350, title="Populations",
               toolbar_location=None, tools="hover",
               tooltips="@subpop: @value", x_range=(-0.5, 1.0))

SSpop.wedge(x=0, y=1, radius=0.4,
            start_angle=cumsum('angle', include_zero=True),
            end_angle=cumsum('angle'),
            line_color="white", fill_color='color',
            legend_field='subpop', source=data)

SSpop.axis.axis_label = None
SSpop.axis.visible = False
SSpop.grid.grid_line_color = None

show_bokeh_inline(SSpop)

Loading BokehJS ...

Figure 1: Pie chart showing the fraction of objects in each MBA population.

3. Holoviews¶

Holoviews supports easy analysis and visualization by annotating data rather than utilizing direct calls to plotting packages. This tutorial uses Bokeh as the plotting library backend for HoloViews. HoloViews supports several plotting libraries, and exercise 1 in Section 5 is to explore using HoloViews with other plotting packages.

Create a random subsample of 20,000 MBAs from uniqueMBAs to use to demonstrate some basic HoloViews functionality. Print the length of this subset and confirm that it contains roughly 20K objects.

In [26]:

frac = 0.015
data20k_mbas = result_df.sample(frac=frac, axis='index')
print(len(data20k_mbas))
assert len(data20k_mbas) == round(frac * len(result_df))

3.1. Single plots¶

The basic core primitives of HoloViews are Elements (hv.Element). Elements are simple wrappers for data which provide a semantically meaningful visual representation. An Element may be a set of Points, an Image, a Curve, a Histogram, etc. See the HoloViews Reference Gallery for all the various types of Elements that can be created with HoloViews.

The example in this section uses the HoloViews Scatter Element to quickly visualize the catalog data as a scatter plot.

Instead of subsetting a dataset to choose which columns to plot, HoloViews allows the user to specify the dimensionality directly. kdims are the key dimensions or the independent variable(s) and vdims are the value dimensions or the dependenent variable(s). The dimensions have to be specified as strings as below, but they are in fact rich objects. Dimension objects support a long descriptive label, which complements the short programmer-friendly name.

HoloViews maintains a strict separation between content and presentation. This separation is achieved by maintaining sets of keyword values as options that specify how Elements are to appear.

This example plots the semimajor axes and eccentricities with chosen x-axis limits, fontscale, plot height and width, and removes the toolbar.

Make a simple scatter plot of the 431 detected Solar System Objects using the Scatter element.

In [27]:

aeplot = hv.Scatter(data20k_mbas, kdims=['a'],
                    vdims=['e']).options(xlim=(0., 6.), toolbar=None,
                                         fontscale=1.2, height=350, width=350)

In [28]:

aeplot

Out[28]:

Figure 2: A non-interactive plot of semimajor axis $a$ vs. eccentricity $e$ appears as a blue circle composed of individual, but mostly blended, blue dots.

The data20k_mbas set contains several columns. If no columns are specified explicitly, the first 2 columns are taken for x and y respectively by the Scatter Element.

Now bin the data in $H$ magnitude using the robust Freedman Diaconis Estimator, plot the resulting distribution using the HoloViews Histogram Element, and add in some basic plot options. Read more about about customizing plots via options. Note that options can be shortened to opts.

In [29]:

(H_bin, count) = np.histogram(data20k_mbas['mpcH'], bins='fd')
H_distribution = hv.Histogram((H_bin, count)).opts(
    xlim=(10., 24.),
    title="H Magnitude Distribution", color='darkmagenta',
    xlabel='H mag', fontscale=1.2,
    height=400, width=400)

In [30]:

H_distribution

Out[30]:

Figure 3: A histogram (bar chart in purple) of the number of objects in a given $H$ magnitude bin in the subset of twenty thousand objects. This plot is interactive and displays a tool bar at right for the user to, e.g., zoom in on the plot.

In [31]:

del H_bin, count

3.2. Layouts of unlinked plots¶

Create a layout of several plots. A Layout is a type of Container that can contain any HoloViews object. Other types of Containers that exist include Overlay, Gridspace, Dynamicmap, etc. See the HoloViews Reference Gallery for the full list of Layouts that can be created with HoloViews. See Building Composite Objects for the full details about the ways Containers can be composed.

Slice the data and set some more options, and then construct a layout using the + operator.

In [32]:

aeplot = hv.Scatter(data20k_mbas, kdims=['a'],
                    vdims=['e']).opts(
    title="Semimajor Axis vs Eccentricity",
    toolbar='above', tools=['hover'],
    height=350, width=350, alpha=0.2,
    size=2)

OrbHPlots = aeplot + H_distribution.options(height=350, width=350)

In [33]:

OrbHPlots

Out[33]:

Figure 4: Two side-by-side plots, Fig 2 from above on the left with the hover tool, and Fig 3 from above on the right, with the interactive toolbar at upper right.

Note that these two plots above are not linked, they are two independent plots laid out next to each other.

Zoom in on the Semimajor Axis vs Eccentricity plot and notice that the data are not rebinned in the semimajor axis distribution plot. Linking plots is demonstrated below.

The tools, however, do apply to both plots. Try modifying both plots and then use the "reset" tool (the circular arrow symbol). Notice that both plots are reset to their original states.

In [34]:

del aeplot, H_distribution, OrbHPlots

3.3. Layouts of linked plots¶

Set up some default plot options to avoid duplicating long lists for every new plot.

Different plotting packages typically provide different customization capabilities. Below, one set of options is defined for a Bokeh backend, and one for a matplotlib backend.

Set Bokeh customizations as a python dictionary.

In [35]:

plot_style_bkh = dict(alpha=0.5, color='darkmagenta',
                      marker='triangle', size=3,
                      xticks=5, yticks=5,
                      height=400, width=400,
                      toolbar='above')

Set matplotlib customizations.

In [36]:

plot_style_mpl = dict(alpha=0.2, color='c', marker='s',
                      fig_size=200, s=2,
                      fontsize=14, xticks=8, yticks=8)

Choose to use the Bokeh plot style.

In [37]:

plot_style = plot_style_bkh

Below, create a semimajor axis versus eccentricity plot of the Solar System objects in the dataset, and also display the distribution of samples along both value dimensions using the hist() method of the Scatter Element.

Set the axes as rich objects.

In [38]:

semi = hv.Dimension('a', label='a', range=(1.7, 3.8))
ecc = hv.Dimension('e', label='e', range=(0.0, 0.65))

Create the scatter plot.

In [39]:

a_e = hv.Scatter(result_df, kdims=semi,
                 vdims=ecc).opts(**plot_style)

Use the hist method to show the distribution of samples along both value dimensions.

In [40]:

a_e = a_e.hist(dimension=[semi, ecc],
               num_bins=10, adjoin=True)

In [41]:

a_e

Out[41]:

Figure 5: The semimajor axis versus eccentricity (purple triangles), with histograms (blue bar charts) of the distributions above and to the right. The interactive tool bar is at upper right.

In [42]:

del semi, ecc, a_e

4. Bokeh¶

A very useful feature of Bokeh is the ability to add connected interactivity between plots that show different attributes of the same data. This is called linking.

With linked plots it is possible to carry out data brushing, whereby data can be selected and manipulated synchronously across multiple linked plots.

For example, if an orbital element plot is linked with a colour-magnitude diagram of the same dataset, it becomes possible to interactively explore the relationship between the positions of objects in each plot.

This section uses the Bokeh plotting library to demonstrate how to set up brushing and linking between two panels showing different representations of the same dataset. A selection applied to either panel will highlight the selected points in the other panel.

This section is based on Bokeh linked brushing.

4.1. Data preparation¶

Getting the data preparation phase right is key to creating powerful visualizations. Bokeh works with a ColumnDataSource (CDS). A CDS is essentially a collection of sequences of data that have their own unique column name.

The CDS is the core of bokeh plots. Bokeh automatically creates a CDS from data passed as python lists or numpy arrays. CDS are useful as they allow data to be shared between multiple plots and renderers, enabling brushing and linking.

Below, a CDS is created from the data returned by the query above and passed directly to Bokeh.

Create a CDS for the plots to share. The data defined as x0, y0, x1, y1 will be used to plot the left and right plots, respectively.

In [43]:

col_data = dict(x0=data20k_mbas['a'],
                y0=data20k_mbas['e'],
                x1=data20k_mbas['a'],
                y1=data20k_mbas['incl'],
                a=data20k_mbas['a'],
                ecc=data20k_mbas['e'],
                incl=data20k_mbas['incl'],
                Hmag=data20k_mbas['mpcH'],
                MBApop=data20k_mbas['MBApop'])
source_mbas = ColumnDataSource(data=col_data)

Additional data can be added to the CDS after creation.

In [44]:

source_mbas.data['mpcDesignation'] = data20k_mbas['mpcDesignation']
source_mbas.data['q'] = data20k_mbas['q']

The pointsize used for plotting can be defined based on a given parameter. Define the conditions for assigning a point size for each Solar System object to be plotted based on its $H$ magnitude.

In [45]:

pointconditions_from_h = [
    (data20k_mbas['mpcH'] <= 5.),
    (data20k_mbas['mpcH'] <= 10.) & (data20k_mbas['mpcH'] > 5.),
    (data20k_mbas['mpcH'] <= 15.) & (data20k_mbas['mpcH'] > 10.),
    (data20k_mbas['mpcH'] <= 20.) & (data20k_mbas['mpcH'] > 15.),
    (data20k_mbas['mpcH'] <= 25.) & (data20k_mbas['mpcH'] > 20.),
    (data20k_mbas['mpcH'] > 25.)
]

Define the point sizes for the given $H$ magnitude ranges in pointconditions_from_h above.

In [46]:

pointsize_from_h = [12, 10, 8, 6, 4, 2]

Determine the point size bin for each object and add it to the source.

In [47]:

pointsize_bin = np.select(pointconditions_from_h, pointsize_from_h)
source_mbas.data['pointsize_bin'] = pointsize_bin

Create a "points" view.

In [48]:

points = CDSView()

In [49]:

del col_data, pointconditions_from_h, pointsize_from_h, pointsize_bin

4.2. Linked plots with data brushing¶

Use Bokeh to plot semimajor axis vs eccentricity and semimajor axis vs inclination, and then link them.

Create a custom hover tool for each panel.

In [50]:

hover_left = HoverTool(tooltips=[("mpcDesignation", "@mpcDesignation"),
                                 ("MBApop", "@MBApop"),
                                 ("(a,e,inc,H)", "(@a, @ecc, @incl, @Hmag)")
                                 ])
hover_right = HoverTool(tooltips=[("mpcDesignation", "@mpcDesignation"),
                                  ("MBApop", "@MBApop"),
                                  ("(a,e,inc,H)", "(@a, @ecc, @incl, @Hmag)")
                                  ])
tools = "box_zoom,box_select,lasso_select,reset,help"
tools_left = [hover_left, tools]
tools_right = [hover_right, tools]

Create a new two-panel plot and add a renderer. Use the "points" view defined above.

Create the left-side plot.

In [51]:

left = figure(tools=tools_right, width=400, height=400,
              title='Semimajor Axis vs Eccentricity')
left.scatter('x0', 'y0', hover_color='firebrick', selection_fill_color='black',
             size='pointsize_bin', alpha=0.8, source=source_mbas, view=points)

left.x_range = Range1d(1.7, 3.8)
left.y_range = Range1d(0., 0.64)
left.xaxis.axis_label = 'Semimajor Axis (au)'
left.yaxis.axis_label = 'Eccentricity'

Create the right-side plot.

In [52]:

right = figure(tools=tools_left, width=400, height=400,
               title='Semimajor Axis vs Inclination')
right.scatter('x1', 'y1', hover_color='firebrick', selection_fill_color='black',
              size='pointsize_bin', alpha=0.7, source=source_mbas, view=points)

right.x_range = Range1d(1.7, 3.8)
right.y_range = Range1d(0., 50.)
right.xaxis.axis_label = 'Semimajor Axis (au)'
right.yaxis.axis_label = 'Inclination (deg)'

Display the grid of plots. This can take a moment to render.

In [53]:

p = gridplot([[left, right]])
show_bokeh_inline(p)

Loading BokehJS ...

Figure 6: At left, Solar System object semimajor axis vs eccentricity, and at right, Solar System object semimajor axis vs inclination. All points are blue circles sized according to their $H$ magnitudes and the interactive toolbar is at upper right.

Use the hover tool to see information about individual data points (e.g., the "mpcDesignation"). This information should appear automatically when the mouse is hovered over the data points. Notice the data points highlighted in red on one panel with the hover tool are also highlighted on the other panel.

Next, click on the selection box icon or the selection lasso icon found in the upper right corner of the figure. Use the selection box and selection lasso to make various selections in either panel by clicking and dragging on either panel. The selected data points will be displayed in the other panel.

4.3. Output to interactive HTML file¶

Output this interactive plot to an interactive HTML file.

Define the output file name to store it in the home directory ~/..

In [54]:

output_dir = os.path.expanduser('~')
output_filename = 'DP1_advanced_plots_plot1.html'
outputfile = os.path.join(output_dir, output_filename)

print('The full pathname of the interactive HTML file will be '+outputfile)

The full pathname of the interactive HTML file will be /home/sarahgreenstreet/DP1_advanced_plots_plot1.html

Use the bokeh.io.show method, embedded in the show_bokeh_to_file function defined in Section 1.2 above, to output the interactive HTML file.

In [55]:

show_bokeh_to_file(p, outputfile)

To view the interactive HTML file navigate to it via the directory listing in the left-hand side panel of this Jupyter notebook graphical interface, and click on its name. This will open another tab containing the HTML file.

Since the file is relatively small (about 3.6 MB) it should load quickly (within a few seconds). Once loading is complete, click on the "Trust HTML" button at the top-left of the tab's window. Then, a near-duplicate of the two linked plots above should be displayed.

It is also possible to download the HTML file, interact with it in a browser as a a local file.

In [56]:

del p

4.4. Linked streams¶

To do subsequent calculations with the set of selected points, it is possible to use HoloViews linked streams for custom interactivity. The following visualization is a modification of this example.

As for the example above, the plots generated below use the selection box and selection lasso to choose data points on the left panel, and then the selected points appear in the right panel.

Notice that as the selection is changed in the left panel, the mean x- and y-values for selected data points are shown in the title of the right panel.

This section is based on HoloViews Selection1D points.

Declare some points, and define a selection for the stream.

In [57]:

points = hv.Points((data20k_mbas['a'],
                    data20k_mbas['e'])).options(
    tools=['box_select', 'lasso_select'])

selection = streams.Selection1D(source=points)

Define a function that uses the selection indices to slice points and compute statistics. This function is defined here (and not in Section 1.2) because it is only used in the subsequent (i.e., it is not a globally defined and used function).

In [58]:

def selected_info(index) -> str:
    selected = points.iloc[index]
    if index:
        label = 'Mean a, e: %.3f, %.3f' % tuple(
            selected.array().mean(axis=0))
    else:
        label = 'No selection'
    return selected.relabel(label).options(color='red')

Combine "points" and DynamicMap. Notice the syntax used here, how the "+" sign makes side-by-side panels.

In [59]:

points.opts(xlabel='a', ylabel='e') + hv.DynamicMap(
    selected_info, streams=[selection]).opts(xlabel='a',
                                             ylabel='e')

Out[59]:

Figure 7: At left, the Solar System object semimajor axis vs eccentricity. The plot at right has the same axes, but will be empty until the action below is taken to select points in the left plot. The interactive toolbar is at upper right.

Use the lasso- or box-select to select a region in the left-hand plot.

Print the number of objects selected.

In [60]:

print(len(selection.index))

In [61]:

del points

5.0 Datashader¶

The interactive features of Bokeh work well with datasets up to a few tens of thousands of data points. To efficiently explore larger datasets it is recommended to use another visualization model that offers better scalability: Datashader.

The examples below will show how, when zooming in on the datashaded two-dimensional histograms, the bin sizes are dynamically adjusted to show finer or coarser granularity in the distribution. This allows for the interactive exploration of large datasets without having to manually adjust the bin sizes while panning and zooming.

Zoom in all the way to see individual points (i.e., bins contain either zero or one count). Soom in far enough and find that the individual points are represented by extremely small pixels in datashader that are difficult to see. A solution is to dynspread instead of datashade, which will preserve a finite size of the plotted points.

5.1. Plotting thousands of data points¶

Plot semimajor axis versus eccentricity using Bokeh with a customized hover tool.

In [62]:

plot_options = {'height': 400, 'width': 800,
                'tools': ['pan', 'box_zoom', 'box_select',
                          'wheel_zoom', 'reset', 'help']}

p = figure(title="Semimajor Axis vs Eccentricity",
           x_axis_label="Semimajor Axis (au)", y_axis_label="Eccentricity",
           x_range=(1.7, 3.8), y_range=(0., 0.64),
           **plot_options)
p.scatter(x='a', y='ecc', source=source_mbas,
          size='pointsize_bin', alpha=0.2,
          hover_color='firebrick',
          legend_field="MBApop",
          color=factor_cmap('MBApop', 'Category10_5',
                            ['Hungaria', 'Inner Belt', 'Middle Belt',
                             'Outer Belt', 'Cybele']))

hover = HoverTool(tooltips=[("mpcDesignation", "@mpcDesignation"),
                            ("Class", "@MBApop"),
                            ("(a, ecc, inc, H)", "(@a, @ecc, @incl, @Hmag)")])
p.add_tools(hover)

In [63]:

show_bokeh_inline(p)

Loading BokehJS ...

Figure 8: The semimajor axis vs. eccentrcity for the Solar System objects, with Hungarias in blue, Inner Belt objects in orange, Middle Belt objects in green, Outer Belt objects in red, and Cybeles in purple. The interactive tool bar is at right.

In [64]:

del p

The plot above suffers from overplotting (confusion), even though the dataset only contains ~20K points. A classic strategy to visualize the dense areas is to specify the transparency (alpha) of the glyphs; in the plot above, alpha=0.2 was used. This has helped, but washes out the detail in the sparser regions.

HoloViews + Datashader allow millions to billions of points to be plotted, and this produces much more informative plots. Datashader rasterizes or aggregates datasets into regular grids that can then be further analysed or viewed as plots or images.

Create a Holoviews object "points" to hold and plot data.

In [65]:

points = hv.Points((source_mbas.to_df()['a'], source_mbas.to_df()['ecc']))

Create the linked streams instance.

In [66]:

boundsxy = (0, 0, 0, 0)
box = streams.BoundsXY(source=points, bounds=boundsxy)
bounds = hv.DynamicMap(lambda bounds: hv.Bounds(bounds), streams=[box])

Apply the datashader.

In [67]:

p = dynspread(datashade(points, cmap="Viridis"))
p = p.opts(width=800, height=300, padding=0.05, show_grid=True,
           xlim=(1.7, 3.8), ylim=(0., 0.64),
           xlabel="Semimajor Axis (au)",
           ylabel="Eccentricity",
           tools=['box_select'])

Render the datashaded plot.

In [68]:

p * bounds

Out[68]:

Figure 9: The MBA semimajor axis vs. eccentricity as in Fig 8, but displayed as a 2-dimensional density map (with a purple-green-yellow colormap) in regions of high density, and as individual points in regions of low density. Interactive toolbar at right.

This datashaded plot of the same semimajor axis vs eccentricity diagram as above does not require any magic-number parameters such as size and alpha and automatically ensures that there is no saturation or overplotting.

Above, select the wheel zoom and adjust the image to interact with the plot. Note how the shades of color of the data points change according to the local density.

In [69]:

del points, p

5.2. Plotting millions of data points¶

The datasest of ~20K points used above is actually too small to demonstrate the power of datashader.

Below, visualize the full >1 million main-belt object dataset returned by the query.

Create a Points Element for the data.

In [70]:

points = hv.Points((result_df['a'], result_df['e']))

Create the linked streams instance.

In [71]:

boundsxy = (0, 0, 0, 0)
box = streams.BoundsXY(source=points, bounds=boundsxy)
bounds = hv.DynamicMap(lambda bounds: hv.Bounds(bounds), streams=[box])

Apply the datashader.

In [72]:

p = dynspread(datashade(points, cmap="Viridis"))
p = p.opts(width=800, height=300, padding=0.05, show_grid=True,
           xlim=(1.75, 3.8), ylim=(0., 0.64),
           xlabel="Semimajor Axis (au)",
           ylabel="Eccentricity",
           tools=['box_select'])

Render the datashaded plot.

In [73]:

p * bounds

Out[73]:

Figure 10: Similar to Fig 9, but for all 1,361,726 MBAs in the MPCORB catalog.

5.3. Adding a callback function¶

Add callback functionality to the semimajor axis vs eccentricity diagam above to retrieve the indices of selected points.

Above, use the box selection tool to select data.

STOP - Select some data points from the plot above using the box select tool before proceeding.

Print the number of objects selected.

In [74]:

selection = (points.data.x > box.bounds[0]) \
    & (points.data.y > box.bounds[1]) \
    & (points.data.x < box.bounds[2]) \
    & (points.data.y < box.bounds[3])
print('The selection box contains %i data points' % (np.sum(selection)))

The selection box contains 372248 data points

In [75]:

del points, p, selection

5.4. Interactive selection¶

Below, create two side-by-side plots.

The left-hand plot will show the datashaded orbital distribution, and the right-hand plot will be a linked and brushed plot showing the inclination distribution for objects selected in the left-hand plot. It will be possible to use the box selection in the spatial distribution plot to change which data are included in the histogram.

First, create a holoviews dataset instance, and label some of the columns.

In [76]:

kdims = [('a', 'semimajor axis (au)'), ('e', 'eccentricity')]
vdims = [('incl', 'inclination (deg)')]
ds = hv.Dataset(result_df, kdims, vdims)

In [77]:

points = hv.Points(ds)
boundsxy = (np.min(ds.data['a']), np.min(ds.data['e']),
            np.max(ds.data['a']), np.max(ds.data['e']))
box = streams.BoundsXY(source=points, bounds=boundsxy)
box_plot = hv.DynamicMap(lambda bounds: hv.Bounds(bounds), streams=[box])

Create custom callback functionality to update the linked histogram.

These functions are defined here (and not in Section 1.2) because they are only used in the following cell.

In [78]:

def log_inf(x) -> float:
    return np.log(x) if x > 0 else 0


def update_histogram(bounds=bounds) -> hv.Histogram:
    selection = (ds.data['a'] > bounds[0]) & \
                (ds.data['e'] > bounds[1]) & \
                (ds.data['a'] < bounds[2]) & \
                (ds.data['e'] < bounds[3])
    selected_mag = ds.data.loc[selection]['incl']
    frequencies, edges = np.histogram(selected_mag)
    hist = hv.Histogram(
        (list(map(log_inf, frequencies)), edges)).opts(
        xlabel='inclination (deg)')
    return hist

In [79]:

dmap = hv.DynamicMap(
    update_histogram, streams=[box]).options(height=400, width=400)
datashade(points, cmap=process_cmap("Viridis", provider="bokeh")) * \
    box_plot.options(height=400, width=400, tools=['box_select']) + \
    dmap

Out[79]:

Figure 11: At left, semimajor axis vs eccentricity as a 2-dimensional density map with a purple-green-yellow colormap. At right, a histogram (blue bar chart) of the fraction of objects in bins of inclination.

Try changing the box selection across the inner, middle, and outer main belt, and watch as the histogram is recomputed and displayed.

In [80]:

del dmap

6. Exercises for the learner¶

HoloViews works with a wide range of plotting libraries; Bokeh, matplotlib, plotly, mpld3, pygal to name a few. As an exercise, try changing the HoloViews plotting library to be matplotlib instead of bokeh at the beginning of the notebook with hv.extension('matplotlib'). Notice the holoviews + matplotlib icons displayed when the library is loaded successfully. Recreate a few plots and compare the outputs. Try again with some other plotting library. Don't forget to set the plotting library back to Bokeh, which is used for this tutorial. Note that some warnings might be raised.

Try making the above interactive plots with a different small body population, such as the transneptunian objects (TNOs) or near-Earth objects (NEOs).

In [ ]:

308.3. Interactive catalog visualization#

308.3. Solar System interactive catalog visualization¶

1. Introduction¶

1.1. Import packages¶

1.2. Define parameters and functions¶

2. Extract main-belt asteroids from MPCORB catalog data¶

3. Holoviews¶

3.1. Single plots¶

3.2. Layouts of unlinked plots¶

3.3. Layouts of linked plots¶

4. Bokeh¶

4.1. Data preparation¶

4.2. Linked plots with data brushing¶

4.3. Output to interactive HTML file¶

4.4. Linked streams¶

5.0 Datashader¶

5.1. Plotting thousands of data points¶

5.2. Plotting millions of data points¶

5.3. Adding a callback function¶

5.4. Interactive selection¶

6. Exercises for the learner¶

2. Extract main-belt asteroids from `MPCORB` catalog data¶