servicemon

servicemon is a command line tool for measuring the timing of Virtual Observatory (VO) queries. The features are also available via the Servicemon API.

Code and issue tracker are on GitHub.

Installation

The latest version of servicemon requires Python 3.8 or higher and can be pip installed. If an environment already has an older version of servicemon installed add --upgrade to make sure the latest version is installed.

Be sure to use the --pre option since the latest version is a pre-release.

$ pip install --pre servicemon

Overview

The basic intent of servicemon is to execute multiple queries on one or more VO services and record the timing of each query. The specific query parameters (such as ra, dec and radius) are varied with each query. Those query parameters can be listed in an input file, or randomly generated according to run-time arguments.

The following commands are available for performing timed queries and generating random input parameters:

sm_query

Executes and times queries over a specified list of template services whose parameters are filled by a specified list of input parameters.

sm_run_all

Runs multiple sm_query commands in parallel based on the Service Files found in specified directory structure.

sm_replay

Replays the queries in a given csv result file from sm_query, recording the timings into a new result file.

sm_conegen

Creates an input parameter file containing the specified number of the random cone search parameters ra, dec and radius.

The services to query are specified in Service Files, typically formatted as a Python list of dictionaries that specify a template query to be filled with the input parameters along with the query endpoint and other metadata.

The input parameters are specified as a list of dictionaries, each of which has a value for ra, dec and radius. (A future generalization will support parameter names other than these cone search parameters.)

Basic Examples

Timing a Simple Cone Search (SCS) service

In this example, we will execute and time 3 different queries to a Simple Cone Search (SCS) service hosted at the Cool Archive.

1. Create a cone search parameter input file for 3 random cones with radii from 0.1 to 0.2 degrees. (If desired, this step can be skipped by supplying the --num_cones, --min_radius and --max_radius to the sm_query command in step 3. In that case the random cones will be generated internally and not stored in a separate file.)

$ sm_conegen three_cones.py --num_cones 3 --min_radius 0.1 --max_radius 0.2
$ cat three_cones.py

[
    {'dec': 27.663409836268887, 'ra': 101.18375952324169, 'radius': 0.11092016620136524},
    {'dec': 4.840997553935431, 'ra': 358.97280995896705, 'radius': 0.19181724441608894},
    {'dec': 3.284106996695529, 'ra': 312.34607454539434, 'radius': 0.13515755153293374}
]

2. Create a service file that defines the Simple Cone Search service to query. For SCS services, use service_type of cone. This causes the ra, dec and radius values from the input file to be appended to the access_url at query time according to the SCS standard.

cool_archive_cone_service.py
[
  {'base_name': 'cool_archive_cone',
   'service_type': 'cone',
   'access_url': 'http://vo.cool_archive_cone.org/vo/conesearch',
   }
]
  1. Run sm_query. By default, all The Output will appear in the results directory.

$ sm_query cool_archive_cone_service.py --cone_file three_cones.py

Timing a Table Access Protocol (TAP) service

This example queries a Table Access Protocol (TAP) service hosted by the Great Archive 3 times with the same cones defined in the previous example.

1. Create a service file that describes the TAP service to be queried. Note that the service_type is tap. The adql value is a template containing 3 {}`s into which the input ``ra`, dec and radius values will be substituted.

great_archive_tap_service.py
[
  {'base_name': 'great_archive_tap',
  'service_type': 'tap',
  'access_url': 'http://services.great_archive.net/tap_service',
  'adql':'''
  SELECT *
  FROM ivoa.obscore
  WHERE
    1=CONTAINS(POINT('ICRS', s_ra, s_dec),
              CIRCLE('ICRS', {}, {}, {} ))
  '''
  }
]
  1. Run sm_query. By default, all The Output will appear in the results directory.

$ sm_query great_archive_tap_service.py --cone_file three_cones.py

Query multiple services in parallel

It can be efficient to query multiple service providers in parallel, however, we may not want to execute multiple parallel queries on the same service provider. sm_run_all provides an automated way to handle that situation.

For each subdirectory of the specified input directory, sm_run_all invokes sm_query once at a time for each service definition file found in that subdirectory. The subdirectories themselves (each perhaps representing a single service provider) are handled in parallel.

This example uses sm_run_all to query the two services above in parallel.

1. Using the files from the previous examples, create an input directory structure to give to sm_run_all.

$ mkdir -p input/cool_archive input/great_archive   # Create a subdirectory for each archive
$ mv cool_archive_cone_service.py input/cool_archive/
$ mv great_archive_tap_service.py input/great_archive/
$ mv three_cones.py input
$ ls -RF input

cool_archive/ great_archive/  three_cones.py

input/cool_archive:
cool_archive_cone_service.py

input/great_archive:
great_archive_tap_service.py

2. For all the cones in input/three_cones.py, run all the services in input/cool_archive in parallel with those in input/great_archive. In addition to specifying the input directory created above, sm_run_all requires that the result directory is explicitly specified.

$ sm_run_all input --cone_file input/three_cones.py --result_dir results

Command Options

sm_query

Documentation for this command:

usage: sm_query [-h] [-r result_dir] [-l plugin_dir_or_file] [-w writer] [-s]
                [-t {async,sync}] [-u USER_AGENT] [-n] [-v]
                [--num_cones num_cones | --cone_file cone_file]
                [--min_radius MIN_RADIUS] [--max_radius MAX_RADIUS]
                [--start_index start_index] [--cone_limit cone_limit]
                services

Measure query performance.

positional arguments:
  services              File containing list of services to query

optional arguments:
  -h, --help            show this help message and exit
  -r result_dir, --result_dir result_dir
                        The directory in which to put query result files.
                        Unless --save_results is specified, each query result
                        file will be deleted after statistics are gathered for
                        the query.
  -l plugin_dir_or_file, --load_plugins plugin_dir_or_file
                        Directory or file from which to load user plug-ins. If
                        not specified, and there is a "plugins" subdirectory,
                        plugin files will be loaded from there.
  -w writer, --writer writer
                        Name and kwargs of a writer plug-in to use.Format is
                        writer_name[:arg1=val1[,arg2=val2...]] May appear
                        multiple times to specify multiple writers. May
                        contain Python datetime format elements which will be
                        substituted with appropriate elements of the current
                        time (e.g., results-'%m-%d-%H:%M:%S'.py)
  -s, --save_results    Save the query result data files. Without this
                        argument, the query result file will be deleted after
                        metadata is gathered for the query.
  -t {async,sync}, --tap_mode {async,sync}
                        How to run TAP queries (default=async)
  -u USER_AGENT, --user_agent USER_AGENT
                        Override the User-Agent used for queries
                        (default=None)
  -n, --norun           Display summary of command arguments without
                        performing any actions
  -v, --verbose         Print additional information to stderr
  --num_cones num_cones
                        Number of cones to generate
  --cone_file cone_file
                        Path of the file containing the individual query
                        inputs.

  --min_radius MIN_RADIUS
                        Minimum radius (deg). Default=0
  --max_radius MAX_RADIUS
                        Maximum radius (deg). Default=0.25

  --start_index start_index
                        Start with this cone in cone file Default=0
  --cone_limit cone_limit
                        Maximum number of cones to query Default=100000000

sm_run_all

usage: sm_run_all input_dir
                  [-h]
                  [-r result_dir]
                  [-o script_output_dir]
                  [-l plugin_dir_or_file]
                  [-w writer] [-s]
                  [-t {async,sync}] [-n] [-v]
                  [-u USER_AGENT]
                  [--num_cones num_cones | --cone_file cone_file]
                  [--min_radius MIN_RADIUS] [--max_radius MAX_RADIUS]
                  [--start_index start_index] [--cone_limit cone_limit]


Measure performance on all the specified services in the input_dir directory tree.

positional arguments:
  input_dir             Directory tree containing the input specifications.

optional arguments:
  -h, --help            show this help message and exit
  -r result_dir, --result_dir result_dir
                        The directory in which to put query result files.
                        Unless --save_results is specified, each query result file
                        will be deleted after statistics are gathered for the query.
  -o script_output_dir, --script_output_dir script_output_dir
                        The directory in which script stdout and stderr files are written.
                        If not specified, stdout and stderr will not be redirected.
  -l plugin_dir_or_file, --load_plugins plugin_dir_or_file
                        Directory or file from which to load user plug-ins.
  -w writer, --writer writer
                        Name and kwargs of a writer plug-in to use.Format is
                        writer_name[:arg1=val1[,arg2=val2...]] May appear
                        multiple times to specify multiple writers. May
                        contain Python datetime format elements which will be
                        substituted with appropriate elements of the current
                        time (e.g., results-'%m-%d-%H:%M:%S'.py)
  -s, --save_results    Save the query result data files. Without this
                        argument, the query result file will be deleted after
                        metadata is gathered for the query.
  -t {async,sync}, --tap_mode {async,sync}
                        How to run TAP queries (default=async)
  -u USER_AGENT, --user_agent USER_AGENT
                        Override the User-Agent used for queries
                        (default=None)
  -n, --norun           Display summary of command arguments without
                        performing any actions
  -v, --verbose         Print additional information to stdout
  --num_cones num_cones
                        Number of cones to generate
  --cone_file cone_file
                        Path of the file containing the individual query
                        inputs.

  --min_radius MIN_RADIUS
                        Minimum radius (deg). Default=0
  --max_radius MAX_RADIUS
                        Maximum radius (deg). Default=0.25

  --start_index start_index
                        Start with this cone in cone file Default=0
  --cone_limit cone_limit
                        Maximum number of cones to query Default=100000000

sm_replay

usage: sm_replay [-h] [-r result_dir] [-l plugin_dir_or_file] [-w writer] [-s]
                 [-t {sync,async}] [-u USER_AGENT] [-n] [-v]
                 [--start_index start_index] [--cone_limit cone_limit]
                 file_to_replay

Measure query replay performance.

positional arguments:
  file_to_replay        File containing the results of a previous set of query
                        timings.

optional arguments:
  -h, --help            show this help message and exit
  -r result_dir, --result_dir result_dir
                        The directory in which to put query result files.
  -l plugin_dir_or_file, --load_plugins plugin_dir_or_file
                        Directory or file from which to load user plug-ins. If
                        not specified, and there is a "plugins" subdirectory,
                        plugin files will be loaded from there.
  -w writer, --writer writer
                        Name and kwargs of a writer plug-in to use.Format is
                        writer_name[:arg1=val1[,arg2=val2...]] May appear
                        multiple times to specify multiple writers. May
                        contain Python datetime format elements which will be
                        substituted with appropriate elements of the current
                        time (e.g., results-'%m-%d-%H:%M:%S'.py)
  -s, --save_results    Save the query result data files. Without this
                        argument, the query result file will be deleted after
                        metadata is gathered for the query.
  -t {sync,async}, --tap_mode {sync,async}
                        How to run TAP queries (default=async)
  -u USER_AGENT, --user_agent USER_AGENT
                        Override the User-Agent used for queries
                        (default=None)
  -n, --norun           Display summary of command arguments without
                        performing any actions
  -v, --verbose         Print additional information to stderr
  --start_index start_index
                        Start with this cone in cone file Default=0
  --cone_limit cone_limit
                        Maximum number of cones to query Default=100000000

sm_conegen

usage: sm_conegen [-h] --num_cones num_cones [--min_radius min_radius]
                  [--max_radius max_radius]
                  outfile

Generate random cones.

positional arguments:
  outfile               Name of the output file to contain the cones.

optional arguments:
  -h, --help            show this help message and exit
  --num_cones num_cones
                        Number of cones to generate
  --min_radius min_radius
                        Minimum radius (deg). Default=0
  --max_radius max_radius
                        Maximum radius (deg). Default=0.25

Service Files

A service file contains a Python list of dictionaries. Each dictionary defines a service endpoint, and must contain the keys defined below. All services are assumed to return results as VOTables.

  • base_name - This name of the service will be used in constructing the unique ids for each result row as well as the file names for the VOTable result files stored in the results subdirectory.

  • service_type - One of cone, xcone or tap

    • cone The query will be constructed as a VO standard Simple Cone Search with the RA, DEC and SR parameters being automatically set based per cone.

    [
      {'base_name': 'cool_archive_cone',
       'service_type': 'cone',
       'access_url': 'http://vo.cool_archive_cone.org/vo/conesearch',
       }
    ]
    
    • xcone A non-standard cone search. The access_url is assumed to contain three {}s (open/close braces). The RA, Dec and Radius for each cone will be substituted for those 3 braces in order.

    service_type ‘xcone’ can be used for an SIA service
    [
        {'base_name': 'SampleSIAService',
        'service_type': 'xcone',
        'adql': '',
        'access_url': 'https://sia.some_archive.org/'
        'mean.votable?flatten_response=false&sort_by=distance'
        '&ra={}&dec={}&radius={}'
        }
    ]
    
    • tap A Table Access Protocol (TAP) service. The adql value is a template template for the TAP query to be performed.

    Sample TAP service.
    [
      {'base_name': 'great_archive_tap',
      'service_type': 'tap',
      'access_url': 'http://services.great_archive.net/tap_service',
      'adql':'''
      SELECT *
      FROM ivoa.obscore
      WHERE
        1=CONTAINS(POINT('ICRS', s_ra, s_dec),
                  CIRCLE('ICRS', {}, {}, {} ))
      '''
      }
    ]
    
  • access_url - The access URL for the service.

  • adql - For the tap service_type, this is the ADQL query. For other types, this key must exist, but the value will be ignored. The ADQL query is assumed to contain three {}s (open/close braces). The ra, dec and radius for each cone will be substituted for those 3 braces in order.

Multiple services are allowed, but it is recommended that all service_type values are the same.
[
  {'base_name': 'one_service',
   'service_type': 'cone',
   'access_url': 'http://vo.cool_archive_cone.org/vo/conesearch',
   },
  {'base_name': 'another_service',
   'service_type': 'cone',
   'access_url': 'http://www.another_archive.org/big_catalog/conesearch',
   }
]

The Output

sm_query and sm_run_all write multiple output files, all to the result directory specified using the --result_dir command argument. sm_run_all requires that the result directory is explicitly specified while sm_query will use a default of results.

CSV files

By default, the output statistics for the queries are written to CSV files, one CSV file per service file. The CSV file names are <service_file_base_name>_<date_string>.csv.

These files are written by the default output writer plugin, csv_writer. See Plugins for information on how to specify alternative or additional writers, or how to customize the CSV file name.

VOTable subdirectories and files

If --save_results is specified on the command line, the VOTables returned from each query will be stored in subdirectories of the result directory. Those subdirectories are named for the base_name specified in the query’s service file. The names of the VOTables are built from attributes of the service and the input: <base_name>_<service_type>_<ra>_<dec>_<radius>.xml

Even when --save_results is not specified, the VOTables are written to those files temporarily. Empty VOTables subdirectories are an artifact of that process when --save_results is not specified.

Log files from sm_run_all

For each subdirectory handled, sm_run_all creates a file that logs the commands run in that directory. The files are called <input_subdir_name>-<date_string>_comlog.txt.

In addition, for each service queried (i.e., each sm_query run), a file is created to collect any stdout or stderr generated by sm_query. Those files are named <base_name>_<service_type>-<date_string>_runlog.xml.

Plugins

Using a plugin

A plugin mechanism is provided to allow customization of the output from sm_query (and sm_run_all). By default, a plugin called csv_writer writes the query statistcs to the CSV files described above.

Alternative or additional plugins can be loaded via the --writer argument. To use a new plugin called my_writer:

$ sm_query cool_archive_cone_service.py --cone_file three_cones.py \
  --writer my_writer

When multiple writers are specified, each writer will be invoked for each result, and they will be executed in the order they were specified on the command line. To use a new plugin called my_writer along with the builtin csv_writer:

$ sm_query cool_archive_cone_service.py --cone_file three_cones.py \
  --writer my_writer \
  --writer csv_writer

Writing a plugin

A plugin is a Python class in a file that gets loaded at run time. The class must be a subclass of AbstractResultWriter and must implement the abstract methods defined there.

abstract AbstractResultWriter.begin(args, **kwargs)[source]
argsargparse.Namespace

the result of an argparse.ArgumentParser’s parse_args().

kwargsdict

keyword args from the plug-in specification.

abstract AbstractResultWriter.one_result(stats)[source]
statsobj

an object with the following methods:

columns()list of str

list of output column names

row_values()dict

dict of output values, one key per column name

abstract AbstractResultWriter.end()[source]

Loading a plugin at run time

To load your plugin at run time, the plugin should be in a .py file. Then it should either be placed in the plugins subdirectory of your working directory, or its location should be specified on the command line (for sm_query, sm_replay) with the --load_plugins argument. The value for --load_plugins can be either a directory (from which all .py files will be loaded, or an individual .py file.)

Example: csv_writer

The builtin plugin csv_writer is shown below. Note that its begin method accepts the keyword argument outfile which can be used to override the default output file name. To specify outfile on the command line, include it with the --writer value:

$ sm_query cool_archive_cone_service.py --cone_file three_cones.py \
  --writer csv_writer:outfile=my_override_filename.csv
import csv
import os
import sys
from pathlib import Path
from datetime import datetime

from servicemon.plugin_support import AbstractResultWriter


class CsvResultWriter(AbstractResultWriter, plugin_name='csv_writer',
                      description='Writes results to a csv file.'):

    def begin(self, args, outfile=None):
        self._first_stat = True
        self._outfile_path = self._compute_outfile_path(args, outfile=outfile)

        # Create the output dir if needed.
        if self._outfile_path is not None:
            os.makedirs(self._outfile_path.parent, exist_ok=True)

    def end(self):
        pass

    def one_result(self, stats):

        if self._outfile_path is not None:
            with open(self._outfile_path, 'a+') as file:
                self._output_stats_row_to_file(stats, file)
        else:
            self._output_stats_row_to_file(stats, sys.stdout)

    def _output_stats_row_to_file(self, stats, file):
        writer = csv.DictWriter(file, dialect='excel',
                                fieldnames=stats.columns())
        if self._first_stat:
            self._first_stat = False
            writer.writeheader()
        row_values = stats.row_values()
        writer.writerow(row_values)

    def _compute_outfile_path(self, args, outfile=None):
        """
        If outfile is not None, use it.  Otherwise compute the output file
        name from name of the services file supplied in the args provided.
        """
        result_path = None
        if outfile is not None:
            # Leaving result_path None will cause output to go to stdout.
            if outfile != 'stdout':
                result_path = Path(outfile)
        else:
            now = datetime.now()
            dtstr = now.strftime('%Y-%m-%d %H:%M:%S.%f')

            base_services_name = Path(args.services).stem
            result_name = f'{base_services_name}_{dtstr}.csv'

            result_path = Path(args.result_dir) / Path(result_name)

        return result_path

Developer documentation

The Servicemon Analysis API is for use in finding and analyzing data already collected.