servicemon¶
servicemon is a command line tool for measuring the timing of Virtual Observatory (VO) queries. The features are also available via the Servicemon API.
Code and issue tracker are on GitHub.
Installation¶
The latest version of servicemon requires Python 3.8 or higher and
can be pip installed. If an environment already has an older version of
servicemon installed add --upgrade
to make sure the latest version
is installed.
Be sure to use the --pre
option since the latest version is a pre-release.
$ pip install --pre servicemon
Overview¶
The basic intent of servicemon is to execute multiple queries on one or more VO services and record the timing of each query. The specific query parameters (such as ra, dec and radius) are varied with each query. Those query parameters can be listed in an input file, or randomly generated according to run-time arguments.
The following commands are available for performing timed queries and generating random input parameters:
- sm_query
Executes and times queries over a specified list of template services whose parameters are filled by a specified list of input parameters.
- sm_run_all
Runs multiple sm_query commands in parallel based on the Service Files found in specified directory structure.
- sm_replay
Replays the queries in a given
csv
result file from sm_query, recording the timings into a new result file.- sm_conegen
Creates an input parameter file containing the specified number of the random cone search parameters
ra
,dec
andradius
.
The services to query are specified in Service Files, typically formatted as a Python list of dictionaries that specify a template query to be filled with the input parameters along with the query endpoint and other metadata.
The input parameters are specified as a list of dictionaries, each of which has a value for
ra
, dec
and radius
. (A future generalization will support parameter names other than these
cone search parameters.)
Basic Examples¶
Timing a Simple Cone Search (SCS) service¶
In this example, we will execute and time 3 different queries to a
Simple Cone Search (SCS)
service
hosted at the Cool Archive.
1. Create a cone search parameter input file for 3 random cones with radii from 0.1 to 0.2 degrees.
(If desired, this step can be skipped by supplying the --num_cones
, --min_radius
and --max_radius
to
the sm_query command in step 3. In that case the random cones will be generated internally and not stored
in a separate file.)
$ sm_conegen three_cones.py --num_cones 3 --min_radius 0.1 --max_radius 0.2
$ cat three_cones.py
[
{'dec': 27.663409836268887, 'ra': 101.18375952324169, 'radius': 0.11092016620136524},
{'dec': 4.840997553935431, 'ra': 358.97280995896705, 'radius': 0.19181724441608894},
{'dec': 3.284106996695529, 'ra': 312.34607454539434, 'radius': 0.13515755153293374}
]
2. Create a service file that defines the Simple Cone Search service to query. For SCS services,
use service_type
of cone
. This causes the ra
, dec
and radius
values from the input file
to be appended to the access_url
at query time according to the SCS standard.
[
{'base_name': 'cool_archive_cone',
'service_type': 'cone',
'access_url': 'http://vo.cool_archive_cone.org/vo/conesearch',
}
]
Run sm_query. By default, all The Output will appear in the
results
directory.
$ sm_query cool_archive_cone_service.py --cone_file three_cones.py
Timing a Table Access Protocol (TAP) service¶
This example queries a
Table Access Protocol (TAP)
service hosted by the Great Archive 3 times with the same cones defined in the previous example.
1. Create a service file that describes the TAP service to be queried. Note that the
service_type
is tap
. The adql
value is a template containing 3 {}`s into
which the input ``ra`
, dec
and radius
values will be substituted.
[
{'base_name': 'great_archive_tap',
'service_type': 'tap',
'access_url': 'http://services.great_archive.net/tap_service',
'adql':'''
SELECT *
FROM ivoa.obscore
WHERE
1=CONTAINS(POINT('ICRS', s_ra, s_dec),
CIRCLE('ICRS', {}, {}, {} ))
'''
}
]
Run sm_query. By default, all The Output will appear in the
results
directory.
$ sm_query great_archive_tap_service.py --cone_file three_cones.py
Query multiple services in parallel¶
It can be efficient to query multiple service providers in parallel, however, we may not want to execute multiple parallel queries on the same service provider. sm_run_all provides an automated way to handle that situation.
For each subdirectory of the specified input directory, sm_run_all invokes sm_query once at a time for each service definition file found in that subdirectory. The subdirectories themselves (each perhaps representing a single service provider) are handled in parallel.
This example uses sm_run_all to query the two services above in parallel.
1. Using the files from the previous examples, create an input directory structure to give to sm_run_all.
$ mkdir -p input/cool_archive input/great_archive # Create a subdirectory for each archive
$ mv cool_archive_cone_service.py input/cool_archive/
$ mv great_archive_tap_service.py input/great_archive/
$ mv three_cones.py input
$ ls -RF input
cool_archive/ great_archive/ three_cones.py
input/cool_archive:
cool_archive_cone_service.py
input/great_archive:
great_archive_tap_service.py
2. For all the cones in input/three_cones.py
, run all the services in input/cool_archive
in parallel with those in input/great_archive
.
In addition to specifying the input directory created above, sm_run_all requires that
the result directory is explicitly specified.
$ sm_run_all input --cone_file input/three_cones.py --result_dir results
Command Options¶
sm_query¶
Documentation for this command:
usage: sm_query [-h] [-r result_dir] [-l plugin_dir_or_file] [-w writer] [-s]
[-t {async,sync}] [-u USER_AGENT] [-n] [-v]
[--num_cones num_cones | --cone_file cone_file]
[--min_radius MIN_RADIUS] [--max_radius MAX_RADIUS]
[--start_index start_index] [--cone_limit cone_limit]
services
Measure query performance.
positional arguments:
services File containing list of services to query
optional arguments:
-h, --help show this help message and exit
-r result_dir, --result_dir result_dir
The directory in which to put query result files.
Unless --save_results is specified, each query result
file will be deleted after statistics are gathered for
the query.
-l plugin_dir_or_file, --load_plugins plugin_dir_or_file
Directory or file from which to load user plug-ins. If
not specified, and there is a "plugins" subdirectory,
plugin files will be loaded from there.
-w writer, --writer writer
Name and kwargs of a writer plug-in to use.Format is
writer_name[:arg1=val1[,arg2=val2...]] May appear
multiple times to specify multiple writers. May
contain Python datetime format elements which will be
substituted with appropriate elements of the current
time (e.g., results-'%m-%d-%H:%M:%S'.py)
-s, --save_results Save the query result data files. Without this
argument, the query result file will be deleted after
metadata is gathered for the query.
-t {async,sync}, --tap_mode {async,sync}
How to run TAP queries (default=async)
-u USER_AGENT, --user_agent USER_AGENT
Override the User-Agent used for queries
(default=None)
-n, --norun Display summary of command arguments without
performing any actions
-v, --verbose Print additional information to stderr
--num_cones num_cones
Number of cones to generate
--cone_file cone_file
Path of the file containing the individual query
inputs.
--min_radius MIN_RADIUS
Minimum radius (deg). Default=0
--max_radius MAX_RADIUS
Maximum radius (deg). Default=0.25
--start_index start_index
Start with this cone in cone file Default=0
--cone_limit cone_limit
Maximum number of cones to query Default=100000000
sm_run_all¶
usage: sm_run_all input_dir
[-h]
[-r result_dir]
[-o script_output_dir]
[-l plugin_dir_or_file]
[-w writer] [-s]
[-t {async,sync}] [-n] [-v]
[-u USER_AGENT]
[--num_cones num_cones | --cone_file cone_file]
[--min_radius MIN_RADIUS] [--max_radius MAX_RADIUS]
[--start_index start_index] [--cone_limit cone_limit]
Measure performance on all the specified services in the input_dir directory tree.
positional arguments:
input_dir Directory tree containing the input specifications.
optional arguments:
-h, --help show this help message and exit
-r result_dir, --result_dir result_dir
The directory in which to put query result files.
Unless --save_results is specified, each query result file
will be deleted after statistics are gathered for the query.
-o script_output_dir, --script_output_dir script_output_dir
The directory in which script stdout and stderr files are written.
If not specified, stdout and stderr will not be redirected.
-l plugin_dir_or_file, --load_plugins plugin_dir_or_file
Directory or file from which to load user plug-ins.
-w writer, --writer writer
Name and kwargs of a writer plug-in to use.Format is
writer_name[:arg1=val1[,arg2=val2...]] May appear
multiple times to specify multiple writers. May
contain Python datetime format elements which will be
substituted with appropriate elements of the current
time (e.g., results-'%m-%d-%H:%M:%S'.py)
-s, --save_results Save the query result data files. Without this
argument, the query result file will be deleted after
metadata is gathered for the query.
-t {async,sync}, --tap_mode {async,sync}
How to run TAP queries (default=async)
-u USER_AGENT, --user_agent USER_AGENT
Override the User-Agent used for queries
(default=None)
-n, --norun Display summary of command arguments without
performing any actions
-v, --verbose Print additional information to stdout
--num_cones num_cones
Number of cones to generate
--cone_file cone_file
Path of the file containing the individual query
inputs.
--min_radius MIN_RADIUS
Minimum radius (deg). Default=0
--max_radius MAX_RADIUS
Maximum radius (deg). Default=0.25
--start_index start_index
Start with this cone in cone file Default=0
--cone_limit cone_limit
Maximum number of cones to query Default=100000000
sm_replay¶
usage: sm_replay [-h] [-r result_dir] [-l plugin_dir_or_file] [-w writer] [-s]
[-t {sync,async}] [-u USER_AGENT] [-n] [-v]
[--start_index start_index] [--cone_limit cone_limit]
file_to_replay
Measure query replay performance.
positional arguments:
file_to_replay File containing the results of a previous set of query
timings.
optional arguments:
-h, --help show this help message and exit
-r result_dir, --result_dir result_dir
The directory in which to put query result files.
-l plugin_dir_or_file, --load_plugins plugin_dir_or_file
Directory or file from which to load user plug-ins. If
not specified, and there is a "plugins" subdirectory,
plugin files will be loaded from there.
-w writer, --writer writer
Name and kwargs of a writer plug-in to use.Format is
writer_name[:arg1=val1[,arg2=val2...]] May appear
multiple times to specify multiple writers. May
contain Python datetime format elements which will be
substituted with appropriate elements of the current
time (e.g., results-'%m-%d-%H:%M:%S'.py)
-s, --save_results Save the query result data files. Without this
argument, the query result file will be deleted after
metadata is gathered for the query.
-t {sync,async}, --tap_mode {sync,async}
How to run TAP queries (default=async)
-u USER_AGENT, --user_agent USER_AGENT
Override the User-Agent used for queries
(default=None)
-n, --norun Display summary of command arguments without
performing any actions
-v, --verbose Print additional information to stderr
--start_index start_index
Start with this cone in cone file Default=0
--cone_limit cone_limit
Maximum number of cones to query Default=100000000
sm_conegen¶
usage: sm_conegen [-h] --num_cones num_cones [--min_radius min_radius]
[--max_radius max_radius]
outfile
Generate random cones.
positional arguments:
outfile Name of the output file to contain the cones.
optional arguments:
-h, --help show this help message and exit
--num_cones num_cones
Number of cones to generate
--min_radius min_radius
Minimum radius (deg). Default=0
--max_radius max_radius
Maximum radius (deg). Default=0.25
Service Files¶
A service file contains a Python list of dictionaries. Each dictionary defines a service endpoint, and must contain the keys defined below. All services are assumed to return results as VOTables.
base_name - This name of the service will be used in constructing the unique ids for each result row as well as the file names for the VOTable result files stored in the
results
subdirectory.service_type - One of
cone
,xcone
ortap
cone
The query will be constructed as a VO standard Simple Cone Search with the RA, DEC and SR parameters being automatically set based per cone.
[ {'base_name': 'cool_archive_cone', 'service_type': 'cone', 'access_url': 'http://vo.cool_archive_cone.org/vo/conesearch', } ]
xcone
A non-standard cone search. The access_url is assumed to contain three {}s (open/close braces). The RA, Dec and Radius for each cone will be substituted for those 3 braces in order.
[ {'base_name': 'SampleSIAService', 'service_type': 'xcone', 'adql': '', 'access_url': 'https://sia.some_archive.org/' 'mean.votable?flatten_response=false&sort_by=distance' '&ra={}&dec={}&radius={}' } ]
tap
A Table Access Protocol (TAP) service. The adql value is a template template for the TAP query to be performed.
[ {'base_name': 'great_archive_tap', 'service_type': 'tap', 'access_url': 'http://services.great_archive.net/tap_service', 'adql':''' SELECT * FROM ivoa.obscore WHERE 1=CONTAINS(POINT('ICRS', s_ra, s_dec), CIRCLE('ICRS', {}, {}, {} )) ''' } ]
access_url - The access URL for the service.
adql - For the
tap
service_type, this is the ADQL query. For other types, this key must exist, but the value will be ignored. The ADQL query is assumed to contain three {}s (open/close braces). The ra, dec and radius for each cone will be substituted for those 3 braces in order.
[
{'base_name': 'one_service',
'service_type': 'cone',
'access_url': 'http://vo.cool_archive_cone.org/vo/conesearch',
},
{'base_name': 'another_service',
'service_type': 'cone',
'access_url': 'http://www.another_archive.org/big_catalog/conesearch',
}
]
The Output¶
sm_query and sm_run_all write multiple output files, all to the result directory
specified using the --result_dir
command argument. sm_run_all requires that the
result directory is explicitly specified while sm_query will use a default of results
.
CSV files
By default, the output statistics for the queries are written to CSV files, one CSV file per service file. The CSV file names are
<service_file_base_name>_<date_string>.csv
.These files are written by the default output writer plugin,
csv_writer
. See Plugins for information on how to specify alternative or additional writers, or how to customize the CSV file name.VOTable subdirectories and files
If
--save_results
is specified on the command line, the VOTables returned from each query will be stored in subdirectories of the result directory. Those subdirectories are named for thebase_name
specified in the query’s service file. The names of the VOTables are built from attributes of the service and the input:<base_name>_<service_type>_<ra>_<dec>_<radius>.xml
Even when
--save_results
is not specified, the VOTables are written to those files temporarily. Empty VOTables subdirectories are an artifact of that process when--save_results
is not specified.Log files from sm_run_all
For each subdirectory handled, sm_run_all creates a file that logs the commands run in that directory. The files are called
<input_subdir_name>-<date_string>_comlog.txt
.In addition, for each service queried (i.e., each sm_query run), a file is created to collect any stdout or stderr generated by sm_query. Those files are named
<base_name>_<service_type>-<date_string>_runlog.xml
.
Plugins¶
Using a plugin¶
A plugin mechanism is provided to allow customization of the output from sm_query (and sm_run_all).
By default, a plugin called csv_writer
writes the query statistcs to the CSV files described above.
Alternative or additional plugins can be loaded via the --writer
argument.
To use a new plugin called my_writer
:
$ sm_query cool_archive_cone_service.py --cone_file three_cones.py \
--writer my_writer
When multiple writers are specified, each writer will be invoked for each result, and they will be executed
in the order they were specified on the command line.
To use a new plugin called my_writer
along with the builtin csv_writer
:
$ sm_query cool_archive_cone_service.py --cone_file three_cones.py \
--writer my_writer \
--writer csv_writer
Writing a plugin¶
A plugin is a Python class in a file that gets loaded at run time. The class must be a subclass of
AbstractResultWriter
and must implement the abstract methods defined there.
- abstract AbstractResultWriter.begin(args, **kwargs)[source]¶
- argsargparse.Namespace
the result of an argparse.ArgumentParser’s parse_args().
- kwargsdict
keyword args from the plug-in specification.
Loading a plugin at run time¶
To load your plugin at run time, the plugin should be in a .py
file. Then it should either
be placed in the plugins
subdirectory of your working directory, or its location should be
specified on the command line (for sm_query, sm_replay) with the --load_plugins
argument. The value for --load_plugins
can be either a directory (from which all .py
files will be loaded, or an individual .py
file.)
Example: csv_writer¶
The builtin plugin csv_writer
is shown below. Note that its begin
method accepts the keyword
argument outfile
which can be used to override the default output file name. To specify outfile
on the
command line, include it with the --writer
value:
$ sm_query cool_archive_cone_service.py --cone_file three_cones.py \
--writer csv_writer:outfile=my_override_filename.csv
import csv
import os
import sys
from pathlib import Path
from datetime import datetime
from servicemon.plugin_support import AbstractResultWriter
class CsvResultWriter(AbstractResultWriter, plugin_name='csv_writer',
description='Writes results to a csv file.'):
def begin(self, args, outfile=None):
self._first_stat = True
self._outfile_path = self._compute_outfile_path(args, outfile=outfile)
# Create the output dir if needed.
if self._outfile_path is not None:
os.makedirs(self._outfile_path.parent, exist_ok=True)
def end(self):
pass
def one_result(self, stats):
if self._outfile_path is not None:
with open(self._outfile_path, 'a+') as file:
self._output_stats_row_to_file(stats, file)
else:
self._output_stats_row_to_file(stats, sys.stdout)
def _output_stats_row_to_file(self, stats, file):
writer = csv.DictWriter(file, dialect='excel',
fieldnames=stats.columns())
if self._first_stat:
self._first_stat = False
writer.writeheader()
row_values = stats.row_values()
writer.writerow(row_values)
def _compute_outfile_path(self, args, outfile=None):
"""
If outfile is not None, use it. Otherwise compute the output file
name from name of the services file supplied in the args provided.
"""
result_path = None
if outfile is not None:
# Leaving result_path None will cause output to go to stdout.
if outfile != 'stdout':
result_path = Path(outfile)
else:
now = datetime.now()
dtstr = now.strftime('%Y-%m-%d %H:%M:%S.%f')
base_services_name = Path(args.services).stem
result_name = f'{base_services_name}_{dtstr}.csv'
result_path = Path(args.result_dir) / Path(result_name)
return result_path
Developer documentation¶
The Servicemon Analysis API is for use in finding and analyzing data already collected.