Documentation

Ingester API and Example Usage

ingester.py - Top level functions for adding data to the science archive.

Data is added to the science archive using the archive API and an S3 client. The steps necessary to add data to the science archive are as follows:

  1. Check that the file does not yet exist in the science archive

  2. Validate and build a cleaned dictionary of the headers of the FITS file

  3. Upload the file to S3

  4. Combine the results from steps 2 and 3 into a record to be added to the science archive database

Examples

Ingest a file one step at a time:

>>> from ocs_ingester import ingester
>>> with open('tst1mXXX-ab12-20191013-0001-e00.fits.fz', 'rb') as fileobj:
>>>     if not frame_exists(fileobj):
>>>        record = ingester.validate_fits_and_create_archive_record(fileobj)
>>>        s3_version = ingester.upload_file_to_s3(fileobj)
>>>        ingested_record = ingester.ingest_archive_record(s3_version, record)

Ingest a file in one step:

>>> from ocs_ingester import ingester
>>> with open('tst1mXXX-ab12-20191013-0001-e00.fits.fz', 'rb') as fileobj:
>>>    ingested_record = ingester.upload_file_and_ingest_to_archive(fileobj)
ingester.frame_exists(fileobj, api_root='http://localhost:8000/', auth_token='')

Checks if the file exists in the science archive.

Computes the md5 of the given file and checks whether a file with that md5 already exists in the science archive.

Parameters
  • fileobj (file-like object) – File-like object

  • api_root (str) – Science archive API root url

  • auth_token (str) – Science archive API authentication token

Returns

Boolean indicating whether the file exists in the science archive

Return type

bool

Raises

ocs_ingester.exceptions.BackoffRetryError – If there was a problem getting a response from the science archive API

ingester.validate_fits_and_create_archive_record(fileobj, path=None, required_headers='PROPID', 'DATE-OBS', 'INSTRUME', 'SITEID', 'TELID', 'OBSTYPE', 'BLKUID', blacklist_headers='HISTORY', 'COMMENT')

Validates the FITS file and creates a science archive record from it.

Checks that required headers are present, removes blacklisted headers, and cleans other headers such that they are valid for ingestion into the science archive.

Parameters
  • fileobj (file-like object) – File-like object

  • path (str) – File path/name for this object. This option may be used to override the filename associated with the fileobj. It must be used if the fileobj does not have a filename.

  • required_headers (tuple) – FITS headers that must be present

  • blacklist_headers (tuple) – FITS headers that should not be ingested

Returns

Constructed science archive record. For example:

{
    'basename': 'tst1mXXX-ab12-20191013-0001-e00',
    'FILTER': 'rp',
    'DATE-OBS': '2019-10-13T10:13:00',
    ...
}

Return type

dict

Raises

ocs_ingester.exceptions.DoNotRetryError – If required headers could not be found

ingester.upload_file_to_s3(fileobj, path=None, bucket='ingestertest')

Uploads a file to the S3 bucket.

Parameters
  • fileobj (file-like object) – File-like object

  • path (str) – File path/name for this object. This option may be used to override the filename associated with the fileobj. It must be used if the fileobj does not have a filename.

  • bucket (str) – S3 bucket name

Returns

Version information for the file that was uploaded. For example:

{
    'key': '792FE6EFFE6FAD7E',
    'md5': 'ECD9B357D67117BE8BF38D6F4B4A6',
    'extension': '.fits.fz'
}

Return type

dict

Hint

The response contains an “md5” field, which is the md5 computed by S3. It is a good idea to check that this md5 is the same as the locally computed md5 of the file to make sure that the entire file was successfully uploaded.

Raises

ocs_ingester.exceptions.BackoffRetryError – If there is a problem connecting to S3

ingester.ingest_archive_record(version, record, api_root='http://localhost:8000/', auth_token='')

Adds a record to the science archive database.

Parameters
  • version (dict) – Version information returned from the upload to S3

  • record (dict) – Science archive record to ingest

  • api_root (str) – Science archive API root url

  • auth_token (str) – Science archive API authentication token

Returns

The science archive record that was ingested. For example:

{
    'basename': 'tst1mXXX-ab12-20191013-0001-e00',
    'version_set': [
        {
            'key': '792FE6EFFE6FAD7E',
            'md5': 'ECD9B357D67117BE8BF38D6F4B4A6',
            'extension': '.fits.fz'
            }
        ],
    'frameid': 12345,
    ...
}

Return type

dict

Raises
  • ocs_ingester.exceptions.BackoffRetryError – If there was a problem connecting to the science archive

  • ocs_ingester.exceptions.DoNotRetryError – If there was a problem with the record that must be fixed before attempting to ingest it again

Exceptions

Exceptions raised by the ingester library.

exception BackoffRetryError

Exception that is raised when an error happens that can be retried with an expontential backoff. For example, networking latency errors that may succeed at a later time.

exception DoNotRetryError

Exception that is raised when an error happens that will undoubtedly repeat if called again. The task should not be retried.

exception NonFatalDoNotRetryError

Exception that is raised when an error happens that should not be retried and is also not a fatal condition.

exception RetryError

Exception that is raised when an error happens that can be retried.

Using the command line entry point

Command-line entrypoint to the ingester library.

Examples

See available options:

(venv) ocs_ingest_frame --help