Documentation¶

Ingester API and Example Usage¶

ingester.py - Top level functions for adding data to the science archive.

Data is added to the science archive using the archive API and an S3 client. The steps necessary to add data to the science archive are as follows:

Check that the file does not yet exist in the science archive

Validate and build a cleaned dictionary of the headers of the FITS file

Upload the file to S3

Combine the results from steps 2 and 3 into a record to be added to the science archive database

Examples

Ingest a file one step at a time:

>>> from ocs_ingester import ingester
>>> with open('tst1mXXX-ab12-20191013-0001-e00.fits.fz', 'rb') as fileobj:
>>>     if not frame_exists(fileobj):
>>>        record = ingester.validate_fits_and_create_archive_record(fileobj)
>>>        s3_version = ingester.upload_file_to_s3(fileobj)
>>>        ingested_record = ingester.ingest_archive_record(s3_version, record)

Ingest a file in one step:

>>> from ocs_ingester import ingester
>>> with open('tst1mXXX-ab12-20191013-0001-e00.fits.fz', 'rb') as fileobj:
>>>    ingested_record = ingester.upload_file_and_ingest_to_archive(fileobj)

ingester.frame_exists(fileobj, api_root='http://localhost:8000/', auth_token='')¶

Checks if the file exists in the science archive.

Computes the md5 of the given file and checks whether a file with that md5 already exists in the science archive.

Parameters

fileobj (file-like object) – File-like object
api_root (str) – Science archive API root url
auth_token (str) – Science archive API authentication token

Returns

Boolean indicating whether the file exists in the science archive

Return type

bool

Raises

ocs_ingester.exceptions.BackoffRetryError – If there was a problem getting a response from the science archive API

ingester.validate_fits_and_create_archive_record(fileobj, path=None, required_headers='PROPID', 'DATE-OBS', 'INSTRUME', 'SITEID', 'TELID', 'OBSTYPE', 'BLKUID', blacklist_headers='HISTORY', 'COMMENT')¶

Validates the FITS file and creates a science archive record from it.

Checks that required headers are present, removes blacklisted headers, and cleans other headers such that they are valid for ingestion into the science archive.

Parameters

fileobj (file-like object) – File-like object
path (str) – File path/name for this object. This option may be used to override the filename associated with the fileobj. It must be used if the fileobj does not have a filename.
required_headers (tuple) – FITS headers that must be present
blacklist_headers (tuple) – FITS headers that should not be ingested

Returns

Constructed science archive record. For example:

{
    'basename': 'tst1mXXX-ab12-20191013-0001-e00',
    'FILTER': 'rp',
    'DATE-OBS': '2019-10-13T10:13:00',
    ...
}

Return type

dict

Raises

ocs_ingester.exceptions.DoNotRetryError – If required headers could not be found

ingester.upload_file_to_s3(fileobj, path=None, bucket='ingestertest')¶

Uploads a file to the S3 bucket.

Parameters

fileobj (file-like object) – File-like object
path (str) – File path/name for this object. This option may be used to override the filename associated with the fileobj. It must be used if the fileobj does not have a filename.
bucket (str) – S3 bucket name

Returns

Version information for the file that was uploaded. For example:

{
    'key': '792FE6EFFE6FAD7E',
    'md5': 'ECD9B357D67117BE8BF38D6F4B4A6',
    'extension': '.fits.fz'
}

Return type

dict

Hint

The response contains an “md5” field, which is the md5 computed by S3. It is a good idea to check that this md5 is the same as the locally computed md5 of the file to make sure that the entire file was successfully uploaded.

Raises: ocs_ingester.exceptions.BackoffRetryError – If there is a problem connecting to S3

ingester.ingest_archive_record(version, record, api_root='http://localhost:8000/', auth_token='')¶

Adds a record to the science archive database.

Parameters

version (dict) – Version information returned from the upload to S3
record (dict) – Science archive record to ingest
api_root (str) – Science archive API root url
auth_token (str) – Science archive API authentication token

Returns

The science archive record that was ingested. For example:

{
    'basename': 'tst1mXXX-ab12-20191013-0001-e00',
    'version_set': [
        {
            'key': '792FE6EFFE6FAD7E',
            'md5': 'ECD9B357D67117BE8BF38D6F4B4A6',
            'extension': '.fits.fz'
            }
        ],
    'frameid': 12345,
    ...
}

Return type

dict

Raises

ocs_ingester.exceptions.BackoffRetryError – If there was a problem connecting to the science archive
ocs_ingester.exceptions.DoNotRetryError – If there was a problem with the record that must be fixed before attempting to ingest it again

Exceptions¶

Exceptions raised by the ingester library.

exception BackoffRetryError¶: Exception that is raised when an error happens that can be retried with an expontential backoff. For example, networking latency errors that may succeed at a later time.

exception DoNotRetryError¶: Exception that is raised when an error happens that will undoubtedly repeat if called again. The task should not be retried.

exception NonFatalDoNotRetryError¶: Exception that is raised when an error happens that should not be retried and is also not a fatal condition.

exception RetryError¶: Exception that is raised when an error happens that can be retried.

Using the command line entry point¶

Command-line entrypoint to the ingester library.

Examples

See available options:

(venv) ocs_ingest_frame --help

Documentation¶

Ingester API and Example Usage¶

Exceptions¶

Using the command line entry point¶

Ingester

Navigation

Related Topics