Documentation¶
Ingester API and Example Usage¶
ingester.py
- Top level functions for adding data to the science archive.
Data is added to the science archive using the archive API and an S3 client. The steps necessary to add data to the science archive are as follows:
Check that the file does not yet exist in the science archive
Validate and build a cleaned dictionary of the headers of the FITS file
Upload the file to S3
Combine the results from steps 2 and 3 into a record to be added to the science archive database
Examples
Ingest a file one step at a time:
>>> from ocs_ingester import ingester
>>> with open('tst1mXXX-ab12-20191013-0001-e00.fits.fz', 'rb') as fileobj:
>>> if not frame_exists(fileobj):
>>> record = ingester.validate_fits_and_create_archive_record(fileobj)
>>> s3_version = ingester.upload_file_to_s3(fileobj)
>>> ingested_record = ingester.ingest_archive_record(s3_version, record)
Ingest a file in one step:
>>> from ocs_ingester import ingester
>>> with open('tst1mXXX-ab12-20191013-0001-e00.fits.fz', 'rb') as fileobj:
>>> ingested_record = ingester.upload_file_and_ingest_to_archive(fileobj)
-
ingester.
frame_exists
(fileobj, api_root='http://localhost:8000/', auth_token='')¶ Checks if the file exists in the science archive.
Computes the md5 of the given file and checks whether a file with that md5 already exists in the science archive.
- Parameters
fileobj (file-like object) – File-like object
api_root (str) – Science archive API root url
auth_token (str) – Science archive API authentication token
- Returns
Boolean indicating whether the file exists in the science archive
- Return type
bool
- Raises
ocs_ingester.exceptions.BackoffRetryError – If there was a problem getting a response from the science archive API
-
ingester.
validate_fits_and_create_archive_record
(fileobj, path=None, required_headers='PROPID', 'DATE-OBS', 'INSTRUME', 'SITEID', 'TELID', 'OBSTYPE', 'BLKUID', blacklist_headers='HISTORY', 'COMMENT')¶ Validates the FITS file and creates a science archive record from it.
Checks that required headers are present, removes blacklisted headers, and cleans other headers such that they are valid for ingestion into the science archive.
- Parameters
fileobj (file-like object) – File-like object
path (str) – File path/name for this object. This option may be used to override the filename associated with the fileobj. It must be used if the fileobj does not have a filename.
required_headers (tuple) – FITS headers that must be present
blacklist_headers (tuple) – FITS headers that should not be ingested
- Returns
Constructed science archive record. For example:
{ 'basename': 'tst1mXXX-ab12-20191013-0001-e00', 'FILTER': 'rp', 'DATE-OBS': '2019-10-13T10:13:00', ... }
- Return type
dict
- Raises
ocs_ingester.exceptions.DoNotRetryError – If required headers could not be found
-
ingester.
upload_file_to_s3
(fileobj, path=None, bucket='ingestertest')¶ Uploads a file to the S3 bucket.
- Parameters
fileobj (file-like object) – File-like object
path (str) – File path/name for this object. This option may be used to override the filename associated with the fileobj. It must be used if the fileobj does not have a filename.
bucket (str) – S3 bucket name
- Returns
Version information for the file that was uploaded. For example:
{ 'key': '792FE6EFFE6FAD7E', 'md5': 'ECD9B357D67117BE8BF38D6F4B4A6', 'extension': '.fits.fz' }
- Return type
dict
Hint
The response contains an “md5” field, which is the md5 computed by S3. It is a good idea to check that this md5 is the same as the locally computed md5 of the file to make sure that the entire file was successfully uploaded.
- Raises
ocs_ingester.exceptions.BackoffRetryError – If there is a problem connecting to S3
-
ingester.
ingest_archive_record
(version, record, api_root='http://localhost:8000/', auth_token='')¶ Adds a record to the science archive database.
- Parameters
version (dict) – Version information returned from the upload to S3
record (dict) – Science archive record to ingest
api_root (str) – Science archive API root url
auth_token (str) – Science archive API authentication token
- Returns
The science archive record that was ingested. For example:
{ 'basename': 'tst1mXXX-ab12-20191013-0001-e00', 'version_set': [ { 'key': '792FE6EFFE6FAD7E', 'md5': 'ECD9B357D67117BE8BF38D6F4B4A6', 'extension': '.fits.fz' } ], 'frameid': 12345, ... }
- Return type
dict
- Raises
ocs_ingester.exceptions.BackoffRetryError – If there was a problem connecting to the science archive
ocs_ingester.exceptions.DoNotRetryError – If there was a problem with the record that must be fixed before attempting to ingest it again
Exceptions¶
Exceptions raised by the ingester library.
-
exception
BackoffRetryError
¶ Exception that is raised when an error happens that can be retried with an expontential backoff. For example, networking latency errors that may succeed at a later time.
-
exception
DoNotRetryError
¶ Exception that is raised when an error happens that will undoubtedly repeat if called again. The task should not be retried.
-
exception
NonFatalDoNotRetryError
¶ Exception that is raised when an error happens that should not be retried and is also not a fatal condition.
-
exception
RetryError
¶ Exception that is raised when an error happens that can be retried.
Using the command line entry point¶
Command-line entrypoint to the ingester library.
Examples
See available options:
(venv) ocs_ingest_frame --help