SciXTEMPLATE.TEMPLATE package

Subpackages

Submodules

SciXTEMPLATE.TEMPLATE.db module

SciXTEMPLATE.TEMPLATE.db.get_TEMPLATE_record(session, record_id)

Return record with UUID: record_id

SciXTEMPLATE.TEMPLATE.db.get_job_status_by_job_hash(cls, job_hashes, only_status=None)

Return all updates with job_hash

SciXTEMPLATE.TEMPLATE.db.update_job_status(cls, job_hash, status=None)

Update status for job previously written to db

SciXTEMPLATE.TEMPLATE.db.write_TEMPLATE_record(cls, record_id, date, s3_key, checksum, source)

Write harvested record to db.

SciXTEMPLATE.TEMPLATE.db.write_job_status(cls, job_request, only_status=None)

Write new status for job to db

SciXTEMPLATE.TEMPLATE.db.write_status_redis(redis_instance, status)

SciXTEMPLATE.TEMPLATE.models module

class SciXTEMPLATE.TEMPLATE.models.Source(value)

Bases: Enum

An enumeration.

SYMBOL1 = 1
SYMBOL2 = 2
SYMBOL3 = 3
SYMBOL4 = 4
class SciXTEMPLATE.TEMPLATE.models.Status(value)

Bases: Enum

An enumeration.

Error = 3
Pending = 1
Processing = 2
Success = 4
class SciXTEMPLATE.TEMPLATE.models.TEMPLATE_record(**kwargs)

Bases: Base

ArXiV records table table containing the relevant information for harvested arxiv records.

checksum
date
id
s3_key
source
class SciXTEMPLATE.TEMPLATE.models.gRPC_status(**kwargs)

Bases: Base

gRPC table table containing the given status of every job passed through the gRPC API

id
job_hash
job_request
status
timestamp

SciXTEMPLATE.TEMPLATE.s3_methods module

class SciXTEMPLATE.TEMPLATE.s3_methods.load_s3(config)

Bases: object

A wrapper class to load multiple S3 providers

load_s3_providers(config)

Loops over all providers specified in config and returns them as a dict

input:

config: The imported Pipeline configuration

return:

provider_dict: a dictionary with entries of the form “PROVIDER_NAME”: class s3_provider

class SciXTEMPLATE.TEMPLATE.s3_methods.s3_methods

Bases: object

Base class for interacting with S3 providers

write_object_s3(file_bytes, object_name)
class SciXTEMPLATE.TEMPLATE.s3_methods.s3_provider(provider, config)

Bases: s3_methods

Class for interacting with a particular S3 provider

SciXTEMPLATE.TEMPLATE.template module

class SciXTEMPLATE.TEMPLATE.template.TEMPLATE_APP(proj_home)

Bases: object

session_scope()

Provide a transactional scope for postgres.

template_consumer(consumer, producer)

Ingests a message from the Pipeline input topic and passes it to the consumer task

template_task(msg, producer)

input: msg: The consumed msg from the Pipeline input topic producer: The relevant Pipeline output producer

The main consumer task for the Pipeline This task will take any consumed messages and pass them to the relevant subprocesses as well as updating postgres and redis.

SciXTEMPLATE.TEMPLATE.template.init_pipeline(proj_home)

input: proj_home: The home directory for the Pipeline

Initializes the relevant python methods app: The main application class schema_client: The Kafka Schema Registry schema: The input schema the pipeline uses consumer: The kafka consumer for the pipeline producer: The kafka producer for the pipeline

SciXTEMPLATE.TEMPLATE.utils module

SciXTEMPLATE.TEMPLATE.utils.conf_update_from_env(app_name, conf)
SciXTEMPLATE.TEMPLATE.utils.from_object(from_obj, to_obj)

Updates the values from the given object. An object can be of one of the following two types: Objects are usually either modules or classes. Just the uppercase variables in that object are stored in the config. :param obj: an import name or object

SciXTEMPLATE.TEMPLATE.utils.get_schema(app, schema_client, schema_name)

input:

app: The relevant calling application (can be any class with a logger attirbute.) schema_client: Kafka SchemaRegistryClient schema_name: The name of the AVRO schema

return:

AVRO schema (str)

SciXTEMPLATE.TEMPLATE.utils.load_config(proj_home=None, extra_frames=0, app_name=None)

Loads configuration from config.py and also from local_config.py

Parameters:
  • proj_home

    • str, location of the home - we’ll always try

    to load config files from there. If the location is empty, we’ll inspect the caller and derive the location of its parent folder.

  • extra_frames

    • int, number of frames to look back; default

    is 2, which is good when the load_config() is called directly, but when called from inside classes, we need to add extra more

Returns:

dictionary

SciXTEMPLATE.TEMPLATE.utils.load_module(filename)

Loads module, first from config.py then from local_config.py :return dictionary

Module contents