Documentation of exe_kg_lib.classes.tasks package
Overview¶
This package contains classes that correspond to entities of type owl:class
that are either rdfs:subClassOf ds:Task
or rdfs:subClassOf ds:AtomicTask
in the KG. In either case, these entities are in the top-level of the Task hierarchy for each of the three KG schemata: ML, Statistics and Visualization.
This package's classes implement the abstract run_method()
to perform the following steps:
- The input data are taken:
- Either from outputs of previous Tasks (parameter:
other_task_output_dict
) of the ExeKG - Or a given dataframe (parameter:
input_data
) that holds the input data for the ExeKG
- Either from outputs of previous Tasks (parameter:
- An algorithm is executed. The algorithm can be related to ML, Statistics or Visualization, depending on the Python file's prefix (i.e.
ml
,statistic
,visual
). The algorithm can:- Either be implemented as part of this library
- Or belong to an external module. In this case, the module is determined using
classes.tasks.task.Task.resolve_module()
based on the Task'smethod_module_chain
. See section Naming conventions for more info onmethod_module_chain
.
- The output of the algorithm is returned as a dictionary with pairs of output name and value
Naming conventions¶
The below naming conventions are necessary for automatically mapping KG's tasks (with methods and properties) to Python objects while parsing the ExeKG.
- Each class name in this package is the name of an
owl:class
that is eitherrdfs:subClassOf ds:Task
orrdfs:subClassOf ds:AtomicTask
. - The
method_params_dict
andmethod_inherited_params_dict
fields inherited fromclasses.tasks.task.Task
contain parameters for the algorithm to be executed.- Their keys are produced by applying
utils.string_utils.property_iri_to_field_name()
to the datatype property names of the Task's linkedds:AtomicMethod
instance in the ExeKG. E.g. a key namedsplit_ratio
corresponds tohasParamSplitRatio
property in the KG. - Their values are produced by applying
classes.exe_kg_mixins.exe_kg_execution_mixin.ExeKGExecutionMixin._literal_to_field_value()
to the literal values of the datatype properties in the ExeKG. E.g. a value of0.6
corresponds to"0.2"^^xsd:float
literal value in the KG.
- Their keys are produced by applying
-
The
method_module_chain
field inherited fromclasses.tasks.task.Task
contains a hierarchy list of Python module names from top to bottom.- The module hierarchy is determined by
utils.query_utils.get_module_hierarchy_chain()
starting from the Task's linkedds:AtomicMethod
instance in the ExeKG, and proceeding via therdfs:subClassOf+ ds:Module
property path. - Each item in the hierarchy list (except for the last one) comes from the name of a
owl:class
that is ardfs:subClassOf+ ds:Module
, after conversion byutils.string_utils.class_name_to_module_name()
. - The last item of the list comes from the type of the Task's linked
ds:AtomicMethod
instance, after conversion byutils.string_utils.class_name_to_method_name()
.
The below example shows the module chain
SVCMethod -> SvmModule -> SklearnModule
which leads tomethod_module_chain = ["sklearn", "svm", "SVC"]
.############################# ### START: ExeKG fragment ### ############################# ml:BinaryClassification1 a ml:BinaryClassification ; ds:hasNextTask ml:Test1 ; ml:hasBinaryClassificationMethod ml:SVCMethod1 ; ml:hasTrainInput ml:DataInTrainX_BinaryClassification1_1, ml:DataInTrainY_BinaryClassification1_1 ; ml:hasTrainOutput ml:DataOutTrainModelSVCMethod . ########################### ### END: ExeKG fragment ### ########################### ################################# ### START: KG schema fragment ### ################################# ml:SVCMethod a owl:Class ; rdfs:subClassOf ds:AtomicMethod, ml:SvmModule, ml:TrainMethod . ml:SvmModule a owl:Class ; rdfs:subClassOf ml:SklearnModule . ml:SklearnModule a owl:Class ; rdfs:subClassOf ds:Module . ############################### ### END: KG schema fragment ### ###############################
- The module hierarchy is determined by
-
The
inputs
andoutputs
fields inherited fromclasses.tasks.task.Task
contain a list ofclasses.data_entity.DataEntity
objects. In the case ofinputs
, the objects can also be of typeclasses.method.Method
. The objects are generated by invoking the method_property_value_to_field_value()
from theExeKGExecutionMixin
class. This method is applied to instances that are linked through a subclass of eitherds:hasInput
ords:hasOutput
to the Task in the ExeKG.- The field names of
DataEntity
andMethod
objects are filled by applyingutils.string_utils.property_iri_to_field_name()
to the properties of the Task's linkedds:DataEntity
ords:Method
instances. - In the case of input DataEntities, the object's fields are filled using the properties of the
ds:DataEntity
instances that are referenced by the Task's linkedds:DataEntity
instances.
The below example shows a
LinePlotting1
task instance that hasDataInToPlot_LinePlotting1_1
as input.DataInToPlot_LinePlotting1_1
referencesds:feature_1
. So, in this case, theinputs
field of the corresponding Task Python object will contain a DataEntity object with fields:source = "feature_1"
,reference = IRI(ds:feature_1)
. The fieldsdata_semantics
anddata_structure
are mainly used during pipeline construction.###################### ### ExeKG fragment ### ###################### visu:LinePlotting1 a visu:LinePlotting ; ds:hasNextTask visu:LinePlotting2 ; visu:hasLinePlottingMethod visu:PlotMethod1 ; visu:hasPlottingInput visu:DataInToPlot_LinePlotting1_1 . visu:DataInToPlot_LinePlotting1_1 a visu:DataInToPlot ; ds:hasReference ds:feature_1 . ds:feature_1 a ds:DataEntity, ds:Numerical, ds:Vector ; ds:hasSource "feature_1"^^xsd:string .
- The field names of