Merged Data¶
merged_data ¶
MergedData Entity
Represents the result of merging the dataset with external databases.
Classes¶
MergedData dataclass ¶
MergedData(original_dataset: Dataset, biorempp_data: Optional[Dict[str, Any]] = None, kegg_data: Optional[Dict[str, Any]] = None, hadeg_data: Optional[Dict[str, Any]] = None, toxcsm_data: Optional[Dict[str, Any]] = None)
Entity that represents the result of the merge with databases.
This entity is immutable after creation to ensure consistency of the processed data. It contains the original dataset and the results of the merges with each of the 4 system databases.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
original_dataset | Dataset | Original dataset before the merges | required |
biorempp_data | Optional[Dict[str, Any]] | Data resulting from the merge with the BioRemPP database | None |
kegg_data | Optional[Dict[str, Any]] | Data resulting from the merge with the KEGG database | None |
hadeg_data | Optional[Dict[str, Any]] | Data resulting from the merge with the HADEG database | None |
toxcsm_data | Optional[Dict[str, Any]] | Data resulting from the merge with the ToxCSM database | None |
Raises:
| Type | Description |
|---|---|
ValueError | If validated without a mandatory merge (BioRemPP) |
Notes
BioRemPP, KEGG, and HADEG are considered mandatory merges. ToxCSM is optional as it depends on the presence of compounds in the data.
Attributes¶
is_biorempp_merged property ¶
Checks if the merge with BioRemPP was executed.
Returns:
| Type | Description |
|---|---|
bool | True if BioRemPP data is present and not empty. |
is_kegg_merged property ¶
Checks if the merge with KEGG was executed.
Returns:
| Type | Description |
|---|---|
bool | True if KEGG data is present and not empty. |
is_hadeg_merged property ¶
Checks if the merge with HADEG was executed.
Returns:
| Type | Description |
|---|---|
bool | True if HADEG data is present and not empty. |
is_toxcsm_merged property ¶
Checks if the merge with ToxCSM was executed.
Returns:
| Type | Description |
|---|---|
bool | True if ToxCSM data is present and not empty. |
is_fully_merged property ¶
Checks if all mandatory merges were executed.
Returns:
| Type | Description |
|---|---|
bool | True if BioRemPP, KEGG, and HADEG were merged. |
Notes
ToxCSM is not considered mandatory as it depends on the presence of compounds in the data.
Functions¶
get_merge_status ¶
Returns the status of all merges.
Returns:
| Type | Description |
|---|---|
Dict[str, bool] | Dictionary with the status of each database. |
Examples:
>>> status = merged.get_merge_status()
>>> status
{
'biorempp': True,
'kegg': True,
'hadeg': True,
'toxcsm': False
}
Source code in src/domain/entities/merged_data.py
validate ¶
Validates the merge state.
Raises:
| Type | Description |
|---|---|
ValueError | If the BioRemPP merge was not executed (mandatory). |
Notes
Only BioRemPP is validated as it is the fundamental database. KEGG and HADEG may be optional depending on the context of use.
Source code in src/domain/entities/merged_data.py
__str__ ¶
Returns the string representation of the merged data.
Returns:
| Type | Description |
|---|---|
str | Descriptive string of the merge status. |
Source code in src/domain/entities/merged_data.py
__repr__ ¶
Returns the debug representation of the merged data.
Returns:
| Type | Description |
|---|---|
str | Detailed representation. |