Skip to content

Merged Data

merged_data

MergedData Entity

Represents the result of merging the dataset with external databases.

Classes

MergedData dataclass

MergedData(original_dataset: Dataset, biorempp_data: Optional[Dict[str, Any]] = None, kegg_data: Optional[Dict[str, Any]] = None, hadeg_data: Optional[Dict[str, Any]] = None, toxcsm_data: Optional[Dict[str, Any]] = None)

Entity that represents the result of the merge with databases.

This entity is immutable after creation to ensure consistency of the processed data. It contains the original dataset and the results of the merges with each of the 4 system databases.

Parameters:

Name Type Description Default
original_dataset Dataset

Original dataset before the merges

required
biorempp_data Optional[Dict[str, Any]]

Data resulting from the merge with the BioRemPP database

None
kegg_data Optional[Dict[str, Any]]

Data resulting from the merge with the KEGG database

None
hadeg_data Optional[Dict[str, Any]]

Data resulting from the merge with the HADEG database

None
toxcsm_data Optional[Dict[str, Any]]

Data resulting from the merge with the ToxCSM database

None

Raises:

Type Description
ValueError

If validated without a mandatory merge (BioRemPP)

Notes

BioRemPP, KEGG, and HADEG are considered mandatory merges. ToxCSM is optional as it depends on the presence of compounds in the data.

Attributes
is_biorempp_merged property
is_biorempp_merged: bool

Checks if the merge with BioRemPP was executed.

Returns:

Type Description
bool

True if BioRemPP data is present and not empty.

is_kegg_merged property
is_kegg_merged: bool

Checks if the merge with KEGG was executed.

Returns:

Type Description
bool

True if KEGG data is present and not empty.

is_hadeg_merged property
is_hadeg_merged: bool

Checks if the merge with HADEG was executed.

Returns:

Type Description
bool

True if HADEG data is present and not empty.

is_toxcsm_merged property
is_toxcsm_merged: bool

Checks if the merge with ToxCSM was executed.

Returns:

Type Description
bool

True if ToxCSM data is present and not empty.

is_fully_merged property
is_fully_merged: bool

Checks if all mandatory merges were executed.

Returns:

Type Description
bool

True if BioRemPP, KEGG, and HADEG were merged.

Notes

ToxCSM is not considered mandatory as it depends on the presence of compounds in the data.

Functions
get_merge_status
get_merge_status() -> Dict[str, bool]

Returns the status of all merges.

Returns:

Type Description
Dict[str, bool]

Dictionary with the status of each database.

Examples:

>>> status = merged.get_merge_status()
>>> status
{
    'biorempp': True,
    'kegg': True,
    'hadeg': True,
    'toxcsm': False
}
Source code in src/domain/entities/merged_data.py
def get_merge_status(self) -> Dict[str, bool]:
    """
    Returns the status of all merges.

    Returns
    -------
    Dict[str, bool]
        Dictionary with the status of each database.

    Examples
    --------
    >>> status = merged.get_merge_status()
    >>> status
    {
        'biorempp': True,
        'kegg': True,
        'hadeg': True,
        'toxcsm': False
    }
    """
    return {
        "biorempp": self.is_biorempp_merged,
        "kegg": self.is_kegg_merged,
        "hadeg": self.is_hadeg_merged,
        "toxcsm": self.is_toxcsm_merged,
    }
validate
validate() -> None

Validates the merge state.

Raises:

Type Description
ValueError

If the BioRemPP merge was not executed (mandatory).

Notes

Only BioRemPP is validated as it is the fundamental database. KEGG and HADEG may be optional depending on the context of use.

Source code in src/domain/entities/merged_data.py
def validate(self) -> None:
    """
    Validates the merge state.

    Raises
    ------
    ValueError
        If the BioRemPP merge was not executed (mandatory).

    Notes
    -----
    Only BioRemPP is validated as it is the fundamental database.
    KEGG and HADEG may be optional depending on the context of use.
    """
    if not self.is_biorempp_merged:
        raise ValueError("BioRemPP merge is required")
__str__
__str__() -> str

Returns the string representation of the merged data.

Returns:

Type Description
str

Descriptive string of the merge status.

Source code in src/domain/entities/merged_data.py
def __str__(self) -> str:
    """
    Returns the string representation of the merged data.

    Returns
    -------
    str
        Descriptive string of the merge status.
    """
    status = self.get_merge_status()
    merged_count = sum(status.values())
    return f"MergedData ({merged_count}/4 databases merged)"
__repr__
__repr__() -> str

Returns the debug representation of the merged data.

Returns:

Type Description
str

Detailed representation.

Source code in src/domain/entities/merged_data.py
def __repr__(self) -> str:
    """
    Returns the debug representation of the merged data.

    Returns
    -------
    str
        Detailed representation.
    """
    return (
        f"MergedData("
        f"biorempp={self.is_biorempp_merged}, "
        f"kegg={self.is_kegg_merged}, "
        f"hadeg={self.is_hadeg_merged}, "
        f"toxcsm={self.is_toxcsm_merged})"
    )