Skip to content

Domain Entities

Domain entities are business objects with unique identity. They are mutable and equality is determined by their ID, not their attributes.


Sample Entity

Sample dataclass

Sample(id: SampleId, ko_list: List[KO] = list(), created_at: datetime = datetime.now(), metadata: Dict[str, Any] = dict())

Aggregate Root - Represents a biological sample.

Encapsulates business rules related to samples and their associated KOs. A sample is uniquely identified by its SampleId and contains a list of KOs (KEGG Orthology) that were detected in it.

Parameters:

Name Type Description Default
id SampleId

Unique sample identifier

required
ko_list List[KO]

List of KOs associated with the sample

[]
created_at datetime

Sample creation timestamp

datetime.now()
metadata Dict[str, Any]

Additional sample metadata

{}

Raises:

Type Description
ValueError

If sample is validated without at least one KO

Notes

This is an Aggregate Root entity in DDD context, responsible for maintaining consistency of its invariants (e.g., every valid sample must have at least one KO).

Attributes

ko_count property
ko_count: int

Returns quantity of KOs associated with the sample.

Returns:

Type Description
int

Number of KOs in the list.

Functions

add_ko
add_ko(ko: KO) -> None

Adds a KO to the sample with duplicate validation.

Parameters:

Name Type Description Default
ko KO

KO to be added.

required
Notes

Duplicate KOs are automatically ignored.

Source code in src/domain/entities/sample.py
def add_ko(self, ko: KO) -> None:
    """
    Adds a KO to the sample with duplicate validation.

    Parameters
    ----------
    ko : KO
        KO to be added.

    Notes
    -----
    Duplicate KOs are automatically ignored.
    """
    if ko not in self.ko_list:
        self.ko_list.append(ko)
        logger.debug(
            "KO added to sample",
            extra={
                "sample_id": str(self.id),
                "ko": str(ko),
                "total_kos": len(self.ko_list),
            },
        )
    else:
        logger.debug(
            "Duplicate KO ignored", extra={"sample_id": str(self.id), "ko": str(ko)}
        )
remove_ko
remove_ko(ko: KO) -> None

Removes a KO from the sample if it exists.

Parameters:

Name Type Description Default
ko KO

KO to be removed.

required
Notes

If KO does not exist in the list, operation is silently ignored.

Source code in src/domain/entities/sample.py
def remove_ko(self, ko: KO) -> None:
    """
    Removes a KO from the sample if it exists.

    Parameters
    ----------
    ko : KO
        KO to be removed.

    Notes
    -----
    If KO does not exist in the list, operation is silently ignored.
    """
    if ko in self.ko_list:
        self.ko_list.remove(ko)
has_ko
has_ko(ko: KO) -> bool

Checks if sample has a specific KO.

Parameters:

Name Type Description Default
ko KO

KO to be checked.

required

Returns:

Type Description
bool

True if KO is present in the sample.

Source code in src/domain/entities/sample.py
def has_ko(self, ko: KO) -> bool:
    """
    Checks if sample has a specific KO.

    Parameters
    ----------
    ko : KO
        KO to be checked.

    Returns
    -------
    bool
        True if KO is present in the sample.
    """
    return ko in self.ko_list
get_unique_kos
get_unique_kos() -> List[KO]

Returns list of unique KOs (without duplicates).

Returns:

Type Description
List[KO]

List of unique KOs.

Notes

In practice, ko_list should not contain duplicates due to add_ko() method, but this method ensures uniqueness.

Source code in src/domain/entities/sample.py
def get_unique_kos(self) -> List[KO]:
    """
    Returns list of unique KOs (without duplicates).

    Returns
    -------
    List[KO]
        List of unique KOs.

    Notes
    -----
    In practice, ko_list should not contain duplicates due to
    add_ko() method, but this method ensures uniqueness.
    """
    return list(set(self.ko_list))
validate
validate() -> None

Validates entity business rules.

Raises:

Type Description
ValueError

If sample does not have at least one KO.

Notes

This validation ensures the business invariant: every processed sample must contain at least one valid KO.

Source code in src/domain/entities/sample.py
def validate(self) -> None:
    """
    Validates entity business rules.

    Raises
    ------
    ValueError
        If sample does not have at least one KO.

    Notes
    -----
    This validation ensures the business invariant: every processed
    sample must contain at least one valid KO.
    """
    if self.ko_count == 0:
        logger.error(
            "Sample validation failed: No KOs", extra={"sample_id": str(self.id)}
        )
        raise ValueError(f"Sample {self.id} must have at least one KO")

    logger.debug(
        "Sample validation successful",
        extra={"sample_id": str(self.id), "ko_count": self.ko_count},
    )
__str__
__str__() -> str

Returns string representation of sample.

Returns:

Type Description
str

String in format "Sample(id) with X KOs".

Source code in src/domain/entities/sample.py
def __str__(self) -> str:
    """
    Returns string representation of sample.

    Returns
    -------
    str
        String in format "Sample(id) with X KOs".
    """
    return f"Sample({self.id}) with {self.ko_count} KOs"
__repr__
__repr__() -> str

Returns debug representation of sample.

Returns:

Type Description
str

Detailed representation.

Source code in src/domain/entities/sample.py
def __repr__(self) -> str:
    """
    Returns debug representation of sample.

    Returns
    -------
    str
        Detailed representation.
    """
    return (
        f"Sample(id={self.id}, "
        f"ko_count={self.ko_count}, "
        f"created_at={self.created_at})"
    )
__eq__
__eq__(other: object) -> bool

Compares samples by identity (SampleId).

Parameters:

Name Type Description Default
other object

Object to be compared.

required

Returns:

Type Description
bool

True if both samples have the same ID.

Source code in src/domain/entities/sample.py
def __eq__(self, other: object) -> bool:
    """
    Compares samples by identity (SampleId).

    Parameters
    ----------
    other : object
        Object to be compared.

    Returns
    -------
    bool
        True if both samples have the same ID.
    """
    if not isinstance(other, Sample):
        return False
    return self.id == other.id
__hash__
__hash__() -> int

Hash based on sample identifier.

Returns:

Type Description
int

Hash of SampleId.

Source code in src/domain/entities/sample.py
def __hash__(self) -> int:
    """
    Hash based on sample identifier.

    Returns
    -------
    int
        Hash of SampleId.
    """
    return hash(self.id)

Dataset Entity

Dataset dataclass

Dataset(samples: List[Sample] = list())

Aggregate - Collection of samples with high-level operations.

Manages a set of biological samples and provides operations for queries and aggregate analysis on these samples.

Parameters:

Name Type Description Default
samples List[Sample]

List of samples in dataset

[]
Notes

Dataset is an Aggregate in DDD context that manages the collection of samples ensuring consistency through validations.

Attributes

total_samples property
total_samples: int

Returns total samples in dataset.

Returns:

Type Description
int

Number of samples.

total_kos property
total_kos: int

Returns total unique KOs in dataset.

Returns:

Type Description
int

Number of unique KOs considering all samples.

Notes

This property iterates through all samples and collects unique KOs using a set to eliminate duplicates.

Functions

add_sample
add_sample(sample: Sample) -> None

Adds validated sample to dataset.

Parameters:

Name Type Description Default
sample Sample

Sample to be added.

required

Raises:

Type Description
ValueError

If sample does not pass validation.

Notes

Sample is validated before being added ensuring only valid samples enter the dataset.

Source code in src/domain/entities/dataset.py
def add_sample(self, sample: Sample) -> None:
    """
    Adds validated sample to dataset.

    Parameters
    ----------
    sample : Sample
        Sample to be added.

    Raises
    ------
    ValueError
        If sample does not pass validation.

    Notes
    -----
    Sample is validated before being added ensuring
    only valid samples enter the dataset.
    """
    try:
        sample.validate()
        self.samples.append(sample)
        logger.debug(
            "Sample added to dataset",
            extra={
                "sample_id": str(sample.id),
                "ko_count": sample.ko_count,
                "total_samples": len(self.samples),
            },
        )
    except ValueError as e:
        logger.error(
            "Failed to add sample to dataset",
            extra={"sample_id": str(sample.id), "error": str(e)},
        )
        raise
remove_sample
remove_sample(sample_id: SampleId) -> bool

Removes sample from dataset by ID.

Parameters:

Name Type Description Default
sample_id SampleId

ID of sample to be removed.

required

Returns:

Type Description
bool

True if sample was removed, False if not found.

Source code in src/domain/entities/dataset.py
def remove_sample(self, sample_id: SampleId) -> bool:
    """
    Removes sample from dataset by ID.

    Parameters
    ----------
    sample_id : SampleId
        ID of sample to be removed.

    Returns
    -------
    bool
        True if sample was removed, False if not found.
    """
    sample = self.get_sample_by_id(sample_id)
    if sample:
        self.samples.remove(sample)
        return True
    return False
get_sample_by_id
get_sample_by_id(sample_id: SampleId) -> Optional[Sample]

Searches for sample by ID.

Parameters:

Name Type Description Default
sample_id SampleId

ID of sample to be searched.

required

Returns:

Type Description
Optional[Sample]

Found sample or None.

Source code in src/domain/entities/dataset.py
def get_sample_by_id(self, sample_id: SampleId) -> Optional[Sample]:
    """
    Searches for sample by ID.

    Parameters
    ----------
    sample_id : SampleId
        ID of sample to be searched.

    Returns
    -------
    Optional[Sample]
        Found sample or None.
    """
    for sample in self.samples:
        if sample.id == sample_id:
            return sample
    return None
get_all_kos
get_all_kos() -> List[KO]

Returns list of all unique KOs from dataset.

Returns:

Type Description
List[KO]

List of unique KOs.

Source code in src/domain/entities/dataset.py
def get_all_kos(self) -> List[KO]:
    """
    Returns list of all unique KOs from dataset.

    Returns
    -------
    List[KO]
        List of unique KOs.
    """
    unique_kos = set()
    for sample in self.samples:
        unique_kos.update(sample.ko_list)
    return list(unique_kos)
get_ko_distribution
get_ko_distribution() -> Dict[KO, int]

Returns KO distribution across samples.

Returns:

Type Description
Dict[KO, int]

Dictionary mapping each KO to the number of samples in which it appears.

Examples:

>>> distribution = dataset.get_ko_distribution()
>>> distribution[KO('K00001')]
5  # KO appears in 5 samples
Source code in src/domain/entities/dataset.py
def get_ko_distribution(self) -> Dict[KO, int]:
    """
    Returns KO distribution across samples.

    Returns
    -------
    Dict[KO, int]
        Dictionary mapping each KO to the number of samples
        in which it appears.

    Examples
    --------
    >>> distribution = dataset.get_ko_distribution()
    >>> distribution[KO('K00001')]
    5  # KO appears in 5 samples
    """
    distribution: Dict[KO, int] = {}
    for sample in self.samples:
        for ko in sample.ko_list:
            distribution[ko] = distribution.get(ko, 0) + 1
    return distribution
get_samples_with_ko
get_samples_with_ko(ko: KO) -> List[Sample]

Returns samples containing a specific KO.

Parameters:

Name Type Description Default
ko KO

KO to be searched.

required

Returns:

Type Description
List[Sample]

List of samples containing the KO.

Source code in src/domain/entities/dataset.py
def get_samples_with_ko(self, ko: KO) -> List[Sample]:
    """
    Returns samples containing a specific KO.

    Parameters
    ----------
    ko : KO
        KO to be searched.

    Returns
    -------
    List[Sample]
        List of samples containing the KO.
    """
    return [sample for sample in self.samples if sample.has_ko(ko)]
to_dict
to_dict() -> Dict[str, List[str]]

Converts dataset to dictionary format.

Returns:

Type Description
Dict[str, List[str]]

Dictionary with format {'sample': [...], 'ko': [...]}.

Notes

This format is useful for later conversion to DataFrame.

Source code in src/domain/entities/dataset.py
def to_dict(self) -> Dict[str, List[str]]:
    """
    Converts dataset to dictionary format.

    Returns
    -------
    Dict[str, List[str]]
        Dictionary with format {'sample': [...], 'ko': [...]}.

    Notes
    -----
    This format is useful for later conversion to DataFrame.
    """
    records = {"sample": [], "ko": []}

    for sample in self.samples:
        for ko in sample.ko_list:
            records["sample"].append(str(sample.id))
            records["ko"].append(str(ko))

    return records
is_empty
is_empty() -> bool

Checks if dataset is empty.

Returns:

Type Description
bool

True if there are no samples.

Source code in src/domain/entities/dataset.py
def is_empty(self) -> bool:
    """
    Checks if dataset is empty.

    Returns
    -------
    bool
        True if there are no samples.
    """
    return self.total_samples == 0
validate
validate() -> None

Validates entire dataset.

Raises:

Type Description
ValueError

If dataset is empty or if any sample is invalid.

Source code in src/domain/entities/dataset.py
def validate(self) -> None:
    """
    Validates entire dataset.

    Raises
    ------
    ValueError
        If dataset is empty or if any sample is invalid.
    """
    if self.is_empty():
        raise ValueError("Dataset cannot be empty")

    for sample in self.samples:
        sample.validate()
__str__
__str__() -> str

Returns string representation of dataset.

Returns:

Type Description
str

Descriptive string.

Source code in src/domain/entities/dataset.py
def __str__(self) -> str:
    """
    Returns string representation of dataset.

    Returns
    -------
    str
        Descriptive string.
    """
    return (
        f"Dataset with {self.total_samples} samples and {self.total_kos} unique KOs"
    )
__repr__
__repr__() -> str

Returns debug representation of dataset.

Returns:

Type Description
str

Detailed representation.

Source code in src/domain/entities/dataset.py
def __repr__(self) -> str:
    """
    Returns debug representation of dataset.

    Returns
    -------
    str
        Detailed representation.
    """
    return f"Dataset(samples={self.total_samples}, unique_kos={self.total_kos})"

MergedData Entity

MergedData dataclass

MergedData(original_dataset: Dataset, biorempp_data: Optional[Dict[str, Any]] = None, kegg_data: Optional[Dict[str, Any]] = None, hadeg_data: Optional[Dict[str, Any]] = None, toxcsm_data: Optional[Dict[str, Any]] = None)

Entity that represents the result of the merge with databases.

This entity is immutable after creation to ensure consistency of the processed data. It contains the original dataset and the results of the merges with each of the 4 system databases.

Parameters:

Name Type Description Default
original_dataset Dataset

Original dataset before the merges

required
biorempp_data Optional[Dict[str, Any]]

Data resulting from the merge with the BioRemPP database

None
kegg_data Optional[Dict[str, Any]]

Data resulting from the merge with the KEGG database

None
hadeg_data Optional[Dict[str, Any]]

Data resulting from the merge with the HADEG database

None
toxcsm_data Optional[Dict[str, Any]]

Data resulting from the merge with the ToxCSM database

None

Raises:

Type Description
ValueError

If validated without a mandatory merge (BioRemPP)

Notes

BioRemPP, KEGG, and HADEG are considered mandatory merges. ToxCSM is optional as it depends on the presence of compounds in the data.

Attributes

is_biorempp_merged property
is_biorempp_merged: bool

Checks if the merge with BioRemPP was executed.

Returns:

Type Description
bool

True if BioRemPP data is present and not empty.

is_kegg_merged property
is_kegg_merged: bool

Checks if the merge with KEGG was executed.

Returns:

Type Description
bool

True if KEGG data is present and not empty.

is_hadeg_merged property
is_hadeg_merged: bool

Checks if the merge with HADEG was executed.

Returns:

Type Description
bool

True if HADEG data is present and not empty.

is_toxcsm_merged property
is_toxcsm_merged: bool

Checks if the merge with ToxCSM was executed.

Returns:

Type Description
bool

True if ToxCSM data is present and not empty.

is_fully_merged property
is_fully_merged: bool

Checks if all mandatory merges were executed.

Returns:

Type Description
bool

True if BioRemPP, KEGG, and HADEG were merged.

Notes

ToxCSM is not considered mandatory as it depends on the presence of compounds in the data.

Functions

get_merge_status
get_merge_status() -> Dict[str, bool]

Returns the status of all merges.

Returns:

Type Description
Dict[str, bool]

Dictionary with the status of each database.

Examples:

>>> status = merged.get_merge_status()
>>> status
{
    'biorempp': True,
    'kegg': True,
    'hadeg': True,
    'toxcsm': False
}
Source code in src/domain/entities/merged_data.py
def get_merge_status(self) -> Dict[str, bool]:
    """
    Returns the status of all merges.

    Returns
    -------
    Dict[str, bool]
        Dictionary with the status of each database.

    Examples
    --------
    >>> status = merged.get_merge_status()
    >>> status
    {
        'biorempp': True,
        'kegg': True,
        'hadeg': True,
        'toxcsm': False
    }
    """
    return {
        "biorempp": self.is_biorempp_merged,
        "kegg": self.is_kegg_merged,
        "hadeg": self.is_hadeg_merged,
        "toxcsm": self.is_toxcsm_merged,
    }
validate
validate() -> None

Validates the merge state.

Raises:

Type Description
ValueError

If the BioRemPP merge was not executed (mandatory).

Notes

Only BioRemPP is validated as it is the fundamental database. KEGG and HADEG may be optional depending on the context of use.

Source code in src/domain/entities/merged_data.py
def validate(self) -> None:
    """
    Validates the merge state.

    Raises
    ------
    ValueError
        If the BioRemPP merge was not executed (mandatory).

    Notes
    -----
    Only BioRemPP is validated as it is the fundamental database.
    KEGG and HADEG may be optional depending on the context of use.
    """
    if not self.is_biorempp_merged:
        raise ValueError("BioRemPP merge is required")
__str__
__str__() -> str

Returns the string representation of the merged data.

Returns:

Type Description
str

Descriptive string of the merge status.

Source code in src/domain/entities/merged_data.py
def __str__(self) -> str:
    """
    Returns the string representation of the merged data.

    Returns
    -------
    str
        Descriptive string of the merge status.
    """
    status = self.get_merge_status()
    merged_count = sum(status.values())
    return f"MergedData ({merged_count}/4 databases merged)"
__repr__
__repr__() -> str

Returns the debug representation of the merged data.

Returns:

Type Description
str

Detailed representation.

Source code in src/domain/entities/merged_data.py
def __repr__(self) -> str:
    """
    Returns the debug representation of the merged data.

    Returns
    -------
    str
        Detailed representation.
    """
    return (
        f"MergedData("
        f"biorempp={self.is_biorempp_merged}, "
        f"kegg={self.is_kegg_merged}, "
        f"hadeg={self.is_hadeg_merged}, "
        f"toxcsm={self.is_toxcsm_merged})"
    )

Analysis Entity

Analysis dataclass

Analysis(id: str, name: str, category: str, status: AnalysisStatus = AnalysisStatus.PENDING, created_at: datetime = datetime.now(), started_at: Optional[datetime] = None, completed_at: Optional[datetime] = None, config: Dict[str, Any] = dict(), result_metadata: Dict[str, Any] = dict(), error_message: Optional[str] = None)

Entity for analysis metadata.

Stores information about an analysis that has been or will be executed, including identification, status, timestamps, and configurations.

Parameters:

Name Type Description Default
id str

Unique identifier for the analysis (e.g., 'UC1_1', 'UC2_3')

required
name str

Descriptive name of the analysis

required
category str

Category of the analysis (e.g., 'heatmaps', 'rankings')

required
status AnalysisStatus

Current status of the analysis

PENDING
created_at datetime

Creation timestamp

datetime.now()
started_at Optional[datetime]

Execution start timestamp

None
completed_at Optional[datetime]

Completion timestamp

None
config Dict[str, Any]

Specific configurations for the analysis

{}
result_metadata Dict[str, Any]

Result metadata

{}
error_message Optional[str]

Error message if it failed

None
Notes

This entity is used for tracking and auditing analyses.

Attributes

duration_seconds property
duration_seconds: Optional[float]

Calculates the analysis duration in seconds.

Returns:

Type Description
Optional[float]

Duration in seconds or None if not finished.

is_completed property
is_completed: bool

Checks if the analysis is completed (success or failure).

Returns:

Type Description
bool

True if status is COMPLETED, FAILED, or CACHED.

is_successful property
is_successful: bool

Checks if the analysis was successful.

Returns:

Type Description
bool

True if status is COMPLETED or CACHED.

Functions

start
start() -> None

Marks the analysis as started.

Notes

Updates status to RUNNING and records the start timestamp.

Source code in src/domain/entities/analysis.py
def start(self) -> None:
    """
    Marks the analysis as started.

    Notes
    -----
    Updates status to RUNNING and records the start timestamp.
    """
    self.status = AnalysisStatus.RUNNING
    self.started_at = datetime.now()
complete
complete(metadata: Optional[Dict[str, Any]] = None) -> None

Marks the analysis as successfully completed.

Parameters:

Name Type Description Default
metadata Optional[Dict[str, Any]]

Result metadata, by default None.

None
Notes

Updates status to COMPLETED and records the completion timestamp.

Source code in src/domain/entities/analysis.py
def complete(self, metadata: Optional[Dict[str, Any]] = None) -> None:
    """
    Marks the analysis as successfully completed.

    Parameters
    ----------
    metadata : Optional[Dict[str, Any]], optional
        Result metadata, by default None.

    Notes
    -----
    Updates status to COMPLETED and records the completion timestamp.
    """
    self.status = AnalysisStatus.COMPLETED
    self.completed_at = datetime.now()
    if metadata:
        self.result_metadata.update(metadata)
fail
fail(error_message: str) -> None

Marks the analysis as failed.

Parameters:

Name Type Description Default
error_message str

Error message describing the failure.

required
Notes

Updates status to FAILED and records the error message.

Source code in src/domain/entities/analysis.py
def fail(self, error_message: str) -> None:
    """
    Marks the analysis as failed.

    Parameters
    ----------
    error_message : str
        Error message describing the failure.

    Notes
    -----
    Updates status to FAILED and records the error message.
    """
    self.status = AnalysisStatus.FAILED
    self.completed_at = datetime.now()
    self.error_message = error_message
mark_from_cache
mark_from_cache() -> None

Marks the analysis as loaded from cache.

Notes

The CACHED status indicates that the result was retrieved from the cache without the need for reprocessing.

Source code in src/domain/entities/analysis.py
def mark_from_cache(self) -> None:
    """
    Marks the analysis as loaded from cache.

    Notes
    -----
    The CACHED status indicates that the result was retrieved from
    the cache without the need for reprocessing.
    """
    self.status = AnalysisStatus.CACHED
    self.completed_at = datetime.now()
validate
validate() -> None

Validates the analysis metadata.

Raises:

Type Description
ValueError

If ID, name, or category are empty.

Source code in src/domain/entities/analysis.py
def validate(self) -> None:
    """
    Validates the analysis metadata.

    Raises
    ------
    ValueError
        If ID, name, or category are empty.
    """
    if not self.id or not self.id.strip():
        raise ValueError("Analysis ID cannot be empty")

    if not self.name or not self.name.strip():
        raise ValueError("Analysis name cannot be empty")

    if not self.category or not self.category.strip():
        raise ValueError("Analysis category cannot be empty")
__str__
__str__() -> str

Returns the string representation of the analysis.

Returns:

Type Description
str

Descriptive string.

Source code in src/domain/entities/analysis.py
def __str__(self) -> str:
    """
    Returns the string representation of the analysis.

    Returns
    -------
    str
        Descriptive string.
    """
    return f"Analysis({self.id}: {self.name}) - {self.status.value}"
__repr__
__repr__() -> str

Returns the debug representation of the analysis.

Returns:

Type Description
str

Detailed representation.

Source code in src/domain/entities/analysis.py
def __repr__(self) -> str:
    """
    Returns the debug representation of the analysis.

    Returns
    -------
    str
        Detailed representation.
    """
    return (
        f"Analysis(id='{self.id}', "
        f"name='{self.name}', "
        f"status={self.status})"
    )