Domain Entities¶
Domain entities are business objects with unique identity. They are mutable and equality is determined by their ID, not their attributes.
Sample Entity¶
Sample dataclass ¶
Sample(id: SampleId, ko_list: List[KO] = list(), created_at: datetime = datetime.now(), metadata: Dict[str, Any] = dict())
Aggregate Root - Represents a biological sample.
Encapsulates business rules related to samples and their associated KOs. A sample is uniquely identified by its SampleId and contains a list of KOs (KEGG Orthology) that were detected in it.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
id | SampleId | Unique sample identifier | required |
ko_list | List[KO] | List of KOs associated with the sample | [] |
created_at | datetime | Sample creation timestamp | datetime.now() |
metadata | Dict[str, Any] | Additional sample metadata | {} |
Raises:
| Type | Description |
|---|---|
ValueError | If sample is validated without at least one KO |
Notes
This is an Aggregate Root entity in DDD context, responsible for maintaining consistency of its invariants (e.g., every valid sample must have at least one KO).
Attributes¶
ko_count property ¶
Returns quantity of KOs associated with the sample.
Returns:
| Type | Description |
|---|---|
int | Number of KOs in the list. |
Functions¶
add_ko ¶
Adds a KO to the sample with duplicate validation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ko | KO | KO to be added. | required |
Notes
Duplicate KOs are automatically ignored.
Source code in src/domain/entities/sample.py
remove_ko ¶
Removes a KO from the sample if it exists.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ko | KO | KO to be removed. | required |
Notes
If KO does not exist in the list, operation is silently ignored.
Source code in src/domain/entities/sample.py
has_ko ¶
Checks if sample has a specific KO.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ko | KO | KO to be checked. | required |
Returns:
| Type | Description |
|---|---|
bool | True if KO is present in the sample. |
Source code in src/domain/entities/sample.py
get_unique_kos ¶
Returns list of unique KOs (without duplicates).
Returns:
| Type | Description |
|---|---|
List[KO] | List of unique KOs. |
Notes
In practice, ko_list should not contain duplicates due to add_ko() method, but this method ensures uniqueness.
Source code in src/domain/entities/sample.py
validate ¶
Validates entity business rules.
Raises:
| Type | Description |
|---|---|
ValueError | If sample does not have at least one KO. |
Notes
This validation ensures the business invariant: every processed sample must contain at least one valid KO.
Source code in src/domain/entities/sample.py
__str__ ¶
Returns string representation of sample.
Returns:
| Type | Description |
|---|---|
str | String in format "Sample(id) with X KOs". |
__repr__ ¶
Returns debug representation of sample.
Returns:
| Type | Description |
|---|---|
str | Detailed representation. |
Source code in src/domain/entities/sample.py
__eq__ ¶
Compares samples by identity (SampleId).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
other | object | Object to be compared. | required |
Returns:
| Type | Description |
|---|---|
bool | True if both samples have the same ID. |
Source code in src/domain/entities/sample.py
__hash__ ¶
Dataset Entity¶
Dataset dataclass ¶
Aggregate - Collection of samples with high-level operations.
Manages a set of biological samples and provides operations for queries and aggregate analysis on these samples.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
samples | List[Sample] | List of samples in dataset | [] |
Notes
Dataset is an Aggregate in DDD context that manages the collection of samples ensuring consistency through validations.
Attributes¶
total_samples property ¶
Returns total samples in dataset.
Returns:
| Type | Description |
|---|---|
int | Number of samples. |
total_kos property ¶
Returns total unique KOs in dataset.
Returns:
| Type | Description |
|---|---|
int | Number of unique KOs considering all samples. |
Notes
This property iterates through all samples and collects unique KOs using a set to eliminate duplicates.
Functions¶
add_sample ¶
Adds validated sample to dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sample | Sample | Sample to be added. | required |
Raises:
| Type | Description |
|---|---|
ValueError | If sample does not pass validation. |
Notes
Sample is validated before being added ensuring only valid samples enter the dataset.
Source code in src/domain/entities/dataset.py
remove_sample ¶
Removes sample from dataset by ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sample_id | SampleId | ID of sample to be removed. | required |
Returns:
| Type | Description |
|---|---|
bool | True if sample was removed, False if not found. |
Source code in src/domain/entities/dataset.py
get_sample_by_id ¶
Searches for sample by ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sample_id | SampleId | ID of sample to be searched. | required |
Returns:
| Type | Description |
|---|---|
Optional[Sample] | Found sample or None. |
Source code in src/domain/entities/dataset.py
get_all_kos ¶
Returns list of all unique KOs from dataset.
Returns:
| Type | Description |
|---|---|
List[KO] | List of unique KOs. |
Source code in src/domain/entities/dataset.py
get_ko_distribution ¶
Returns KO distribution across samples.
Returns:
| Type | Description |
|---|---|
Dict[KO, int] | Dictionary mapping each KO to the number of samples in which it appears. |
Examples:
>>> distribution = dataset.get_ko_distribution()
>>> distribution[KO('K00001')]
5 # KO appears in 5 samples
Source code in src/domain/entities/dataset.py
get_samples_with_ko ¶
to_dict ¶
Converts dataset to dictionary format.
Returns:
| Type | Description |
|---|---|
Dict[str, List[str]] | Dictionary with format {'sample': [...], 'ko': [...]}. |
Notes
This format is useful for later conversion to DataFrame.
Source code in src/domain/entities/dataset.py
is_empty ¶
Checks if dataset is empty.
Returns:
| Type | Description |
|---|---|
bool | True if there are no samples. |
validate ¶
Validates entire dataset.
Raises:
| Type | Description |
|---|---|
ValueError | If dataset is empty or if any sample is invalid. |
Source code in src/domain/entities/dataset.py
__str__ ¶
Returns string representation of dataset.
Returns:
| Type | Description |
|---|---|
str | Descriptive string. |
__repr__ ¶
Returns debug representation of dataset.
Returns:
| Type | Description |
|---|---|
str | Detailed representation. |
MergedData Entity¶
MergedData dataclass ¶
MergedData(original_dataset: Dataset, biorempp_data: Optional[Dict[str, Any]] = None, kegg_data: Optional[Dict[str, Any]] = None, hadeg_data: Optional[Dict[str, Any]] = None, toxcsm_data: Optional[Dict[str, Any]] = None)
Entity that represents the result of the merge with databases.
This entity is immutable after creation to ensure consistency of the processed data. It contains the original dataset and the results of the merges with each of the 4 system databases.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
original_dataset | Dataset | Original dataset before the merges | required |
biorempp_data | Optional[Dict[str, Any]] | Data resulting from the merge with the BioRemPP database | None |
kegg_data | Optional[Dict[str, Any]] | Data resulting from the merge with the KEGG database | None |
hadeg_data | Optional[Dict[str, Any]] | Data resulting from the merge with the HADEG database | None |
toxcsm_data | Optional[Dict[str, Any]] | Data resulting from the merge with the ToxCSM database | None |
Raises:
| Type | Description |
|---|---|
ValueError | If validated without a mandatory merge (BioRemPP) |
Notes
BioRemPP, KEGG, and HADEG are considered mandatory merges. ToxCSM is optional as it depends on the presence of compounds in the data.
Attributes¶
is_biorempp_merged property ¶
Checks if the merge with BioRemPP was executed.
Returns:
| Type | Description |
|---|---|
bool | True if BioRemPP data is present and not empty. |
is_kegg_merged property ¶
Checks if the merge with KEGG was executed.
Returns:
| Type | Description |
|---|---|
bool | True if KEGG data is present and not empty. |
is_hadeg_merged property ¶
Checks if the merge with HADEG was executed.
Returns:
| Type | Description |
|---|---|
bool | True if HADEG data is present and not empty. |
is_toxcsm_merged property ¶
Checks if the merge with ToxCSM was executed.
Returns:
| Type | Description |
|---|---|
bool | True if ToxCSM data is present and not empty. |
is_fully_merged property ¶
Checks if all mandatory merges were executed.
Returns:
| Type | Description |
|---|---|
bool | True if BioRemPP, KEGG, and HADEG were merged. |
Notes
ToxCSM is not considered mandatory as it depends on the presence of compounds in the data.
Functions¶
get_merge_status ¶
Returns the status of all merges.
Returns:
| Type | Description |
|---|---|
Dict[str, bool] | Dictionary with the status of each database. |
Examples:
>>> status = merged.get_merge_status()
>>> status
{
'biorempp': True,
'kegg': True,
'hadeg': True,
'toxcsm': False
}
Source code in src/domain/entities/merged_data.py
validate ¶
Validates the merge state.
Raises:
| Type | Description |
|---|---|
ValueError | If the BioRemPP merge was not executed (mandatory). |
Notes
Only BioRemPP is validated as it is the fundamental database. KEGG and HADEG may be optional depending on the context of use.
Source code in src/domain/entities/merged_data.py
__str__ ¶
Returns the string representation of the merged data.
Returns:
| Type | Description |
|---|---|
str | Descriptive string of the merge status. |
Source code in src/domain/entities/merged_data.py
__repr__ ¶
Returns the debug representation of the merged data.
Returns:
| Type | Description |
|---|---|
str | Detailed representation. |
Source code in src/domain/entities/merged_data.py
Analysis Entity¶
Analysis dataclass ¶
Analysis(id: str, name: str, category: str, status: AnalysisStatus = AnalysisStatus.PENDING, created_at: datetime = datetime.now(), started_at: Optional[datetime] = None, completed_at: Optional[datetime] = None, config: Dict[str, Any] = dict(), result_metadata: Dict[str, Any] = dict(), error_message: Optional[str] = None)
Entity for analysis metadata.
Stores information about an analysis that has been or will be executed, including identification, status, timestamps, and configurations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
id | str | Unique identifier for the analysis (e.g., 'UC1_1', 'UC2_3') | required |
name | str | Descriptive name of the analysis | required |
category | str | Category of the analysis (e.g., 'heatmaps', 'rankings') | required |
status | AnalysisStatus | Current status of the analysis | PENDING |
created_at | datetime | Creation timestamp | datetime.now() |
started_at | Optional[datetime] | Execution start timestamp | None |
completed_at | Optional[datetime] | Completion timestamp | None |
config | Dict[str, Any] | Specific configurations for the analysis | {} |
result_metadata | Dict[str, Any] | Result metadata | {} |
error_message | Optional[str] | Error message if it failed | None |
Notes
This entity is used for tracking and auditing analyses.
Attributes¶
duration_seconds property ¶
Calculates the analysis duration in seconds.
Returns:
| Type | Description |
|---|---|
Optional[float] | Duration in seconds or None if not finished. |
is_completed property ¶
Checks if the analysis is completed (success or failure).
Returns:
| Type | Description |
|---|---|
bool | True if status is COMPLETED, FAILED, or CACHED. |
is_successful property ¶
Checks if the analysis was successful.
Returns:
| Type | Description |
|---|---|
bool | True if status is COMPLETED or CACHED. |
Functions¶
start ¶
Marks the analysis as started.
Notes
Updates status to RUNNING and records the start timestamp.
complete ¶
Marks the analysis as successfully completed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metadata | Optional[Dict[str, Any]] | Result metadata, by default None. | None |
Notes
Updates status to COMPLETED and records the completion timestamp.
Source code in src/domain/entities/analysis.py
fail ¶
Marks the analysis as failed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
error_message | str | Error message describing the failure. | required |
Notes
Updates status to FAILED and records the error message.
Source code in src/domain/entities/analysis.py
mark_from_cache ¶
Marks the analysis as loaded from cache.
Notes
The CACHED status indicates that the result was retrieved from the cache without the need for reprocessing.
Source code in src/domain/entities/analysis.py
validate ¶
Validates the analysis metadata.
Raises:
| Type | Description |
|---|---|
ValueError | If ID, name, or category are empty. |
Source code in src/domain/entities/analysis.py
__str__ ¶
Returns the string representation of the analysis.
Returns:
| Type | Description |
|---|---|
str | Descriptive string. |
__repr__ ¶
Returns the debug representation of the analysis.
Returns:
| Type | Description |
|---|---|
str | Detailed representation. |
Source code in src/domain/entities/analysis.py
Related Documentation¶
- Value Objects - Immutable domain concepts
- Domain Services - Business logic orchestration