Dataset¶
dataset ¶
Dataset Entity
Represents a collection of biological samples.
Classes¶
Dataset dataclass ¶
Aggregate - Collection of samples with high-level operations.
Manages a set of biological samples and provides operations for queries and aggregate analysis on these samples.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
samples | List[Sample] | List of samples in dataset | [] |
Notes
Dataset is an Aggregate in DDD context that manages the collection of samples ensuring consistency through validations.
Attributes¶
total_samples property ¶
Returns total samples in dataset.
Returns:
| Type | Description |
|---|---|
int | Number of samples. |
total_kos property ¶
Returns total unique KOs in dataset.
Returns:
| Type | Description |
|---|---|
int | Number of unique KOs considering all samples. |
Notes
This property iterates through all samples and collects unique KOs using a set to eliminate duplicates.
Functions¶
add_sample ¶
Adds validated sample to dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sample | Sample | Sample to be added. | required |
Raises:
| Type | Description |
|---|---|
ValueError | If sample does not pass validation. |
Notes
Sample is validated before being added ensuring only valid samples enter the dataset.
Source code in src/domain/entities/dataset.py
remove_sample ¶
Removes sample from dataset by ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sample_id | SampleId | ID of sample to be removed. | required |
Returns:
| Type | Description |
|---|---|
bool | True if sample was removed, False if not found. |
Source code in src/domain/entities/dataset.py
get_sample_by_id ¶
Searches for sample by ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sample_id | SampleId | ID of sample to be searched. | required |
Returns:
| Type | Description |
|---|---|
Optional[Sample] | Found sample or None. |
Source code in src/domain/entities/dataset.py
get_all_kos ¶
Returns list of all unique KOs from dataset.
Returns:
| Type | Description |
|---|---|
List[KO] | List of unique KOs. |
Source code in src/domain/entities/dataset.py
get_ko_distribution ¶
Returns KO distribution across samples.
Returns:
| Type | Description |
|---|---|
Dict[KO, int] | Dictionary mapping each KO to the number of samples in which it appears. |
Examples:
>>> distribution = dataset.get_ko_distribution()
>>> distribution[KO('K00001')]
5 # KO appears in 5 samples
Source code in src/domain/entities/dataset.py
get_samples_with_ko ¶
to_dict ¶
Converts dataset to dictionary format.
Returns:
| Type | Description |
|---|---|
Dict[str, List[str]] | Dictionary with format {'sample': [...], 'ko': [...]}. |
Notes
This format is useful for later conversion to DataFrame.
Source code in src/domain/entities/dataset.py
is_empty ¶
Checks if dataset is empty.
Returns:
| Type | Description |
|---|---|
bool | True if there are no samples. |
validate ¶
Validates entire dataset.
Raises:
| Type | Description |
|---|---|
ValueError | If dataset is empty or if any sample is invalid. |
Source code in src/domain/entities/dataset.py
__str__ ¶
Returns string representation of dataset.
Returns:
| Type | Description |
|---|---|
str | Descriptive string. |
__repr__ ¶
Returns debug representation of dataset.
Returns:
| Type | Description |
|---|---|
str | Detailed representation. |