Merge Service¶
merge_service ¶
Merge Service.
Domain service to orchestrate merges with databases.
Classes:
| Name | Description |
|---|---|
DatabaseRepository | Protocol defining repository interface |
MergeService | Service coordinating merges with 4 databases |
Classes¶
DatabaseRepository ¶
Bases: Protocol
Protocol (interface) for database repositories.
Defines the contract that all database repositories must implement.
Notes
This is a Python Protocol (PEP 544) that allows duck typing while maintaining type safety. Concrete implementations will be in the Infrastructure layer.
MergeService ¶
MergeService(biorempp_repo: DatabaseRepository, kegg_repo: DatabaseRepository, hadeg_repo: DatabaseRepository, toxcsm_repo: DatabaseRepository)
Domain service to orchestrate merges with databases.
Coordinates the process of merging the input dataset with the 4 system databases: BioRemPP, KEGG, HADEG, and ToxCSM.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
biorempp_repo | DatabaseRepository | Repository for the BioRemPP database | required |
kegg_repo | DatabaseRepository | Repository for the KEGG database | required |
hadeg_repo | DatabaseRepository | Repository for the HADEG database | required |
toxcsm_repo | DatabaseRepository | Repository for the ToxCSM database | required |
Notes
This service depends on repositories that will be injected, following the Dependency Inversion Principle (SOLID).
Initialize the service with the necessary repositories.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
biorempp_repo | DatabaseRepository | BioRemPP repository | required |
kegg_repo | DatabaseRepository | KEGG repository | required |
hadeg_repo | DatabaseRepository | HADEG repository | required |
toxcsm_repo | DatabaseRepository | ToxCSM repository | required |
Source code in src/domain/services/merge_service.py
Functions¶
merge_all ¶
Execute all merges sequentially.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset | Dataset | Input dataset with samples and KOs | required |
Returns:
| Type | Description |
|---|---|
MergedData | Entity with all merge results |
Raises:
| Type | Description |
|---|---|
ValueError | If any mandatory merge fails |
Notes
The process follows this order: 1. Merge with BioRemPP (mandatory) 2. Merge with KEGG (mandatory) 3. Merge with HADEG (mandatory) 4. Merge with ToxCSM (optional, depends on compounds)
Source code in src/domain/services/merge_service.py
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 | |
merge_biorempp ¶
Execute only the merge with BioRemPP.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset | Dataset | Input dataset | required |
Returns:
| Type | Description |
|---|---|
Dict[str, Any] | Data merged with BioRemPP |
Notes
Useful for partial or incremental processing.
Source code in src/domain/services/merge_service.py
get_merge_statistics ¶
Calculate statistics about the merges performed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
merged_data | MergedData | Merged data | required |
Returns:
| Type | Description |
|---|---|
Dict[str, Any] | Merge statistics |