PCA Strategy¶
pca_strategy ¶
PCA Strategy - Principal Component Analysis Visualization.
This module implements the PCAStrategy for creating PCA scatter plots to visualize sample relationships and clustering patterns based on feature profiles.
Classes:
| Name | Description |
|---|---|
PCAStrategy | Strategy for PCA scatter plot generation. |
Notes
- Uses scikit-learn for PCA computation
- Creates 2D scatter plots (PC1 vs PC2)
- Displays explained variance on axes
- Interactive hover information with Plotly
For supported use cases, refer to the official documentation.
Classes¶
PCAStrategy ¶
Bases: BasePlotStrategy
Strategy for creating PCA scatter plots.
This strategy reduces high-dimensional data to 2D for visualization, preserving as much variance as possible.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config | Dict[str, Any] | Complete configuration from YAML file. | required |
Attributes:
| Name | Type | Description |
|---|---|---|
data_config | Dict[str, Any] | Data processing configuration. |
plotly_config | Dict[str, Any] | Plotly-specific configuration. |
sample_column | str | Column name for samples. |
feature_column | str | Column name for features (KO or Compound). |
n_components | int | Number of principal components (default: 2). |
Methods:
| Name | Description |
|---|---|
validate_data | Validate input data for PCA requirements |
process_data | Process data into presence/absence matrix and apply PCA |
create_figure | Create PCA scatter plot from processed data |
Notes
- Requires minimum 2 samples and 2 features
- Creates binary presence/absence matrix
- Standardizes data before PCA
- Visualizes PC1 vs PC2 with explained variance
Initialize strategy with configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config | Dict[str, Any] | Complete configuration from YAML file. | required |
Source code in src/domain/plot_strategies/charts/pca_strategy.py
Functions¶
validate_data ¶
Validate input data for PCA requirements.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df | DataFrame | Input data to validate. | required |
Raises:
| Type | Description |
|---|---|
ValueError | If DataFrame is empty, required columns missing, or fewer than 2 samples/features found. |
Source code in src/domain/plot_strategies/charts/pca_strategy.py
process_data ¶
Process data into presence/absence matrix and apply PCA.
Creates binary matrix, standardizes features, and applies PCA transformation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df | DataFrame | Input data with sample and feature columns. | required |
Returns:
| Type | Description |
|---|---|
DataFrame | DataFrame with columns: ['Sample', 'PC1', 'PC2'] containing principal component scores. |
Source code in src/domain/plot_strategies/charts/pca_strategy.py
create_figure ¶
Create PCA scatter plot from processed data.
Creates interactive scatter plot with PC1 vs PC2, sample coloring, and explained variance in axis labels.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
processed_df | DataFrame | Processed data with PC1, PC2, and Sample columns. | required |
Returns:
| Type | Description |
|---|---|
Figure | Configured Plotly figure ready for display. |
Source code in src/domain/plot_strategies/charts/pca_strategy.py
248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 | |
apply_filters ¶
Apply filters to data.
This is a common implementation that can be overridden by subclasses if needed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df | DataFrame | Data to filter. | required |
filters | Optional[Dict[str, Any]] | Filter specifications. | None |
Returns:
| Type | Description |
|---|---|
DataFrame | Filtered data. |
Source code in src/domain/plot_strategies/base/base_plot_strategy.py
apply_customizations ¶
Apply custom styling to figure.
This is a hook for future customization features (FLEXIVEL and FLEXIVEL2).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fig | Figure | Base figure. | required |
customizations | Optional[Any] | Customization specifications. | None |
Returns:
| Type | Description |
|---|---|
Figure | Customized figure. |
Source code in src/domain/plot_strategies/base/base_plot_strategy.py
generate_plot ¶
generate_plot(data: DataFrame, filters: Optional[Dict[str, Any]] = None, customizations: Optional[Any] = None) -> go.Figure
Generate complete plot (Template Method).
This method orchestrates the entire plot generation process: 1. Validate input data 2. Process data 3. Apply filters 4. Create figure 5. Apply customizations
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data | DataFrame | Input data. | required |
filters | Optional[Dict[str, Any]] | Filters to apply. | None |
customizations | Optional[Any] | Customizations to apply. | None |
Returns:
| Type | Description |
|---|---|
Figure | Complete Plotly figure. |
Raises:
| Type | Description |
|---|---|
ValueError | If validation fails. |