Skip to content

Box Scatter Strategy

box_scatter_strategy

Box Scatter Strategy.

Strategy for creating combined box plot with jittered scatter plot visualizations using Plotly. Ideal for showing statistical distributions while preserving visibility of individual data points.

Classes:

Name Description
BoxScatterStrategy

Strategy combining box plot with jittered scatter overlay

Notes

This strategy is particularly useful for: - Comparing distributions across categories - Identifying outliers while seeing all data points - Visualizing statistical summaries with raw data overlay

For supported use cases, refer to the official documentation.

Classes

BoxScatterStrategy

BoxScatterStrategy(config: Dict[str, Any])

Bases: BasePlotStrategy

Strategy for creating box plot with jittered scatter overlay.

This strategy combines two visualization types: 1. Box plot: Shows statistical distribution (median, IQR, outliers) 2. Scatter plot: Shows individual data points with horizontal jitter

The combination provides both statistical summary and granular detail, making it ideal for distribution analysis with moderate sample sizes.

Parameters:

Name Type Description Default
config Dict[str, Any]

Complete configuration dictionary from YAML (must contain 'visualization' section with plot parameters)

required

Attributes:

Name Type Description
config Dict[str, Any]

Stored configuration dictionary

Notes

Configuration Structure (YAML): visualization: strategy: "BoxScatterStrategy" plotly: box: y: "unique_ko_count" marker: color: "#198754" scatter: y: "unique_ko_count" mode: "markers" marker: color: "rgba(0,0,0,0.5)" size: 8 jitter: 0.01 hovertemplate: "%{customdata[0]}
..." customdata_columns: ["sample", "rank"] layout: yaxis: title: "Unique KO Count"

Refer to the official documentation for supported use cases and detailed configuration examples.

Initialize BoxScatterStrategy with configuration.

Parameters:

Name Type Description Default
config Dict[str, Any]

Complete configuration from YAML file

required
Source code in src/domain/plot_strategies/charts/box_scatter_strategy.py
def __init__(self, config: Dict[str, Any]):
    """
    Initialize BoxScatterStrategy with configuration.

    Parameters
    ----------
    config : Dict[str, Any]
        Complete configuration from YAML file
    """
    super().__init__(config)
    self.data_config = config.get("data", {})
    self.plotly_config = self.viz_config.get("plotly", {})
    logger.info("BoxScatterStrategy initialized")
Functions
validate_data
validate_data(df: DataFrame) -> None

Validate input data for box scatter plot requirements.

Parameters:

Name Type Description Default
df DataFrame

Input data to validate (already aggregated by callback)

required

Raises:

Type Description
ValueError

If validation fails

Notes

Expected columns in aggregated data: - 'sample': Sample identifier - 'unique_ko_count': Aggregated count of unique KOs - 'rank': Ranking within database

Source code in src/domain/plot_strategies/charts/box_scatter_strategy.py
def validate_data(self, df: pd.DataFrame) -> None:
    """
    Validate input data for box scatter plot requirements.

    Parameters
    ----------
    df : pd.DataFrame
        Input data to validate (already aggregated by callback)

    Raises
    ------
    ValueError
        If validation fails

    Notes
    -----
    Expected columns in aggregated data:
    - 'sample': Sample identifier
    - 'unique_ko_count': Aggregated count of unique KOs
    - 'rank': Ranking within database
    """
    logger.debug("Starting data validation for BoxScatterStrategy")

    # Check if DataFrame is empty
    if df.empty:
        raise ValueError("DataFrame is empty")

    # Get required columns from config
    required_cols = self.data_config.get("required_columns", [])

    # If no required columns specified, use default for aggregated data
    if not required_cols:
        required_cols = ["sample", "unique_ko_count", "rank"]
        logger.debug(
            "No required columns in config, using defaults: " f"{required_cols}"
        )

    # Validate required columns exist
    missing_cols = set(required_cols) - set(df.columns)
    if missing_cols:
        raise ValueError(
            f"Missing required columns: {missing_cols}. "
            f"Available: {df.columns.tolist()}"
        )

    logger.info(f"Data validation passed: {len(df)} rows")
process_data
process_data(df: DataFrame) -> pd.DataFrame

Process data for box scatter visualization.

For BoxScatterStrategy, data is expected to already be aggregated by the callback (grouped by sample with unique KO counts and ranks). This method performs minimal processing - just ensures clean copy.

Parameters:

Name Type Description Default
df DataFrame

Input data (already aggregated by callback) Expected columns: ['sample', 'unique_ko_count', 'rank']

required

Returns:

Type Description
DataFrame

Processed data ready for visualization (unchanged copy)

Notes

The aggregation pipeline is handled in the callback: 1. Extract raw data from store (Sample, KO columns) 2. Clean data (remove empty, duplicates) 3. Aggregate: groupby('sample')['ko'].nunique() 4. Calculate rank 5. Sort by unique_ko_count descending

Strategy receives final aggregated data.

Source code in src/domain/plot_strategies/charts/box_scatter_strategy.py
def process_data(self, df: pd.DataFrame) -> pd.DataFrame:
    """
    Process data for box scatter visualization.

    For BoxScatterStrategy, data is expected to already be aggregated
    by the callback (grouped by sample with unique KO counts and ranks).
    This method performs minimal processing - just ensures clean copy.

    Parameters
    ----------
    df : pd.DataFrame
        Input data (already aggregated by callback)
        Expected columns: ['sample', 'unique_ko_count', 'rank']

    Returns
    -------
    pd.DataFrame
        Processed data ready for visualization (unchanged copy)

    Notes
    -----
    The aggregation pipeline is handled in the callback:
    1. Extract raw data from store (Sample, KO columns)
    2. Clean data (remove empty, duplicates)
    3. Aggregate: groupby('sample')['ko'].nunique()
    4. Calculate rank
    5. Sort by unique_ko_count descending

    Strategy receives final aggregated data.
    """
    logger.debug(f"Processing data: {len(df)} rows")

    # Data is already processed by callback
    # Just ensure it's a clean copy
    processed = df.copy()

    logger.info(
        f"Data processing completed: {len(processed)} rows "
        f"(pre-aggregated by callback)"
    )
    return processed
create_figure
create_figure(df: DataFrame) -> go.Figure

Create box scatter figure from processed data.

Parameters:

Name Type Description Default
df DataFrame

Processed data ready for visualization

required

Returns:

Type Description
Figure

Plotly figure with box plot and scatter overlay

Source code in src/domain/plot_strategies/charts/box_scatter_strategy.py
def create_figure(self, df: pd.DataFrame) -> go.Figure:
    """
    Create box scatter figure from processed data.

    Parameters
    ----------
    df : pd.DataFrame
        Processed data ready for visualization

    Returns
    -------
    go.Figure
        Plotly figure with box plot and scatter overlay
    """
    return self.generate(df)
generate
generate(data: DataFrame) -> go.Figure

Generate box plot with jittered scatter overlay.

Parameters:

Name Type Description Default
data DataFrame

Processed data ready for visualization (must contain column specified in plotly.scatter.y)

required

Returns:

Type Description
Figure

Plotly figure with box plot and scatter overlay

Raises:

Type Description
ValueError

If data is empty or required columns missing

Source code in src/domain/plot_strategies/charts/box_scatter_strategy.py
def generate(self, data: pd.DataFrame) -> go.Figure:
    """
    Generate box plot with jittered scatter overlay.

    Parameters
    ----------
    data : pd.DataFrame
        Processed data ready for visualization (must contain column
        specified in plotly.scatter.y)

    Returns
    -------
    go.Figure
        Plotly figure with box plot and scatter overlay

    Raises
    ------
    ValueError
        If data is empty or required columns missing
    """
    logger.info("Generating box scatter plot", extra={"rows": len(data)})

    # Validate data
    if data.empty:
        raise ValueError("Cannot create plot: DataFrame is empty")

    # Extract configuration
    box_config = self.plotly_config.get("box", {})
    scatter_config = self.plotly_config.get("scatter", {})
    layout_config = self.plotly_config.get("layout", {})

    # Get y-column from scatter config
    y_col = scatter_config.get("y")
    if not y_col:
        raise ValueError("Configuration missing 'visualization.plotly.scatter.y'")

    if y_col not in data.columns:
        raise ValueError(f"Column '{y_col}' not found in data")

    # Create figure
    fig = go.Figure()

    # Add box plot trace
    fig = self._add_box_trace(fig, data, y_col, box_config)

    # Add scatter trace with jitter
    fig = self._add_scatter_trace(fig, data, y_col, scatter_config)

    # Apply layout
    fig = self._apply_layout(fig, layout_config)

    logger.info(
        "Box scatter plot generated successfully", extra={"traces": len(fig.data)}
    )

    return fig
apply_filters
apply_filters(df: DataFrame, filters: Optional[Dict[str, Any]] = None) -> pd.DataFrame

Apply filters to data.

This is a common implementation that can be overridden by subclasses if needed.

Parameters:

Name Type Description Default
df DataFrame

Data to filter.

required
filters Optional[Dict[str, Any]]

Filter specifications.

None

Returns:

Type Description
DataFrame

Filtered data.

Source code in src/domain/plot_strategies/base/base_plot_strategy.py
def apply_filters(
    self, df: pd.DataFrame, filters: Optional[Dict[str, Any]] = None
) -> pd.DataFrame:
    """
    Apply filters to data.

    This is a common implementation that can be overridden
    by subclasses if needed.

    Parameters
    ----------
    df : pd.DataFrame
        Data to filter.
    filters : Optional[Dict[str, Any]], default=None
        Filter specifications.

    Returns
    -------
    pd.DataFrame
        Filtered data.
    """
    import logging

    logger = logging.getLogger(__name__)

    if not filters:
        logger.debug("No filters provided, returning original data")
        return df

    logger.info(
        f"Applying filters - Input shape: {df.shape}, "
        f"Columns: {df.columns.tolist()}"
    )
    logger.info(f"Filters to apply: {filters}")

    filtered_df = df.copy()

    # Get filter configurations
    filter_configs = self.config.get("filters", [])

    for filter_config in filter_configs:
        filter_id = filter_config.get("filter_id")
        filter_type = filter_config.get("type")

        if filter_id not in filters:
            continue

        filter_value = filters[filter_id]
        data_binding = filter_config.get("data_binding", {})
        column = data_binding.get("column")

        if not column or column not in filtered_df.columns:
            logger.warning(
                f"Filter '{filter_id}': Column '{column}' not found. "
                f"Available: {filtered_df.columns.tolist()}"
            )
            continue

        # Apply range filter
        if filter_type == "range" and isinstance(filter_value, list):
            min_val, max_val = filter_value
            logger.info(
                f"Applying range filter on '{column}': " f"[{min_val}, {max_val}]"
            )
            filtered_df = filtered_df[
                (filtered_df[column] >= min_val) & (filtered_df[column] <= max_val)
            ]
            logger.info(f"After filter: {len(filtered_df)} rows remaining")

    logger.info(f"Final filtered shape: {filtered_df.shape}")
    return filtered_df
apply_customizations
apply_customizations(fig: Figure, customizations: Optional[Any] = None) -> go.Figure

Apply custom styling to figure.

This is a hook for future customization features (FLEXIVEL and FLEXIVEL2).

Parameters:

Name Type Description Default
fig Figure

Base figure.

required
customizations Optional[Any]

Customization specifications.

None

Returns:

Type Description
Figure

Customized figure.

Source code in src/domain/plot_strategies/base/base_plot_strategy.py
def apply_customizations(
    self, fig: go.Figure, customizations: Optional[Any] = None
) -> go.Figure:
    """
    Apply custom styling to figure.

    This is a hook for future customization features
    (FLEXIVEL and FLEXIVEL2).

    Parameters
    ----------
    fig : go.Figure
        Base figure.
    customizations : Optional[Any], default=None
        Customization specifications.

    Returns
    -------
    go.Figure
        Customized figure.
    """
    # Hook for future implementation
    return fig
generate_plot
generate_plot(data: DataFrame, filters: Optional[Dict[str, Any]] = None, customizations: Optional[Any] = None) -> go.Figure

Generate complete plot (Template Method).

This method orchestrates the entire plot generation process: 1. Validate input data 2. Process data 3. Apply filters 4. Create figure 5. Apply customizations

Parameters:

Name Type Description Default
data DataFrame

Input data.

required
filters Optional[Dict[str, Any]]

Filters to apply.

None
customizations Optional[Any]

Customizations to apply.

None

Returns:

Type Description
Figure

Complete Plotly figure.

Raises:

Type Description
ValueError

If validation fails.

Source code in src/domain/plot_strategies/base/base_plot_strategy.py
def generate_plot(
    self,
    data: pd.DataFrame,
    filters: Optional[Dict[str, Any]] = None,
    customizations: Optional[Any] = None,
) -> go.Figure:
    """
    Generate complete plot (Template Method).

    This method orchestrates the entire plot generation process:
    1. Validate input data
    2. Process data
    3. Apply filters
    4. Create figure
    5. Apply customizations

    Parameters
    ----------
    data : pd.DataFrame
        Input data.
    filters : Optional[Dict[str, Any]], default=None
        Filters to apply.
    customizations : Optional[Any], default=None
        Customizations to apply.

    Returns
    -------
    go.Figure
        Complete Plotly figure.

    Raises
    ------
    ValueError
        If validation fails.
    """
    # 1. Validate
    self.validate_data(data)

    # 2. Process
    processed_df = self.process_data(data)

    # 3. Filter
    filtered_df = self.apply_filters(processed_df, filters)

    # 4. Create figure
    figure = self.create_figure(filtered_df)

    # 5. Apply customizations (hook for future)
    figure = self.apply_customizations(figure, customizations)

    return figure

Functions