Skip to content

Frozenset Strategy

frozenset_strategy

Frozenset Strategy - Minimal Sample-Group Visualization.

This module implements the FrozensetStrategy for creating grouped scatter plots that visualize samples organized by their compound profiles (frozenset) with set cover minimization to reduce redundancy.

Classes:

Name Description
FrozensetStrategy

Strategy for frozenset-based sample grouping visualization.

Notes
  • Groups samples by compound profile (frozenset)
  • Applies greedy set cover algorithm for minimization
  • Color-codes markers by unique KO count per compound

For supported use cases, refer to the official documentation.

Classes

FrozensetStrategy

FrozensetStrategy(config: Dict[str, Any])

Bases: BasePlotStrategy

Strategy for minimal sample-group visualization using frozensets.

This strategy creates grouped scatter plots showing samples organized by their compound profiles (frozenset), with set cover minimization to reduce redundancy and maximize compound coverage.

Parameters:

Name Type Description Default
config Dict[str, Any]

Complete configuration from YAML file.

required

Attributes:

Name Type Description
data_config Dict[str, Any]

Data processing configuration.

plotly_config Dict[str, Any]

Plotly-specific configuration.

sample_column str

Column name for sample identifiers.

compound_column str

Column name for compound identifiers.

compoundclass_column str

Column name for compound class filtering.

ko_column str

Column name for KO identifiers (for color scaling).

color_scale str

Plotly color scale for markers.

marker_size int

Size of scatter markers.

Methods:

Name Description
validate_data

Validate input data for frozenset visualization requirements

process_data

Process data with filtering, grouping, and set cover minimization

create_figure

Create frozenset visualization figure from processed data

apply_filters

Apply filters including compound class selection

get_available_compound_classes

Get list of available compound classes

get_group_statistics

Calculate statistics for visualization

Notes
  • Applies greedy set cover algorithm for group minimization
  • Color-codes by unique KO count per compound
  • Supports compound class filtering

Initialize strategy with configuration.

Parameters:

Name Type Description Default
config Dict[str, Any]

Complete configuration from YAML file.

required
Source code in src/domain/plot_strategies/charts/frozenset_strategy.py
def __init__(self, config: Dict[str, Any]):
    """
    Initialize strategy with configuration.

    Parameters
    ----------
    config : Dict[str, Any]
        Complete configuration from YAML file.
    """
    super().__init__(config)
    self.data_config = config.get("data", {})
    self.plotly_config = self.viz_config.get("plotly", {})

    # Column configuration
    self.sample_column: str = self.plotly_config.get("sample_column", "sample")
    self.compound_column: str = self.plotly_config.get(
        "compound_column", "compoundname"
    )
    self.compoundclass_column: str = self.plotly_config.get(
        "compoundclass_column", "compoundclass"
    )
    self.ko_column: str = self.plotly_config.get("ko_column", "ko")

    # Visual configuration
    self.color_scale: str = self.plotly_config.get(
        "color_scale", DEFAULT_COLOR_SCALE
    )
    self.marker_size: int = self.plotly_config.get(
        "marker_size", DEFAULT_MARKER_SIZE
    )

    # Selected compound class (set via filters)
    self._selected_compoundclass: Optional[str] = None

    # Internal state for processed data
    self._grouped_df: Optional[pd.DataFrame] = None
    self._minimized_groups: List[str] = []
    self._ko_counts: Optional[pd.Series] = None

    logger.info(
        f"FrozensetStrategy initialized for "
        f"{self.metadata.get('use_case_id', 'unknown')}: "
        f"sample='{self.sample_column}', "
        f"compound='{self.compound_column}', "
        f"compoundclass='{self.compoundclass_column}'"
    )
Functions
validate_data
validate_data(df: DataFrame) -> None

Validate input data for frozenset visualization requirements.

Parameters:

Name Type Description Default
df DataFrame

Input data to validate.

required

Raises:

Type Description
ValueError

If DataFrame is empty or required columns missing.

Source code in src/domain/plot_strategies/charts/frozenset_strategy.py
def validate_data(self, df: pd.DataFrame) -> None:
    """
    Validate input data for frozenset visualization requirements.

    Parameters
    ----------
    df : pd.DataFrame
        Input data to validate.

    Raises
    ------
    ValueError
        If DataFrame is empty or required columns missing.
    """
    logger.debug(
        f"Validating data - Shape: {df.shape}, " f"Columns: {df.columns.tolist()}"
    )

    if df.empty:
        raise ValueError("Input DataFrame is empty")

    # Validate required columns exist
    required_cols = [
        self.sample_column,
        self.compound_column,
        self.compoundclass_column,
    ]

    missing_cols = [c for c in required_cols if c not in df.columns]
    if missing_cols:
        raise ValueError(
            f"Missing required columns: {missing_cols}. "
            f"Available: {df.columns.tolist()}"
        )

    # KO column is optional but recommended for color scaling
    if self.ko_column and self.ko_column not in df.columns:
        logger.warning(
            f"KO column '{self.ko_column}' not found. "
            f"Color scaling will use default values."
        )

    logger.info(f"Data validation passed - {len(df)} records")
process_data
process_data(df: DataFrame) -> pd.DataFrame

Process data for frozenset visualization.

Applies filtering by compound class, groups samples by compound profile (frozenset), applies set cover minimization, and calculates KO counts.

Parameters:

Name Type Description Default
df DataFrame

Input data with required columns.

required

Returns:

Type Description
DataFrame

Processed data with group labels and KO counts.

Source code in src/domain/plot_strategies/charts/frozenset_strategy.py
def process_data(self, df: pd.DataFrame) -> pd.DataFrame:
    """
    Process data for frozenset visualization.

    Applies filtering by compound class, groups samples by compound profile
    (frozenset), applies set cover minimization, and calculates KO counts.

    Parameters
    ----------
    df : pd.DataFrame
        Input data with required columns.

    Returns
    -------
    pd.DataFrame
        Processed data with group labels and KO counts.
    """
    logger.info("Processing frozenset data...")

    # Clean data
    df_clean = df.dropna(
        subset=[self.sample_column, self.compound_column, self.compoundclass_column]
    ).copy()

    if df_clean.empty:
        raise ValueError("No valid data after removing nulls")

    # Get compound classes available
    compound_classes = df_clean[self.compoundclass_column].unique().tolist()
    logger.info(f"Available compound classes: {len(compound_classes)}")

    # If no compound class selected, use first one
    if not self._selected_compoundclass:
        self._selected_compoundclass = compound_classes[0]
        logger.info(
            f"No compound class selected, using: "
            f"'{self._selected_compoundclass}'"
        )

    # Filter by selected compound class
    df_filtered = df_clean[
        df_clean[self.compoundclass_column] == self._selected_compoundclass
    ].copy()

    if df_filtered.empty:
        raise ValueError(
            f"No data for compound class: {self._selected_compoundclass}"
        )

    logger.info(
        f"Filtered to {len(df_filtered)} rows for class "
        f"'{self._selected_compoundclass}'"
    )

    # Step 1: Group samples by compound profile (frozenset)
    self._grouped_df = self._group_by_compound_profile(df_filtered)

    if self._grouped_df.empty:
        raise ValueError("No groups found after grouping by compound profile")

    # Step 2: Minimize groups using set cover
    self._minimized_groups = self._minimize_groups(self._grouped_df)

    if not self._minimized_groups:
        raise ValueError("No minimized groups found")

    # Filter to minimized groups only
    final_df = self._grouped_df[
        self._grouped_df["_group"].isin(self._minimized_groups)
    ].copy()

    # Step 3: Calculate KO counts per compound (for color)
    self._ko_counts = self._calculate_ko_counts(df_clean, df_filtered)

    # Merge KO counts into final dataframe
    if self._ko_counts is not None:
        final_df = final_df.merge(
            self._ko_counts,
            left_on=self.compound_column,
            right_index=True,
            how="left",
        )
        final_df["_unique_ko_count"] = (
            final_df["_unique_ko_count"].fillna(0).astype(int)
        )
    else:
        final_df["_unique_ko_count"] = 1

    logger.info(
        f"Processing complete: {len(final_df)} rows, "
        f"{len(self._minimized_groups)} minimized groups"
    )

    return final_df
create_figure
create_figure(processed_df: DataFrame) -> go.Figure

Create frozenset visualization figure from processed data.

Creates subplots with one scatter plot per minimized group, with markers color-coded by unique KO count.

Parameters:

Name Type Description Default
processed_df DataFrame

Processed data with group labels and KO counts.

required

Returns:

Type Description
Figure

Configured Plotly figure with subplots.

Source code in src/domain/plot_strategies/charts/frozenset_strategy.py
def create_figure(self, processed_df: pd.DataFrame) -> go.Figure:
    """
    Create frozenset visualization figure from processed data.

    Creates subplots with one scatter plot per minimized group, with
    markers color-coded by unique KO count.

    Parameters
    ----------
    processed_df : pd.DataFrame
        Processed data with group labels and KO counts.

    Returns
    -------
    go.Figure
        Configured Plotly figure with subplots.
    """
    logger.debug("Creating frozenset visualization figure...")

    chart_config = self.plotly_config.get("chart", {})
    layout_config = self.plotly_config.get("layout", {})

    # Get unique groups (sorted)
    unique_groups = sorted(processed_df["_group"].unique())
    n_groups = len(unique_groups)

    if n_groups == 0:
        raise ValueError("No groups to visualize")

    # Get horizontal spacing
    horizontal_spacing = chart_config.get("horizontal_spacing", 0.03)

    # Create subplots
    fig = make_subplots(
        rows=1,
        cols=n_groups,
        shared_yaxes=True,
        subplot_titles=unique_groups,
        horizontal_spacing=horizontal_spacing,
    )

    # Get color range
    cmin = int(processed_df["_unique_ko_count"].min())
    cmax = int(processed_df["_unique_ko_count"].max())
    if cmin == cmax and cmax == 0:
        cmax = 1  # Avoid colorbar range [0, 0]

    # Get colorbar configuration
    colorbar_title = chart_config.get("colorbar_title", "Unique<br>KO Count")

    # Add traces for each group
    for i, group in enumerate(unique_groups, 1):
        group_df = processed_df[processed_df["_group"] == group]

        fig.add_trace(
            go.Scatter(
                x=group_df[self.sample_column],
                y=group_df[self.compound_column],
                mode="markers",
                name=str(group),
                showlegend=False,
                marker=dict(
                    size=self.marker_size,
                    color=group_df["_unique_ko_count"],
                    colorscale=self.color_scale,
                    cmin=cmin,
                    cmax=cmax,
                    showscale=(i == 1),  # Show colorbar only on first
                    colorbar=dict(title=colorbar_title, thickness=15, len=0.8),
                ),
                customdata=group_df[["_unique_ko_count"]].values,
                hovertemplate=(
                    f"Sample: %{{x}}<br>"
                    f"Compound: %{{y}}<br>"
                    f"Unique KOs: %{{customdata[0]}}<extra></extra>"
                ),
            ),
            row=1,
            col=i,
        )

    # Handle title configuration (support both string and dict)
    title_config = chart_config.get("title", {})
    if isinstance(title_config, str):
        # Backward compatibility: string title
        show_title = True
        title_text = title_config
        title_font_size = 16
    else:
        # New format: dict with show, text, font_size
        show_title = title_config.get("show", True)
        title_text = (
            title_config.get(
                "text",
                f"Minimal Sample-Group Visualization: {self._selected_compoundclass}",
            )
            if show_title
            else ""
        )
        title_font_size = title_config.get("font_size", 16)

    # Get layout options
    height = layout_config.get("height", DEFAULT_CHART_HEIGHT)
    use_autosize = layout_config.get("autosize", False)
    template = layout_config.get("template", DEFAULT_TEMPLATE)

    # Get margin configuration
    margin_config = layout_config.get("margin", {})
    margin = dict(
        l=margin_config.get("l", 150),
        r=margin_config.get("r", 50),
        t=margin_config.get("t", 80),
        b=margin_config.get("b", 50),
    )

    # Build layout update dict
    layout_update = {
        "title": dict(
            text=title_text,
            x=0.5,
            xanchor="center",
            font=dict(size=title_font_size),
        ),
        "template": template,
        "height": height,
        "margin": margin,
        "paper_bgcolor": "white",
    }

    # Add autosize or width
    if use_autosize:
        layout_update["autosize"] = True
    else:
        if layout_config.get("width"):
            layout_update["width"] = layout_config.get("width", DEFAULT_CHART_WIDTH)

    fig.update_layout(**layout_update)

    # Update X-axes: rotation and title
    xaxis_tickangle = chart_config.get("xaxis_tickangle", -45)
    xaxis_title = chart_config.get("xaxis_title", "")

    for i in range(1, n_groups + 1):
        fig.update_xaxes(
            tickangle=xaxis_tickangle,
            title_text=xaxis_title if i == 1 else "",
            row=1,
            col=i,
        )

    # Update Y-axis: rotation and title on first column
    yaxis_tickangle = chart_config.get("yaxis_tickangle", 0)
    yaxis_title = chart_config.get("yaxis_title", "Compound")

    fig.update_yaxes(
        title_text=yaxis_title, tickangle=yaxis_tickangle, row=1, col=1
    )

    # Log statistics
    n_compounds = processed_df[self.compound_column].nunique()
    n_samples = processed_df[self.sample_column].nunique()
    logger.info(
        f"Frozenset figure created - {n_groups} groups, "
        f"{n_samples} samples, {n_compounds} compounds"
    )

    return fig
apply_filters
apply_filters(df: DataFrame, filters: Optional[Dict[str, Any]] = None) -> pd.DataFrame

Apply filters including compound class selection.

Parameters:

Name Type Description Default
df DataFrame

Data to filter.

required
filters Optional[Dict[str, Any]]

Filter specifications including 'compoundclass'.

None

Returns:

Type Description
DataFrame

Filtered data.

Source code in src/domain/plot_strategies/charts/frozenset_strategy.py
def apply_filters(
    self, df: pd.DataFrame, filters: Optional[Dict[str, Any]] = None
) -> pd.DataFrame:
    """
    Apply filters including compound class selection.

    Parameters
    ----------
    df : pd.DataFrame
        Data to filter.
    filters : Optional[Dict[str, Any]], default=None
        Filter specifications including 'compoundclass'.

    Returns
    -------
    pd.DataFrame
        Filtered data.
    """
    if filters and "compoundclass" in filters:
        self._selected_compoundclass = filters["compoundclass"]
        logger.info(f"Set compound class filter: '{self._selected_compoundclass}'")

    return super().apply_filters(df, filters)
get_available_compound_classes
get_available_compound_classes(df: DataFrame) -> List[str]

Get list of available compound classes from data.

Parameters:

Name Type Description Default
df DataFrame

Input data.

required

Returns:

Type Description
List[str]

Sorted list of unique compound classes.

Source code in src/domain/plot_strategies/charts/frozenset_strategy.py
def get_available_compound_classes(self, df: pd.DataFrame) -> List[str]:
    """
    Get list of available compound classes from data.

    Parameters
    ----------
    df : pd.DataFrame
        Input data.

    Returns
    -------
    List[str]
        Sorted list of unique compound classes.
    """
    if self.compoundclass_column not in df.columns:
        return []

    return sorted(df[self.compoundclass_column].dropna().unique().tolist())
get_group_statistics
get_group_statistics(processed_df: DataFrame) -> Dict[str, Any]

Calculate statistics for visualization.

Parameters:

Name Type Description Default
processed_df DataFrame

Processed data.

required

Returns:

Type Description
Dict[str, Any]

Statistics including group counts, samples, compounds.

Source code in src/domain/plot_strategies/charts/frozenset_strategy.py
def get_group_statistics(self, processed_df: pd.DataFrame) -> Dict[str, Any]:
    """
    Calculate statistics for visualization.

    Parameters
    ----------
    processed_df : pd.DataFrame
        Processed data.

    Returns
    -------
    Dict[str, Any]
        Statistics including group counts, samples, compounds.
    """
    return {
        "compound_class": self._selected_compoundclass,
        "total_groups": len(self._minimized_groups),
        "groups": self._minimized_groups,
        "total_samples": processed_df[self.sample_column].nunique(),
        "total_compounds": processed_df[self.compound_column].nunique(),
        "ko_range": {
            "min": int(processed_df["_unique_ko_count"].min()),
            "max": int(processed_df["_unique_ko_count"].max()),
        },
    }
apply_customizations
apply_customizations(fig: Figure, customizations: Optional[Any] = None) -> go.Figure

Apply custom styling to figure.

This is a hook for future customization features (FLEXIVEL and FLEXIVEL2).

Parameters:

Name Type Description Default
fig Figure

Base figure.

required
customizations Optional[Any]

Customization specifications.

None

Returns:

Type Description
Figure

Customized figure.

Source code in src/domain/plot_strategies/base/base_plot_strategy.py
def apply_customizations(
    self, fig: go.Figure, customizations: Optional[Any] = None
) -> go.Figure:
    """
    Apply custom styling to figure.

    This is a hook for future customization features
    (FLEXIVEL and FLEXIVEL2).

    Parameters
    ----------
    fig : go.Figure
        Base figure.
    customizations : Optional[Any], default=None
        Customization specifications.

    Returns
    -------
    go.Figure
        Customized figure.
    """
    # Hook for future implementation
    return fig
generate_plot
generate_plot(data: DataFrame, filters: Optional[Dict[str, Any]] = None, customizations: Optional[Any] = None) -> go.Figure

Generate complete plot (Template Method).

This method orchestrates the entire plot generation process: 1. Validate input data 2. Process data 3. Apply filters 4. Create figure 5. Apply customizations

Parameters:

Name Type Description Default
data DataFrame

Input data.

required
filters Optional[Dict[str, Any]]

Filters to apply.

None
customizations Optional[Any]

Customizations to apply.

None

Returns:

Type Description
Figure

Complete Plotly figure.

Raises:

Type Description
ValueError

If validation fails.

Source code in src/domain/plot_strategies/base/base_plot_strategy.py
def generate_plot(
    self,
    data: pd.DataFrame,
    filters: Optional[Dict[str, Any]] = None,
    customizations: Optional[Any] = None,
) -> go.Figure:
    """
    Generate complete plot (Template Method).

    This method orchestrates the entire plot generation process:
    1. Validate input data
    2. Process data
    3. Apply filters
    4. Create figure
    5. Apply customizations

    Parameters
    ----------
    data : pd.DataFrame
        Input data.
    filters : Optional[Dict[str, Any]], default=None
        Filters to apply.
    customizations : Optional[Any], default=None
        Customizations to apply.

    Returns
    -------
    go.Figure
        Complete Plotly figure.

    Raises
    ------
    ValueError
        If validation fails.
    """
    # 1. Validate
    self.validate_data(data)

    # 2. Process
    processed_df = self.process_data(data)

    # 3. Filter
    filtered_df = self.apply_filters(processed_df, filters)

    # 4. Create figure
    figure = self.create_figure(filtered_df)

    # 5. Apply customizations (hook for future)
    figure = self.apply_customizations(figure, customizations)

    return figure