Skip to content

UpSet Plot Strategy

upset_strategy

UpSet Strategy - Set Intersection Visualizations.

This module implements the UpSetStrategy for creating UpSet plots to visualize set intersections and unique contributions across multiple categorical sources.

Classes:

Name Description
UpSetStrategy

Strategy for creating UpSet plots showing set intersections.

Notes
  • Compares overlap between databases (e.g., BioRemPP, HADEG, KEGG)
  • Analyzes distribution across regulatory agencies
  • Identifies consensus evidence vs. source-specific coverage
  • Uses upsetplot library for publication-quality visualizations

For supported use cases, refer to the official documentation.

Classes

UpSetStrategy

UpSetStrategy(config: Dict[str, Any])

Bases: BasePlotStrategy

Strategy for creating UpSet plots showing set intersections.

UpSet plots visualize overlap and uniqueness of elements across multiple categories through three components: set size bars, intersection matrix, and intersection size bars.

Parameters:

Name Type Description Default
config Dict[str, Any]

Complete configuration from YAML file.

required

Attributes:

Name Type Description
entity_column str

Column name for entities to compare.

category_column str

Column name for categories/sources.

sort_by str

Sorting method ('cardinality' or 'degree').

show_counts bool

Whether to display counts on bars.

show_percentages bool

Whether to display percentages.

min_subset_size int

Minimum subset size to display.

max_subset_rank Optional[int]

Maximum subset rank limit.

Methods:

Name Description
generate

Generate UpSet plot from data

validate_data

Validate input data

process_data

Process and transform data for visualization

create_figure

Create Plotly figure from processed data

Notes
  • Uses upsetplot library (matplotlib) converted to Plotly format
  • Maintains consistency with application's visualization framework

Initialize strategy with configuration.

Parameters:

Name Type Description Default
config Dict[str, Any]

Full configuration dictionary from YAML.

required
Source code in src/domain/plot_strategies/charts/upset_strategy.py
def __init__(self, config: Dict[str, Any]):
    """
    Initialize strategy with configuration.

    Parameters
    ----------
    config : Dict[str, Any]
        Full configuration dictionary from YAML.
    """
    super().__init__(config)

    # Extract plotly configuration from viz_config
    self.plotly_config = self.viz_config.get("plotly", {})

    # Extract UpSet-specific configuration
    plotly_config = self.plotly_config or {}

    # Required columns
    self.entity_column = plotly_config.get("entity_column")
    self.category_column = plotly_config.get("category_column")

    if not self.entity_column:
        raise ValueError("UpSetStrategy requires 'entity_column' in config")
    if not self.category_column:
        raise ValueError("UpSetStrategy requires 'category_column' in config")

    # Optional UpSet parameters
    self.sort_by = plotly_config.get("sort_by", "cardinality")
    self.show_counts = plotly_config.get("show_counts", True)
    self.show_percentages = plotly_config.get("show_percentages", False)
    self.min_subset_size = plotly_config.get("min_subset_size", 0)
    self.max_subset_rank = plotly_config.get("max_subset_rank", None)

    # Figure dimensions
    self.fig_width = plotly_config.get("fig_width", 14)
    self.fig_height = plotly_config.get("fig_height", 8)

    # Color scheme
    self.bar_color = plotly_config.get("bar_color", "#0d6efd")

    # Layout configuration
    self.layout_config = plotly_config.get("layout", {})

    logger.debug(
        f"UpSetStrategy initialized: "
        f"entity='{self.entity_column}', "
        f"category='{self.category_column}', "
        f"sort_by='{self.sort_by}'"
    )
Functions
generate
generate(data: DataFrame) -> go.Figure

Generate UpSet plot from data.

Validates data, cleans it, builds category sets, creates UpSet plot, and converts to Plotly format.

Parameters:

Name Type Description Default
data DataFrame

DataFrame containing entity and category columns.

required

Returns:

Type Description
Figure

Plotly figure object containing the UpSet visualization.

Raises:

Type Description
ValueError

If data is empty or required columns are missing.

Source code in src/domain/plot_strategies/charts/upset_strategy.py
def generate(self, data: pd.DataFrame) -> go.Figure:
    """
    Generate UpSet plot from data.

    Validates data, cleans it, builds category sets, creates UpSet plot,
    and converts to Plotly format.

    Parameters
    ----------
    data : pd.DataFrame
        DataFrame containing entity and category columns.

    Returns
    -------
    go.Figure
        Plotly figure object containing the UpSet visualization.

    Raises
    ------
    ValueError
        If data is empty or required columns are missing.
    """
    logger.info(
        "Generating UpSet plot",
        extra={
            "entity_col": self.entity_column,
            "category_col": self.category_column,
            "rows": len(data),
        },
    )

    # Validate data
    self._validate_data(data)

    # Clean and prepare data
    df_clean = self._clean_data(data)

    # Build sets for each category
    category_sets = self._build_category_sets(df_clean)

    # Generate UpSet data structure
    upset_data = from_contents(category_sets)

    # Create UpSet plot (matplotlib)
    upset_plot = self._create_upset_plot(upset_data)

    # Convert to Plotly
    fig = self._convert_to_plotly(upset_plot, category_sets)

    # Apply layout
    self._apply_layout(fig)

    logger.info(
        "UpSet plot generated successfully",
        extra={
            "categories": len(category_sets),
            "total_intersections": len(upset_data),
        },
    )

    return fig
validate_data
validate_data(df: DataFrame) -> None

Validate input data (required by BasePlotStrategy).

This method wraps the internal _validate_data method to comply with the abstract base class interface.

Parameters:

Name Type Description Default
df DataFrame

Input data to validate.

required

Raises:

Type Description
ValueError

If validation fails.

Source code in src/domain/plot_strategies/charts/upset_strategy.py
def validate_data(self, df: pd.DataFrame) -> None:
    """
    Validate input data (required by BasePlotStrategy).

    This method wraps the internal _validate_data method to comply
    with the abstract base class interface.

    Parameters
    ----------
    df : pd.DataFrame
        Input data to validate.

    Raises
    ------
    ValueError
        If validation fails.
    """
    self._validate_data(df)
process_data
process_data(df: DataFrame) -> pd.DataFrame

Process and transform data for visualization.

This method cleans the data and builds category sets, then returns the cleaned DataFrame ready for visualization.

Parameters:

Name Type Description Default
df DataFrame

Input data.

required

Returns:

Type Description
DataFrame

Processed data ready for visualization.

Source code in src/domain/plot_strategies/charts/upset_strategy.py
def process_data(self, df: pd.DataFrame) -> pd.DataFrame:
    """
    Process and transform data for visualization.

    This method cleans the data and builds category sets, then
    returns the cleaned DataFrame ready for visualization.

    Parameters
    ----------
    df : pd.DataFrame
        Input data.

    Returns
    -------
    pd.DataFrame
        Processed data ready for visualization.
    """
    # Clean data
    df_clean = self._clean_data(df)

    # Store category sets for use in create_figure
    self._category_sets = self._build_category_sets(df_clean)

    return df_clean
create_figure
create_figure(processed_df: DataFrame) -> go.Figure

Create Plotly figure from processed data.

This method generates the UpSet plot using the previously built category sets.

Parameters:

Name Type Description Default
processed_df DataFrame

Processed data (not directly used, category sets are used).

required

Returns:

Type Description
Figure

Configured Plotly figure with UpSet visualization.

Source code in src/domain/plot_strategies/charts/upset_strategy.py
def create_figure(self, processed_df: pd.DataFrame) -> go.Figure:
    """
    Create Plotly figure from processed data.

    This method generates the UpSet plot using the previously
    built category sets.

    Parameters
    ----------
    processed_df : pd.DataFrame
        Processed data (not directly used, category sets are used).

    Returns
    -------
    go.Figure
        Configured Plotly figure with UpSet visualization.
    """
    # Generate UpSet data structure
    upset_data = from_contents(self._category_sets)

    # Create UpSet plot (matplotlib)
    upset_plot = self._create_upset_plot(upset_data)

    # Convert to Plotly
    fig = self._convert_to_plotly(upset_plot, self._category_sets)

    # Apply layout
    self._apply_layout(fig)

    return fig
apply_filters
apply_filters(df: DataFrame, filters: Optional[Dict[str, Any]] = None) -> pd.DataFrame

Apply filters to data.

This is a common implementation that can be overridden by subclasses if needed.

Parameters:

Name Type Description Default
df DataFrame

Data to filter.

required
filters Optional[Dict[str, Any]]

Filter specifications.

None

Returns:

Type Description
DataFrame

Filtered data.

Source code in src/domain/plot_strategies/base/base_plot_strategy.py
def apply_filters(
    self, df: pd.DataFrame, filters: Optional[Dict[str, Any]] = None
) -> pd.DataFrame:
    """
    Apply filters to data.

    This is a common implementation that can be overridden
    by subclasses if needed.

    Parameters
    ----------
    df : pd.DataFrame
        Data to filter.
    filters : Optional[Dict[str, Any]], default=None
        Filter specifications.

    Returns
    -------
    pd.DataFrame
        Filtered data.
    """
    import logging

    logger = logging.getLogger(__name__)

    if not filters:
        logger.debug("No filters provided, returning original data")
        return df

    logger.info(
        f"Applying filters - Input shape: {df.shape}, "
        f"Columns: {df.columns.tolist()}"
    )
    logger.info(f"Filters to apply: {filters}")

    filtered_df = df.copy()

    # Get filter configurations
    filter_configs = self.config.get("filters", [])

    for filter_config in filter_configs:
        filter_id = filter_config.get("filter_id")
        filter_type = filter_config.get("type")

        if filter_id not in filters:
            continue

        filter_value = filters[filter_id]
        data_binding = filter_config.get("data_binding", {})
        column = data_binding.get("column")

        if not column or column not in filtered_df.columns:
            logger.warning(
                f"Filter '{filter_id}': Column '{column}' not found. "
                f"Available: {filtered_df.columns.tolist()}"
            )
            continue

        # Apply range filter
        if filter_type == "range" and isinstance(filter_value, list):
            min_val, max_val = filter_value
            logger.info(
                f"Applying range filter on '{column}': " f"[{min_val}, {max_val}]"
            )
            filtered_df = filtered_df[
                (filtered_df[column] >= min_val) & (filtered_df[column] <= max_val)
            ]
            logger.info(f"After filter: {len(filtered_df)} rows remaining")

    logger.info(f"Final filtered shape: {filtered_df.shape}")
    return filtered_df
apply_customizations
apply_customizations(fig: Figure, customizations: Optional[Any] = None) -> go.Figure

Apply custom styling to figure.

This is a hook for future customization features (FLEXIVEL and FLEXIVEL2).

Parameters:

Name Type Description Default
fig Figure

Base figure.

required
customizations Optional[Any]

Customization specifications.

None

Returns:

Type Description
Figure

Customized figure.

Source code in src/domain/plot_strategies/base/base_plot_strategy.py
def apply_customizations(
    self, fig: go.Figure, customizations: Optional[Any] = None
) -> go.Figure:
    """
    Apply custom styling to figure.

    This is a hook for future customization features
    (FLEXIVEL and FLEXIVEL2).

    Parameters
    ----------
    fig : go.Figure
        Base figure.
    customizations : Optional[Any], default=None
        Customization specifications.

    Returns
    -------
    go.Figure
        Customized figure.
    """
    # Hook for future implementation
    return fig
generate_plot
generate_plot(data: DataFrame, filters: Optional[Dict[str, Any]] = None, customizations: Optional[Any] = None) -> go.Figure

Generate complete plot (Template Method).

This method orchestrates the entire plot generation process: 1. Validate input data 2. Process data 3. Apply filters 4. Create figure 5. Apply customizations

Parameters:

Name Type Description Default
data DataFrame

Input data.

required
filters Optional[Dict[str, Any]]

Filters to apply.

None
customizations Optional[Any]

Customizations to apply.

None

Returns:

Type Description
Figure

Complete Plotly figure.

Raises:

Type Description
ValueError

If validation fails.

Source code in src/domain/plot_strategies/base/base_plot_strategy.py
def generate_plot(
    self,
    data: pd.DataFrame,
    filters: Optional[Dict[str, Any]] = None,
    customizations: Optional[Any] = None,
) -> go.Figure:
    """
    Generate complete plot (Template Method).

    This method orchestrates the entire plot generation process:
    1. Validate input data
    2. Process data
    3. Apply filters
    4. Create figure
    5. Apply customizations

    Parameters
    ----------
    data : pd.DataFrame
        Input data.
    filters : Optional[Dict[str, Any]], default=None
        Filters to apply.
    customizations : Optional[Any], default=None
        Customizations to apply.

    Returns
    -------
    go.Figure
        Complete Plotly figure.

    Raises
    ------
    ValueError
        If validation fails.
    """
    # 1. Validate
    self.validate_data(data)

    # 2. Process
    processed_df = self.process_data(data)

    # 3. Filter
    filtered_df = self.apply_filters(processed_df, filters)

    # 4. Create figure
    figure = self.create_figure(filtered_df)

    # 5. Apply customizations (hook for future)
    figure = self.apply_customizations(figure, customizations)

    return figure

Functions