Skip to content

Dot Plot Strategy

dot_plot_strategy

Dot Plot Strategy - Scatter and Bubble Chart Visualizations.

This module implements the DotPlotStrategy for creating scatter plots and bubble charts using Plotly. Supports both simple scatter plots with uniform markers and bubble charts with size/color encoding for quantitative variables.

Classes:

Name Description
DotPlotStrategy

Strategy for creating scatter and bubble chart visualizations.

Notes
  • Supports simple scatter plots and bubble charts
  • Flexible axis mappings (categorical or continuous)
  • Data aggregation and filtering capabilities

For supported use cases, refer to the official documentation.

Classes

DotPlotStrategy

DotPlotStrategy(config: Dict[str, Any])

Bases: BasePlotStrategy

Strategy for creating scatter and bubble chart visualizations.

This strategy creates scatter-based visualizations with flexible configuration for both simple scatter plots and bubble charts with size and color encoding.

Parameters:

Name Type Description Default
config Dict[str, Any]

Complete configuration dictionary from YAML. Must contain 'visualization' section with plot parameters.

required

Attributes:

Name Type Description
config Dict[str, Any]

Stored configuration dictionary.

data_config Dict[str, Any]

Data processing configuration.

plotly_config Dict[str, Any]

Plotly-specific visualization config.

Methods:

Name Description
validate_data

Validate input data for dot plot requirements

process_data

Process data with filtering, grouping, and sorting

create_figure

Create dot plot figure from processed data

generate

Generate complete dot plot visualization

Notes
  • Supports simple scatter and bubble chart modes
  • Flexible axis mappings (categorical or continuous)
  • Data aggregation and filtering capabilities

Initialize strategy with configuration.

Parameters:

Name Type Description Default
config Dict[str, Any]

Complete configuration from YAML file.

required
Source code in src/domain/plot_strategies/charts/dot_plot_strategy.py
def __init__(self, config: Dict[str, Any]):
    """
    Initialize strategy with configuration.

    Parameters
    ----------
    config : Dict[str, Any]
        Complete configuration from YAML file.
    """
    super().__init__(config)
    self.data_config = config.get("data", {})
    self.plotly_config = self.viz_config.get("plotly", {})
    logger.info("DotPlotStrategy initialized")
Functions
validate_data
validate_data(df: DataFrame) -> None

Validate input data for dot plot requirements.

Parameters:

Name Type Description Default
df DataFrame

Input data to validate.

required

Raises:

Type Description
ValueError

If DataFrame is empty, required columns missing, or x/y columns not found in data.

Source code in src/domain/plot_strategies/charts/dot_plot_strategy.py
def validate_data(self, df: pd.DataFrame) -> None:
    """
    Validate input data for dot plot requirements.

    Parameters
    ----------
    df : pd.DataFrame
        Input data to validate.

    Raises
    ------
    ValueError
        If DataFrame is empty, required columns missing, or x/y columns
        not found in data.
    """
    logger.debug("Starting data validation for DotPlotStrategy")

    # Check if DataFrame is empty
    if df.empty:
        raise ValueError("DataFrame is empty")

    # Get required columns from config
    required_cols = self.data_config.get("required_columns", [])

    if required_cols:
        missing_cols = set(required_cols) - set(df.columns)
        if missing_cols:
            raise ValueError(
                f"Missing required columns: {missing_cols}. "
                f"Available: {df.columns.tolist()}"
            )

    # Validate x and y columns from scatter config
    scatter_config = self.plotly_config.get("scatter", {})
    x_col = scatter_config.get("x")
    y_col = scatter_config.get("y")

    if not x_col or not y_col:
        raise ValueError(
            "Configuration must specify 'x' and 'y' in "
            "'visualization.plotly.scatter'"
        )

    if x_col not in df.columns:
        raise ValueError(f"X column '{x_col}' not found in data")

    if y_col not in df.columns:
        raise ValueError(f"Y column '{y_col}' not found in data")

    # Validate size/color columns if bubble mode
    # NOTE: Skip validation for columns created during processing
    # (e.g., 'unique_ko_count' created by group_and_count step)
    mode = scatter_config.get("mode", "simple")
    if mode == "bubble":
        size_col = scatter_config.get("size")
        color_col = scatter_config.get("color")

        # Check if column will be created during processing
        processing_steps = self.data_config.get("processing", {}).get("steps", [])
        result_columns = []
        for step in processing_steps:
            if step.get("name") == "group_and_count":
                result_col = step.get("params", {}).get("result_column")
                if result_col:
                    result_columns.append(result_col)

        # Only validate if column is not created by processing
        if (
            size_col
            and size_col not in df.columns
            and size_col not in result_columns
        ):
            raise ValueError(
                f"Size column '{size_col}' not found in data and "
                f"not created by processing steps"
            )

        if (
            color_col
            and color_col not in df.columns
            and color_col not in result_columns
        ):
            raise ValueError(
                f"Color column '{color_col}' not found in data and "
                f"not created by processing steps"
            )

    logger.info(f"Data validation passed: {len(df)} rows")
process_data
process_data(df: DataFrame) -> pd.DataFrame

Process data for dot plot visualization.

Applies processing steps defined in configuration including filtering, grouping, aggregation, and sorting.

Parameters:

Name Type Description Default
df DataFrame

Input data.

required

Returns:

Type Description
DataFrame

Processed data ready for visualization.

Source code in src/domain/plot_strategies/charts/dot_plot_strategy.py
def process_data(self, df: pd.DataFrame) -> pd.DataFrame:
    """
    Process data for dot plot visualization.

    Applies processing steps defined in configuration including filtering,
    grouping, aggregation, and sorting.

    Parameters
    ----------
    df : pd.DataFrame
        Input data.

    Returns
    -------
    pd.DataFrame
        Processed data ready for visualization.
    """
    logger.debug(f"Processing data: {len(df)} rows")

    processed_df = df.copy()

    # Get processing steps from config
    processing_steps = self.data_config.get("processing", {}).get("steps", [])

    for step in processing_steps:
        step_name = step.get("name")
        enabled = step.get("enabled", True)

        if not enabled:
            logger.debug(f"Skipping disabled step: {step_name}")
            continue

        params = step.get("params", {})

        if step_name == "filter":
            processed_df = self._apply_filter_step(processed_df, params)
        elif step_name == "group_and_count":
            processed_df = self._apply_grouping_step(processed_df, params)
        elif step_name == "sort":
            processed_df = self._apply_sort_step(processed_df, params)
        else:
            logger.warning(f"Unknown processing step: {step_name}")

    logger.info(
        f"Data processing completed: {len(processed_df)} rows "
        f"(from {len(df)} original rows)"
    )

    return processed_df
create_figure
create_figure(processed_df: DataFrame) -> go.Figure

Create dot plot figure from processed data.

Parameters:

Name Type Description Default
processed_df DataFrame

Processed data ready for visualization.

required

Returns:

Type Description
Figure

Plotly figure (scatter or bubble chart).

Source code in src/domain/plot_strategies/charts/dot_plot_strategy.py
def create_figure(self, processed_df: pd.DataFrame) -> go.Figure:
    """
    Create dot plot figure from processed data.

    Parameters
    ----------
    processed_df : pd.DataFrame
        Processed data ready for visualization.

    Returns
    -------
    go.Figure
        Plotly figure (scatter or bubble chart).
    """
    return self.generate(processed_df)
generate
generate(data: DataFrame) -> go.Figure

Generate dot plot (scatter or bubble chart).

Parameters:

Name Type Description Default
data DataFrame

Processed data ready for visualization.

required

Returns:

Type Description
Figure

Plotly figure with scatter plot or bubble chart.

Raises:

Type Description
ValueError

If data is empty or required configuration missing.

Source code in src/domain/plot_strategies/charts/dot_plot_strategy.py
def generate(self, data: pd.DataFrame) -> go.Figure:
    """
    Generate dot plot (scatter or bubble chart).

    Parameters
    ----------
    data : pd.DataFrame
        Processed data ready for visualization.

    Returns
    -------
    go.Figure
        Plotly figure with scatter plot or bubble chart.

    Raises
    ------
    ValueError
        If data is empty or required configuration missing.
    """
    logger.info("Generating dot plot", extra={"rows": len(data)})

    # Validate data
    if data.empty:
        raise ValueError("Cannot create plot: DataFrame is empty")

    # Extract configuration
    scatter_config = self.plotly_config.get("scatter", {})
    layout_config = self.plotly_config.get("layout", {})

    # Get plot mode
    mode = scatter_config.get("mode", "simple")

    # Create figure based on mode
    if mode == "bubble":
        fig = self._create_bubble_chart(data, scatter_config)
    else:
        fig = self._create_simple_scatter(data, scatter_config)

    # Apply layout
    fig = self._apply_layout(fig, layout_config, scatter_config)

    logger.info(
        "Dot plot generated successfully", extra={"mode": mode, "points": len(data)}
    )

    return fig
apply_filters
apply_filters(df: DataFrame, filters: Optional[Dict[str, Any]] = None) -> pd.DataFrame

Apply filters to data.

This is a common implementation that can be overridden by subclasses if needed.

Parameters:

Name Type Description Default
df DataFrame

Data to filter.

required
filters Optional[Dict[str, Any]]

Filter specifications.

None

Returns:

Type Description
DataFrame

Filtered data.

Source code in src/domain/plot_strategies/base/base_plot_strategy.py
def apply_filters(
    self, df: pd.DataFrame, filters: Optional[Dict[str, Any]] = None
) -> pd.DataFrame:
    """
    Apply filters to data.

    This is a common implementation that can be overridden
    by subclasses if needed.

    Parameters
    ----------
    df : pd.DataFrame
        Data to filter.
    filters : Optional[Dict[str, Any]], default=None
        Filter specifications.

    Returns
    -------
    pd.DataFrame
        Filtered data.
    """
    import logging

    logger = logging.getLogger(__name__)

    if not filters:
        logger.debug("No filters provided, returning original data")
        return df

    logger.info(
        f"Applying filters - Input shape: {df.shape}, "
        f"Columns: {df.columns.tolist()}"
    )
    logger.info(f"Filters to apply: {filters}")

    filtered_df = df.copy()

    # Get filter configurations
    filter_configs = self.config.get("filters", [])

    for filter_config in filter_configs:
        filter_id = filter_config.get("filter_id")
        filter_type = filter_config.get("type")

        if filter_id not in filters:
            continue

        filter_value = filters[filter_id]
        data_binding = filter_config.get("data_binding", {})
        column = data_binding.get("column")

        if not column or column not in filtered_df.columns:
            logger.warning(
                f"Filter '{filter_id}': Column '{column}' not found. "
                f"Available: {filtered_df.columns.tolist()}"
            )
            continue

        # Apply range filter
        if filter_type == "range" and isinstance(filter_value, list):
            min_val, max_val = filter_value
            logger.info(
                f"Applying range filter on '{column}': " f"[{min_val}, {max_val}]"
            )
            filtered_df = filtered_df[
                (filtered_df[column] >= min_val) & (filtered_df[column] <= max_val)
            ]
            logger.info(f"After filter: {len(filtered_df)} rows remaining")

    logger.info(f"Final filtered shape: {filtered_df.shape}")
    return filtered_df
apply_customizations
apply_customizations(fig: Figure, customizations: Optional[Any] = None) -> go.Figure

Apply custom styling to figure.

This is a hook for future customization features (FLEXIVEL and FLEXIVEL2).

Parameters:

Name Type Description Default
fig Figure

Base figure.

required
customizations Optional[Any]

Customization specifications.

None

Returns:

Type Description
Figure

Customized figure.

Source code in src/domain/plot_strategies/base/base_plot_strategy.py
def apply_customizations(
    self, fig: go.Figure, customizations: Optional[Any] = None
) -> go.Figure:
    """
    Apply custom styling to figure.

    This is a hook for future customization features
    (FLEXIVEL and FLEXIVEL2).

    Parameters
    ----------
    fig : go.Figure
        Base figure.
    customizations : Optional[Any], default=None
        Customization specifications.

    Returns
    -------
    go.Figure
        Customized figure.
    """
    # Hook for future implementation
    return fig
generate_plot
generate_plot(data: DataFrame, filters: Optional[Dict[str, Any]] = None, customizations: Optional[Any] = None) -> go.Figure

Generate complete plot (Template Method).

This method orchestrates the entire plot generation process: 1. Validate input data 2. Process data 3. Apply filters 4. Create figure 5. Apply customizations

Parameters:

Name Type Description Default
data DataFrame

Input data.

required
filters Optional[Dict[str, Any]]

Filters to apply.

None
customizations Optional[Any]

Customizations to apply.

None

Returns:

Type Description
Figure

Complete Plotly figure.

Raises:

Type Description
ValueError

If validation fails.

Source code in src/domain/plot_strategies/base/base_plot_strategy.py
def generate_plot(
    self,
    data: pd.DataFrame,
    filters: Optional[Dict[str, Any]] = None,
    customizations: Optional[Any] = None,
) -> go.Figure:
    """
    Generate complete plot (Template Method).

    This method orchestrates the entire plot generation process:
    1. Validate input data
    2. Process data
    3. Apply filters
    4. Create figure
    5. Apply customizations

    Parameters
    ----------
    data : pd.DataFrame
        Input data.
    filters : Optional[Dict[str, Any]], default=None
        Filters to apply.
    customizations : Optional[Any], default=None
        Customizations to apply.

    Returns
    -------
    go.Figure
        Complete Plotly figure.

    Raises
    ------
    ValueError
        If validation fails.
    """
    # 1. Validate
    self.validate_data(data)

    # 2. Process
    processed_df = self.process_data(data)

    # 3. Filter
    filtered_df = self.apply_filters(processed_df, filters)

    # 4. Create figure
    figure = self.create_figure(filtered_df)

    # 5. Apply customizations (hook for future)
    figure = self.apply_customizations(figure, customizations)

    return figure

Functions