Skip to content

UC-4.9 — Profiling of Sample Enzymatic Activity

Module: 4 – Functional and Genetic Profiling
Visualization type: Interactive bar chart (enzyme activity vs. gene diversity)
Primary inputs: BioRemPP_Results.xlsx or BioRemPP_Results.csv (sample–enzyme–gene associations)
Primary outputs: Ranked enzymatic activity profile per sample


Scientific Question and Rationale

Question: For any given sample, which enzymatic functions are most broadly represented, as measured by the diversity of unique gene annotations associated with them?

The same high-level enzymatic activity (e.g., oxidoreductase, hydrolase) can be annotated across many distinct genes. Quantifying how many unique gene annotations are associated with each enzymatic function can reveal:

  • which activities have the broadest gene annotation coverage for that sample, and
  • which annotated functions are most represented in the dataset for that sample.

Data and Inputs

  • Primary data source: BioRemPP_Results.xlsx or BioRemPP_Results.csv (semicolon-delimited)

  • Key columns:

  • sample – identifier for each biological sample
  • enzyme_activity – functional label for enzymatic activity (e.g., oxidoreductase, transferase)
  • genesymbol – gene symbols mapped to that enzymatic activity in a given sample

  • User control:

  • Dropdown – Sample: all unique sample identifiers available in the dataset.

  • Output structure:

  • X-axis: enzymatic activities (enzyme_activity)
  • Y-axis: number of distinct genesymbol values per activity for the selected sample
  • Bars: one bar per enzymatic activity, ranked by gene diversity

Analytical Workflow

  1. User Selection
  2. The user selects a sample from the interactive dropdown menu.
  3. This choice defines the focal organism or consortium for profiling.

  4. Dynamic Filtering

  5. The BioRemPP_Results.xlsx or BioRemPP_Results.csv table is filtered to retain only rows matching the selected sample.
  6. Rows with missing enzyme_activity or genesymbol are discarded to ensure valid associations.

  7. Aggregation

  8. The filtered data is grouped by enzyme_activity.
  9. For each enzymatic activity, the number of distinct gene symbols is computed (e.g., using nunique() on genesymbol).
  10. The result is a summary table:

    • one row per enzyme_activity,
    • one value: unique_gene_count.
  11. Sorting and Rendering

  12. Enzymatic activities are sorted in descending order of unique_gene_count.
  13. A bar chart is rendered:
    • X-axis: enzyme_activity,
    • Y-axis: unique_gene_count,
    • bars labelled with their exact counts for clarity.

How to Read the Plot

  • Dropdown Menu – Sample Selection
  • Choose a Sample to analyze.
  • The bar chart updates to show the enzymatic activity profile for that specific sample.

  • X-axis – Enzymatic Activities

  • Each tick corresponds to a distinct enzyme_activity annotated in the selected sample.
  • Examples might include hydrolase, monooxygenase, transferase, oxidoreductase, etc.

  • Y-axis – Gene Diversity per Activity

  • The vertical value for each bar is the count of unique genes (genesymbol) mapped to that activity in the chosen sample.

  • Bars – Genetic Support for Each Activity

  • The height and numeric label of each bar indicate how many distinct genes support that activity.
  • Taller bars represent enzymatic functions backed by a more diverse gene set.

Representative Output

The image below illustrates a representative output generated by this use case using the example dataset.

Click on the image to enlarge and explore details.

Representative output for UC-4.9


Interpretation and Key Messages

  • Enzymatic Functions with High Gene Annotation Diversity
  • Enzymatic activities with tall bars may represent functions annotated across many distinct genes in that sample:

    • a large number of gene annotations are linked to that activity in the dataset.
  • Concentrated vs. Distributed Annotation Profiles

  • A profile dominated by a small set of activities (e.g., many genes annotated as oxidoreductase or monooxygenase) may indicate concentrated annotation coverage in those functional categories.

  • Breadth vs. Depth of Gene Annotations

  • A wide spread of activities with moderate gene counts may suggest broad annotation coverage across many enzymatic functions.
  • A few activities with very high gene counts may indicate concentrated annotation depth in those specific functional categories.

  • Comparative Profiling Across Samples

  • By switching the selected sample:
    • one can compare which activities have the most gene annotations across different samples,
    • identify samples with different annotation profiles (e.g., one with many hydrolase annotations, another with many monooxygenase annotations), and
    • generate annotation-based hypotheses for experimental follow-up.

Reproducibility and Assumptions

  • Input Format
    The analysis requires a semicolon-delimited table with at least:
  • sample,
  • enzyme_activity,
  • genesymbol.

  • Presence and Counting Rules

  • Each bar's value is the number of unique gene symbols associated with that activity for the selected sample.
  • Multiple occurrences of the same (sample, enzyme_activity, genesymbol) combination do not increase the count; they are treated as a single gene providing that activity.

  • Scope and Limitations

  • The chart reflects annotated gene diversity per enzymatic function, not measured expression or activity levels.
  • enzyme_activity labels depend on upstream annotation pipelines; misannotations or incomplete mappings will affect the profile.

Activity diagram of the use case

Click on the image to enlarge and explore details.

Activity diagram of the use case