UC-2.4 — Ranking of Compounds Richness by Gene Count per Chemical Classes¶

Module: 2 – Exploratory Analysis: Ranking the Functional Potential of Samples and Compounds
Visualization type: Interactive bar chart (unique gene count per compound, by chemical class)
Primary inputs: BioRemPP results table with compoundclass, compoundname, and genesymbol columns
Primary outputs: Ranked list of compounds by number of unique genes, within a selected chemical class

Scientific Question and Rationale¶

Question: Within a specific chemical class, which compounds are associated with the greatest diversity of unique genes in the dataset, and what hypotheses might this pattern suggest about the involvement of varied enzymatic functions?

This use case ranks compounds according to the diversity of unique genes co-annotated with them, within a user-selected chemical class. The resulting bar chart highlights which compounds show the greatest gene diversity in their annotations for a selected category.

Data and Inputs¶

Primary data source: BioRemPP_Results.xlsx or BioRemPP_Results.csv
Key columns:
compoundclass – categorical label defining the chemical class (e.g., Aromatics, Aliphatics, Chlorinated)
compoundname – name of the chemical compound
genesymbol – gene symbol or identifier associated with the interaction
Accepted format: semicolon-delimited text table (.txt or .csv)
Entities of interest: compounds, grouped by chemical class, with counts of unique associated genes

Analytical Workflow¶

User Selection
The user selects a compoundclass from an interactive dropdown menu.
Dynamic Filtering
The primary results table (BioRemPP_Results.xlsx or BioRemPP_Results.csv) is filtered to retain only rows corresponding to the selected compoundclass. All further calculations are performed on this subset.
Aggregation The filtered data is grouped by each unique compoundname. Within each group, the number of distinct gene symbols is computed (e.g., using nunique() on the genesymbol column). This yields a per-compound count of unique genes co-annotated with that compound.
Sorting and Rendering
The aggregated results (compound vs. unique gene count) are sorted in descending order and rendered as a bar chart:
one axis lists Compounds within the selected class,
the other axis represents the count of unique genes associated with each compound, and
bar height/length is proportional to the unique gene count.

How to Read the Plot¶

Dropdown Menu
Use the dropdown to select the Chemical Class (compoundclass) of interest. The plot updates automatically to show only compounds belonging to that class.
Compound Axis
One axis (typically the X-axis) lists the individual Compounds within the selected class.
Gene Count Axis
The other axis (typically the Y-axis) represents the count of unique genes (genesymbol) associated with each compound.
Bar Height and Labels The height (or length) of each bar, along with optional numeric labels, indicates the total number of unique genes co-annotated with that compound. Taller bars correspond to compounds with a larger number of co-annotated genes in the dataset.

Representative Output¶

The image below illustrates a representative output generated by this use case using the example dataset.

Click on the image to enlarge and explore details.

Interpretation and Key Messages¶

High Gene Diversity Compounds with taller bars are associated with a larger and more diverse set of genes in the annotation data. This may suggest the involvement of multiple functional roles in related pathways, though the annotation diversity does not confirm the complexity or number of steps in any biotransformation process.
Low Gene Diversity Compounds with shorter bars — especially those with a unique gene count of 1 — are co-annotated with fewer genes. This may reflect a narrower annotation profile or more limited co-occurrence data, rather than confirmed enzymatic simplicity.
Comparing Chemical Classes By switching between chemical classes in the dropdown, users can compare gene co-annotation patterns across classes. This can help explore whether certain classes show broader gene diversity in their annotations — generating hypotheses about functional complexity differences that would require experimental follow-up to validate.

Reproducibility and Assumptions¶

Input Format
The analysis assumes a semicolon-delimited table containing at least the columns compoundclass, compoundname, and genesymbol.
Uniqueness Definition
The ranking is based on the count of unique genes per compound. If a single gene is associated multiple times with the same compound (e.g., across different samples or conditions), it is counted only once for that compound.
Gene-Level Interpretation Gene counts reflect annotation co-occurrence density per compound. The visualization does not incorporate expression level, gene copy number, or kinetic parameters, and does not confirm the complexity of any biotransformation process. These aspects would require additional experimental or analytical layers.
Class-Specific Context
All rankings and interpretations are conditional on the selected chemical class. A compound may be genetically complex in one class but absent in another simply because it is not annotated there.

Activity diagram of the use case¶

Click on the image to enlarge and explore details.