UC-5.6 — Compound–Compound Interaction Network (Based on Shared Genes)¶
Module: 5 – Modeling Interactions of Samples, Genes, and Compounds
Visualization type: Weighted compound–compound network (shared-gene edges, force-directed layout)
Primary inputs: BioRemPP results table with compoundname and genesymbol columns
Primary outputs: Compound–compound interaction network weighted by number of shared genes; node-level connectivity (degree)
Scientific Question and Rationale¶
Question: Which chemical compounds share the most gene co-annotations across samples, and what co-annotation structure do these compound–compound relationships form?
This use case examines compound–compound co-annotation overlap by identifying which compounds are co-annotated with overlapping sets of genes across all biological samples. Compounds that share many gene co-annotations could warrant investigation as potentially related in pathway annotation, though structural or biochemical similarity requires experimental validation. By constructing a compound–compound network where edges represent shared gene co-annotations and edge weights encode the number of these shared annotations, the analysis may reveal compound co-annotation clusters, highly connected hub compounds, and bridge compounds that link distinct annotation groups.
Data and Inputs¶
- Primary data source:
BioRemPP_Results.xlsx or BioRemPP_Results.csv - Key columns:
compoundname– name (or identifier) of the chemical compoundgenesymbol– gene symbol or identifier associated with that compound in at least one sample- Accepted format: semicolon-delimited text table (
.txtor.csv) - Derived structures:
- mapping of each compound to its set of unique genes,
- weighted compound–compound edge list based on the count of shared genes.
Analytical Workflow¶
-
Data Loading
The primary results table (BioRemPP_Results.xlsx or BioRemPP_Results.csv) is loaded from its semicolon-delimited format. -
Compound-to-Gene Mapping
For each uniquecompoundname, a gene set is constructed: - all unique
genesymbolentries associated with that compound are collected into a set, -
this set represents the gene co-annotation profile of that compound.
-
Graph Construction (Compound–Compound Network)
A network graph is built where: - each unique compound is added as a node,
-
all unique pairs of compounds are evaluated; for each pair:
- the intersection of their gene sets is computed,
- if the intersection is non-empty, an edge is added between the two compounds,
- the edge weight is set to the number of shared unique genes.
-
Layout and Styling
A force-directed layout is used to compute node positions: - compounds with many strong connections tend to cluster toward the center,
- sparsely connected compounds are positioned closer to the periphery.
Node attributes are then computed: - degree (number of connected compound neighbors) is calculated for each node,
-
this degree is mapped to node color to highlight highly connected compounds.
-
Rendering
The network is rendered as an interactive plot: - nodes represent individual compounds,
- edges represent compound–compound links based on shared genes,
- edge thickness is proportional to edge weight (number of shared genes),
- node color is proportional to degree (number of compound neighbors), with a color bar indicating the scale.
How to Read the Plot¶
- Nodes (Compounds)
Each point in the graph is a Compound Name: - its position is determined by the force-directed layout,
-
its color encodes its degree (how many other compounds it is connected to).
-
Edges (Compound–Compound Links) Each line between two nodes represents a shared gene co-annotation link:
- the two compounds share at least one common gene co-annotation,
-
the thickness of the edge is proportional to the number of shared gene co-annotations (edge weight).
-
Node Color Scale
A color bar indicates the range of node degrees: - brighter/warmer colors correspond to high-degree compounds (hubs),
-
cooler or darker colors correspond to compounds with fewer connections.
-
Overall Structure The spatial arrangement may reflect the organization of compound co-annotation groups:
- dense regions could correspond to clusters of compounds sharing many gene co-annotations,
- more isolated nodes might suggest compounds with narrower or unique gene annotation profiles.
Representative Output¶
The image below illustrates a representative output generated by this use case using the example dataset.
Click on the image to enlarge and explore details.
Interpretation and Key Messages¶
- Compound Co-annotation Clusters Dense clusters of interconnected nodes may represent groups of compounds with shared gene co-annotations:
- they could share structural similarity or annotation patterns in the database,
-
whether they are pathway-connected intermediates or metabolically related requires experimental validation.
-
Hub Compounds Brightly colored, highly connected nodes may be hub compounds:
- they share gene co-annotations with many other compounds,
-
they could represent broadly annotated compounds that appear across many gene annotations in the database.
-
Bridge Compounds Compounds that connect distinct clusters may act as annotation bridges:
- they could link two different annotation groups or chemical classes through shared gene co-annotations,
-
they may represent annotation points of interest connecting distinct compound subsets.
-
Narrowly Annotated Compounds Nodes on the periphery with few connections may represent compounds with narrow gene co-annotation profiles:
- compounds co-annotated with a distinct or limited set of genes,
- possibly relevant for niche investigation scenarios or specific annotation contexts.
Reproducibility and Assumptions¶
-
Input Format
The analysis assumes a semicolon-delimited table containing at least the columnscompoundnameandgenesymbol. -
Link Definition
- A link between two compounds is defined by the presence of at least one shared gene co-annotation in their annotation sets.
- Edge weight is the number of shared unique gene co-annotations.
-
Node color reflects connectivity to other compounds (degree), not the total number of unique genes co-annotated with each compound.
-
Network Properties
- The network is typically treated as undirected and weighted: edges encode symmetric relationships based on shared genes and carry a weight proportional to that overlap.
-
The force-directed layout can be made reproducible by fixing a random seed.
-
Interpretation Scope
- The network captures co-annotation overlap patterns inferred from shared gene annotations; it does not directly encode chemical structure, thermodynamics, or kinetic parameters.
- Co-connectivity should be seen as hypothesis-generating evidence for chemical relatedness or pathway co-annotation, requiring further structural, biochemical, or environmental validation.
Activity diagram of the use case¶
Click on the image to enlarge and explore details.