UC-5.6 — Compound–Compound Interaction Network (Based on Shared Genes)¶

Module: 5 – Modeling Interactions of Samples, Genes, and Compounds
Visualization type: Weighted compound–compound network (shared-gene edges, force-directed layout)
Primary inputs: BioRemPP results table with compoundname and genesymbol columns
Primary outputs: Compound–compound interaction network weighted by number of shared genes; node-level connectivity (degree)

Scientific Question and Rationale¶

Question: Which chemical compounds share the most gene co-annotations across samples, and what co-annotation structure do these compound–compound relationships form?

This use case examines compound–compound co-annotation overlap by identifying which compounds are co-annotated with overlapping sets of genes across all biological samples. Compounds that share many gene co-annotations could warrant investigation as potentially related in pathway annotation, though structural or biochemical similarity requires experimental validation. By constructing a compound–compound network where edges represent shared gene co-annotations and edge weights encode the number of these shared annotations, the analysis may reveal compound co-annotation clusters, highly connected hub compounds, and bridge compounds that link distinct annotation groups.

Data and Inputs¶

Primary data source: BioRemPP_Results.xlsx or BioRemPP_Results.csv
Key columns:
compoundname – name (or identifier) of the chemical compound
genesymbol – gene symbol or identifier associated with that compound in at least one sample
Accepted format: semicolon-delimited text table (.txt or .csv)
Derived structures:
mapping of each compound to its set of unique genes,
weighted compound–compound edge list based on the count of shared genes.

Analytical Workflow¶

Data Loading
The primary results table (BioRemPP_Results.xlsx or BioRemPP_Results.csv) is loaded from its semicolon-delimited format.
Compound-to-Gene Mapping
For each unique compoundname, a gene set is constructed:
all unique genesymbol entries associated with that compound are collected into a set,
this set represents the gene co-annotation profile of that compound.
Graph Construction (Compound–Compound Network)
A network graph is built where:
each unique compound is added as a node,
all unique pairs of compounds are evaluated; for each pair:
- the intersection of their gene sets is computed,
- if the intersection is non-empty, an edge is added between the two compounds,
- the edge weight is set to the number of shared unique genes.
Layout and Styling
A force-directed layout is used to compute node positions:
compounds with many strong connections tend to cluster toward the center,
sparsely connected compounds are positioned closer to the periphery.
Node attributes are then computed:
degree (number of connected compound neighbors) is calculated for each node,
this degree is mapped to node color to highlight highly connected compounds.
Rendering
The network is rendered as an interactive plot:
nodes represent individual compounds,
edges represent compound–compound links based on shared genes,
edge thickness is proportional to edge weight (number of shared genes),
node color is proportional to degree (number of compound neighbors), with a color bar indicating the scale.

How to Read the Plot¶

Nodes (Compounds)
Each point in the graph is a Compound Name:
its position is determined by the force-directed layout,
its color encodes its degree (how many other compounds it is connected to).
Edges (Compound–Compound Links) Each line between two nodes represents a shared gene co-annotation link:
the two compounds share at least one common gene co-annotation,
the thickness of the edge is proportional to the number of shared gene co-annotations (edge weight).
Node Color Scale
A color bar indicates the range of node degrees:
brighter/warmer colors correspond to high-degree compounds (hubs),
cooler or darker colors correspond to compounds with fewer connections.
Overall Structure The spatial arrangement may reflect the organization of compound co-annotation groups:
dense regions could correspond to clusters of compounds sharing many gene co-annotations,
more isolated nodes might suggest compounds with narrower or unique gene annotation profiles.

Representative Output¶

The image below illustrates a representative output generated by this use case using the example dataset.

Click on the image to enlarge and explore details.

Interpretation and Key Messages¶

Compound Co-annotation Clusters Dense clusters of interconnected nodes may represent groups of compounds with shared gene co-annotations:
they could share structural similarity or annotation patterns in the database,
whether they are pathway-connected intermediates or metabolically related requires experimental validation.
Hub Compounds Brightly colored, highly connected nodes may be hub compounds:
they share gene co-annotations with many other compounds,
they could represent broadly annotated compounds that appear across many gene annotations in the database.
Bridge Compounds Compounds that connect distinct clusters may act as annotation bridges:
they could link two different annotation groups or chemical classes through shared gene co-annotations,
they may represent annotation points of interest connecting distinct compound subsets.
Narrowly Annotated Compounds Nodes on the periphery with few connections may represent compounds with narrow gene co-annotation profiles:
compounds co-annotated with a distinct or limited set of genes,
possibly relevant for niche investigation scenarios or specific annotation contexts.

Reproducibility and Assumptions¶

Input Format
The analysis assumes a semicolon-delimited table containing at least the columns compoundname and genesymbol.
Link Definition
A link between two compounds is defined by the presence of at least one shared gene co-annotation in their annotation sets.
Edge weight is the number of shared unique gene co-annotations.
Node color reflects connectivity to other compounds (degree), not the total number of unique genes co-annotated with each compound.
Network Properties
The network is typically treated as undirected and weighted: edges encode symmetric relationships based on shared genes and carry a weight proportional to that overlap.
The force-directed layout can be made reproducible by fixing a random seed.
Interpretation Scope
The network captures co-annotation overlap patterns inferred from shared gene annotations; it does not directly encode chemical structure, thermodynamics, or kinetic parameters.
Co-connectivity should be seen as hypothesis-generating evidence for chemical relatedness or pathway co-annotation, requiring further structural, biochemical, or environmental validation.

Activity diagram of the use case¶

Click on the image to enlarge and explore details.