UC-5.5 — Gene–Gene Interaction Network (Based on Shared Compounds)¶

Module: 5 – Modeling Interactions of Samples, Genes, and Compounds
Visualization type: Weighted gene–gene network (shared-compound edges, force-directed layout)
Primary inputs: BioRemPP results table with genesymbol and compoundname columns
Primary outputs: Gene–gene interaction network weighted by number of shared compounds; node-level connectivity (degree)

Scientific Question and Rationale¶

Question: Which genes share the most compound co-annotations across samples, and what co-annotation structure do these gene–gene relationships form?

This use case examines gene–gene co-annotation overlap by identifying which genes are co-annotated with overlapping sets of chemical compounds across all biological samples. Genes that share many compound co-annotations could warrant investigation as potential functional partners, though experimental validation is required. By constructing a gene–gene network where edges represent shared compound co-annotations and edge weights encode the number of these shared annotations, the analysis may highlight co-annotation clusters, highly connected hub genes, and potential bridge genes that connect distinct annotation subsets.

Data and Inputs¶

Primary data source: BioRemPP_Results.xlsx or BioRemPP_Results.csv
Key columns:
genesymbol – gene symbol or identifier
compoundname – name (or identifier) of the chemical compound associated with that gene in at least one sample
Accepted format: semicolon-delimited text table (.txt or .csv)
Derived structures:
mapping of each gene to its set of unique compounds,
weighted gene–gene edge list based on the count of shared compounds.

Analytical Workflow¶

Data Loading
The primary results table (BioRemPP_Results.xlsx or BioRemPP_Results.csv) is loaded from its semicolon-delimited format.
Gene-to-Compound Mapping
For each unique genesymbol, a compound set is constructed:
all unique compoundname entries associated with that gene are collected into a set,
this set represents the compound co-annotation profile of that gene.
Graph Construction (Gene–Gene Network)
A network graph is built where:
each unique gene is added as a node,
all unique pairs of genes are evaluated; for each pair:
- the intersection of their compound sets is computed,
- if the intersection is non-empty, an edge is added between the two genes,
- the edge weight is set to the number of shared unique compounds.
Layout and Styling
A force-directed layout is used to compute node positions:
genes with many strong connections tend to be drawn closer to one another, forming clusters,
sparsely connected genes are placed closer to the periphery.
Node attributes are then computed:
degree (number of connected gene neighbors) is calculated for each node,
this degree is mapped to node color to highlight highly connected genes.
Rendering
The network is rendered as an interactive plot:
nodes represent individual genes,
edges represent gene–gene links based on shared compounds,
edge thickness is proportional to edge weight (number of shared compounds),
node color is proportional to degree (number of gene neighbors), with a color bar indicating the scale.

How to Read the Plot¶

Nodes (Genes)
Each point in the graph is a Gene Symbol:
the position is determined by the force-directed layout,
the color of a node encodes its degree (how many other genes it is connected to).
Edges (Gene–Gene Links) Each line between two nodes represents a shared compound co-annotation link:
two genes share at least one common compound co-annotation,
the thickness of the edge is proportional to the number of shared compound co-annotations (edge weight).
Color Scale for Nodes
A color bar indicates the range of node degrees:
nodes with brighter/warmer colors correspond to high-degree genes (hubs),
nodes with cooler or darker colors correspond to lower-degree genes.
Overall Structure
The spatial arrangement may reflect the network's modular organization:
dense clusters could indicate groups of genes with many shared compound partners,
sparsely connected or isolated nodes might indicate more specialized or infrequent relationships.

Representative Output¶

The image below illustrates a representative output generated by this use case using the example dataset.

Click on the image to enlarge and explore details.

Interpretation and Key Messages¶

Co-annotation Clusters Dense clusters of interconnected nodes may represent gene co-annotation groups:
genes in the same cluster tend to share many compound co-annotations,
they are candidates for belonging to related annotation contexts, though pathway membership requires experimental validation.
Hub Genes Brightly colored, centrally located nodes with many edges may be broadly co-annotated hub genes:
they share compound co-annotations with many other genes,
they could represent genes with broad annotation coverage across many compounds in the database.
Bridge Genes Genes that connect otherwise distinct clusters may act as annotation bridges:
they could link different annotation groups or chemical families,
they represent genes whose co-annotation patterns span multiple compound subsets.
Peripheral Genes Nodes on the network's periphery with few connections may represent narrowly co-annotated genes:
they could be relevant for rare or niche compound co-annotations,
they may still be important in targeted investigation scenarios even if not highly connected.

Reproducibility and Assumptions¶

Input Format
The analysis assumes a semicolon-delimited table containing at least the columns genesymbol and compoundname.
Link Definition
A link between two genes is defined by the presence of at least one shared compound co-annotation in their annotation sets.
Edge weight is the number of shared unique compound co-annotations.
Node color reflects gene–gene connectivity (degree), not the total number of gene–compound co-annotations.
Network Properties
The network is typically treated as undirected and weighted: directionality is not inferred, but the strength of association is encoded in edge weights.
The layout is based on a force-directed algorithm that can be made reproducible by fixing a random seed.
Interpretation Scope
The network captures association patterns inferred from shared compound targets; it does not directly encode regulatory direction, reaction stoichiometry, or kinetic parameters.
Co-connectivity should be interpreted as hypothesis-generating evidence for functional relationships that require additional biochemical, genomic, or regulatory validation.

Activity diagram of the use case¶

Click on the image to enlarge and explore details.