UC-5.3 — Regulatory Relevance of Samples¶

Module: 5 – Modeling Interactions of Samples, Genes, and Compounds
Visualization type: Chord diagram (bipartite sample–agency interaction network)
Primary inputs: BioRemPP results table with sample and referenceAG columns
Primary outputs: Interaction matrix of samples × regulatory agencies (co-occurrence counts)

Scientific Question and Rationale¶

Question: Which samples are most co-annotated with compounds monitored by different environmental regulatory agencies?

This use case quantifies how strongly each biological sample is co-annotated with the regulatory context represented in the dataset. By summarizing co-annotation frequencies between samples and environmental or regulatory agencies (referenceAG), the analysis can reveal which samples are most frequently co-annotated with compounds under formal monitoring. A chord diagram is used to provide an integrated, system-level view of sample–agency co-annotation patterns, which may highlight samples with broad or focused regulatory compound coverage.

Data and Inputs¶

Primary data source: BioRemPP_Results.xlsx or BioRemPP_Results.csv
Key columns:
sample – identifier for each biological sample
referenceAG – regulatory or scientific agency label (e.g., WFD, CONAMA, EPC)
Accepted format: semicolon-delimited text table (.txt or .csv)
Derived structure: interaction matrix with:
rows = samples
columns = regulatory agencies
cell = interaction count for each sample–agency pair

Analytical Workflow¶

Data Loading
The primary results table (BioRemPP_Results.xlsx or BioRemPP_Results.csv) is loaded from its semicolon-delimited format.
Filtering
The dataset is filtered to retain only rows containing valid entries for both sample and referenceAG. Incomplete records are discarded.
Aggregation (Interaction Strength)
The filtered data is grouped by unique (sample, referenceAG) pairs:
for each pair, the total number of co-occurrence records (rows) is counted,
this count provides a measure of interaction strength between the sample and the agency's monitored chemical space.
Chord Matrix / Edge List Construction
The aggregated counts are arranged into a matrix or edge list suitable for chord diagram rendering, where:
each sample is treated as one set of nodes,
each regulatory agency (referenceAG) is treated as the other set,
the edge weight between them is the interaction count.
Rendering
A chord diagram is generated:
arcs on the circumference represent both samples and regulatory agencies,
ribbons (chords) connect each sample to the agencies with which it is associated,
chord thickness encodes interaction strength.

How to Read the Plot¶

Outer Arcs (Nodes)
Each colored arc along the circle represents either:
a Sample, or
a Regulatory Agency (referenceAG).
The length of an arc is proportional to the total number of interactions (sum of counts) associated with that entity.
Chords (Ribbons)
The ribbons spanning between arcs represent Sample–Agency relationships:
one end of the ribbon is anchored at a sample arc,
the other at an agency arc.
Chord Thickness
The thickness of a chord where it connects to an arc is proportional to the interaction strength:
thicker chords may denote stronger associations (more co-occurrences),
thinner chords reflect weaker or less frequent associations.

Representative Output¶

The image below illustrates a representative output generated by this use case using the example dataset.

Click on the image to enlarge and explore details.

Interpretation and Key Messages¶

High Co-annotation Frequency with Regulatory Compounds A thick chord between a given sample and a specific agency may indicate high co-annotation frequency of that sample with the agency's monitored compounds:
the sample is co-annotated with many compounds listed under that agency's purview,
suggesting it could be a relevant candidate for further investigation in that regulatory context.
Broad vs. Focused Regulatory Coverage
Samples connected by multiple thick chords to several agencies show broad co-annotation coverage across diverse regulatory frameworks.
Samples characterized by one dominant thick chord show more concentrated co-annotation coverage for a specific agency's compound list.
Agency-Level Co-annotation Footprint Agencies that receive many substantial chords from different samples may have a broad co-annotation footprint in the dataset:
their monitored compound lists are widely co-annotated across the available samples,
suggesting that many samples carry KO annotations linked to compounds on those lists.
Hypothesis Generation for Policy Alignment The chord diagram can help explore alignment between sample annotations and regulatory priorities:
samples can be compared based on how broadly they cover the compounds of regulatory agencies of interest,
providing an annotation-level basis for prioritizing experimental follow-up.

Reproducibility and Assumptions¶

Input Format
The analysis assumes a semicolon-delimited table containing at least the columns sample and referenceAG.
Interaction Definition
Interaction strength is defined as the total number of co-occurrence records for each (sample, referenceAG) pair in the raw data:
multiple rows linking the same sample and agency (e.g., via different compounds or genes) increase the aggregate count,
the chord diagram therefore reflects overall intensity of association, not unique compound or KO counts.
Scope and Limitations
The visualization summarizes co-annotation frequency with regulatory contexts, not compliance status, toxicity levels, or pathway completeness.
It should be interpreted as a high-level annotation map to prioritize detailed downstream analyses, rather than a standalone compliance assessment.

Activity diagram of the use case¶

Click on the image to enlarge and explore details.