Introduction to GammaHelix

Welcome to the GammaHelix platform. This guide provides a comprehensive overview of how to format your inputs, select target modalities, and interpret the quantitative metrics generated by the underlying AlphaGenome predictive architecture.


1. System Overview

GammaHelix is a localized, secure interface designed to translate high-dimensional deep learning outputs into actionable, quantitative biological insights. * Core Engine: AlphaGenome * Context Window: 1,048,576 base pairs (1Mb) * Data Privacy: Stateless architecture. Sequences and API keys are processed in memory and immediately discarded upon session termination.


2. Input Specifications

To ensure accurate structural and functional predictions, please adhere to the following input formatting rules.

Sequence Formatting (FASTA/Raw)

The platform accepts raw DNA sequences (A, T, G, C) or standard FASTA formatted text. * Minimum Length: While there is no strict minimum, sequences >1.2kb are highly recommended for optimal peak detection. * Maximum Length: 1,048,576 bp. * Algorithmic Padding: If a sequence under 1Mb is provided, GammaHelix automatically centers your sequence and pads the flanks with N (unknown nucleotides) to fulfill the model's strict tensor shape requirements without distorting central regulatory elements.

Variant Definition


3. Interpreting the Metrics

GammaHelix moves beyond raw numerical arrays by automatically calculating four key metrics for every tissue modality evaluated, utilizing SciPy signal processing on the backend.

Accessibility Change (%)

Calculates the relative percentage difference in overall chromatin accessibility or expression between the Reference sequence and the Altered sequence. * Positive Value (+): Indicates an overall gain in accessibility or expression. * Negative Value (-): Indicates a loss of accessibility (e.g., a disrupted enhancer or silenced promoter).

Max Delta

The maximum absolute difference found at any single base pair within the viewed genomic window. This isolates the exact locus of highest biological disruption.

Peak Status

Utilizes algorithmic peak-finding to count definitive regulatory binding sites across the specific viewport. * Stable: No peaks were created or destroyed. * Lost (-X): The mutation destroyed existing binding sites. * Gained (+X): The mutation created novel, anomalous binding sites.

Confidence Score

A heuristic measure of the signal-to-noise ratio based on the amplitude of the accessibility change. * High: The variant caused an Accessibility Change > 5% (Strong evidence of impact). * Moderate: The variant caused an Accessibility Change < 5% (Impact may be subtle or localized to a single base).


4. Supported Modalities & Tissues

The GammaHelix pipeline supports prediction across 5,930 distinct tissue types, cell lines, and biological states (including ATAC-seq, DNAse, and RNA-seq tracks).

Because rendering a list of nearly 6,000 items can cause browser latency, GammaHelix dynamically fetches the active ontology list directly from AlphaGenome when you authenticate your workspace.

Pro-Tip: You can input standard search terms (e.g., pancreas, heart, HeLa) or exact ENCODE/UBERON IDs (e.g., UBERON:0001225) into the target ontologies field. The engine will automatically map your input to the correct predictive track.

Supported Modalities

The GammaHelix pipeline supports prediction across 5,930 distinct tissue types and cell lines.

Because the complete ontology list is extensive, we recommend using standard search terms (e.g., "Pancreas", "HeLa", "UBERON:0001225") in the dashboard input field.

📥 Download the complete supported tissue ontology (.csv)