GenomicQuery - Natural Language Genomic Analysis

The Vision: Conversational Genomics

The Genomic Query Assistant enables researchers to:

Ask questions in natural language through Claude Desktop
Get immediate answers without any programming
Perform complex multi-dimensional analyses
Access integrated data from authoritative sources
Support multiple languages (English, Spanish, French, German)

Ready to start? Check out the Getting Started tab for installation instructions and quick examples.

Beta Status

Current Status: GenomicQuery is in beta testing. We're actively gathering feedback from biologists and researchers. Contact mglynias@gmail.com to request beta access or report issues.

Getting Started with GenomicQuery

Ask questions about genomic variants in plain English through Claude Desktop. Our system translates your natural language queries into precise genomic analyses.

🗣️ Natural Language

Ask questions in plain English, and the Claude API translates your query into precise genomic DSL.

⚡ High Performance

Built with Polars (Rust-based DataFrames), processes millions of variants in seconds.

🔐 Secure & Private

Your data stays private. TLS encryption, API key authentication, role-based access control.

🎯 Accurate Annotations

GENCODE gene annotations, gnomAD frequencies, AlphaMissense predictions, ClinVar data.

🧬 Gene Analysis

Analyze variants in specific genes or lists of genes

⚠️ Pathogenicity

Integrated predictions from AlphaMissense and ClinVar

🌳 GO Analysis

Explore variant impact by biological pathways using Gene Ontology terms

🏥 Clinical Conditions

Map variants to clinical conditions from ClinVar

📊 Comparisons

Compare datasets for kinship, QC, or filtered analysis

Quick Start: Install the MCP client (see MCP Setup tab), restart Claude Desktop, and try asking: "List my genomic datasets" or "How many variants are in NA12877?"

Beta Status

Current Status: GenomicQuery is in beta testing. We're actively gathering feedback from biologists and researchers. Contact mglynias@gmail.com to request beta access or report issues.

Claude Desktop Integration

GenomicQuery integrates with Claude Desktop via MCP (Model Context Protocol). Follow these steps to get started.

Installation

Get Your Personal Package

Contact mglynias@gmail.com to receive your personalized installer package with API credentials.

Run the Installer

Mac: Double-click install.command

Windows: Double-click install.bat

The installer automatically configures Claude Desktop for you.

Restart Claude Desktop

Completely quit Claude Desktop (Cmd+Q on Mac) and reopen it.

Test the Connection

Ask Claude: "List my genomic datasets" or "Check if the genomic service is working"

What's Included: Fast Go binary (8.5 MB, no dependencies!), 12 MCP tools for queries and uploads, automatic authentication, cross-platform support (Mac Intel/M1/M2 and Windows).

Available Tools

Core Tools (7)

query_genomic_data - Execute natural language queries
list_datasets - View all available datasets
check_service_health - Verify connection status
show_capabilities - List supported analyses
share_dataset - Share with collaborators
unshare_dataset - Remove access
delete_dataset - Remove datasets

Upload Tools (5)

initiate_upload - Start VCF upload
upload_chunk - Stream file chunks
check_upload_status - Monitor progress
list_uploads - View active uploads
cancel_upload - Stop upload

Troubleshooting

Tools not showing?

Ensure you completely restarted Claude Desktop (quit and reopen)
Check Claude Desktop MCP logs (Help → View Logs)
Verify your API key in ~/.genomic-assistant/.env
Email mglynias@gmail.com with logs if issues persist

Query Examples

All queries use natural language. Our system supports 10+ query types with flexible filtering and combinations.

🔢 Basic Variant Counting

▶

"How many variants are there in NA12877?"

Count all variants in a dataset

"Count exonic variants in NA12877"

Filter by genomic region type

"Count variants with quality > 100 in NA12877"

Apply quality filters

"Count substitutions in NA12877"

Filter by mutation type (substitution, insertion, deletion)

🧬 Gene-Specific Analysis

▶

"How many variants in BRCA1 in NA12877?"

Single gene analysis

"Count variants in BRCA1, BRCA2, TP53 in NA12877"

Multi-gene panel analysis

"Count high quality exonic variants in TTN in NA12877"

Combine gene + quality + region filters

📊 Statistical Analysis

▶

"Mean quality in NA12877"

Calculate aggregate statistics

"Mean quality by mutation type in NA12877"

Grouped statistics

"Test if substitution has higher QUAL than insertion in NA12877"

Statistical significance testing

⚠️ Pathogenicity Analysis

▶

"Show pathogenic variants in NA12877"

All pathogenic predictions (AlphaMissense + ClinVar)

"Count pathogenic variants from alphamissense in BRCA2 in NA12877"

Source-specific predictions for a gene

"Show pathogenic variants with quality > 100 in NA12877"

High-confidence pathogenic variants

🌳 Gene Ontology (GO) Analysis

▶

"Which biological processes have the most significant impact in NA12877?"

GO enrichment analysis

"Show pathogenic variants in DNA repair genes in NA12877"

GO term filtering (421 genes in DNA repair)

"Show descendants of DNA damage response"

Navigate GO hierarchy

🏥 Clinical Conditions

▶

"Show condition summary in NA12877"

ClinVar clinical conditions with variant counts

"Show conditions where total_variants > 5 in NA12877"

Filter by variant count threshold

"Using only variants marked pathogenic by alphamissense, show conditions in NA12877"

Combine pathogenicity + conditions

📊 Dataset Comparisons

▶

"compare HG002 to HG003"

Kinship analysis (detects parent-child, siblings, etc.)

"compare HG004 and HG002 for BRCA1 and BRCA2"

Filtered comparison for specific genes

"compare HG004 and HG002 for DNA repair genes"

GO term-based comparison (421 genes)

"compare HG004 and HG002 for clinical conditions"

Compare ClinVar variants only

👤 Sex Determination

▶

"What sex is NA12877?"

Determine biological sex from variant data

Tip: Combine filters freely! Try "Count homozygous pathogenic variants in BRCA2 with quality > 100 in NA12877"

Dataset Management

Manage your genomic datasets, control access, and collaborate with others.

Public vs Private Datasets

📂 Public Datasets

Available to all beta users:

NA12877 (GIAB sample)
HG002, HG003, HG004 (Ashkenazi trio)
HG005, HG006, HG007 (Han Chinese trio)

🔒 Private Datasets

Your uploaded datasets:

Only visible to you by default
Can be shared with specific users
Full control over access

Dataset Operations

Ask Claude: "List my genomic datasets"

View all available datasets (public + your private ones)

Ask Claude: "Share my NA12877_copy dataset with colleague@example.com"

Grant read access to a collaborator

Ask Claude: "Stop sharing test_dataset with user@example.com"

Revoke access from a user

Ask Claude: "Delete old_dataset"

Remove a dataset (requires confirmation)

Privacy Note: Only you (the dataset owner) can share or delete your private datasets. Shared users have read-only access.

Dataset Formats

All datasets are stored in optimized Parquet format with:

Gene annotations (from GENCODE)
Population frequencies (from gnomAD when available)
Pathogenicity predictions (AlphaMissense, ClinVar)
Clinical conditions (ClinVar)
GO term mappings

VCF Upload Process

Upload your VCF files through Claude Desktop. Files are automatically annotated and ready for analysis.

Supported Formats

✅ Accepted Formats

.vcf (uncompressed)
.vcf.gz (gzip compressed)
Single-sample VCFs only

📋 Requirements

Valid VCF header
Human genome (GRCh38/hg38)
Standard chromosomes (1-22, X, Y, MT)

Upload Steps

Initiate Upload

Ask Claude: "Upload my genome.vcf.gz"

Claude will use the MCP tools to start the upload process.

Streaming Upload

The file is streamed via WebSocket in chunks (fast Rust service handles this).

You'll see progress updates as it uploads.

Automatic Processing

Server automatically:

Parses VCF format
Adds gene annotations
Integrates pathogenicity predictions
Maps to GO terms
Converts to optimized Parquet format

Ready for Queries

Your dataset is now available! Try: "How many variants are in my_genome?"

Monitor Uploads

Ask Claude: "Check upload progress"

Monitor active upload status

Ask Claude: "List all active uploads"

View all in-progress uploads

Ask Claude: "Cancel the upload"

Stop an active upload

Performance: Upload speed depends on file size and compression. A typical 1GB compressed VCF uploads in 2-5 minutes and processes in 5-10 minutes.

Platform Features

Technical Architecture

Integrated Data Sources

AlphaMissense - AI-based pathogenicity predictions for missense variants
ClinVar - Clinical significance and disease associations
gnomAD - Population allele frequencies
GENCODE - Gene annotations and genomic coordinates
Gene Ontology - Biological processes, molecular functions, cellular components

Query Types Supported

Count Queries

23+ variations with flexible filters

Statistics

26+ aggregation and grouped stats

Comparisons

5+ statistical tests

Pathogenicity

12+ prediction queries

GO Analysis

7+ enrichment queries

Clinical Conditions

11+ condition queries

Dataset Comparison

Kinship + filtered analysis

Sex Determination

Biological sex inference

Beta Status

Current Status: GenomicQuery is in beta testing. We're actively gathering feedback from biologists and researchers. Contact mglynias@gmail.com to request beta access or report issues.