🧬 GenomicQuery

Natural Language Genomic Variant Analysis BETA

The Vision: Conversational Genomics

The Genomic Query Assistant enables researchers to:

  • Ask questions in natural language through Claude Desktop
  • Get immediate answers without any programming
  • Perform complex multi-dimensional analyses
  • Access integrated data from authoritative sources
  • Support multiple languages (English, Spanish, French, German)
Ready to start? Check out the Getting Started tab for installation instructions and quick examples.

Beta Status

Current Status: GenomicQuery is in beta testing. We're actively gathering feedback from biologists and researchers. Contact mglynias@gmail.com to request beta access or report issues.

Getting Started with GenomicQuery

Ask questions about genomic variants in plain English through Claude Desktop. Our system translates your natural language queries into precise genomic analyses.

🗣️ Natural Language

Ask questions in plain English, and the Claude API translates your query into precise genomic DSL.

⚡ High Performance

Built with Polars (Rust-based DataFrames), processes millions of variants in seconds.

🔐 Secure & Private

Your data stays private. TLS encryption, API key authentication, role-based access control.

🎯 Accurate Annotations

GENCODE gene annotations, gnomAD frequencies, AlphaMissense predictions, ClinVar data.

🧬 Gene Analysis

Analyze variants in specific genes or lists of genes

⚠️ Pathogenicity

Integrated predictions from AlphaMissense and ClinVar

🌳 GO Analysis

Explore variant impact by biological pathways using Gene Ontology terms

🏥 Clinical Conditions

Map variants to clinical conditions from ClinVar

📊 Comparisons

Compare datasets for kinship, QC, or filtered analysis

Quick Start: Install the MCP client (see MCP Setup tab), restart Claude Desktop, and try asking: "List my genomic datasets" or "How many variants are in NA12877?"

Beta Status

Current Status: GenomicQuery is in beta testing. We're actively gathering feedback from biologists and researchers. Contact mglynias@gmail.com to request beta access or report issues.

Claude Desktop Integration

GenomicQuery integrates with Claude Desktop via MCP (Model Context Protocol). Follow these steps to get started.

Installation

1

Get Your Personal Package

Contact mglynias@gmail.com to receive your personalized installer package with API credentials.

2

Run the Installer

Mac: Double-click install.command

Windows: Double-click install.bat

The installer automatically configures Claude Desktop for you.

3

Restart Claude Desktop

Completely quit Claude Desktop (Cmd+Q on Mac) and reopen it.

4

Test the Connection

Ask Claude: "List my genomic datasets" or "Check if the genomic service is working"

What's Included: Fast Go binary (8.5 MB, no dependencies!), 12 MCP tools for queries and uploads, automatic authentication, cross-platform support (Mac Intel/M1/M2 and Windows).

Available Tools

Core Tools (7)
  • query_genomic_data - Execute natural language queries
  • list_datasets - View all available datasets
  • check_service_health - Verify connection status
  • show_capabilities - List supported analyses
  • share_dataset - Share with collaborators
  • unshare_dataset - Remove access
  • delete_dataset - Remove datasets
Upload Tools (5)
  • initiate_upload - Start VCF upload
  • upload_chunk - Stream file chunks
  • check_upload_status - Monitor progress
  • list_uploads - View active uploads
  • cancel_upload - Stop upload

Troubleshooting

Tools not showing?
  • Ensure you completely restarted Claude Desktop (quit and reopen)
  • Check Claude Desktop MCP logs (Help → View Logs)
  • Verify your API key in ~/.genomic-assistant/.env
  • Email mglynias@gmail.com with logs if issues persist

Query Examples

All queries use natural language. Our system supports 10+ query types with flexible filtering and combinations.

🔢 Basic Variant Counting
"How many variants are there in NA12877?"
Count all variants in a dataset
"Count exonic variants in NA12877"
Filter by genomic region type
"Count variants with quality > 100 in NA12877"
Apply quality filters
"Count substitutions in NA12877"
Filter by mutation type (substitution, insertion, deletion)
🧬 Gene-Specific Analysis
"How many variants in BRCA1 in NA12877?"
Single gene analysis
"Count variants in BRCA1, BRCA2, TP53 in NA12877"
Multi-gene panel analysis
"Count high quality exonic variants in TTN in NA12877"
Combine gene + quality + region filters
📊 Statistical Analysis
"Mean quality in NA12877"
Calculate aggregate statistics
"Mean quality by mutation type in NA12877"
Grouped statistics
"Test if substitution has higher QUAL than insertion in NA12877"
Statistical significance testing
⚠️ Pathogenicity Analysis
"Show pathogenic variants in NA12877"
All pathogenic predictions (AlphaMissense + ClinVar)
"Count pathogenic variants from alphamissense in BRCA2 in NA12877"
Source-specific predictions for a gene
"Show pathogenic variants with quality > 100 in NA12877"
High-confidence pathogenic variants
🌳 Gene Ontology (GO) Analysis
"Which biological processes have the most significant impact in NA12877?"
GO enrichment analysis
"Show pathogenic variants in DNA repair genes in NA12877"
GO term filtering (421 genes in DNA repair)
"Show descendants of DNA damage response"
Navigate GO hierarchy
🏥 Clinical Conditions
"Show condition summary in NA12877"
ClinVar clinical conditions with variant counts
"Show conditions where total_variants > 5 in NA12877"
Filter by variant count threshold
"Using only variants marked pathogenic by alphamissense, show conditions in NA12877"
Combine pathogenicity + conditions
📊 Dataset Comparisons
"compare HG002 to HG003"
Kinship analysis (detects parent-child, siblings, etc.)
"compare HG004 and HG002 for BRCA1 and BRCA2"
Filtered comparison for specific genes
"compare HG004 and HG002 for DNA repair genes"
GO term-based comparison (421 genes)
"compare HG004 and HG002 for clinical conditions"
Compare ClinVar variants only
👤 Sex Determination
"What sex is NA12877?"
Determine biological sex from variant data
Tip: Combine filters freely! Try "Count homozygous pathogenic variants in BRCA2 with quality > 100 in NA12877"

Dataset Management

Manage your genomic datasets, control access, and collaborate with others.

Public vs Private Datasets

📂 Public Datasets

Available to all beta users:

  • NA12877 (GIAB sample)
  • HG002, HG003, HG004 (Ashkenazi trio)
  • HG005, HG006, HG007 (Han Chinese trio)

🔒 Private Datasets

Your uploaded datasets:

  • Only visible to you by default
  • Can be shared with specific users
  • Full control over access

Dataset Operations

Ask Claude: "List my genomic datasets"
View all available datasets (public + your private ones)
Ask Claude: "Share my NA12877_copy dataset with colleague@example.com"
Grant read access to a collaborator
Ask Claude: "Stop sharing test_dataset with user@example.com"
Revoke access from a user
Ask Claude: "Delete old_dataset"
Remove a dataset (requires confirmation)
Privacy Note: Only you (the dataset owner) can share or delete your private datasets. Shared users have read-only access.

Dataset Formats

All datasets are stored in optimized Parquet format with:

  • Gene annotations (from GENCODE)
  • Population frequencies (from gnomAD when available)
  • Pathogenicity predictions (AlphaMissense, ClinVar)
  • Clinical conditions (ClinVar)
  • GO term mappings

VCF Upload Process

Upload your VCF files through Claude Desktop. Files are automatically annotated and ready for analysis.

Supported Formats

✅ Accepted Formats

  • .vcf (uncompressed)
  • .vcf.gz (gzip compressed)
  • Single-sample VCFs only

📋 Requirements

  • Valid VCF header
  • Human genome (GRCh38/hg38)
  • Standard chromosomes (1-22, X, Y, MT)

Upload Steps

1

Initiate Upload

Ask Claude: "Upload my genome.vcf.gz"

Claude will use the MCP tools to start the upload process.

2

Streaming Upload

The file is streamed via WebSocket in chunks (fast Rust service handles this).

You'll see progress updates as it uploads.

3

Automatic Processing

Server automatically:

  • Parses VCF format
  • Adds gene annotations
  • Integrates pathogenicity predictions
  • Maps to GO terms
  • Converts to optimized Parquet format
4

Ready for Queries

Your dataset is now available! Try: "How many variants are in my_genome?"

Monitor Uploads

Ask Claude: "Check upload progress"
Monitor active upload status
Ask Claude: "List all active uploads"
View all in-progress uploads
Ask Claude: "Cancel the upload"
Stop an active upload
Performance: Upload speed depends on file size and compression. A typical 1GB compressed VCF uploads in 2-5 minutes and processes in 5-10 minutes.

Platform Features

Technical Architecture

Claude Desktop (You) MCP Go MCP Client (Local) 8.5 MB • 12 tools HTTPS/WSS Python Server (Hetzner Cloud) NL → DSL Translation Query Engines (Polars) Rust Upload Service (Hetzner Cloud) Streaming VCF Processing Fast • Concurrent

Integrated Data Sources

  • AlphaMissense - AI-based pathogenicity predictions for missense variants
  • ClinVar - Clinical significance and disease associations
  • gnomAD - Population allele frequencies
  • GENCODE - Gene annotations and genomic coordinates
  • Gene Ontology - Biological processes, molecular functions, cellular components

Query Types Supported

Count Queries

23+ variations with flexible filters

Statistics

26+ aggregation and grouped stats

Comparisons

5+ statistical tests

Pathogenicity

12+ prediction queries

GO Analysis

7+ enrichment queries

Clinical Conditions

11+ condition queries

Dataset Comparison

Kinship + filtered analysis

Sex Determination

Biological sex inference

Beta Status

Current Status: GenomicQuery is in beta testing. We're actively gathering feedback from biologists and researchers. Contact mglynias@gmail.com to request beta access or report issues.