The Vision: Conversational Genomics
The Genomic Query Assistant enables researchers to:
- Ask questions in natural language through Claude Desktop
- Get immediate answers without any programming
- Perform complex multi-dimensional analyses
- Access integrated data from authoritative sources
- Support multiple languages (English, Spanish, French, German)
Beta Status
Getting Started with GenomicQuery
Ask questions about genomic variants in plain English through Claude Desktop. Our system translates your natural language queries into precise genomic analyses.
🗣️ Natural Language
Ask questions in plain English, and the Claude API translates your query into precise genomic DSL.
⚡ High Performance
Built with Polars (Rust-based DataFrames), processes millions of variants in seconds.
🔐 Secure & Private
Your data stays private. TLS encryption, API key authentication, role-based access control.
🎯 Accurate Annotations
GENCODE gene annotations, gnomAD frequencies, AlphaMissense predictions, ClinVar data.
🧬 Gene Analysis
Analyze variants in specific genes or lists of genes
⚠️ Pathogenicity
Integrated predictions from AlphaMissense and ClinVar
🌳 GO Analysis
Explore variant impact by biological pathways using Gene Ontology terms
🏥 Clinical Conditions
Map variants to clinical conditions from ClinVar
📊 Comparisons
Compare datasets for kinship, QC, or filtered analysis
Beta Status
Claude Desktop Integration
GenomicQuery integrates with Claude Desktop via MCP (Model Context Protocol). Follow these steps to get started.
Installation
Get Your Personal Package
Contact mglynias@gmail.com to receive your personalized installer package with API credentials.
Run the Installer
Mac: Double-click install.command
Windows: Double-click install.bat
The installer automatically configures Claude Desktop for you.
Restart Claude Desktop
Completely quit Claude Desktop (Cmd+Q on Mac) and reopen it.
Test the Connection
Ask Claude: "List my genomic datasets" or "Check if the genomic service is working"
Available Tools
- query_genomic_data - Execute natural language queries
- list_datasets - View all available datasets
- check_service_health - Verify connection status
- show_capabilities - List supported analyses
- share_dataset - Share with collaborators
- unshare_dataset - Remove access
- delete_dataset - Remove datasets
- initiate_upload - Start VCF upload
- upload_chunk - Stream file chunks
- check_upload_status - Monitor progress
- list_uploads - View active uploads
- cancel_upload - Stop upload
Troubleshooting
- Ensure you completely restarted Claude Desktop (quit and reopen)
- Check Claude Desktop MCP logs (Help → View Logs)
- Verify your API key in
~/.genomic-assistant/.env - Email mglynias@gmail.com with logs if issues persist
Query Examples
All queries use natural language. Our system supports 10+ query types with flexible filtering and combinations.
Dataset Management
Manage your genomic datasets, control access, and collaborate with others.
Public vs Private Datasets
📂 Public Datasets
Available to all beta users:
- NA12877 (GIAB sample)
- HG002, HG003, HG004 (Ashkenazi trio)
- HG005, HG006, HG007 (Han Chinese trio)
🔒 Private Datasets
Your uploaded datasets:
- Only visible to you by default
- Can be shared with specific users
- Full control over access
Dataset Operations
Dataset Formats
All datasets are stored in optimized Parquet format with:
- Gene annotations (from GENCODE)
- Population frequencies (from gnomAD when available)
- Pathogenicity predictions (AlphaMissense, ClinVar)
- Clinical conditions (ClinVar)
- GO term mappings
VCF Upload Process
Upload your VCF files through Claude Desktop. Files are automatically annotated and ready for analysis.
Supported Formats
✅ Accepted Formats
- .vcf (uncompressed)
- .vcf.gz (gzip compressed)
- Single-sample VCFs only
📋 Requirements
- Valid VCF header
- Human genome (GRCh38/hg38)
- Standard chromosomes (1-22, X, Y, MT)
Upload Steps
Initiate Upload
Ask Claude: "Upload my genome.vcf.gz"
Claude will use the MCP tools to start the upload process.
Streaming Upload
The file is streamed via WebSocket in chunks (fast Rust service handles this).
You'll see progress updates as it uploads.
Automatic Processing
Server automatically:
- Parses VCF format
- Adds gene annotations
- Integrates pathogenicity predictions
- Maps to GO terms
- Converts to optimized Parquet format
Ready for Queries
Your dataset is now available! Try: "How many variants are in my_genome?"
Monitor Uploads
Platform Features
Technical Architecture
Integrated Data Sources
- AlphaMissense - AI-based pathogenicity predictions for missense variants
- ClinVar - Clinical significance and disease associations
- gnomAD - Population allele frequencies
- GENCODE - Gene annotations and genomic coordinates
- Gene Ontology - Biological processes, molecular functions, cellular components
Query Types Supported
Count Queries
23+ variations with flexible filters
Statistics
26+ aggregation and grouped stats
Comparisons
5+ statistical tests
Pathogenicity
12+ prediction queries
GO Analysis
7+ enrichment queries
Clinical Conditions
11+ condition queries
Dataset Comparison
Kinship + filtered analysis
Sex Determination
Biological sex inference