Repository: github.com/srodriguezloya/omop-development-environment

Introduction

In my previous post, I covered the OHDSI ecosystem explaining what each tool does, when you need it, and how the components work together. That guide focused on understanding the architecture and making informed deployment decisions for production environments.

This post tackles a different but equally important challenge: how do you actually learn and experiment with the OHDSI stack without breaking the bank?

The OHDSI community provides an excellent quick-start solution called OHDSI-in-a-Box, designed for rapid deployment on AWS. It’s purpose-built for personal learning and training environments—you can have a complete OHDSI stack running in minutes.

There’s just one problem: it costs approximately $40-44 per month to run on AWS.

For personal learning, weekend experimentation, or training workshops, those costs add up quickly. I wanted the same “one-click” deployment experience, but without the monthly AWS bill.

So I set up a local alternative—and deployed it on an old laptop sitting in my closet.

What’s Included

Core Services:

  • PostgreSQL - Database server with multiple schemas
  • WebAPI - Spring Boot REST API for OHDSI tools
  • Atlas - Web-based research interface
  • Achilles - Database characterization tool (via Docker profile)

Supporting Infrastructure:

  • OMOP Vocabulary loading - Automated scripts for vocabulary management
  • Synthea Integration - Generates synthetic patient data and ETL to OMOP CDM

Quick Start

Requirements

  • Docker and Docker Compose installed
  • Java 11+ (for Synthea data generation)

Installation

1. Clone and configure:

git clone https://github.com/srodriguezloya/omop-development-environment.git 
cd omop-development-environment

# Copy environment file
cp .env.example .env

# Edit .env and change POSTGRES_PASSWORD
vim .env

2. Initial setup:

# Make scripts executable
chmod +x scripts/*.sh

# Run setup (first time only)
./scripts/setup.sh

This creates directories, downloads OMOP CDM DDL files, and prepares the environment.

Download OMOP Vocabularies

CRITICAL: OMOP vocabularies are required for the ETL to work.

  1. Go to https://athena.ohdsi.org/
  2. Create a free account
  3. Select vocabularies (minimum required):
    • SNOMED (required)
    • RxNorm (required for medications)
    • LOINC (required for labs)
    • Optional but recommended: ICD10CM, CPT4
  4. Click “Download Vocabularies”
  5. Wait for email notification (a few minutes)
  6. Download the ZIP file
  7. Extract to ./sample-data/vocabulary/

Note: For limited disk space, select only SNOMED, RxNorm, and LOINC.

Start Services

./scripts/start.sh

Wait 2-3 minutes for all services to become healthy. The script will show you when they’re ready.

Configure Data Source

Register your CDM database with WebAPI:

./scripts/add-webapi-source.sh 1 "Synthea OMOP CDM" SYNTHEA

Load Sample Data

# Load 50 synthetic patients (recommended for first run)
./scripts/load-synthea-data.sh 50

# Or more patients (takes longer)
./scripts/load-synthea-data.sh 100

# Load 1000 patients (default)
./scripts/load-synthea-data.sh

First run takes longer (loads vocabularies). Subsequent runs are faster.

Run Achilles

Generate database characterization statistics:

docker compose --profile tools up achilles

Results appear in Atlas under “Data Sources”. Runtime varies by data size and hardware.

Access Services

Service URL Description
Atlas http://localhost:8081/atlas OHDSI Atlas - Data exploration UI
WebAPI http://localhost:8080/WebAPI OHDSI WebAPI - REST API
PostgreSQL localhost:5432 Database (credentials in .env)

Common Operations

Add More Data

# Add 200 more patients
./scripts/load-synthea-data.sh 200

# Generate patients from different states
./scripts/load-synthea-data.sh 100 California

View Logs

# All services
docker compose logs -f

# Specific service
docker compose logs -f webapi
docker compose logs -f atlas

Database Access

# Using the helper script
./scripts/psql.sh

# Or directly
docker exec -it omop-postgres psql -U ohdsi_admin -d ohdsi

Stop Services

./scripts/stop.sh

# Or with Docker Compose
docker compose down

Reset Everything

docker compose down -v  # Removes volumes (deletes all data!)
./scripts/setup.sh      # Re-initialize
./scripts/start.sh      # Start fresh

Troubleshooting

“No Data Sources” in Atlas

WebAPI hasn’t been configured yet.

Solution:

./scripts/add-webapi-source.sh 1 "Synthea OMOP CDM" SYNTHEA
docker compose restart webapi atlas

Services Won’t Start

Check if ports are already in use:

# Check port 5432 (PostgreSQL)
lsof -i :5432

# Check port 8080 (WebAPI)
lsof -i :8080

# Check port 8081 (Atlas)
lsof -i :8081

Vocabularies Not Loading

Ensure vocabulary files are in the correct location:

ls -la ./sample-data/vocabulary/
# Should show: CONCEPT.csv, VOCABULARY.csv, etc.

Conclusion

Running the OHDSI stack locally removes the financial barrier to learning. Cost: $0/month instead of $44/month on AWS.

That old laptop you have sitting around is probably sufficient—mine is a 7-year-old HP Pavilion, and it works great.

Plus, there’s something satisfying about pointing at an old laptop in the corner and saying “that’s my OHDSI research server.” :)