Reproducing CEHR-XGPT: A Beginner's Journey into EHR Foundation Models
Introduction In my previous post, I set up a local OHDSI development environment with synthetic data from Synthea. As I continued learning about the OMOP Common Data Model, I became interested in a specific question: How can I generate realistic synthetic patient data from an OMOP instance? While searching for approaches, I found CEHR-XGPT (pronounced “seer-ex-gpt”), a foundation model for electronic health records developed by Chao Pang and colleagues at Columbia University. I think it’s a fantastic piece of work—the idea of using time tokens to preserve temporal structure is elegant, and the fact that a single model can handle feature extraction, prediction, and synthetic generation is impressive. ...
Learning ARX: A Hands-On GUI Tutorial
A quick hands-on exercise to understand ARX’s features through its GUI, using the bundled adult dataset (US Census data). Setup Download the ARX repository to access the example data: git clone https://github.com/arx-deidentifier/arx.git The example dataset is at arx/data/adult.csv — US Census data from the UCI ML Repository. Creating a Project and Importing Data In ARX GUI: File → New Project File → Import data → CSV file Select adult.csv ...
De-identifying OMOP Databases
A survey of existing tools and approaches for de-identifying OMOP data. Why De-identification Matters De-identification is the process of removing or transforming data elements that could identify individuals. In healthcare, this typically means addressing the 18 identifiers specified under HIPAA’s Safe Harbor provision, or demonstrating through Expert Determination that re-identification risk is “very small.” The practical payoff: properly de-identified data is no longer considered PHI under HIPAA. That means: ...
Local OHDSI Development Environment
Repository: github.com/srodriguezloya/omop-development-environment Introduction In my previous post, I covered the OHDSI ecosystem explaining what each tool does, when you need it, and how the components work together. That guide focused on understanding the architecture and making informed deployment decisions for production environments. This post tackles a different but equally important challenge: how do you actually learn and experiment with the OHDSI stack without breaking the bank? The OHDSI community provides an excellent quick-start solution called OHDSI-in-a-Box, designed for rapid deployment on AWS. It’s purpose-built for personal learning and training environments—you can have a complete OHDSI stack running in minutes. ...
OHDSI Stack Implementation Guide: Achilles, DQD, WebAPI, Atlas, and ARES for OMOP CDM Deployments
Introduction If you’re implementing OMOP CDM for your organization, you’ve likely asked: Do we need to deploy the full OHDSI stack, or just transform our data to OMOP CDM? The OMOP Common Data Model is a data standard—a specification for how to structure observational healthcare data. The OHDSI stack (Achilles, Data Quality Dashboard, WebAPI, Atlas, ARES) consists of tools built to work with that standardized data. Understanding what each tool provides helps you decide which ones your use case requires. ...
Setting Up a Self-Hosted Infrastructure with Traefik and Docker
I recently migrated my personal infrastructure from nginx to Traefik for reverse proxy and SSL management. Here’s what I learned. The Setup Running multiple services on a single VPS: OpenEMR for family health records This blog (Hugo static site) Future services as needed Why Traefik? Coming from nginx, Traefik offers several advantages: Automatic SSL certificate management with Let’s Encrypt Dynamic service discovery through Docker labels No manual config edits for new services The Migration Process Previously, I was using nginx with cronginx - a Docker container combining Cron, Nginx, and Certbot for automated Let’s Encrypt certificate management. It worked reliably for quite some time. ...