Probabilistic Inference

Bayesian Unmixing for Enhanced Rock Weathering

When field measurements—soil chemistry, geochemical assays—carry inherent uncertainty, how do you extract a reliable answer? Standard methods collapse noisy data into a single estimate with no error bars. You can’t tell if it’s trustworthy or how it would change with more evidence.

Pipeline overview of the Bayesian unmixing framework for Enhanced Rock Weathering showing end-member characterization, ILR transformation, MCMC sampling, and posterior inference over mixing fractions and weathering extent

Methodology & Results

A field soil sample after ERW treatment is a physical mixture of three end-members—fresh feedstock rock, weathered residue, and native soil—measured through elemental concentrations (Ca, Mg, Ti) that are inherently compositional and live on the simplex. The framework applies Isometric Log-Ratio (ILR) transforms to map compositions into unconstrained Euclidean space where Gaussian assumptions hold, then jointly infers mixing fractions, end-member ILR means, covariance scales, and total concentrations across an 11-dimensional parameter space using the NUTS sampler within a fully differentiable PyTensor computation graph. Weathering extent τ = f₂ / (f₁ + f₂) is derived from posterior samples of the rock and weathered fractions, yielding full probability distributions rather than point estimates—critical in the high-soil regime (f₃ ≈ 0.88) where small shifts in inferred fractions can flip τ from 33% to 67%, with direct consequences for carbon credit valuation.

In Progress

Probabilistic Inference

Multi-Horizon Completion Optimization for the Bakken Field

Prediction models tell you what might happen, but they don’t tell you what to do. How do you turn a production forecast into an optimal design recommendation—automatically, at portfolio scale?

Pipeline overview showing the three-stage framework: spatial-temporal feature engineering, XGBoost multi-horizon surrogate modeling, and constraint-based genetic algorithm optimization for Bakken completion design

Methodology & Results

Spatial-temporal feature engineering replaces raw coordinates with physically meaningful inter-well interference metrics—parent-child relationships, formation-specific lateral distances, and temporally aligned cumulative production from 9,000+ neighboring wells—eliminating data leakage while preserving spatial dependence. An XGBoost multi-horizon surrogate trained on these features predicts cumulative production at five intervals (3–24 months), with SHAP-based interpretability confirming that pressure gradient and parent cumulative liquid production are the dominant drivers across all horizons. A constrained genetic algorithm then optimizes three completion decision variables (stages, fluid volume, proppant mass) subject to 12 engineering constraints—including proppant-fluid ratio limits, per-stage loading bounds, and stage spacing—maximizing NPV with explicit cost accounting, yielding a $2.47M average uplift per well and a 3.5:1 marginal return on additional completion investment.

Paper in Submission

Generative Virtualization

Arbitrary-Scale 3D Porous Media Generation Conditioned on Field-Scale Characterization

Field-scale characterization models need high-resolution rock properties as input, but micro-CT scans cost $250K–$500K each, take weeks, and yield only sparse snapshots of the subsurface. How do you fill the resolution gap without exhaustive sampling?

Pipeline overview showing VQ-VAE tokenization of micro-CT sub-volumes followed by autoregressive transformer assembly into supra-REV porous media conditioned on spatial porosity fields

Methodology & Results

A VQ-VAE compresses each 64³ micro-CT sub-volume into discrete codebook tokens drawn from 3,000 learned pore-structure patterns, while an autoregressive transformer assembles token sets sequentially in spatial order—each conditioned on the local porosity extracted from field-scale geological models—to maintain grain fabric and pore connectivity across sub-volume boundaries. Trained on near-REV 8-neighborhood sequences (128³ voxels), the framework extrapolates to supra-REV volumes of up to 729 neighborhoods (576³ voxels), a 91-fold increase in assembled components. Validated on Mt. Simon Sandstone micro-CT data, spatial porosity conditioning achieves absolute permeability MAE of ~54 md across 384³ to 576³ voxel volumes, and accurately reproduces CO₂–brine drainage relative permeability when the generated pore network preserves topological connectivity (Euler characteristic).

Paper in Submission · Preprint

Generative Virtualization

Physics-Informed GAN for 3D Rock Microstructure

How do you generate realistic 3D rock microstructures with user-defined physical properties for CO₂ storage simulation, when direct measurements are sparse and expensive?

GAN physical simulator interaction workflow showing iterative latent vector optimization with pore network model feedback

Methodology & Results

A Wasserstein GAN with gradient penalty (WGAN-GP) is pre-trained on segmented Berea sandstone micro-CT sub-volumes (128³ voxels) to learn realistic 3D pore-structure distributions. In the post-training stage, a gradual Gaussian deformation scheme iteratively perturbs the latent vector z via trigonometric mixing of independent Gaussian realizations, with each candidate structure evaluated by an OpenPNM pore network model that computes porosity, absolute permeability, and mean pore/throat size distributions. The mismatch between simulated and target properties is back-propagated through gradient descent to update the deformation parameter, converging within ~15 iterations per sample. The resulting framework produces controllable 3D microstructures that honor well-derived rock properties, providing a direct pore-to-field-scale linkage for upscaling multiphase CO₂-brine relative permeability in CCUS reservoir simulations.

Paper in Submission · Preprint

Predictive Analytics

Remaining Useful Life Estimation via Extended Similarity-Based Matching

Predicting when a machine will fail is critical for maintenance planning, but deep learning methods require large training datasets and lose accuracy when observations are sparse. Can a non-parametric approach that leverages historical degradation profiles outperform neural networks in small-data regimes?

Top-20 matched reference models with uncertainty envelopes (red dashed) overlaid on a test asset's health index trajectory (blue dots), illustrating uncertainty-aware similarity matching

Red curves: reference machine RUL profile libraries with uncertainty envelopes; Blue dots: observed sensor measurement profile of the test asset.

Methodology & Results

The extended similarity-based method (Extended SBM) improves upon classical KNN-style matching for remaining useful life prediction in two key ways: (1) k-means clustering of the model library into a prediction subspace for each test asset, reducing computation by filtering out low-similarity reference models; and (2) incorporating model uncertainty via mean squared prediction error envelopes around reference degradation curves during similarity computation. Evaluated on the C-MAPSS turbofan benchmark (Dataset #2), Extended SBM achieves MAE of 15.9 and RMSE of 23—outperforming the original SBM (MAE 17.1), RNN (MAE 22.6), and ANN (MAE 23.3)—while also being competitive with state-of-the-art deep learning approaches (Deep LSTM RMSE 24.5, Semi-supervised DL RMSE 22.7) at a fraction of the architectural complexity.

M.S. Thesis, Pennsylvania State University, 2020 · Full Thesis