Zihan Ren

Probabilistic Inference

Bayesian Unmixing for Enhanced Rock Weathering

When field measurements—soil chemistry, geochemical assays—carry inherent uncertainty, how do you extract a reliable answer? Standard methods collapse noisy data into a single estimate with no error bars. You can’t tell if it’s trustworthy or how it would change with more evidence.

$Pipeline overview of the Bayesian unmixing framework for Enhanced Rock Weathering showing end-member characterization, ILR transformation, MCMC sampling, and posterior inference over mixing fractions and weathering extent$

Methodology & Results

A field soil sample after ERW treatment is a physical mixture of three end-members—fresh feedstock rock, weathered residue, and native soil—measured through elemental concentrations (Ca, Mg, Ti) that are inherently compositional and live on the simplex. The framework applies Isometric Log-Ratio (ILR) transforms to map compositions into unconstrained Euclidean space where Gaussian assumptions hold, then jointly infers mixing fractions, end-member ILR means, covariance scales, and total concentrations across an 11-dimensional parameter space using the NUTS sampler within a fully differentiable PyTensor computation graph. Weathering extent τ = f₂ / (f₁ + f₂) is derived from posterior samples of the rock and weathered fractions, yielding full probability distributions rather than point estimates—critical in the high-soil regime (f₃ ≈ 0.88) where small shifts in inferred fractions can flip τ from 33% to 67%, with direct consequences for carbon credit valuation.

In Progress

Probabilistic Inference

Multi-Horizon Completion Optimization for the Bakken Field

Prediction models tell you what might happen, but they don’t tell you what to do. How do you turn a production forecast into an optimal design recommendation—automatically, at portfolio scale?

Pipeline overview showing the three-stage framework: spatial-temporal feature engineering, XGBoost multi-horizon surrogate modeling, and constraint-based genetic algorithm optimization for Bakken completion design

Methodology & Results

Spatial-temporal feature engineering replaces raw coordinates with physically meaningful inter-well interference metrics—parent-child relationships, formation-specific lateral distances, and temporally aligned cumulative production from 9,000+ neighboring wells—eliminating data leakage while preserving spatial dependence. An XGBoost multi-horizon surrogate trained on these features predicts cumulative production at five intervals (3–24 months), with SHAP-based interpretability confirming that pressure gradient and parent cumulative liquid production are the dominant drivers across all horizons. A constrained genetic algorithm then optimizes three completion decision variables (stages, fluid volume, proppant mass) subject to 12 engineering constraints—including proppant-fluid ratio limits, per-stage loading bounds, and stage spacing—maximizing NPV with explicit cost accounting, yielding a $2.47M average uplift per well and a 3.5:1 marginal return on additional completion investment.

Paper in Submission

Generative Virtualization

Arbitrary-Scale 3D Porous Media Generation Conditioned on Field-Scale Characterization

Field-scale characterization models need high-resolution rock properties as input, but micro-CT scans cost $250K–$500K each, take weeks, and yield only sparse snapshots of the subsurface. How do you fill the resolution gap without exhaustive sampling?

Pipeline overview showing VQ-VAE tokenization of micro-CT sub-volumes followed by autoregressive transformer assembly into supra-REV porous media conditioned on spatial porosity fields

Methodology & Results

A VQ-VAE compresses each 64³ micro-CT sub-volume into discrete codebook tokens drawn from 3,000 learned pore-structure patterns, while an autoregressive transformer assembles token sets sequentially in spatial order—each conditioned on the local porosity extracted from field-scale geological models—to maintain grain fabric and pore connectivity across sub-volume boundaries. Trained on near-REV 8-neighborhood sequences (128³ voxels), the framework extrapolates to supra-REV volumes of up to 729 neighborhoods (576³ voxels), a 91-fold increase in assembled components. Validated on Mt. Simon Sandstone micro-CT data, spatial porosity conditioning achieves absolute permeability MAE of ~54 md across 384³ to 576³ voxel volumes, and accurately reproduces CO₂–brine drainage relative permeability when the generated pore network preserves topological connectivity (Euler characteristic).

Paper in Submission · Preprint

Generative Virtualization

Physics-Informed GAN for 3D Rock Microstructure

How do you generate realistic 3D rock microstructures with user-defined physical properties for CO₂ storage simulation, when direct measurements are sparse and expensive?

GAN physical simulator interaction workflow showing iterative latent vector optimization with pore network model feedback

Methodology & Results

A Wasserstein GAN with gradient penalty (WGAN-GP) is pre-trained on segmented Berea sandstone micro-CT sub-volumes (128³ voxels) to learn realistic 3D pore-structure distributions. In the post-training stage, a gradual Gaussian deformation scheme iteratively perturbs the latent vector z via trigonometric mixing of independent Gaussian realizations, with each candidate structure evaluated by an OpenPNM pore network model that computes porosity, absolute permeability, and mean pore/throat size distributions. The mismatch between simulated and target properties is back-propagated through gradient descent to update the deformation parameter, converging within ~15 iterations per sample. The resulting framework produces controllable 3D microstructures that honor well-derived rock properties, providing a direct pore-to-field-scale linkage for upscaling multiphase CO₂-brine relative permeability in CCUS reservoir simulations.

Paper in Submission · Preprint

Predictive Analytics

Remaining Useful Life Estimation via Extended Similarity-Based Matching

Predicting when a machine will fail is critical for maintenance planning, but deep learning methods require large training datasets and lose accuracy when observations are sparse. Can a non-parametric approach that leverages historical degradation profiles outperform neural networks in small-data regimes?

Top-20 matched reference models with uncertainty envelopes (red dashed) overlaid on a test asset's health index trajectory (blue dots), illustrating uncertainty-aware similarity matching

Red curves: reference machine RUL profile libraries with uncertainty envelopes; Blue dots: observed sensor measurement profile of the test asset.

Methodology & Results

The extended similarity-based method (Extended SBM) improves upon classical KNN-style matching for remaining useful life prediction in two key ways: (1) k-means clustering of the model library into a prediction subspace for each test asset, reducing computation by filtering out low-similarity reference models; and (2) incorporating model uncertainty via mean squared prediction error envelopes around reference degradation curves during similarity computation. Evaluated on the C-MAPSS turbofan benchmark (Dataset #2), Extended SBM achieves MAE of 15.9 and RMSE of 23—outperforming the original SBM (MAE 17.1), RNN (MAE 22.6), and ANN (MAE 23.3)—while also being competitive with state-of-the-art deep learning approaches (Deep LSTM RMSE 24.5, Semi-supervised DL RMSE 22.7) at a fraction of the architectural complexity.

M.S. Thesis, Pennsylvania State University, 2020 · Full Thesis