1. Introduction

Understanding the complex dynamics of the brain and their neurobiological underpinnings, with the potential to advance precision medicine [1, 2, 3, 4], is a central goal in neuroscience. Modeling these dynamics provides crucial insights into causality and mechanisms underlying both normal brain function and various neurological disorders [5, 6, 7]. By integrating the average activity of large populations of neurons (e.g., neural mass models; [8, 9, 10, 11, 12, 13]) with information provided by structural imaging modalities (i.e., connectome; [14, 15, 16, 17]), whole-brain network modeling has proven to be a powerful tractable approach for simulating brain activities and emergent dynamics as recorded by functional imaging modalities (such as (s)EEG/MEG/fMRI; [18, 19, 20, 21, 22, 23].

The whole-brain models have been well-established in network neuroscience [24, 25] for understanding the brain structure and function [26, 27, 28, 29, 30, 31, 32] and investigating the mechanisms underlying brain dynamics at rest [33, 34, 35, 36], normal aging [37, 38], and also altered states such as anaesthesia and loss of consciousness [39, 40, 41, 42]. This class of computational models, also known as virtual brain models [43, 44, 18, 19, 45, 46], have shown remarkable capability in delineating the pathophysiological causes of a wide range of brain diseases, such as epilepsy [47, 48, 6], multiple sclerosis [49, 50], Alzheimer’s disease [51, 52], Parkinson’s disease [53, 54], neuropsychiatric disorders [55, 56], stroke [57, 58] and focal lesions [59]. In particular, they enable the personalized simulation of both normal and abnormal brain activities, along with their associated imaging recordings, thereby stratifying between healthy and diseased states [60, 61, 52] and potentially informing targeted interventions and treatment strategies [47, 62, 6, 45, 23]. Although there are only a few tools available for forward simulations at the whole-brain level, e.g., the brain network simulator The Virtual Brain (TVB; [44]), there is a lack of tools for addressing the inverse problem, i.e., finding the set of control (generative) parameters that best explains the observed data. This study aims to bridge this gap by addressing the inverse problem in large-scale brain networks, a crucial step toward making these models operable for clinical applications.

Accurately and reliably estimating the parameters of whole-brain models remains a formidable challenge, mainly due to the high-dimensionality and nonlinearity inherent in brain activity data, as well as the non-trivial effects of noise and network inputs. A large number of previous studies in whole-brain modeling have relied on optimization techniques to identify a single optimal value from an objective function, scoring the model’s performance against observed data [34, 36, 63, 64]. This approach often involves minimizing metrics such as the Kolmogorov-Smirnov distance or maximizing the Pearson correlation between observed and generated data features such as functional connectivity (FC), functional connectivity dynamics (FCD), and/or power spectral density (PSD). Although fast, such a parametric approach results in only point estimates and fails to capture the relationship between parameters and their associated uncertainty. This limits the generalizability of findings, and hinders identifiability analysis, which explores the uniqueness of solutions. Furthermore, optimization algorithms can easily get stuck in local extrema, requiring multi-start strategies to address potential parameter degeneracies. These additional steps, while necessary, ultimately increase the computational cost. Critically, the estimation heavily depends on the form of the objective function defined for optimization [65, 66]. These limitations can be overcome by employing Bayesian inference, which naturally quantifies the uncertainty in the estimation and statistical dependencies between parameters, leading to more robust and generalizable models. Bayesian inference is a principal method for updating prior beliefs with information provided by data through the likelihood function, resulting in a posterior probability distribution that encodes all the information necessary for inferences and predictions. This approach has proven essential for understanding the intricate relationships between brain structure and function [67, 37, 59], as well as for revealing the pathophysiological causes underlying brain disorders [68, 51, 49, 46, 69].

In this context, simulation-based inference (SBI; [70, 71, 68, 69]) has gained prominence as an efficient methodology for conducting Bayesian inference in complex models where traditional inference techniques become inapplicable. SBI leverages computational simulations to generate synthetic data and employs advanced probabilistic machine learning methods to infer the joint distribution over parameters that best explain the observed data, along with associated uncertainty. This approach is particularly well-suited for Bayesian inference on whole-brain models, which often exhibit complex dynamics that are difficult to retrieve from neuroimaging data with conventional estimation techniques. Crucially, SBI circumvents the need for explicit likelihood evaluation and the Markovian (sequential) property in sampling. Markov chain Monte Carlo (MCMC) is a standard non-parametric technique and asymptotically exact for sampling from a probability distribution [72]. However, for Bayesian inference on whole-brain models given high-dimensional data, the likelihood function becomes intractable, rendering MCMC sampling computationally prohibitive. SBI offers significant advantages, such as parallel simulation while leveraging amortized learning, making it effective for personalized inference from large datasets [69]. Amortization in artificial neural networks refers to the idea of reusing learned computations across multiple tasks or inputs [73]. Amortization in Bayesian inference refers to the process of training a shared inference network (e.g., a neural network) with with an intensive upfront computational cost, to perform fast inference across many different observations. Instead of re-running inference for each new observation, the trained model can rapidly return posterior estimates, significantly reducing computational cost at test time [68]. Following an initial computational cost during simulation and training to learn all posterior distributions, subsequent evaluation of new hypotheses can be conducted efficiently, without additional computational overhead for further simulations [68]. Importantly, SBI sidesteps the convergence issues caused by complex geometries that are often encountered when using gradient-based MCMC methods [74, 75, 76]. It also substantially outperforms approximate Bayesian computation (ABC) methods, which rely on a threshold to accept or reject samples [77, 78, 71]. Such a likelihood-free approach provides us with generic inference on complex systems as long as we can provide three modules:

  1. A prior distribution, describing the possible range of parameters from which random samples can be easily drawn, i.e.,.

  2. A simulator in computer code, that takes parameters as input and generates data as output, i.e.,.

  3. A set of low-dimensional data features, which are informative of the parameters that we aim to infer.

These elements prepare us with a training data set with a budget of Nsim simulations. Then, using a class of deep neural density estimators (such as masked autoregressive flows [79] or neural spline flows [80]), we can approximate the posterior distribution of parameters given a set of observed data, i.e.,. Therefore, a versatile toolkit should be flexible and integrative, adeptly incorporating these modules to enable efficient Bayesian inference over complex models.

To address the need for widely applicable, reliable, and efficient parameter estimation from different (source-localized) neuroimaging modalities, we introduce Virtual Brain Inference (VBI), a flexible and integrative toolkit for probabilistic inference at whole-brain level. This open-source toolkit offers fast simulation through just-intime (JIT) compilation of various brain models in different programming languages (Python/C++) and devices (CPUs/GPUs). It supports space-efficient storage of simulated data (HDF5/NPZ/PT), provides a memory-efficient loader for batched data, and facilitates the extraction of low-dimensional data features (FC/FCD/PSD). Additionally, it enables the training of deep neural density estimators (MAFs/NSFs), making it a versatile tool for inference on neural sources corresponding to (s)EEG, MEG, fMRI recordings.VBI leverages high-performance computing, significantly enhancing computational efficiency through parallel processing of large-scale datasets, which would be impractical with current alternative methods. Although SBI has been used for low-dimensional parameter spaces [71, 49, 81], we demonstrate that it can scale to whole-brain models with high-dimensional unknown parameters, as long as informative data features are provided.VBI is now accessible on the cloud platform EBRAINS (https://ebrains.eu), enabling users to explore more realistic brain dynamics underlying brain (dys)functioning using Bayesian inference.

In the following sections, we will provide an overview of the theoretical foundations of whole-brain models and simulation-based inference, describe the architecture and features of the VBI toolkit, and demonstrate the validation through a series of case studies using in-silico data. We explore various whole-brain models corresponding to different types of brain recordings: a whole-brain network model of WilsonCowan [8], Jansen-Rit [82, 83], and Stuart-Landau [84] for simulating neural activity associated with EEG/MEG signals, the Epileptor [11] related to stereoelectro-EEG (sEEG) recordings, and Montbrió [12], and Wong-Wang [85, 86] mapped to fMRI BOLD signals. Although these models represent source signals and could be applied to other modalities (e.g., Stuart-Landau representing generic oscillatory dynamics), we focused on their capabilities to perform optimally in specific contexts. For instance, some are better suited for encephalographic signals (e.g., EEG/MEG) due to their ability to preserve spectral properties, while others have been used for fMRI data, emphasizing their ability to capture dynamic features such as bistability and time-varying functional connectivity.

2. Materials and methods

2.1. The virtual brain models

To build a virtual brain model (see Figure 1), the process begins with parcellating the brain into regions using anatomical data, typically derived from T1-MRI scans. Each region, represented as nodes in the network, is then equipped with a neural mass model to simulate the collective behavior of neurons within that area. These nodes are interconnected using a structural connectivity (SC) matrix, typically obtained from diffusion-weighted magnetic resonance imaging (DW-MRI). The connectome was built with TVB-specific reconstruction pipeline using generally available neuroimaging software [16]. The entire network of interconnected nodes is then simulated using neuroinformatic tools, such as The Virtual Brain (TVB; [18]), generating neural activities at the source level. However, neural sources are not directly observable in real-world experiments, and a projection needs to be established to transform the simulated neural activity into empirically measurable quantities, such as (s)EEG, MEG, and fMRI. This approach offers insights into both normal brain function and neurological disorders [23]. In the following, we describe commonly used whole-brain network models at the source level, which can be mapped to different types of neuroimaging recordings. Note that each model represents one of many possible variants used in the literature, and the choice of model often depends on the specific research question, the spatial and temporal resolution of the available data, and the desired level of biological or mathematical detail.

The workflow of Virtual Brain Inference (VBI).

This probabilistic approach is designed to estimate the posterior distribution of control parameters in virtual brain models from whole-brain recordings. (A) The process begins with constructing a personalized connectome using diffusion tensor imaging and a brain parcellation atlas. (B) The personalized virtual brain model is then assembled. Neural mass models describing the averaged activity of neural populations, in the generic form of , are placed to each brain region, and interconnected via the structural connectivity matrix. Initially, the control parameters are randomly drawn from a simple prior distribution. (C) Next, the VBI operates as a simulator that uses these samples to generate time series data associated with neuroimaging recordings. (D) We extract a set summary statistics from the low-dimensional features of the simulations (FC, FCD, PSD) for training. (E) Subsequently, a class of deep neural density estimators is trained on pairs of random parameters and their corresponding data features to learn the joint posterior distribution of the model parameters. (F) Finally, the amortized network allows us to quickly approximate the posterior distribution for new (empirical) data features, enabling us to make probabilistic predictions that are consistent with the observed data.

2.1.1. Wilson-Cowan model

The Wilson-Cowan model [8] is a seminal neural mass model that describes the dynamics of connected excitatory and inhibitory neural populations, at cortical microcolumn level. It has been widely used to understand the collective behavior of neurons and simulate neural activities recorded by methods such as local field potentials (LFPs) and EEG. The model effectively captures phenomena such as oscillations, wave propagation, pattern formation in neural tissue, and responses to external stimuli, offering insights into various brain (dys)functions, particularly in Parkinson’s disease [87, 88].

The Wilson-Cowan model describes the temporal evolution of the mean firing rates of excitatory (E) and inhibitory (I) populations using nonlinear differential equations. Each population’s activity is governed by a balance of self-excitation, cross-inhibition, external inputs, and network interactions through long-range coupling. The nonlinearity arises from a sigmoidal transfer function Si,e(x), which maps the total synaptic input to the firing rate, capturing saturation effects and thresholds in neural response. In the whole-brain network extension, each neural population at node i receives input from other nodes via a weighted connectivity matrix, allowing the study of large-scale brain dynamics and spatial pattern formation [89, 90, 91].

which incorporates both local dynamics and global interactions, modulated by coupling strengths and synaptic weights. Here, SCkl is an element of the (non)symmetric structural connectivity matrix and is nonzero if there is a connection between regions k and l. The nominal parameter values and the prior range for the target parameters are summarized in Table 1.

Parameter descriptions for capturing whole-brain dynamics using the Wilson-Cowan neural mass model.

2.1.2. Jansen-Rit model

The Jansen-Rit neural mass model [92] has been widely used to simulate physiological signals from various recording methods like intracranial LFPs, and scalp MEG/EEG recordings. For example, it has been shown to recreate responses similar to evoked-related potentials after a series of impulse stimulations [83, 93], generating high-alpha and low-beta oscillations (with added recurrent inhibitory connections and spike-rate modulation) [94], and also seizure patterns similar to those seen in temporal lobe epilepsy [95]. This biologically motivated model comprises three main populations of neurons: excitatory pyramidal neurons, inhibitory interneurons and excitatory interneurons. These populations interact with each other through synaptic connections, forming a feedback loop that produces oscillatory activity governed by a set of nonlinear ordinary differential equations [82, 83, 96]:

where y0i, y1i, and y2i denote the average membrane potentials of pyramidal cells, excitatory interneurons, and inhibitory interneurons, respectively. Their corresponding time derivatives, y3i(t), y4i(t), and y5i(t), represent the rates of change of these membrane potentials. P (t) also represents an external input current. The sigmoid function S(x) maps the average membrane potential of neurons to their mean action potential firing rate. SC is a normalized structural connectivity matrix. The model’s output at ith region, corresponds to the membrane potential of pyramidal cells, and is given by y1iy2i. The nominal parameter values and the prior range for the target parameters are summarized in Table 2.

Parameter descriptions for capturing whole-brain dynamics using Jansen-Rit neural mass model.

EP: excitatory populations, IP: inhibitory populations, PSP: post synaptic potential, PSPA: post synaptic potential amplitude.

2.1.3. Stuart-Landau oscillator

The Stuart-Landau oscillator [84] is a generic mathematical model used to describe oscillatory phenomena, particularly those near a Hopf bifurcation, which is often employed to study the nonlinear dynamics of neural activity [97, 98, 63, 49]. One approach uses this model to capture slow hemodynamic changes in BOLD signal [97], while others apply it to model fast neuronal dynamics, which can be linked directly to EEG/MEG data [98, 63, 49]. Note that this is a phenomenological framework, and both applications operate on completely different time scales.

In the network, each brain region, characterized by an autonomous Stuart-Landau oscillator, can exhibit either damped or limit-cycle oscillations depending on the bifurcation parameter a. If a < 0, the system shows damped oscillations, similar to a pendulum under friction. In this regime, the system, when subjected to perturbation, relaxes back to its stable fixed point through damped oscillations with an angular frequency ω0. The rate of amplitude damping is determined by |a|. Conversely, if a > 0, the system supports limit cycle solutions, allowing for self-sustained oscillations even in the absence of external noise. At a critical value of a = 0, the system undergoes a Hopf bifurcation, i.e., small changes in parameters can lead to large variations in the system’s behavior.

Using whole-brain network modeling of EEG/MEG data [49, 63], the oscillators are interconnected via white-matter pathways, with coupling strengths specified by subject-specific DTI fiber counts, i.e., elements of SC matrix. This adjacency matrix is then scaled by a global coupling parameter G. Note that coupling between regions accounts for finite conduction times, which are often estimated by dividing the Euclidean distances between nodes by an average conduction velocity Tjk = djk/v.

Knowing the personalized time-delays [99, 100], we can use the distance as a proxy, assuming a constant propagation velocity. The distance itself can be defined as either the length of the tracts or the Euclidean distance. Taking this into account, the activity of each regions is given by a set of complex differential equations:

where Z is a complex variable, and Re[Z(t)] is the corresponding time series. In this particular realization, each region has a natural frequency of 40 Hz (ωj = ω0 = 2π · 40 rad/s), motivated by empirical studies demonstrating the emergence of gamma oscillations from the balance of excitation and inhibition, playing a role in local circuit computations [101].

In this study, for the sake of simplicity, a common cortico-cortical conduction velocity is estimated, i.e., the distance-dependent average velocities vjk = v. We also consider a = −5, capturing the highly variable amplitude envelope of gamma oscillations as reported in experimental recordings [102, 63]. This choice also best reflects the slowest decay time constants of GABAB inhibitory receptors–approximately 1 second [103]. A Gaussian noise (here denoted by ξi) with an intensity of σ = 104 is added to each oscillator to mimic stochastic fluctuations. The nominal parameter values and the prior range for the target parameters are summarized in Table 3.

Parameter descriptions for capturing whole-brain dynamics using Stuart-Landau oscillator.

2.1.4. Epileptor model

In personalized whole-brain network modeling of epilepsy spread [47], the dynamics of each brain region are governed by the Epileptor model [11]. The Epileptor model provides a comprehensive description of epileptic seizures, encompassing the complete taxonomy of system bifurcations to simultaneously reproduce the dynamics of seizure onset, progression, and termination [104]. The full Epileptor model comprises five state variables that couple two oscillatory dynamical systems operating on three different time scales [11]. Then motivated by Synergetic theory [105, 106] and under time-scale separation [107], the fast variables rapidly collapse on the slow manifold [108], whose dynamics is governed by the slow variable. This adiabatic approximation yields the 2D reduction of whole-brain model of epilepsy spread, also known as the Virtual Epileptic Patient (VEP) as follows:

where xi and zi indicate the fast and slow variables corresponding to ith brain region, respectively, and the set of unknown ηi is the spatial map of epileptogenicity to be estimated. SC is a normalized structure connectivity matrix. In real-world epilepsy applications [67, 68, 6], we compute the envelope function from sEEG data to perform inference. The nominal parameter values and the prior range for the target parameters are summarized in Table 4.

Parameter descriptions for capturing whole-brain dynamics using 2D Epileptor neural mass model.

2.1.5. Montbrió model

The exact macroscopic dynamics of a specific brain region (represented as a node in the network) can be analytically derived in thermodynamic limit of infinitely allto-all coupled spiking neurons [12] or Θ neuron representation [109]. By assuming a Lorentzian distribution on excitabilities in large ensembles of quadratic integrateandfire neurons with synaptic weights J and a half-width Δ centered at η, the macroscopic dynamics has been derived in terms of the collective firing activity and mean membrane potential [12]. Then, by coupling the brain regions via an additive current (e.g., in the average membrane potential equations), the dynamics of the whole-brain network can be described as follow [59]:

where vi and ri are the average membrane potential and firing rate, respectively, at the ith brain region, and parameter G is the network scaling parameter that modulates the overall impact of brain connectivity on the state dynamics. The SCij denotes the connection weight between ith and jth regions, and the dynamical noise ξ(t) ∼ 𝒩 (0, σ2) follows a Gaussian distribution with mean zero and variance σ2.

The model parameters are tuned so that each decoupled node is in a bistable regime, exhibiting a down-state stable fixed point (low-firing rate) and an up-state stable focus (high-firing rate) in the phase-space [12, 81]. The bistability is a fundamental property of regional brain dynamics to ensure a switching behavior in the data (e.g., to generate FCD), that has been recognized as representative of realistic dynamics observed empirically [110, 111, 112, 59].

The solution of the coupled system yields a neuroelectric dataset that describes the evolution of the variables (ri(t), vi(t)) in each brain region i, providing measures of macroscopic activity. The surrogate BOLD activity for each region is then derived by filtering this activity through the Balloon-Windkessel model [113]. The input current Istim represents the stimulation to selected brain regions, which increase the basin of attraction of the up-state in comparison to the down-state, while the fixed points move farther apart [110, 111, 112, 59].

The nominal parameter values and the prior range for the target parameters are summarized in Table 5.

Parameter descriptions for capturing whole-brain dynamics using Montbrió model.

2.1.6. Wong-Wang model

Another commonly used whole-brain model for simulation of neural activity is the so-called parameterized dynamics mean-field (pDMF) model [114, 36, 86]. At each region, it comprises a simplified system of two nonlinear coupled differential equations, motivated by the attractor network model, which integrates sensory information over time to make perceptual decisions, known as Wong-Wang model [85]. This biophysically realistic cortical network model of decision making then has been simplified further into a single-population model [86], which has been widely used to understand the mechanisms underpinning brain resting state dynamics [36, 115, 38]. The pDMF model has been also used to study whole-brain dynamics in various brain disorders, including Alzheimer’s disease [116], schizophrenia [117], and stroke [118]. The pDMF model equations are given as:

where H(xi) and Si, and xi denote the population firing rate, the average synaptic gating variable, and the total input current at the ith brain region, respectively. ξi(t) is uncorrelated standard Gaussian noise and the noise amplitude is controlled by σ. The nominal parameter values and the prior range for the target parameters are summarized in Table 6.

Parameter descriptions for capturing whole-brain dynamics using the Wong-Wang model.

According to recent studies [36, 38], we can parameterize the set of w, I and σ as linear combinations of group-level T1w/T2w myelin maps [119] and the first principal gradient of functional connectivity:

where Myei and Gradi are the average values of the T1w/T2w myelin map and the first FC principal gradient, respectively, within the ith brain region. Therefore, the set of unknown parameters to estimate includes G and linear coefficients (aw, bw, cw, aI, bI, cI, aσ, bσ, cσ) 9, hence 10 parameters in total.

2.1.7. The Balloon-Windkessel model

The Balloon-Windkessel model is a biophysical framework that links neural activity to the BOLD signals detected in fMRI. This is not a neuronal model but rather a representation of neurovascular coupling, describing how neural activity influences hemodynamic responses. The model is characterized by two state variables: venous blood volume (v) and deoxyhemoglobin content (q). The system’s input is blood flow (fin), and the output is the BOLD signal (y):

where V0 represents the resting blood volume fraction, E0 is the oxygen extraction fraction at rest, E is the ratio of intrato extravascular signals, r0 is the slope of the relationship between the intravascular relaxation rate and oxygen saturation, ϑ0 is the frequency offset at the surface of a fully deoxygenated vessel at 1.5 T, and TE is the echo time. The dynamics of venous blood volume v and deoxyhemoglobin content q are governed by the Balloon model’s hemodynamic state equations:

where τ0 is the transit time of blood flow, α reflects the resistance of the venous vessel (stiffness), and f (t) denotes blood inflow at time t, given by

where s is an exponentially decaying vasodilatory signal defined by

where, E represents the efficacy of neuronal activity x(t) (i.e., integrated synaptic activity) in generating a signal increase, τs is the time constant for signal decay, and τf is the time constant for autoregulatory blood flow feedback [113]. For parameter values, see Table 7, taken from [113, 120, 121]. The resulting time series is downsampled to match the TR value in seconds.

Parameter descriptions for the Balloon-Windkessel model to map neural activity to the BOLD signals detected in fMRI.

2.2. Simulation-based inference

In the Bayesian framework [122], parameter estimation involves quantifying and propagating uncertainty through probability distributions placed on the parameters (prior information before seeing data), which are updated with information provided by the data (likelihood function). The formidable challenge to conducting efficient Bayesian inference is evaluating the likelihood function p(x | θ). This typically involves intractable integrating over all possible trajectories in the latent space: p(x | θ) = ⎰ p(x, z | θ)dz, where p(x, z | θ) is the joint probability density of the data x and latent variables z, given parameters θ. For whole-brain network models with highdimensional and nonlinear latent spaces, the computational cost can be prohibitive, making likelihood-based inference with MCMC sampling challenging to converge [76, 67, 123].

Simulation-based inference (SBI; [70, 71, 68]), or likelihood-free inference [124, 125, 126], addresses issues with explicit likelihood evaluation in complex systems, where the it often becomes intractable. The task of density estimation, one of the most fundamental problems in statistics, is to infer an underlying probability distribution based on a set of independently and identically distributed data points drawn from that distribution. Traditional density estimators, such as histograms and kernel density estimators, typically perform well only in low-dimensional settings. Recently, neural network-based approaches have been proposed for conditional density estimation, showing promising results in Bayesian inference problems involving highdimensional data [127, 79, 125, 71, 128, 129, 68].

Given a prior distribution placed on the parameters random simulations are generated (with samples from prior), resulting in pairs , where and is the simulated data given . By training a deep neural density estimator F (such as normalizing flows [79, 80]), we can approximate the posterior with by minimizing the loss function

over network parameters ϕ. Once the parameters of the neural network F are optimized, for observed data we can readily estimate the target posterior . This allows for rapid approximation and sampling from the posterior distribution for any new observed data through a forward pass in the trained network [68, 69].

This approach uses a class of generative machine learning models known as normalizing flows ([130, 131]) to transform a simple based distribution into any complex target through a sequence of invertible mappings. Here, generative modeling is an unsupervised machine learning method for modeling a probability distribution given samples drawn from that distribution. The state-of-the-art normalizing flows, such as masked autoregressive flows (MAF; [79]) and neural spline flows (NSF; [80]), enable fast and exact density estimation and sampling from high-dimensional distributions. These models learn mappings between input data and probability densities, capturing complex dependencies and multi-modal distributions [131, 132].

In our work, we integrate the implementation of these models from the opensource SBI tool, leveraging both MAF and NSF architectures. The MAF model comprises 5 flow transforms, each with two blocks and 50 hidden units, tanh nonlinearity and batch normalization after each layer. The NSF model consists of 5 flow transforms, two residual blocks of 50 hidden units each, ReLU nonlinearity, and 10 spline bins. We apply these generative models to virtual brain simulations conducted with random parameters, to approximate the full posterior distribution of parameters from low-dimensional data features. Note that we employ a single round of SBI to benefit from amortization strategy rather than using a sequential approach that is designed to achieve a better fit but only for a specific dataset [68, 69].

2.3. Sensitivity analysis

Sensitivity analysis is a crucial step for identifying which model parameters influence the model’s behavior in response to changes in input [66, 68]. A local sensitivity can be quantified by computing the curvature of objective function through the Hessian matrix [133, 66]. Using SBI, after estimating the posterior for a specific observation, we can perform sensitivity analysis by computing the eigenvalues and corresponding eigenvectors of the following matrix [134, 135]:

which then does an eigendecomposition M = QΛQ1. A large eigenvalue in the so-called active subspaces [136] indicates that the gradient of the posterior is large in the corresponding direction, suggesting that the system output is sensitive to changes along that eigenvector.

2.4. Evaluation of posterior fit

To assess the reliability of Bayesian inference using synthetic data, we evaluate the posterior z-scores (denoted by z) against the posterior shrinkage (denoted by s), as defined by [75]:

where and θ are the posterior mean and the true values, respectively, σprior is the standard deviation of the prior, and σpost is the standard deviation of the posterior.

The z-score quantifies how far the posterior mean of a parameter lies from a reference value (e.g., the true value), scaled by the posterior standard deviation. The shrinkage quantifies how much the posterior distribution has contracted relative to the initial prior distribution after learning from data. A small z-score indicates that the posterior estimate is close to the true value, reflecting accurate inference. A large shrinkage value suggests that the posterior is sharply concentrated, indicating that the parameter is well-identified. According to these definitions, an ideal Bayesian inference is characterized by z-scores close to zero and posterior shrinkage values close to one, reflecting both accuracy and reliability in the inferred parameters.

2.5. Flexible simulator and model building

A key feature of theVBI pipeline is its modularity and flexibility in integrating various simulators (see Figure 2). The Simulation module of the VBI pipeline is designed to be easily interchangeable, allowing researchers to replace it with other simulators, such asTVB [18],Neurolib [137],Brian [138],Brainpy [139]. This adaptability supports a wide range of simulation needs and computational environments, making theVBI a versatile tool for inference in system neuroscience. In particular, Simulation module offers a comprehensive implementation of commonly used wholebrain models. This is a customized version of implementation from open-sourceTVB simulator. While VBI does not encompass all the features of the original TVB, it is mainly designed to leverage the computational power of GPU devices and significantly reduce RAM requirements (see Figure S1A). This optimization ensures that high-performance clusters can be fully utilized, enabling parallel and scalable simulations, as often is required to perform scalable SBI.

Flowchart of the VBI Structure.

This toolkit consists of three main modules: (1) The Simulation module, implementing various whole-brain models, such as Wilson-Cowan (WCo), Jansen-Rit (JR), Stuart-Landau (SL), Epileptor (EPi), Montbrió (MPR), and Wong-Wang (WW), across different numerical computing libraries (C++, Cupy, PyTorch, Numba). (2) The Features module, offering an extensive toolbox for extracting low-dimensional data features, such as spectral, temporal, connectivity, statistical, and information theory features. (3) The Inference module, providing neural density estimators (such as MAF and NSF) to approximate the posterior of parameters.

2.6. Comprehensive feature extraction

VBI offers a comprehensive toolbox for feature extraction across various datasets. The Features module includes but not limited to: (1) Statistical features, including mean (average of elements), variance (spread of the elements around mean), kurtosis (tailedness of the distribution of elements), and skewness (the asymmetry of the distribution of elements), that can be applied to any matrix. (2) Spectral features, such as low-dimensional summary statistics of power spectrum density (PSD). (3) Temporal features, including zero crossings, area under the curve, average power, and envelope. (4) Connectivity features, including functional connectivity (FC), which represents the statistical dependencies or correlations between activity patterns of different brain regions, and functional connectivity dynamics (FCD), which captures the temporal variations and transitions in these connectivity patterns over time. These calculations are performed for the whole-brain and/or subnetwork (e.g., limbic system, resting state networks). However, since these matrices are still highdimensional, we use standard dimensional reduction techniques, such as principal component analysis (PCA) on FC/FCD matrices, to extract their associated lowdimensional summary statistics. (5) Information theory features, such as mutual information and transfer entropy. Following [69], we use the term spatio-temporal data features to refer to both statistical features and temporal features derived from time series. In contrast, we refer to the connectivity features extracted from FC/FCD matrices as functional data features.

The Features module uses parallel multiprocessing to speed up feature calculation. Additionally, it provides flexibility for users to add their own custom feature calculations with minimal effort and expertise, or to adjust the parameters of existing features based on the type of input time series. The feature extraction module is designed to be interchangeable with existing feature extraction libraries such as tsfel [140], pyspi [141], hctsa, [142], and scikit-learn [143]. Note that some lightweight libraries such as catch22 [144] are directly accessible from the VBI feature extraction module.

2.7. VBI workflow

Figure 1 illustrates an overview of our approach in VBI, which combines virtual brain models and simulation-based inference to make probabilistic predictions on brain dynamics from (source-localized) neuroimaging recordings. The inputs to the pipeline include the structural imaging data (for building the connectome), functional imaging data such as (s)EEG/MEG, and fMRI as the target for fitting, and prior information as a plausible range over control parameters for generating random simulations. The main computational costs involve model simulations and data feature extraction. The output of the pipeline is the joint posterior distribution of control parameters (such as excitability, synaptic weights, or effective external input) that best explains the observed data. Since the approach is amortized (i.e., it learns across all combinations in the parameter space), it can be readily applied to any new data from a specific subject.

In the first step, non-invasive brain imaging data, such as T1-weighted MRI and Diffusion-weighted MRI (DW-MRI), are collected for a specific subject (Figure 1A). T1-weighted MRI images are processed to obtain brain parcellation, while DW-MRI images are used for tractography. Using the estimated fiber tracts and the defined brain regions from the parcellation, the connectome (i.e., the complete set of links between brain regions) is constructed by counting the fibers connecting all regions. The SC matrix, with entries representing the connection strength between brain regions, forms the structural component of the virtual brain which constrains the generation of brain dynamics and functional data at arbitrary brain locations (e.g., cortical and subcortical structures).

Subsequently, each brain network node is equipped with a computational model of average neuronal activity, known as neural mass models (see Figure 1B and subsection 2.1). They can be represented in the generic form of a dynamical model as , with the system variables (such as membrane potential and firing rate), the control parameters (such as excitability), and the input current Iinput (such as stimulation). This integration of mathematical mean-field modeling (neural mass models) with anatomical information (connectome) allows us to efficiently analyze functional neuroimaging modalities at the whole-brain level.

To quantify the posterior distribution of control parameters given a set of observations, , we first need to define a plausible range for the control parameters based on background knowledge , i.e., a simple base distribution known as a prior. We draw random samples from the prior and provide them as input to the VBI simulator (implemented by Simulation module) to generate simulated time series associated with neuroimaging recordings, as shown in Figure 1C. Subsequently, we extract low-dimensional data features (implemented by Features module), as shown in Figure 1D for FC/FCD/PSD, to prepare the training dataset , with a budget of Nsim simulations. Then, we use a class of deep neural density estimators, such as MAF or NSF models, as schematically shown in Figure 1E, to learn all the posterior . Finally, we can readily sample from , which determines the probability distribution in parameter space that best explains the observed data. Figure 2 depicts the structure of the VBI toolkit, which consists of three main modules. The first module, referred to as Simulation module, is designed for fast simulation of whole-brain models, such as Wilson-Cowan (subsubsection 2.1.1), Jansen-Rit (subsubsection 2.1.2), Stuart-Landau (subsubsection 2.1.3), Epileptor (subsubsection 2.1.4), Montbrió (subsubsection 2.1.5), and Wong-Wang (subsubsection 2.1.6). These whole-brain models are implemented across various numerical computing libraries such as Cupy (GPU-accelerated computing with Python), C++ (a highperformance systems programming language), Numba (a just-in-time compiler for accelerating Python code), and PyTorch (an open-source machine learning library for creating deep neural network).

The second module, Features, provides a versatile tool for extracting low-dimensional features from simulated time series (see subsection 2.6). The features include, but are not limited to, spectral, temporal, connectivity, statistical, and information theory related features, and the associated summary statistics. The third module focuses on Inference, i.e., training the deep neural density estimators, such as MAF and NSF (see subsection 2.2), to learn the joint posterior distribution of control parameters.

3. Results

In the following, we demonstrate the capability of VBI for inference on the stateof-the-art whole-brain network models using in-silico testing, where the ground truth is known. We apply this approach to simulate neural activity and associated measurements, including (s)EEG/MEG, and fMRI, while also providing diagnostics for the accuracy and reliability of the estimation. Note that for (s)EEG/MEG neuroimaging, we perform inference at the regional level rather than at the sensor level, whereas for fMRI, it is mapped using the Balloon-Windkessel model (see subsubsection 2.1.7). The results presented are based on synthetic data generated using a set of predefined parameters, referred to as the ground truth, randomly selected within biologically plausible ranges and incorporating a certain level of heterogeneity.

We first demonstrate inference on the whole-brain network model of the WilsonCowan (Eq. 1), which is capable of generating a wide range of oscillatory dynamics depending on the control parameters. Specifically, we estimate the bifurcation parameters Pi ∈ ℝ 88, representing the external input to each excitatory population, and the global excitatory coupling parameter ge. Figure 3A and B present the observed and predicted EEG-like signals, represented by the activity of excitatory populations E across regions. Panels C and D show the corresponding power spectral density (PSD), as data features. Panels E and F illustrate the inferred posterior distributions for parameters Pi and ge, respectively, given θ = {ge, Pi} ∈ ℝ 89. For training, we conducted 250k random simulations from uniform priors ge ∼ 𝒰 (0, 3) and Pi ∼ 𝒰 (0, 3) (shown in blue; see Table 1). After approximately 2 hours of training using MAF density estimators, posterior sampling was completed within a few seconds. Due to the large number of simulations and informativeness of data features, we achieved accurate estimations of the high-dimensional and heterogeneous control parameters. Ground-truth values (shown in green) are well recovered, leading to close agreement between observed and predicted PSDs of the signals. Finally, Figure 3G reports the posterior shrinkage and z-score metrics used to evaluate the quality of the parameter estimation. The results indicate that the inferred posteriors are both precise and well-centered around the ground-truth values, as reflected by high shrinkage and low z-scores. See Figure S2 for estimation over other configurations. Moreover, Figure S3, and Figure S4, indicate the higher accuracy of NSF, though with substantially more computational cost for training compared to MAF.

Bayesian inference on heterogeneous control parameters in the whole-brain network of Wilson-Cowan model.

The set of inferred parameters is , with the global scaling parameter G and average external input current to excitatory populations per Pi region, given i ∈{1, 2, …, Nn = 88} parcelled regions. Summary statistics of power spectrum density (PSD) were used for training of MAF density estimator, with a budget of 260k simulations. (A) and (B) illustrate the observed and predicted neural activities, respectively. (C) and (D) show the observed and predicted PSDs as the data. (E) and (F) display the posterior distribution of Pi per region, and global coupling G, respectively. The ground truth and prior are represented by a vertical green line and a blue distribution, respectively. (G) shows the inference evaluation using posterior shrinkage and z-score.

Then, we demonstrate the inference on heterogeneous control parameters in the whole-brain network of Jansen-Rit (see subsubsection 2.1.2), commonly used for modeling EEG/MEG data, e.g., in dementia and Alzheimer’s disease [145, 146]. Figure 4A and B show the observed and predicted EEG signals, given by (y1iy2i) at each region, while Figure 4C and D illustrate the observed and predicted features such as PSD, respectively. Figure 4E and F show the estimated posterior distributions of synaptic connections Ci, and the global coupling parameter G, respectively, given the set of unknown parameters . Here we conducted 50k random simulations with samples drawn from uniform priors G ∈ 𝒰 (0, 5) and Ci ∈ 𝒰 (100, 650) (shown in blue, see Table 2). After approximately 45 min of training (MAF density estimator), the posterior sampling took only a few seconds. With such a sufficient number of simulations and informative data features, VBI shows accurate estimation of high-dimensional heterogeneous parameters (given the ground truth, shown in green), leading to a strong correspondence between the observed and predicted PSD of EEG/MEG data. Figure 4G displays the shrinkage and z-score as the evaluation metrics, indicating an ideal Bayesian estimation for Ci parameters, but not for the coupling parameter G. This occurred because the network input did not induce a significant change in the intrinsic frequency of activities at regional level, resulting in diffuse uncertainty in its estimation for this model.

Bayesian inference on heterogeneous control parameters in the whole-brain network of Jansen-Rit model.

The set of inferred parameters is , with the global scaling parameter G and average numbers of synapse between neural populations per Ci region, given i ∈ {1, 2, …, Nn = 88} parcelled regions. Summary statistics of power spectrum density (PSD) were used for training, with a budget of 50k simulations. (A) and (B) illustrate the observed and predicted neural activities, respectively. (C) and (D) show the observed and predicted data features, such as PSD. (E) and (F) display the posterior distribution of Ci per region, and global coupling G, respectively. The ground truth and prior are represented by vertical green lines and a blue distribution, respectively. (G) shows the inference evaluation using the shrinkage and z-score of the estimated posterior distributions.

Note that relying on only alpha-peak while excluding other summary statistics, such as total power (i.e., area under the curve), leads to poor estimation of synaptic connections across brain regions (see Figure S5). This results in less accurate predictions of the PSD, with more dispersion in their amplitudes. This example demonstrates thatVBI provides a valuable tool for hypothesis evaluation and improved insight into data features by uncertainty quantification, and their impact on predictions.

To demonstrate efficient inference on the whole-brain time delay from EEG/MEG data, we used a whole-brain network model of coupled generic oscillators (StuartLandau model (see subsubsection 2.1.3). This model could establish a causal link between empirical spectral changes and the slower conduction velocities observed in multiple sclerosis patients, resulting from immune system attacks on the myelin sheath [49, 50]. The parameter set to estimate is , consisting of the global scaling parameter G and the averaged velocity of signal transmission V. The training was performed using a budget of only 2k simulations, which was sufficient due to the low dimensionality of the parameter space. Figure 5A illustrates the comparison between observed (in green) and predicted neural activities (in red). Figure 5B shows a close agreement between observed and predicted PSD signals, as the data feature used for training. Figure 5C and D provide visualizations of the posterior distributions for the averaged velocity V and the global coupling G. In these panels, we can see a large shrinkage in the posterior (in red) the uniform prior (in blue) centered around the true values (vertical green lines). Importantly, Figure 5E presenting the joint posterior distribution, indicates a high correlation of ρ = 0.7 between parameters G and V. This illustrates the advantage of Bayesian estimation in identifying statistical relationships between parameters, which helps to detect degeneracy among them. This is crucial for causal hypothesis evaluation and guiding conclusions in clinical settings. Finally, Figure 5F illustrates the sensitivity analysis (based on the eigenvalues of the posterior distribution), revealing that the posterior is more sensitive to changes in V compared to G. This highlights the relative impact of these parameters on the model’s posterior estimates.

Bayesian inference on global scaling parameter G and the averaged velocity V of signal transmission using the whole-brain network model of Stuart-Landau oscillators.

The set of estimated parameters is , and the summary statistics of PSD signals with a budget of 2k simulations were used for training. (A) illustrates exemplary observed and predicted neural activities (in green and red, respectively). (B) shows the observed and predicted PSD signals (in green and red, respectively). (C) and (D) display the posterior distribution of averaged velocity V and global coupling G, respectively. The true values and prior are shown as vertical green lines and a blue distribution, respectively. (E) shows the joint posterior distribution indicating a high correlation between posterior samples. (F) illustrates the sensitivity analysis based on the eigenvalues of the posterior distribution.

Next, we demonstrate the inference on a whole-brain model of epilepsy spread, known as the Virtual Epileptic Patient (VEP; [47, 76]), used to delineate the epileptogenic and propagation zone networks from the invasive sEEG recordings (subsubsection 2.1.4). Here, we used a large value for system time constant τ = 90 ms (see Table 4) to generate slow-fast dynamics in pathological areas, corresponding to seizure envelope at each brain region. Figure 6 demonstrates the inference the set of inferred parameters , with the global scaling parameter G and spatial map of epileptogenicity ηi, given i ∈ {1, 2, …, Nn = 88} parcelled regions. Figure 6A and B show the observed and predicted envelope, respectively, at each brain region. Here, the whole brain regions are classified into two epileptogenic zones (in red, corresponding to high excitability), three propagation zones (in yellow, corresponding to excitability close to bifurcation), and the rest as healthy regions (in green, corresponding to low excitability). Figure 6C and D illustrate the observed and predicted data features as the total power energy per region, calculated as the area under the curve. Additionally the seizure onset at each region was used as a data feature for training the MAF density estimator. From these panels, we observe accurate recovery of seizure envelopes in pathological regions. Figure 6E and F show that the posterior distribution of heterogeneous ηi, and global coupling parameter G, respectively, indicating 100% accurate recovery of the true values (in green). Figure 6G confirms the reliability and accuracy of the estimates through shrinkage and z-score diagnostics. With our efficient implementation, generating 10k whole-brain simulations took less than a minute (using 10 CPU cores). The training took approximately 13 minutes to converge, while posterior sampling required only a few seconds. See Figure S6 for a similar analysis with a faster time separation (τ = 10 ms). These results demonstrate an ideal and fast Bayesian estimation, despite the stiffness of equations in each region and the high dimensionality of the parameters. See Figure S7 and Figure S8 showing the accuracy and reliability of estimation under different levels of additive and dynamical noise. Note that for the VEP model, the total integration time is less than 100 ms, and due to the model’s stable behavior and a large time step integration, the simulation cost is significantly lower compared to other whole-brain models.

Bayesian inference on the spatial map of epileptogenicity across brain regions in the VEP model.

The set of inferred parameters is , as the global scaling parameter and spatial map of epileptogenicity with i ∈ {1, 2, …, Nn = 88} parcelled regions. (A) The observed seizure envelope generated by the Epileptor model, given two regions as epileptogenic zones (in red) and three regions as propagation zones (in yellow), while the rest are healthy (in green). (B) The predicted seizure envelope, by training MAF model on a dataset containing 10k simulations, using only the total power and seizure onset per region as the data features. (C) and (D) show the observed and predicted data features, respectively. (E) and (F) show the posterior distributions of heterogeneous control parameters ηi, and global coupling parameter G, respectively. (G) The posterior z-scores versus posterior shrinkages for estimated parameters.

Targeting the fMRI data, we demonstrate the inference on the whole-brain dynamics using Montbrió model (see subsubsection 2.1.5). Figure 7 demonstrates the inference on heterogeneous control parameters of the Montbrió model, operating in a bistable regime (Table 5). Figure 7A and B show the observed and predicted BOLD time series, respectively, while Figure 7C and D illustrate the observed and predicted data features, such as the static and dynamical functional connectivity matrices (FC and FCD, respectively). Figure 7E and F show the estimated posterior distributions of excitability ηi per brain region, and the global coupling parameter G. Figure 7G displays the reliability and accuracy of estimation through the evaluation of posterior shrinkage and z-score (see Equation 13 and 14). See Figure S9 for estimation over different configurations of the ground-truth values in this model.

Bayesian inference on heterogeneous control parameters in the whole-brain dynamics using Montbrió model.

The set of inferred parameters is , as the global scaling parameter and excitability per region, with i ∈{1, 2, …, Nn = 88} parcelled regions. VBI provides accurate and reliable posterior estimation using both spatio-temporal and functional data features for training, with a budget of 500k simulations. (A) and (B) illustrate the observed and predicted BOLD signals, respectively. (C) and (D) show the observed (upper triangular) and predicted (lower triangular) data features (FC and FCD), respectively. (E) and (F) display the posterior distribution of excitability parameters ηi per region, and global coupling G, respectively. The true values and prior are shown as vertical green lines and a blue distribution, respectively. (G) shows the inference evaluation by the shrinkage and z-score of the posterior distributions.

Due to the large number of simulations for training and the informativeness of the data features (both spatio-temporal and functional data features), the results indicate that we achieved accurate parameter estimation, consequently, a close agreement between the observed and predicted features of BOLD data. This required 500k simulations for training (see Figure S10), given the uniform priors G ∈ 𝒰 (0, 1) and ηi ∈ 𝒰 (− 6, −3.5). After approximately 10 h of training (of MAF density estimator), posterior sampling took only 1 min. Our results indicate that training the MAF model was 2 to 4 times faster than the NSF model (see Figure S1B). Note that removing the spatio-temporal features and considering only FC/FCD as the data features (see Figure S11) leads to poor estimation of the excitability parameter across brain regions (see Figure S12). Interestingly, accurate estimation of the only global coupling parameter, , from only FC/FCD requires around 100 simulations (see Figure S13 and Figure S14). This showcase demonstrates the capability ofVBI in inferring heterogeneous excitability, given the bistable brain dynamics, for fMRI studies. See Figure S15 and Figure S16 showing the accuracy and reliability of estimation under different levels of additive and dynamical noise.

Finally, in Figure 8, we show the inference on the so-called pDMF model, i.e., a whole-brain network model of the reduced Wong-Wang equation (see subsubsection 2.1.6), comprising 10 control parameters: the global scaling of connections G and the linear coefficients (aw, bw, cw, aI, bI, cI, aσ, bσ, cσ) ∈ ℝ9. These parameters are introduced to reduce the dimension of whole-brain parameters as recurrent connection strength wi, external input current Ii, and noise amplitude σi for each region (in total, 264 parameters were reduced to 9 dimensions; see Equation 7). Here, we used summary statistics of both spatio-temporal and functional data features extracted from simulated BOLD data to train the MAF density estimator, with a budget of 50k simulations. The training took around 160 min to converge, whereas posterior sampling took only a few seconds.

Bayesian inference on the parametric mean-field model of Wong-Wang (also known as pDMF model), with linear coefficients (aw, bw, cw, aI, bI, cI, aσ, bσ, cσ) ∈ ℝ9, reparameterizing the recurrent connection strength wi, external input current Ii, and noise amplitude σi for each region.

Summary statistics of spatio-temporal and functional data features were used for training, with a budget of 50k simulations. (A) The diagonal panels display the ground-true values (in green), the uniform prior (in blue), and the estimated posterior distributions (in red). The upper diagonal panels illustrate the joint posterior distributions between parameters, along with their correlation (ρ, in the upper left corners), and ground-truth values (green stars). High-probability areas are colorcoded in yellow, while low-probability areas are shown in black. (B) The observed and predicted BOLD time series (in green and red, respectively). (C) The observed and predicted data features, such as FC/FCD matrices. (D) The inference evaluation by calculating the shrinkage and z-score of the estimated posterior distributions.

The diagonal panels in Figure 8A show estimated posterior distributions (in red), along with the prior (in blue) and true values (in green). The upper diagonal panels illustrate the joint posterior distributions between parameters (i.e., statistical dependency between parameters). Figure 8B illustrates the observed and predicted BOLD time series, generated by true and estimated parameters (in blue and red, respectively). From Figure 8C, we can see a close agreement between the observed and predicted data features (FC/FCD matrices). Note that due to the stochastic nature of the generative process, we do not expect an exact element-wise correspondence between these features, but rather a match in their summary statistics, such as the mean, variance, and higher-order moments (see Figure S17). Figure 8D shows the posterior z-score versus shrinkage, indicating less accurate estimation for the coefficients cw, cI, and cσ compared to others, as they are not informed by anatomical data such as the T1w/T2w myelin map and the first FC principal gradient (see Equation 7). This showcase demonstrates the advantage of Bayesian inference over optimization in assessing the accuracy and reliability of parameter estimation, whether informed by anatomical data.

Note that in the whole-brain network of Wong-Wang model (6), the global scaling parameter G and synaptic coupling J exhibit structural non-identifiability, meaning their combined effects on the system cannot be uniquely disentangled (see Figure S18, and Figure S19). This is evident in the parameter estimations corresponding to selected observations, where the posterior distributions appear diffuse. The joint posterior plots reveal a nonlinear dependency (banana shape) between G and J, arising from their product in the neural mass equation (see Equation 6). Such a nonlinear relationship between parameters poses challenges for deriving causal conclusions, as often occurs in other neural mass models. This is a demonstration of how Bayesian inference facilitates causal hypothesis testing without requiring additional non-identifiability analysis.

4. Discussion

This study introduces the Virtual Brain Inference (VBI), a flexible and integrative toolkit designed to facilitate probabilistic inference on complex whole-brain dynamics using connectome-based models (forward problem) and simulation-based inference (inverse problem). The toolkit leverages high-performance programming languages (C++) and dynamic compilers (such as Python’s JIT compiler), alongside the computational power of parallel processors (GPUs), to significantly enhance the speed and efficiency of simulations. Additionally,VBI integrates popular feature extraction libraries with parallel multiprocessing to efficiently convert simulated time series into low-dimensional summary statistics. Moreover,VBI incorporates state-of-the-art deep neural density estimators (such as MAF and NSF generative models) to estimate the posterior density of control parameters within whole-brain models given low-dimensional data features. Our results demonstrated the versatility and efficacy of theVBI toolkit across commonly used whole-brain network models, such as WilsonCowan, Jansen-Rit, Stuart-Landau, Epileptor, Montbrió, and Wong-Wang equations placed at each region. The ability to perform parallel and rapid simulations, coupled with a taxonomy of feature extraction, allows for detailed and accurate parameter estimation from associated neuroimaging modalities such as (s)EEG/MEG/fMRI. This is crucial for advancing our understanding of brain dynamics and the underlying mechanisms of various brain disorders. Overall, VBI represents a substantial improvement over alternative methods, offering a robust framework for both simulation and parameter estimation, and contributing to the advancement of network neuroscience, potentially, to precision medicine.

The alternatives for parameter estimation include optimization techniques [66, 34, 36, 63, 64], approximate Bayesian computation (ABC) method, and MCMC sampling. Optimization techniques are sensitive to the choice of the objective function (e.g., minimizing distance error or maximizing correlation) and do not provide estimates of uncertainty. Although multiple runs and thresholding can be used to address these issues, such methods often fall short in revealing relationships between parameters, such as identifying degeneracy, which is crucial for reliable causal inference. Alternatively, a technique known as ABC compares observed and simulated data using a distance measure based on summary statistics [147, 148, 149]. It is known that ABC methods suffer from the curse of dimensionality, and their performance also depends critically on the tolerance level in the accepted/rejected setting [71, 70]. The self-tuning variants of MCMC sampling have also been used for model inversion at the whole-brain level [76, 67]. Although MCMC is unbiased and exact with infinite runs, it can be computationally prohibitive, and sophisticated reparameterization methods are often required to facilitate convergence at whole-brain level [123, 150]. This becomes more challenging for gradient-based MCMC algorithms, due to the bistability and stiffness of neural mass models. Tailored to Bayes’ rule, SBI sidesteps these issues by relying on expressive deep neural density estimators (such as MAF and NSF) on low-dimensional data features to efficiently approximate the posterior distribution of model parameters. Taking spiking neurons as generative models, this approach has demonstrated superior performance compared to alternative methods, as it does not require model or data features to be differentiable [71, 81].

In previous studies, we demonstrated the effectiveness of SBI on virtual brain models of neurological [68, 49, 50, 54], and neurodegenerative diseases [69, 51, 23] as well as focal intervention [59] and healthy aging [37]. In this work, we extended this probabilistic methodology to encompass a broader range of whole-brain network models, highlighting its flexibility and scalability in leveraging diverse computational resources, from CPUs/GPUs to high-performance computing facilities.

Our results indicated that the VBI toolkit effectively estimates posterior distributions of control parameters in whole-brain network modeling, offering a deeper understanding of the mechanisms underlying brain activity. For example, using the Montbrio and Wong-Wang models, we achieved a close match between observed and predicted FC/FCD matrices derived from BOLD time series (Figure 7 and Figure 8). Additionally, the Jansen-Rit and Stuart-Landau models provided accurate inferences of PSD from neural activity (Figure 4, and Figure 5), while the Epileptor model precisely captured the spread of of seizure envelopes (Figure 6). These results underscore the toolkit’s capability to manage complex, high-dimensional data with precision. Uncertainty quantification usingVBI can illuminate and combine the informativeness of data features (e.g., FC/FCD) and reveal the causal drivers behind interventions [37, 69, 59]. This adaptability ensures thatVBI can be applied across various (source-localized) neuroimaging modalities, accommodating different computational capabilities and research needs.

Note that there is no specific rule for determining the optimal number of simulations required for training. In general, a larger number of simulations, depending on the available computational resources, tends to improve the quality of posterior estimation. However, when using synthetic data, we can monitor the z-score and posterior shrinkage to assess the accuracy and reliability of the inferred parameters (see Figure S10). This also critically depends on the parameter dimensionality. For instance, in estimating only global coupling parameter, a maximum of 300 simulations was used, demonstrating accurate estimation across models and different configurations (see Figure S20), except for the Jansen-Rit model, where coupling did not induce a significant change in the intrinsic frequency of regional activity. Importantly, the choice of data features is critical, and some factors (e.g., that lead to inaccurate feature calculation) can lead to the collapse of this method. For instance, high noise levels in observations or dynamical noise can compromise the accurate calculation of data features, undermining the inference process (see Figure S7, Figure S8, Figure S15, Figure S16). Identifying the set of low-dimensional data features that are relevant to the control parameters for each case study is another challenge in effectively applying SBI. Nevertheless, the uncertainty of the posterior informs us about the predictive power of these features. Statistical moments of time series could be effective candidates for most models. However, this poses a formidable challenge for inference from empirical data, as certain moments, such as the mean and variance, may be lost during preprocessing steps. The hyperparameter and noise estimation can also be challenging for SBI. Moreover, there is no established rule for determining the number of simulations for training, aside from relying on z-score values during in-silico testing, as it depends on available computational resources.

Various sequential methods, such as SNPE [125], SNLE [151], and SNRE [152], have been proposed to reduce computational costs of SBI by iteratively refining the fit to specific targets. These approaches aim for more precise parameter estimation by progressively adjusting the model based on each new data set or subset, potentially enhancing the accuracy of the fit at the reduced computational effort. The choice of method depends on the specific characteristics and requirements of the problem being addressed [128]. Our previous study indicates that for inferring whole-brain dynamics of epilepsy spread, the SNPE method outperforms alternative approaches [68]. Nevertheless, sequential methods can become unstable, with simulators potentially diverging and causing probability mass to leak into regions that lack prior support [68]. In this study, we used single-round training to benefit from an amortization strategy. This approach brings the costs of simulation and network training upfront, enabling inference on new data to be performed rapidly (within seconds). This strategy facilitates personalized inference at the subject level, as the generative model is tailored by the SC matrix, thereby allowing for rapid hypothesis evaluation specific to each subject (e.g., in delineating the epileptogenic and propagation zones). Note that model comparison across different configurations or model structures, as well-established in dynamic causal modeling [153, 154, 155], has yet to be explored in this context.

Deep learning algorithms are increasingly gaining traction in the context of wholebrain modeling. The VBI toolkit leverages a class of deep generative models, called Normalizing Flows (NFs; [130, 131]), to model probability distributions given samples drawn from those distributions. Using NFs, a base probability distribution (e.g., a standard normal) is transformed into any complex distribution (potentially multimodal) through a sequence of invertible transformations. Variational autoencoders (VAEs; [156, 157]) is a class of deep generative models to encode data into a latent space and then decode it back to reconstruct the original data. Recently, Sip et al.

[158] introduced a method using VAEs for nonlinear dynamical system identification, enabling the inference of neural mass models and regionand subject-specific parameters from functional data. VAEs have also been employed for dimensionality reduction of whole-brain functional connectivity [42], and to investigate various pathologies and their severity by analyzing the evolution of trajectories within a lowdimensional latent space [52]. Additionally, Generative Adversarial Networks (GANs; [159, 160]) have demonstrated remarkable success in mapping latent space to data space by learning a manifold induced from a base density [129]. This method merits further exploration within the context of whole-brain dynamics. To fully harness the potential of deep generative models in large-scale brain network modeling, integrating VAEs and GANs into the VBI framework would be beneficial. This will elucidate their strengths and limitations within this context and guide future advancements in the field.

In summary, VBI offers fast simulations, taxonomy of feature extraction, and deep generative models, making it a versatile tool for model-based inference from different neuroimaging modalities, helping researchers to explore brain (dys)functioning in greater depth. This advancement not only enhances our theoretical understanding but also holds promise for practical applications in diagnosing and treating neurological conditions.

5. Glossary of technical terms

  • Bayesian Rule: A fundamental belief updating principle that calculates the probability of a hypothesis given new evidence.

  • Deep neural density estimators: A class of artificial neural network-based approaches that are used to learn and approximate the underlying probability distribution from a given dataset.

  • Control (generative) parameters: The bifurcation parameters, setting, or configuration within a generative model that controls the synthesis of data and potentially represents causal relationships.

  • Generative model: A statistical, machine learning, or mechanistic model that represents the underlying data distribution to generate new data resembling the original dataset.

  • Likelihood: The conditional probability of observing the evidence given a particular hypothesis.

  • Markov chain Monte Carlo: A family of stochastic algorithms used for uncertainty quantification by drawing random samples from probability distributions, in which the sampling process does not require knowledge of the entire distribution.

  • Prior: The initial probability assigned to a hypothesis before considering new evidence.

  • Probability distribution: The statistical description of potential outcomes of random events, where a numerical measure is assigned to the possibility of each specific outcome.

  • Probability density estimation: The process of inferring the underlying probability distribution of a random event based on observed data.

  • Posterior: The updated probability of a hypothesis after taking into account both prior beliefs and observed evidence.

  • Simulation-based inference: A machine learning approach that involves generating synthetic data through forward simulations to make inferences about complex systems, often when analytic or computational solutions are unavailable.

  • Whole-brain or virtual brain models: Computational models informed by personalized anatomical data, i.e., a set of equations describing regional brain dynamics placed at each node, which are then connected through structural connectivity matrix.

Information Sharing Statement

All code is available on GitHub (https://github.com/ins-amu/vbi).

Supplementary figures

(A) B enchmarking the simulation cost of the whole-brain network of Montbrió model, with 2.5M time step on Intel(R) Core(TM) i9-10900 CPU 2.80 GHz (in red) and NVIDIA RTX A5000 GPU (in blue). GPUs deliver substantial speedups up to 100x over multi-core CPUs. (B) Comparing MAF and NSF density estimators, MAF is typically 2-4 times faster than NSF during the training process.

Inference across different configurations of heterogeneous generative parameters in the whole-brain network model based on the Wilson-Cowan framework.

The set of unknown control parameters is . We observe accurate posterior estimation across different scenarios by varying the external input current map across brain regions. Features derived from the power spectrum density were used for training, with a simulation budget of 260k. The regions are color-coded based on their current values: blue indicates low, green medium, and red high external input current and the size of the regions reflects the shrinkage of the posterior distribution for the corresponding inferred parameters. Top: axial view. Bottom: sagittal view. The three configurations (from left to right) correspond to: randomly assigned heterogeneous Pi values, increased Pi for the five strongest nodes, and increased Pi for the five weakest nodes.

Bayesian inference on heterogeneous control parameters in the whole-brain network of Wilson-Cowan model, as Figure 3, but ignoring the spatial information of the data features.

Here, we observe that the MAF density estimator leads to a multimodal posterior distribution for low values of Pi, when trained with a reduced set of data features, suggesting increased uncertainty or potential non-identifiability under limited information.

Same as Figure S3, but using the NSF density estimator.

Here, we observe more accurate estimation, however, NSF required approximately 6 hours of training compared to 20 minutes for MAF.

Inference on heterogeneous generative parameters in the whole-brain network of the Jansen-Rit model (Equation 2).

The set of inferred parameters is θ- = G, Ci R89, consisting of the global scaling parameter G and the average number of synapses between neural populations Ci per region, with i ∈ {1, 2, …, Nn = 88} parcelled regions. For panels A to C, only the area under the curve of the power spectral density (PSD) in the [0,30] Hz band was used, while for panels D to F, additional statistical features of the PSD were included in the training, with a total budget of 50k simulations. (A) and (D) illustrate the observed and predicted EEG signals. (B) and (E) show the observed and predicted PSD signals. (C) and (F) display the posterior distribution of Ci for each region. The true values are indicated by vertical green lines, and the prior distribution is shown in blue.

Bayesian inference on the spatial map of epileptogenicity across brain regions in the VEP model (Equation 4), with slow time scale separation, τ = 10 ms.

(A) The observed seizure envelope generated by the 2D Epileptor model given two regions as epileptogenic zones (in red) and three regions as propagation zones (in yellow), while the rest are healthy (in green). (B) The predicted seizure envelope, by training MAF density estimator on a dataset containing 10k simulations, using only the total power per region as the data features. (C) and (D) show the observed and predicted data features, respectively. (E) and (F) show the posterior distribution of heterogeneous ηi, and global coupling parameter G, respectively. (F) The posterior z-scores versus posterior shrinkages for estimated parameters.

Evaluating the performance of estimation over the global coupling parameter, under different levels of additive noise in the VEP model.

(A) Simulated time series under different levels of additive noise. (B) and (C) Posterior shrinkage and z-score, respectively, as function of noise level σ. (D) Ground-true (in green), prior (in blue), and posterior distributions (in red) for global coupling parameter. We observe that at higher noise levels (e.g., σ = 0.3 and σ = 0.6), the ground truth is not captured within the posterior, leading to increased z-scores compared to the noise-free case σ = 0.

Evaluating the accuracy of dynamical noise estimation in the VEP model.

(A) Simulated time series generated under different levels of dynamical noise. (B) and (C) Posterior shrinkage and z-score, respectively, across different scenarios, demonstrating the accuracy and reliability of estimating the dynamical noise parameter. (D) Ground-true (in green), prior (in blue) and posterior distributions (in red), for different realizations of the dynamical noise parameter σ in the system. We observe robust estimation even at high levels of dynamical noise, though for σ ≥ 1.6 the posterior sampling fails (shaded area).

Inference over different configurations of heterogeneous generative parameters in the whole-brain network model of the Montbrió, (see Figure 7).

The set of inferred parameters is , as the global scaling parameter and excitability per region, with i ∈ {1, 2, …, Nn = 88} parcelled regions.

Evaluating the reliability and accuracy of estimation (parameter G) with respect to the number of simulation used for training the MAF/NSF density estimator by (A) posterior shrinkage, and (B) posterior z-score, versus the number of simulations.

Scatter plot of summary statistics from the FC/FCD matrices in the whole-brain network of Montbrió model (Equation 5).

Inference on heterogeneous generative parameters in the whole-brain network of Montbrió model (Equation 5) involves estimating a parameter set denoted as from FC/FCD matrices.

Here, G is the global scaling parameter, and ηi represents the excitability of each brain region, where i ∈ {1, 2, …, Nn = 88} corresponds to the parcelled regions. Relying on only functional data features (FC/FCD) for training proves insufficient to accurately estimate the posterior distribution of heterogeneous excitabilities, even with a budget of 500k simulations. In (A), the posterior estimates are shown in red, the true values of ηi in green, and the prior distribution in blue. Panels (B) and (C) show the observed and predicted FC/FCD matrices, respectively, as the data features. The simulation to generate training data took approximately 24 hours, utilizing 10 GPUs concurrently. The training process lasted around 8 hours, while sampling required just a few seconds.

Inference of the global coupling parameter in the whole-brain network of Montbrió model (Equation 5) using functional data features, using a budget of only 100 simulations.

(A) The uniform prior distribution of G in blue, the estimated posterior in red and the true value is shown in green. (B) and (C) The observed and predicted FC/FCD matrices, respectively.

Inference of the global coupling parameter in the whole-brain model of Montbrió (Equation 5) using TVB simulator, using summation of FC and FCD matrix elements as low dimensional data features, constrained by a budget of only 100 simulations.

(A) The uniform prior distribution of G in blue, the estimated posterior in red and the true value is shown in green. (B) and (C) show the observed and predicted FC/FCD matrices, respectively.

Evaluating the performance of estimation over the global coupling parameter under different levels of noise added to the BOLD signal in the Montbrió model.

(A) and (B) Simulated time series and corresponding FCD matrices, respectively, under different levels of additive noise. (C) and (D) Posterior shrinkages and z-scores, respectively, for different noise values. (E) Ground-true (in green), prior (in blue) and posterior distribution (in red), for estimation of coupling parameter G, given different values of additive noise to the BOLD signals, indicating robust inference. However, for very high values of noise (σ ≥ 0.45) the posterior sampling fails (shaded area).

Evaluating the accuracy of dynamical noise estimation in the Montbrió model.

(A) and (B) Simulated time series and corresponding FCD matrices, respectively, under different levels of dynamical noise. (C) and (D) Posterior shrinkages and z-scores, respectively, for different noise values. (E) Ground-true (in green), prior (in blue) and posterior distributions (in red), for different realizations of the dynamical noise parameter σ in the system. We observe robust estimation even at high levels of dynamical noise.

The distribution of observed and predicted functional connectivity (FC) in panel and the functional connectivity dynamics in panel (B), which are demonstrated in Figure 8.

The non-identifiability in estimating G and J in the Wong-Wang model (Equation 6).

A total of 16k simulations and functional data features (see Figure S18) were used for training. The upper row illustrates the FCD variance and the sum of FC matrices plotted against the parameters G and J, respectively, for three selected observations marked by red circles. The lower panels display the parameter estimations corresponding to the selected observations. Due to structural non-identifiability in the product of G and J, the posterior distributions appear diffuse, showing a nonlinear dependency (banana shape) in the joint posterior plots.

Scatter plot of summary statistics from the FC/FCD matrices in the whole-brain network of Wong-Wang model (Equation 6).

Estimation of the global coupling parameter across whole-brain models, evaluated for multiple configurations (each represented by a violin plot).

The MAF density estimator was trained using the same features as described for the heterogeneous case in the main text. The simulation budget was limited to a maximum of 300. The green dashed line indicates an ideal estimation, while the gray dashed line represents a linear regression based on the mean values of the posterior sampling distributions. We can observe an accurate posterior estimation for coupling parameter across different models and configurations, except for the Jansen-Rit model, where coupling did not induce a significant change in the intrinsic frequency of regional activity. Note that the Montbrió model shows more diffuse estimations compared to the others, due to switching dynamics driven by noise.

Acknowledgements

This project/research has received funding from the European Union’s Horizon Europe Programme under the Specific Grant Agreement No. 101147319 (EBRAINS 2.0 Project), No. 101137289 (Virtual Brain Twin Project), No. 101057429 (project environMENTAL), and government grant managed by the Agence Nationale de la Recherche reference ANR-22-PESN-0012 (France 2030 program). We acknowledge the use of Fenix Infrastructure resources, which are partially funded from the European Union’s Horizon 2020 research and innovation programme through the ICEI project under the grant agreement No. 800858. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Additional information

Author contributions

Conceptualization: V.J., and M.H. Methodology: A.Z., M.W., L.D., and M.H. Software: A.Z, M.W., L.D., and M.H. Investigation: A.Z. Visualization: A.Z., Supervision: V.J., S.P., and M.H. Funding acquisition: V.J., M.H. Writing - original draft: A.Z., and M.H. Writing - review & editing: A.Z, M.W., L.D., S.P., V.J, and M.H.

Abbreviations

  • VBI: virtual brain inference

  • BOLD: blood-oxygen-level-dependent

  • fMRI: functional magnetic resonance imaging

  • EEG: Electroencephalography

  • MEG: Magnetoencephalography

  • sEEG: Stereoelectroencephalography

  • SC: structural connectivity

  • FC: functional connectivity

  • FCD: functional connectivity dynamic

  • PSD: power spectral density

  • SBI: simulation-based inference

  • MAF: masked autoregressive flow

  • NSF: neural spline flow

  • MCMC: Markov chain Monte Carlo