Every Model, Every Formula, Runnable Code
You're staring at a Polymarket contract. "Will the Fed cut rates in March?" YES is trading at $0.62.
Your instinct says: that's a 62% probability. Maybe you think it should be 70%. So you buy.
Congratulations. You just did what every retail trader does. You treated a prediction market contract like a coin flip with a known bias, estimated your own bias, and bet the difference.
Problems:
A coin flip has one parameter: p.
A prediction market contract embedded in a portfolio of correlated events, with time-varying information flow, order book dynamics, and execution risk, has dozens.
Every simulation in this article ultimately reduces to Monte Carlo: draw samples from a distribution, compute a statistic, repeat.
The estimator for event probability \( p = P(A) \) is just the sample mean.
The Central Limit Theorem gives you the convergence rate: \( O(N^{-1/2}) \), with variance:
$$\text{Var}(\hat{p}_N) = \frac{p(1-p)}{N}$$The variance is maximized at \( p = 0.5 \) — a contract trading at 50 cents (the most uncertain, most actively traded contract on the platform) is exactly where your Monte Carlo estimates are least precise.
To hit ±0.01 precision at 95% confidence when \( p = 0.50 \): \( N \approx 9{,}604 \)
Goal: Estimate the probability that an asset-linked binary contract pays off (e.g., "Will AAPL close above $200 by March 15?")
import numpy as np
def simulate_binary_contract(S0, K, mu, sigma, T, N_paths=100_000):
"""
Monte Carlo simulation for a binary contract.
S0: Current asset price
K: Strike / threshold
mu: Annual drift
sigma: Annual volatility
T: Time to expiry in years
N_paths: Number of simulated paths
"""
# Simulate terminal prices via GBM
Z = np.random.standard_normal(N_paths)
S_T = S0 * np.exp((mu - 0.5 * sigma**2) * T + sigma * np.sqrt(T) * Z)
# Binary payoff
payoffs = (S_T > K).astype(float)
# Estimate and confidence interval
p_hat = payoffs.mean()
se = np.sqrt(p_hat * (1 - p_hat) / N_paths)
ci_lower = p_hat - 1.96 * se
ci_upper = p_hat + 1.96 * se
return {
'probability': p_hat,
'std_error': se,
'ci_95': (ci_lower, ci_upper),
'N_paths': N_paths
}
# Example: AAPL at $195, strike $200, 20% vol, 30 days
result = simulate_binary_contract(S0=195, K=200, mu=0.08, sigma=0.20, T=30/365)
print(f"P(AAPL > $200) ≈ {result['probability']:.4f}")
print(f"95% CI: ({result['ci_95'][0]:.4f}, {result['ci_95'][1]:.4f})")
This works. For one contract, with one underlying, assuming lognormal dynamics. Real prediction markets break every one of those assumptions.
def brier_score(predictions, outcomes):
"""Evaluate simulation calibration."""
return np.mean((np.array(predictions) - np.array(outcomes))**2)
# Compare two models
model_A_preds = [0.7, 0.3, 0.9, 0.1] # sharp, confident
model_B_preds = [0.5, 0.5, 0.5, 0.5] # always uncertain
actual_outcomes = [1, 0, 1, 0]
print(f"Model A Brier: {brier_score(model_A_preds, actual_outcomes):.4f}") # 0.05
print(f"Model B Brier: {brier_score(model_B_preds, actual_outcomes):.4f}") # 0.25
A Brier score below 0.20 is good. Below 0.10 is excellent. The best election forecasters (538, Economist) historically achieve 0.06–0.12 on presidential races.
Polymarket hosts contracts on extreme events. "Will the S&P 500 drop 20% in one week?" is trading at $0.003. With crude Monte Carlo at 100,000 samples, you might see zero or one hit.
Your estimate is either 0.00000 or 0.00001 — both useless.
Importance sampling replaces the original probability measure with one that oversamples the rare region, then corrects the bias with a likelihood ratio (Radon-Nikodym derivative).
The practical workhorse is exponential tilting. If your underlying follows a random walk with increments \( \Delta_i \) having moment generating function \( M(\gamma) = E[e^{\gamma\Delta}] \), you tilt the distribution, choosing \( \gamma \) to make the rare event typical. For a contract that pays off when a sum exceeds a large threshold, \( \gamma \) solves the Lundberg equation \( M(\gamma) = 1 \).
def rare_event_IS(S0, K_crash, sigma, T, N_paths=100_000):
"""
Importance sampling for extreme downside binary contracts.
Example: P(S&P drops 20% in one week)
"""
K = S0 * (1 - K_crash) # e.g., 20% crash threshold
# Original drift (risk-neutral)
mu_original = -0.5 * sigma**2
# Tilted drift: shift the mean toward the crash region
log_threshold = np.log(K / S0)
mu_tilt = log_threshold / T # center the distribution on the crash
Z = np.random.standard_normal(N_paths)
# Simulate under TILTED measure
log_returns_tilted = mu_tilt * T + sigma * np.sqrt(T) * Z
S_T_tilted = S0 * np.exp(log_returns_tilted)
# Likelihood ratio: original density / tilted density
log_LR = (
-0.5 * ((log_returns_tilted - mu_original * T) / (sigma * np.sqrt(T)))**2
+ 0.5 * ((log_returns_tilted - mu_tilt * T) / (sigma * np.sqrt(T)))**2
)
LR = np.exp(log_LR)
# IS estimator
payoffs = (S_T_tilted < K).astype(float)
is_estimates = payoffs * LR
p_IS = is_estimates.mean()
se_IS = is_estimates.std() / np.sqrt(N_paths)
# Compare with crude MC
Z_crude = np.random.standard_normal(N_paths)
S_T_crude = S0 * np.exp(mu_original * T + sigma * np.sqrt(T) * Z_crude)
p_crude = (S_T_crude < K).mean()
se_crude = np.sqrt(p_crude * (1 - p_crude) / N_paths) if p_crude > 0 else float('inf')
return {
'p_IS': p_IS, 'se_IS': se_IS,
'p_crude': p_crude, 'se_crude': se_crude,
'variance_reduction': (se_crude / se_IS)**2 if se_IS > 0 else float('inf')
}
result = rare_event_IS(S0=5000, K_crash=0.20, sigma=0.15, T=5/252)
print(f"IS estimate: {result['p_IS']:.6f} ± {result['se_IS']:.6f}")
print(f"Crude estimate: {result['p_crude']:.6f} ± {result['se_crude']:.6f}")
print(f"Variance reduction factor: {result['variance_reduction']:.1f}x")
On extreme contracts, IS can reduce variance by factors of 100–10,000x. 100 IS samples give better precision than 1,000,000 crude samples.
The state evolves via a logit random walk (keeps probabilities bounded).
import numpy as np
from scipy.special import expit, logit # sigmoid and logit
class PredictionMarketParticleFilter:
"""
Sequential Monte Carlo filter for real-time event probability estimation.
"""
def __init__(self, N_particles=5000, prior_prob=0.5, process_vol=0.05, obs_noise=0.03):
self.N = N_particles
self.process_vol = process_vol
self.obs_noise = obs_noise
logit_prior = logit(prior_prob)
self.logit_particles = logit_prior + np.random.normal(0, 0.5, N_particles)
self.weights = np.ones(N_particles) / N_particles
self.history = []
def update(self, observed_price):
"""Incorporate a new observation (market price, poll result, etc.)"""
# 1. Propagate: random walk in logit space
noise = np.random.normal(0, self.process_vol, self.N)
self.logit_particles += noise
# 2. Convert to probability space
prob_particles = expit(self.logit_particles)
# 3. Reweight: likelihood of observation given each particle
log_likelihood = -0.5 * ((observed_price - prob_particles) / self.obs_noise)**2
log_weights = np.log(self.weights + 1e-300) + log_likelihood
log_weights -= log_weights.max()
self.weights = np.exp(log_weights)
self.weights /= self.weights.sum()
# 4. Check ESS and resample if needed
ess = 1.0 / np.sum(self.weights**2)
if ess < self.N / 2:
self._systematic_resample()
self.history.append(self.estimate())
def _systematic_resample(self):
"""Systematic resampling - lower variance than multinomial."""
cumsum = np.cumsum(self.weights)
u = (np.arange(self.N) + np.random.uniform()) / self.N
indices = np.searchsorted(cumsum, u)
self.logit_particles = self.logit_particles[indices]
self.weights = np.ones(self.N) / self.N
def estimate(self):
"""Weighted mean probability estimate."""
probs = expit(self.logit_particles)
return np.average(probs, weights=self.weights)
def credible_interval(self, alpha=0.05):
"""Weighted quantile-based credible interval."""
probs = expit(self.logit_particles)
sorted_idx = np.argsort(probs)
sorted_probs = probs[sorted_idx]
sorted_weights = self.weights[sorted_idx]
cumw = np.cumsum(sorted_weights)
lower = sorted_probs[np.searchsorted(cumw, alpha/2)]
upper = sorted_probs[np.searchsorted(cumw, 1 - alpha/2)]
return lower, upper
# --- Simulate election night ---
pf = PredictionMarketParticleFilter(prior_prob=0.50, process_vol=0.03)
observations = [0.50, 0.52, 0.55, 0.58, 0.61, 0.63, 0.60, 0.65, 0.70, 0.75, 0.80, 0.85, 0.90, 0.95]
print("Election Night Tracker:")
print(f"{'Time':>6} {'Observed':>10} {'Filtered':>10} {'95% CI':>20}")
print("-" * 52)
for t, obs in enumerate(observations):
pf.update(obs)
ci = pf.credible_interval()
print(f"{t:>5}h {obs:>10.3f} {pf.estimate():>10.3f} ({ci[0]:.3f}, {ci[1]:.3f})")
The particle filter smooths noise and propagates uncertainty. When the market spikes from $0.58 to $0.65 on a single trade, the filter tempers the update based on how volatile the observation process has been.
When the payoff function is monotone (which binary contracts always are), variance reduction is guaranteed. Typical reduction: 50–75%. Zero extra computational cost.
If simulating a binary contract under stochastic volatility (no closed form), use the Black-Scholes digital price (closed form) as a control variate.
Partition the probability space into \( J \) strata, sample within each, combine. Maximum gain from Neyman allocation: \( n_j \propto \omega_j \sigma_j \) (oversample strata with high variance).
def stratified_binary_mc(S0, K, sigma, T, J=10, N_total=100_000):
"""Stratified MC for binary contract pricing."""
n_per_stratum = N_total // J
estimates = []
for j in range(J):
U = np.random.uniform(j/J, (j+1)/J, n_per_stratum)
Z = norm.ppf(U)
S_T = S0 * np.exp((-0.5*sigma**2)*T + sigma*np.sqrt(T)*Z)
stratum_mean = (S_T > K).mean()
estimates.append(stratum_mean)
p_stratified = np.mean(estimates)
se_stratified = np.std(estimates) / np.sqrt(J)
return p_stratified, se_stratified
p, se = stratified_binary_mc(S0=100, K=105, sigma=0.20, T=30/365)
print(f"Stratified estimate: {p:.6f} ± {se:.6f}")
Stack all three: antithetic variates inside each stratum, with a control variate correction → routinely achieve 100–500x variance reduction over crude MC. This is table stakes in production.
The hierarchical Bayesian model implicitly encodes correlation through shared national swing parameters. But what about tail dependence — the tendency for extreme co-movements that don't show up in linear correlation?
In 2008, the Gaussian copula's failure to model tail dependence contributed to the global financial crisis.
where \( C \) is the copula (pure dependency structure) and \( F_i \) are the marginal CDFs.
import numpy as np
from scipy.stats import norm, t as t_dist
def simulate_correlated_outcomes_gaussian(probs, corr_matrix, N=100_000):
"""Gaussian copula — no tail dependence."""
d = len(probs)
L = np.linalg.cholesky(corr_matrix)
Z = np.random.standard_normal((N, d))
X = Z @ L.T
U = norm.cdf(X)
outcomes = (U < np.array(probs)).astype(int)
return outcomes
def simulate_correlated_outcomes_t(probs, corr_matrix, nu=4, N=100_000):
"""Student-t copula — symmetric tail dependence."""
d = len(probs)
L = np.linalg.cholesky(corr_matrix)
Z = np.random.standard_normal((N, d))
X = Z @ L.T
S = np.random.chisquare(nu, N) / nu
T = X / np.sqrt(S[:, None])
U = t_dist.cdf(T, nu)
outcomes = (U < np.array(probs)).astype(int)
return outcomes
def simulate_correlated_outcomes_clayton(probs, theta=2.0, N=100_000):
"""Clayton copula (bivariate) — lower tail dependence."""
V = np.random.gamma(1/theta, 1, N)
E = np.random.exponential(1, (N, len(probs)))
U = (1 + E / V[:, None])**(-1/theta)
outcomes = (U < np.array(probs)).astype(int)
return outcomes
# --- Compare tail behavior ---
probs = [0.52, 0.53, 0.51, 0.48, 0.50] # 5 swing state probabilities
state_names = ['PA', 'MI', 'WI', 'GA', 'AZ']
corr = np.array([
[1.0, 0.7, 0.7, 0.4, 0.3],
[0.7, 1.0, 0.8, 0.3, 0.3],
[0.7, 0.8, 1.0, 0.3, 0.3],
[0.4, 0.3, 0.3, 1.0, 0.5],
[0.3, 0.3, 0.3, 0.5, 1.0],
])
N = 500_000
gauss_outcomes = simulate_correlated_outcomes_gaussian(probs, corr, N)
t_outcomes = simulate_correlated_outcomes_t(probs, corr, nu=4, N=N)
p_sweep_gauss = gauss_outcomes.all(axis=1).mean()
p_sweep_t = t_outcomes.all(axis=1).mean()
p_sweep_indep = np.prod(probs)
print("Joint Outcome Probabilities:")
print(f"{'':>25} {'Independent':>12} {'Gaussian':>12} {'t-copula':>12}")
print(f"{'P(sweep all 5)':>25} {p_sweep_indep:>12.4f} {p_sweep_gauss:>12.4f} {p_sweep_t:>12.4f}")
print(f"\nt-copula increases sweep probability by {p_sweep_t/p_sweep_gauss:.1f}x vs Gaussian")
The t-copula with \( \nu=4 \) routinely shows 2–5x higher probability of extreme joint outcomes. If you're trading correlated prediction market contracts without modeling tail dependence, you're running a portfolio that will blow up in exactly the scenarios that matter most.
For \( d > 5 \) contracts, bivariate copulas are insufficient. Vine copulas decompose the \( d \)-dimensional dependency into \( d(d-1)/2 \) bivariate conditional copulas in a tree structure:
Build maximum spanning trees ordered by \( |\tau_{\text{Kendall}}| \), select pair-copula families via AIC, estimate sequentially. Implementations: pyvinecopulib (Python), VineCopula (R).
Everything so far assumes you know the data-generating process. But prediction markets are populated by heterogeneous agents — informed traders, noise traders, market makers, and bots.
Markets can be efficient even when every single trader is completely irrational.
Gode & Sunder (1993): zero-intelligence agents achieve near-100% allocative efficiency in a continuous double auction.
Farmer, Patelli & Zovko (2005): explained 96% of cross-sectional spread variation on the London Stock Exchange. One parameter. 96%.
import numpy as np
from collections import deque
class PredictionMarketABM:
"""
Agent-based model of a prediction market order book.
Agent types:
- Informed: know the true probability, trade toward it
- Noise: random trades
- Market maker: provides liquidity around current price
"""
def __init__(self, true_prob, n_informed=10, n_noise=50, n_mm=5):
self.true_prob = true_prob
self.price = 0.50
self.price_history = [self.price]
self.best_bid = 0.49
self.best_ask = 0.51
self.n_informed = n_informed
self.n_noise = n_noise
self.n_mm = n_mm
self.volume = 0
self.informed_pnl = 0
self.noise_pnl = 0
def step(self):
total = self.n_informed + self.n_noise + self.n_mm
r = np.random.random()
if r < self.n_informed / total:
self._informed_trade()
elif r < (self.n_informed + self.n_noise) / total:
self._noise_trade()
else:
self._mm_update()
self.price_history.append(self.price)
def _informed_trade(self):
signal = self.true_prob + np.random.normal(0, 0.02)
if signal > self.best_ask + 0.01:
size = min(0.1, abs(signal - self.price) * 2)
self.price += size * self._kyle_lambda()
self.volume += size
self.informed_pnl += (self.true_prob - self.best_ask) * size
elif signal < self.best_bid - 0.01:
size = min(0.1, abs(self.price - signal) * 2)
self.price -= size * self._kyle_lambda()
self.volume += size
self.informed_pnl += (self.best_bid - self.true_prob) * size
self.price = np.clip(self.price, 0.01, 0.99)
self._update_book()
def _noise_trade(self):
direction = np.random.choice([-1, 1])
size = np.random.exponential(0.02)
self.price += direction * size * self._kyle_lambda()
self.price = np.clip(self.price, 0.01, 0.99)
self.volume += size
self.noise_pnl -= abs(self.price - self.true_prob) * size * 0.5
self._update_book()
def _mm_update(self):
spread = max(0.02, 0.05 * (1 - self.volume / 100))
self.best_bid = self.price - spread / 2
self.best_ask = self.price + spread / 2
def _kyle_lambda(self):
sigma_v = abs(self.true_prob - self.price) + 0.05
sigma_u = 0.1 * np.sqrt(self.n_noise)
return sigma_v / (2 * sigma_u)
def _update_book(self):
spread = self.best_ask - self.best_bid
self.best_bid = self.price - spread / 2
self.best_ask = self.price + spread / 2
def run(self, n_steps=1000):
for _ in range(n_steps):
self.step()
return np.array(self.price_history)
# --- Simulation ---
np.random.seed(42)
sim = PredictionMarketABM(true_prob=0.65, n_informed=10, n_noise=50, n_mm=5)
prices = sim.run(n_steps=2000)
print("Agent-Based Prediction Market Simulation")
print(f"True probability: {sim.true_prob:.2f}")
print(f"Starting price: 0.50")
print(f"Final price: {prices[-1]:.4f}")
print(f"Price at t=500: {prices[500]:.4f}")
print(f"Price at t=1000: {prices[1000]:.4f}")
print(f"Total volume: {sim.volume:.1f}")
print(f"Informed P&L: ${sim.informed_pnl:.2f}")
print(f"Noise trader P&L: ${sim.noise_pnl:.2f}")
print(f"Convergence error: {abs(prices[-1] - sim.true_prob):.4f}")
WebSocket feed from Polymarket CLOB API (real-time prices, volumes), News/poll feeds (NLP-processed into probability signals), On-chain event data (Polygon)
Hierarchical Bayesian model (Stan/PyMC), Particle filter real-time updating, Jump-diffusion SDE path simulation, Ensemble: weighted average of model outputs
Vine copula pairwise dependencies, Factor model shared risk factors, Tail dependence estimation via t-copula
EVT-based VaR and Expected Shortfall, Reverse stress testing, Correlation stress, Liquidity risk order book depth monitoring
Brier score tracking, P&L attribution, Drawdown alerts, Model drift detection
Eyeballing a probability from a market price is like testing one happy path and shipping to prod. You checked one dimension of a multi-dimensional problem.
A coin flip has one config value: p. A real prediction market contract is like a microservice with dozens of environment variables — drift, volatility, correlation, liquidity, time decay — all changing at runtime. You wouldn't deploy a service with 90% of its config undefined. Don't trade that way either.
The whole article is about simulation as a substitute for closed-form answers. In software terms: when you can't solve the equation analytically, you brute-force it with computation. That's literally what Monte Carlo is. The art is in making the brute force efficient.
Monte Carlo simulation is exactly property-based testing / fuzz testing. You generate random inputs (price paths), run them through your system (payoff function), and aggregate the results. The CLT is the mathematical guarantee that your test suite converges — like saying "if you run enough fuzz iterations, your coverage approaches the true behavior."
The \( O(N^{-1/2}) \) convergence rate means to halve your error, you need 4x the samples. This is like database query performance — doubling your hardware doesn't double your throughput if you have an O(√N) bottleneck. Every technique in later sections is about beating this rate.
Brier score is the MSE between your predictions and reality. It's the accuracy_score of probability forecasting. Below 0.20 = passing tests. Below 0.10 = production-grade. Think of it as your simulation's SLA — the number you track in your dashboard.
Variance peaks at p=0.5 — meaning your estimates are least reliable exactly when the market is most uncertain. This is like your load balancer being least accurate at peak traffic. The system degrades precisely when you need it most.
Crude Monte Carlo for rare events is like running your integration tests and hoping they randomly trigger the one-in-a-million race condition. Importance sampling is chaos engineering: you deliberately inject the failure mode you care about, then mathematically correct for the fact that you biased the test. Netflix's Chaos Monkey doesn't wait for random failures — it forces them, just like IS forces the rare event region.
The "Radon-Nikodym derivative" is just a correction factor. You sampled from a biased distribution on purpose; the likelihood ratio un-biases the results. It's like running weighted A/B tests — you oversample the minority group, then reweight when computing metrics. Same math.
100–10,000x variance reduction means IS gives you the precision of a million random samples from just a hundred targeted ones. It's the difference between SELECT * and a query with the right index.
Imagine a load balancer with 5,000 backend servers (particles). Each server has a "health score" (weight). When a new health check comes in (observation), you:
That's literally the particle filter algorithm. The "Effective Sample Size" check is your circuit breaker — when too many particles have near-zero weight, you resample to avoid degeneracy.
Working in logit space is the probability equivalent of using logarithmic scaling for your metrics. Just as you'd never plot latency on a linear axis, you don't do random walks on raw probabilities (they'd escape [0,1]). The logit transform is the Math.log() of Bayesian updating.
The particle filter is a streaming data pipeline for probability. Data in → state update → estimate out. It's Apache Kafka meets Bayesian inference. Each observation is a message, each particle is a parallel hypothesis, and resampling is garbage collection for bad hypotheses.
The key insight: these techniques stack multiplicatively. Antithetic inside stratified with control variate correction → 100–500x speedup. This is like layering CDN + caching + database indexes. Each technique is independent, and the combined effect multiplies. In simulation, "table stakes" means you MUST use all three.
Crude Monte Carlo is O(N) brute force. Variance reduction gets you the same answer in O(N/500). You'd never ship an O(n²) algorithm when an O(n log n) exists. Same principle.
In microservices, Service A going down doesn't just affect A — it takes down B and C too. Correlation tells you they're related. Tail dependence tells you they fail together under extreme stress. A Gaussian copula says: "services are correlated but won't cascade." A t-copula says: "when things go wrong, they go wrong everywhere at once." The 2008 crisis was a real-world demo of using the wrong copula.
Sklar's theorem is the dependency injection of probability. It says you can separate "what each variable looks like alone" (marginals) from "how they relate to each other" (copula). Swap the copula and you change correlation structure without touching the individual models. It's like changing your message broker without rewriting your services.
Using a Gaussian copula for correlated risks is like running your distributed system without a circuit breaker and assuming independent failure probabilities. Everything's fine until a correlated failure hits and your "impossible" joint failure probability turns out to be 100x more likely than your model predicted.
When you have 5+ correlated contracts, vine copulas decompose the dependency graph into pairwise connections — like a service mesh. C-vine is hub-and-spoke (one central service), D-vine is a chain (A→B→C), R-vine is a general graph. You pick the topology that matches your dependency structure, just like you'd design your service mesh.
Agent-based modeling is load testing where each virtual user has a different persona. Informed traders are power users who know the optimal path. Noise traders are random clickers. Market makers are your CDN — providing liquidity/caching. The emergent behavior (market price) arises from their interaction, not from any single agent's strategy. It's Locust or k6 with heterogeneous user profiles.
The mind-blowing result: random agents produce efficient markets. This is like discovering that a randomly-configured distributed system with the right protocol (continuous double auction) self-organizes into optimal behavior. The protocol matters more than the participants. 96% of spread variance explained by market structure, not trader intelligence. Your architecture matters more than your code.
Kyle's lambda measures how much a trade moves the price — it's the rate limiter of the market. High lambda = illiquid, every trade moves the price (like a system under heavy load where each request degrades performance). Low lambda = liquid, trades are absorbed smoothly (like auto-scaling handling traffic spikes).
Strip away the quant jargon and this is a standard 5-stage data pipeline:
If you've built a real-time ML pipeline, you've built 80% of this. The math is different, the architecture is identical.
Brier score tracking IS model performance monitoring. P&L attribution IS feature importance analysis. Drawdown alerts ARE anomaly detection. This entire monitoring layer maps 1:1 to standard MLOps practices. If you know how to monitor an ML model in production, you know how to monitor a quant desk.
Quantitative simulation is property-based testing + streaming pipelines + chaos engineering + service mesh modeling + load testing with user personas, glued together with statistics instead of assertions. Every technique maps to something you already know. The math is the implementation detail.
The Farmer et al. (2005) paper on zero-intelligence traders is the most accessible — it's essentially about emergent behavior in distributed systems. Kyle (1985) is foundational for understanding price impact (rate limiting). Everything else is domain-specific math.