Benchmarking Deterministic Computation on Probability Distributions Against Monte Carlo Methods

General

General

Monte Carlo methods are the most common methods for performing uncertainty quantification and are therefore the standard benchmark to compare Signaloid's UxHw technology to. Unlike Monte Carlo, which involves repeatedly sampling from some set of distributions followed by performing some computation on those samples, Signaloid's UxHw technology involves just a single-shot execution of models while performing arithmetic operations on representations of probability distributions rather than samples. Signaloid's UxHw technology is inherently deterministic, resulting in the same answer every time. Monte Carlo methods on the other hand are inherently stochastic, with even large million-iteration Monte Carlo evaluations exhibiting variations in their output distributions when repeated. As a result, comparisons between Signaloid's technology and Monte Carlo evaluations must choose an appropriate number of Monte Carlo iterations to match both the quality and low variance of the Signaloid UxHw-based solution. Once a competitive evaluation has calculated the equivalent number of Monte Carlo samples required to achieve the same accuracy as a Signaloid UxHw solution, the timing information of the respective processes allows for a direct comparison between the two methods, at equivalent fidelity. This technology explainer details how we benchmark an application that solves a problem that is today commonly solved using Monte Carlo methods (e.g., path-dependent option pricing using a correlated Brownian motion model), comparing the accuracy and performance of Monte Carlo implementations against implementations running on Signaloid's UxHw technology. The technology explainer details the methodology we have developed, tools we have implemented, and insights we have gained in our effort to allow our customers to obtain a realistic evaluation of the performance and accuracy advantages of Signaloid's UxHw technology. In all the comparisons detailed below, the hardware platforms used to execute the Monte Carlo and Signaloid UxHw implementations have equivalent hardware resources (equivalent number of transistors, equivalent clock speeds, equivalent memory bandwidth, etc.) and any speedups reported for Signaloid's UxHw technology results from fundamental algorithmic and architectural advantages of Signaloid's UxHw technology, independent of any silicon implementation or brute hardware advantage.

Figure 1: Sketch outlining the benchmarking process. For each Signaloid UxHw-based application implementation evaluated, the benchmarking process performs 100 different multi-iteration Monte Carlo evaluations, each with a sufficiently large number of iterations (e.g., 100k iterations) so that the variance across the 100 Monte Carlo evaluations is small. The benchmarking process also ensures that the output distributions from the Signaloid UxHw-based execution match the output distribution of the Monte Carlo evaluations. Timing measurements then allow calculation of the effective speedup of the Signaloid UxHw-based implementation.

Why It Matters

Accurate benchmarking allows our customers (and our engineering teams) to get a realistic picture of the benefits of Signaloid's UxHw technology. Accurate benchmarking against Monte Carlo is an important input to organizations who want to maximize the performance and accuracy of computations that they previously had no other choice but to solve using Monte Carlo methods.

The Technical Details

How We Benchmark

To benchmark an application that solves a problem that is today commonly solved using Monte Carlo methods, we break the problem down into two main components: accuracy of the application output and timing performance. For a specific computation performed on a platform implementing Signaloid's UxHw technology, we calculate the number of Monte Carlo samples required to achieve the same fidelity (with respect to some ground truth, detailed below) as Signaloid's output. We refer to this as the equivalent Monte Carlo iteration count.

We perform this analysis for a range of Signaloid core types (i.e., different UxHw representation sizes) to provide an overview of how different variants of the UxHw technology perform, in each case measuring the execution time when running on Signaloid's Cloud Computing Engine (SCCE), which implements the Signaloid UxHw technology, and averaging the runtime across multiple executions. Because the Signaloid UxHw execution is deterministic, this repetition is primarily to account for variations in the OS-level timing measurements.

We run the Monte Carlo variants of the applications on Amazon Web Services (AWS) high-performance r7iz EC2 instances and measure their runtimes averaged across 100 repetitions of each (equivalent Monte Carlo iteration count) Monte Carlo, due to the inherent variability of the output distributions of sampling-based Monte Carlo.

We then calculate the relative Signaloid speedup as the ratio of the execution times. For example, if we determine that an application running on SCCE is on average as accurate as 500,000 Monte Carlo samples, then if the average execution time of 100 of such 500,000-iteration Monte Carlo simulation is 10 times as long as the SCCE execution time, we report a speedup of 10-fold. Due to the inherent variability of the output distribution of sampling-based Monte Carlo, we run (typically) 100 repetitions of the equivalent Monte Carlo iteration count Monte Carlo evaluations, which we refer to as adversaries. For each adversary, we calculate an accuracy (using a metric distance to a ground truth distribution, detailed further below), leading to a distribution of accuracy values. For each UxHw technology configuration we evaluate, we report the equivalent Monte Carlo count as the number of Monte Carlo samples such that the mean of the distribution of accuracies for that size is less than or equal to the accuracy of the Signaloid output — i.e., for which, on average, Signaloid outperforms Monte Carlo in terms of the distance to the ground truth. The following details how we define the accuracy and ground truth.

How Do We Define Accuracy?

Defining accuracy requires deciding what objects we are comparing. Most commonly, these objects are distributions as the accuracy of the entire distributional output is usually of the most interest. When comparing distributions, we quantify accuracy or closeness via a distance metric that measures the so-called weak convergence of probability measures. When it comes to the particular choice of metric, we typically opt for the family of Wasserstein distances as they have a range of advantages over other metrics. This includes the fact that they handle distances over the underlying space well and that they have a rich duality theory, in particular for finite mean distributions where there is a strong connection with Stein's method. Specifically the Wasserstein-1 distance incorporates a lot of the geometry of the underlying space, in particular the map x↦δx is an isometry and is the metric utilized in our typical benchmarking procedure. Using this distance metric we can measure accuracy with respect to some reference distribution. We refer to that reference distribution as the ground truth. The ground truth is normally the result of a large (typically more than 1–3 million sample) Monte Carlo evaluation. In cases where the distribution has an analytic solution, we can construct a ground truth distribution using weighted samples from this analytic density function. Utilizing analytic formulae leads to a far higher fidelity of ground truth for a given number of samples.

Verifying the Robustness of our Benchmarking Process

It is essential to verify that our benchmarking protocol leads to correct results. Despite the straightforward process outlined so far, there are subtleties and potential pitfalls when implementing this procedure. Handily, for certain distance metrics, there exist central limit theorems that allow us to compute the asymptotic distribution of distances for a given number of Monte Carlo samples. We may often express this asymptotic distribution as an integral of a Brownian bridge and we can use such an expression to directly compare to the distribution of distances that we measure [1]. If these distributions differ too much, this indicates that the analysis is not of high-enough fidelity. In addition, there also exist analytic expressions for the mean distance of this asymptotic distribution for multiple metrics.

The most common cause of low-fidelity benchmarking results is an inaccurate ground truth. The required fidelity of the ground truth depends on what equivalent Monte Carlo counts we are considering. For example, a 1 million Monte Carlo ground truth may be suitable for benchmarking a Signaloid UxHw implementation with a representation size of 128 Dirac deltas, but that same ground truth may be insufficient for one with a representation size of 256 Dirac deltas. If the ground truth is from a large Monte Carlo evaluation, the slow inverse-square-root convergence of Monte Carlo means that the benchmarking will require orders of magnitude more samples to achieve a noticeable increase in fidelity. For this reason, utilizing analytic formulae for creating ground truths is often a better option, as the benchmarking process can achieve the same fidelity in far fewer samples. Once we have generated a ground truth, we can determine if the ground truth size is suitable for the maximum adversary size via inspection of the log-log plot of Wasserstein-1 distance vs sample count. Deviation of from the inverse root scaling law predicted from the central limit theorem indicates an inaccurate ground truth.

Figure 2: Plot of Wasserstein-1 distance between a high fidelity ground truth and Monte Carlo and TTR approximations of the unit Gaussian distribution as a function of representation size, N. We average the Monte Carlo distances across 100 Monte Carlo simulations and we detail the scaling of both representations on the right of the plot. Dashed vertical lines indicate how one could estimate the equivalent number of Monte Carlo iterations for a particular TTR representation size.

Relevant Code Example
import numpy as np
from scipy.stats import wasserstein_distance

ground_truth_size = 1000_000
adversary_size_min  = 1
adversary_size_max = 100_000
adversary_size_step = 1000
num_adversaries = 100

def unit_gaussian(x):
    return np.exp(-x**2/2)/np.sqrt(2*np.pi)

support = np.linspace(-8, 8, ground_truth_size)
ground_truth_weights = unit_gaussian(support) * (support[1] - support[0])

signaloid_wasserstein_distance = 0.01

adversary_sizes = np.arange(start=adversary_size_min, stop=adversary_size_max, step=adversary_size_step)
for mc_size in adversary_sizes:
    distances = []
    for _ in range(num_adversaries):
        mc_samples = np.random.normal(loc=0, scale=1, size=mc_size)
        distance = wasserstein_distance(u_values=mc_samples, v_values=support, v_weights=ground_truth_weights)
        distances.append(distance)

    average_distance = np.mean(distances)

    if average_distance < signaloid_wasserstein_distance:
        print(f"Equivalent Monte Carlo count is {mc_size}.")
        break

Figure 3: Plot of the distribution of the Wasserstein-1 distances between 10,000 adversary Monte Carlo simulations, each comprised of 100,000 samples, and the ground truth distribution for a uniform distribution over [0,1] (hatched region). The plot overlays the exact asymptotic distribution described in [2] (solid black line).

Figure 4: Plot of Brownian bridge approximation of Wasserstein-1 distances between an asymptotically large number of Monte Carlo simulations each comprised of 100,000 samples, and the ground truth distribution for a uniform distribution over [0,1] (hatched region). The plot overlays the exact asymptotic distribution described in [2] (solid black line).

Figure 5: Mathematical descriptions relating to the Wasserstein-1 distance of empirical processes. The first equation defines the Wasserstein-1 distance between a distribution and a Monte Carlo simulation of said distribution with n samples. The second (middle) equation shows how the Wasserstein-1 distance multiplied by the square root of the sample size converges in distribution to the integral of the absolute value of the Brownian bridge with quantile measure. The last (bottom) equation gives the formula for the expected value of the scaled Wasserstein-1 distribution as a function of the true cumulative density function, F [1].

The Takeaway

Comparing different methods for benchmarking an application that solves a problem that is today commonly solved using Monte Carlo methods is a non-trivial process and contains subtleties. We invested significant time and resources to develop a methodology and software tools, and test harnesses that allow us to be confident in our assessment of our accuracy and performance of Signaloid's deterministic UxHw technology relative to sampling-based Monte Carlo. This benchmarking infrastructure and know-how allows us to reliably and fairly quantify the advantages of our technology over a sampling-based Monte Carlo implementations and provides one important indicator of the benefit of the Signaloid UxHw technology.

Schedule a Demo Call
Request Whitepaper
Schedule a Demo Call
Request Whitepaper
Schedule a Demo Call
Request Whitepaper