Generating Samples from the TTR family of Digital Distribution Representations

General

General

Signaloid's cloud and hardware platforms support several different representations, each comprising a fixed set of bits, for representing arbitrary empirical or analytic probability distributions. When applications require individual sample values from these distributions, the underlying technology generates these samples using methods that depend on the representation in question. This technology explainer details one method for generating samples from the UxHw telescoping torques representation (TTR).

Figure 1: TTR representation of a Gaussian distribution (green, hatched) and a depiction of eight samples (black dashed lines) generated according to this distribution.

Why It Matters

Signaloid's compute platforms perform arithmetic directly on representations of probability distributions, without processing individual samples. There are however algorithms (e.g., approximating integrals or estimating expectations) that require explicit sampling from these distributions.

The Technical Details

The UxHw TTR representation uses a mixture of Dirac deltas, each of some specified mass, placed across the support (i.e., domain) of the probability distribution being represented, and can in this manner (approximately) represent any valid probability distribution. The core idea behind sampling from a TTR-represented distribution is to generate values from the support of the distribution such that values with higher probability in the distribution are generated in proportion to their probability mass, appropriately interpolating the impact of the limited number of Dirac deltas that make up the distribution representation.

The image below shows the above idea pictorially. The image shows a Gaussian distribution (green) over which is overlaid eight vertical lines corresponding to eight samples generated according to the distribution. There are more samples (i.e., the lines are more densely packed) in the region of higher probability. The method for sampling from a given distribution achieves this spacing of samples over the support by first transforming the probability distribution representation into a cumulative density function (CDF), followed by uniform sampling in the CDF's range.

Figure 2: Binned representation of a probability density function or distribution.

Binning Followed by Inverse CDF

Starting from the TTR representation (a set of Dirac deltas each with a position and mass), the sampling algorithm first generates an optimal division of the support into variable-sized contiguous sections enclosing the Dirac deltas (variable-width binning). Next, the sampling process uses this binning to compute the CDF, given by:

The first equality above is the definition of a cumulative density for continuous random variables (which also extends to TTR distributions) and the second equality captures the fact that a binned representation is actually a step function. The resulting CDF is a piecewise linear continuous function (Figure 3).

Figure 3: Cumulative Distribution Function of the binned PDD representation in Figure 2.

With the CDF in hand, the sampling algorithm generates a random variate uniformly-distributed between 0 and 1 (points on the vertical axis in Figure 4) and then computes the inverse CDF to obtain a sample of the original distribution intended to be sampled from. Conceptually, you can think of this as reflecting the points on the vertical axis in Figure 4 through the CDF curve to obtain a point on the horizontal axis of Figure 4; in an engineering implementation, there are a variety of methods to achieve this, ranging from analytic methods (when the distribution in question is a parametric or analytically-defined distribution) to numerical root finding algorithms. For sampling from TTR however, it is possible to make an important simplification: the sampling algorithm only needs to find the horizontal bin index and can subsequently compute the required inverse CDF value by a simple computation of line intersections.

Code Example

The UxHw API, accessible to programs running on the cloud-based and hardware module implementations of the Signaloid compute engine, provides two functions for generating samples from the distribution information that the compute engine maintains for all floating-point values in programs:

  • float UxHwFloatSample(float value)/double UxHwDoubleSample(double value). These functions take as argument a float/double value value and return a single sample of the distribution information maintained within the Signaloid compute engine associated with the value.

  • void UxHwFloatSampleBatch(float value, float * destSampleArray, size_t numberOfRandomSamples)/void UxHwDoubleSampleBatch(double value, double * destSampleArray, size_t numberOfRandomSamples). These functions take as argument a float/double value value and desired number of samples N. The functions place the generated samples in destSampleArray.

Figure 4: Uniformly distributed points (vertical axis) and the corresponding samples by inverse CDF search (horizontal axis).

Relevant Code Example

In this first example [1], we draw a single sample from a uniform distribution. Finally, we print the output:

#include <stdio.h>
#include <uxhw.h>

int
main(void)
{
    double  a = UxHwDoubleUniformDist(10.0, 20.0);
    double  sample = UxHwDoubleSample(a);

    printf("Sample of %lf is %lf\n", a, sample);

    return 0;
}
Relevant Code Example

In this second example [2], we draw a batch of 10 samples from a uniform distribution. Finally, we print the output:

#include <stddef.h>
#include <stdio.h>
#include <uxhw.h>

enum
{
    N = 10,
};

int
main(void)
{
    size_t  i;
    double  a;
    double  samples[N] = {0};

    a = UxHwDoubleUniformDist(10.0, 20.0);

    UxHwDoubleSampleBatch(a, samples, N);

    for (i = 0; i < 10; i++)
    {
        printf("Sample %d of %lf is %lf\n", i, a, samples[i]);
    }

    return 0;
}
The Takeaway

Signaloid's cloud and hardware platforms support several different representations, each comprising a fixed set of bits, for representing arbitrary empirical or analytic probability distributions. When applications (e.g., approximating integrals or estimating expectations) require individual sample values from distributions, the UxHw technology can generate samples as needed.

Schedule a Demo Call
Request Whitepaper
Schedule a Demo Call
Request Whitepaper
Schedule a Demo Call
Request Whitepaper