Inference Utilities¶

Neural Network Tensor Utilities¶

class pyrtlnet.saved_tensors.QuantizedLayer(input_scale, weight_scale, weight_zero, output_scale, output_zero, weight, bias)[source]¶

Stores a layer’s weights, biases, and quantization metadata.

This class performs some additional pre-processing on the raw quantization metadata. For example, the layer’s floating-point scale factor m is converted to a fixed-point scale factor m0 and a bitwise right-shift n with normalization_constants().

__init__(input_scale, weight_scale, weight_zero, output_scale, output_zero, weight, bias)[source]¶

Store a layer’s weights, biases, and quantization metadata.

Parameters:

input_scale (ndarray) – Scale factor for the layer’s input. The first layer’s input is special, and must be retrieved from the model separately. The input for subsequent layers comes from the preceding layer, so subsequent layer inputs use the preceding layer’s scale factor. A layer’s scale factor can be retrieved with scale.
weight_scale (ndarray) – Scale factor for the layer’s weight.
weight_zero (ndarray) – Zero point for the layer’s weight.
output_scale (ndarray) – Scale factor for the layer’s output.
output_zero (ndarray) – Zero point for the layer’s output.
weight (ndarray) – The layer’s weight.
bias (ndarray) – The layer’s bias.

bias: ndarray[source]¶: The layer’s quantized bias.

m0: ndarray[source]¶: The layer’s scale can be expressed as a fixed-point scale factor m0 and a bitwise right-shift n. See normalization_constants().

n: ndarray[source]¶: The layer’s scale can be expressed as a fixed-point scale factor m0 and a bitwise right-shift n. See normalization_constants().

scale: ndarray[source]¶: The layer output’s floating point scale factor m.

weight: ndarray[source]¶: The layer’s quantized weight.

zero: ndarray[source]¶: The layer output’s zero point z.

class pyrtlnet.saved_tensors.SavedTensors(quantized_model_name)[source]¶

Loads weights, biases, and quantization metadata saved by save_tensors().

__init__(quantized_model_name)[source]¶

input_scale: ndarray[source]¶

Floating-point scale factor for neural network’s input.

This scale factor can be converted to a fixed-point multiplier by normalization_constants().

input_zero: ndarray[source]¶: Zero point for neural network’s input.

layer: list[QuantizedLayer][source]¶: List of QuantizedLayer containing per-layer weights, biases, and quantization metadata.

pyrtlnet.saved_tensors.normalization_constants(s1, s2, s3)[source]¶

Normalize multiplier m to fixed-point m0 and bit-shift n.

See Section 2.2 in Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. The multiplier m (Equation 5) is computed from three scale factors s1, s2, s3.

This multiplier m can then be expressed as a pair of (m0, n), where m0 is a fixed-point 32-bit multiplier and n is a bitwise right-shift amount. A floating-point multiplication by m is equivalent to a fixed-point multiplication by m0, followed by a bitwise right-shift by n. This fixed-point multiplication and bitwise shift are done by normalize().

In other words, m == (2 ** -n) * m0, where m0 must be in the interval [0.5, 1).

A layer can have per-axis scale factors, so s1, s2, and s3 are vectors of scale factors. This function returns a vector of fixed-point m0 values and a vector of integer n values. See per-axis quantization for details.

Parameters:

s1 (ndarray) – Scale factors for the matrix multiplication’s left input, which is the layer’s weight matrix.
s2 (ndarray) – Scale factors for the matrix multiplication’s right input, which is the layer’s input matrix.
s3 (ndarray) – Scale factors for the matrix multiplication’s output, which is the layer’s output matrix.

Return type:

tuple[Fxp, ndarray]

Returns:

(m0, n), where m0 is a fixed-point multiplier in the interval [0.5, 1), n is a bitwise right-shift amount, and m == (2 ** -n) * m0.

Inference Script Utilities¶

pyrtlnet.inference_util.add_common_arguments(parser)[source]¶

Add common command line arguments supported by all inference scripts to an ArgumentParser.

This defines all the shared command line flags in one place, so it’s easier to keep them in sync.

Parameters:: parser (ArgumentParser) – Parser to add arguments to.

pyrtlnet.inference_util.batched_images(images, start_image, num_images, batch_size)[source]¶

Generator that yields batched image data.

Parameters:

images (ndarray) – Image data to group into batches.
start_image (int) – Index of the first image in the first batch, in images.
num_images (int) – Total number of images to yield. The yielded images may be grouped into batches, see batch_size below. The generator always stops at the end of images and does not wrap around to the beginning, so fewer than num_images images will be yielded if there aren’t enough images.
batch_size (int) – Maximum size of each batch. If num_images is not evenly divisible by batch_size, the last batch will be padded out to batch_size with null images. batch_size must be greater than zero.

Raises:

ValueError – If start_image exceeds the number of available images, or if batch_size is less than or equal to zero.

Return type:

Iterator[tuple[int, ndarray]]

Returns:

Yields batch_start_index, test_batch, where batch_start_index is the index of the first image in the yielded batch, and test_batch is a batch of image data, with shape (batch_size, 12, 12). The last batch may be padded with null images, see the description of batch_size above. images[batch_start_index] is equivalent to test_batch[0].

pyrtlnet.inference_util.load_mnist_data(tensor_path)[source]¶

Load MNIST test data and labels from mnist_test_data.npz.

mnist_test_data.npz contains a saved copy of the preprocessed MNIST image data and its corresponding labels. This .npz file is generated by tensorflow_training.py, after evaluating the trained model. Loading this .npz file is much faster than calling load_mnist_images(), and avoids a dependency on tensorflow.

Parameters:: tensor_path (str) – Path to mnist_test_data.npz.
Raises:: FileNotFoundError – If mnist_test_data.npz is not found.
Return type:: tuple[ndarray, ndarray]
Returns:: (test_images, test_labels), where test_images has shape (10000, 12, 12) and test_labels has shape (10000,).

pyrtlnet.inference_util.preprocess_image(test_batch, input_scale, input_zero)[source]¶

Preprocess the raw image data in the batch. This is required by the quantized neural network.

This adjusts the batch image data by input_scale and input_zero. Then, it flattens each 2D image into a 1D column vector and stores them in a matrix of shape (144, batch_size).

Parameters:

test_batch (ndarray) – Batch data to preprocess. This data should have already been normalized to [0.0, 1.0] and resized to (batch_size, 12, 12), usually by load_mnist_images().
input_scale (ndarray) – Scale factor for test_batch.
input_zero (ndarray) – Zero point for test_batch.

Return type:

ndarray

Returns:

Flattened batch data of shape (144, batch_size), adjusted by the quantized neural network’s input_scale and input_zero.

Command-line Interface Utilities¶

class pyrtlnet.cli_util.Accuracy[source]¶

Update and display accuracy statistics over multiple tests.

__init__()[source]¶

display()[source]¶

Display accuracy statistics over all tests.

The printed summary looks like:

9/10 correct predictions, 90.0% accuracy

update(actual, expected)[source]¶

Update accuracy statistics for a single test.

Records a correct prediction when actual == expected.

Parameters:

actual (int) – Actual outcome of the test. This is the actual output of the neural network.
expected (int) – Expected outcome of the test. This is the output we’re expecting, according to the labeled test data.

class pyrtlnet.cli_util.PrintElapsedTime(message)[source]¶

Report how long it takes to run the code in a with statement.

This context manager first prints a message, then runs the code in the with statement, then prints a "done" message followed by the elapsed time. All output is printed on one line.

When an interactive script pauses for more than a second, users will start to wonder if something is wrong. So use PrintElapsedTime to let the user know what’s going on before starting an operation that’s expected to take more than a second.

Example:

with PrintElapsedTime(message="Sleeping"):
    time.sleep(2)

Example output:

Sleeping... done (2.0 seconds)

__init__(message)[source]¶

Parameters:: message (str) – Message to print before running the code in the with statement.

pyrtlnet.cli_util.display_image(script_name, image, image_index, batch_number, batch_index, verbose)[source]¶

Print an image as ASCII art in a terminal and its metadata.

A header line is always printed, which looks like:

LiteRT Inference image_index 2 batch_number 1 batch_index 0

This header line displays the script_name (LiteRT Inference), the image_index (2), batch_number (1), and the batch_index (0).

Next, the image is displayed, when verbose is True. The image display requires a terminal that supports 24-bit color.

The image is presented as a 2D array of grayscale pixel values. The pixel values are normalized such that the largest value displays as white, and the smallest value displays as black. One line of terminal output contains up to two rows of pixels.

After the image, a footer line is displayed, when verbose is True:

shape (12, 12) dtype float32

This footer line displays image’s shape and dtype.

Parameters:

script_name (str) – Name of the script processing the image data.
image (ndarray) – Image to display in the terminal.
image_index (int) – Index of the displayed image in the full test data set.
batch_number (int) – Batch number that the displayed image belongs to. Multiple consecutive images may be grouped into batches for processing. When this grouping occurs, multiple images will share the same batch_number. The first batch processed is batch_number 0, the second batch processed is batch_number 1, and so on.
batch_index (int) – Index of the displayed image in its batch. When batching is disabled, the batch_size is 1, so every batch_index will be 0.
verbose (bool) – When False, only the header line is displayed. When True, the header line, image, and footer line are all displayed.

pyrtlnet.cli_util.display_outputs(script_name, layer0_output, layer1_output, expected, actual, verbose)[source]¶

Display the neural network’s outputs.

Prints the raw outputs of each neural network layer, followed by a bar chart that interprets the final layer’s output as each digit’s un-normalized probability.

Bars for higher probability digits are displayed before bars for lower probability digits.

The bar corresponding to the expected digit is always colored green. If the actual digit is not the same as the expected digit, the bar corresponding to the actual digit will be colored red.

Sample output with colors omitted:

LiteRT Inference layer0 output shape (18,) dtype int8:
[-123 -114 -123  -76 -123  -23  -94 -123  -65  -68 -123   -1  -63 -112 -123 ...]

LiteRT Inference layer1 output shape (10,) dtype int8:
[ 33 -48  29  58 -50  31 -87  93   9  49]

LiteRT Inference layer1 output as bar chart:
7▕          ▄▄▄▄▄▄▄▄▄▄ 93 (expected, actual)
3▕          ▄▄▄▄▄▄ 58
9▕          ▄▄▄▄▄ 49
0▕          ▄▄▄▄ 33
5▕          ▄▄▄▄ 31
2▕          ▄▄▄ 29
8▕          ▄ 9
1▕     ▄▄▄▄▄ -48
4▕     ▄▄▄▄▄ -50
6▕ ▄▄▄▄▄▄▄▄▄ -87

In the sample output above, the digit corresponding to each bar is displayed on the left, so the digit 7 has the highest probability, followed by the digit 3. The model predicted the digit is a 7, and the digit actually was a 7 according to the labeled test data, so the first bar is annotated with (expected, actual).

Parameters:

script_name (str) – Name of the script processing the image data.
layer0_output (ndarray) – Output of the neural network’s first layer.
layer1_output (ndarray) – Output of the neural network’s second layer.
expected (int) – Expected prediction from labeled training data.
actual (int) – Actual prediction from the neural network.
verbose (bool) – When False, just print a summary of the expected and actual predictions. When True, print each layer’s output and an annotated bar chart.