LiteRT Inference

This is a reference implementation of quantized inference, based on the LiteRT Interpreter.


Run quantized LiteRT inference on test images from the MNIST dataset.

This implementation uses the reference LiteRT Interpreter inference implementation. It returns each layer’s tensor output, which is useful for verifying the correctness of NumPy Inference and PyRTL Inference.

The litert_inference demo uses load_tflite_model() and run_tflite_model() to implement quantized inference with LiteRT.

pyrtlnet.litert_inference.load_tflite_model(tensor_path)[source]

Load the quantized model and return an initialized LiteRT Interpreter.

The quantized model should be produced by quantize_model().

Parameters:

quantized_model_name – Name of the .tflite file created by quantize_model().

Return type:

Interpreter

Returns:

An initialized LiteRT Interpreter.

pyrtlnet.litert_inference.run_tflite_model(interpreter, test_batch)[source]

Run quantized inference on an image batch with a TFLite Interpreter.

Parameters:
  • interpreter (Interpreter) – An initialized TFLite Interpreter, produced by load_tflite_model().

  • test_batch (ndarray) – An image batch of shape (batch_size, 12, 12) to run through the Interpreter.

Return type:

tuple[ndarray, ndarray, int]

Returns:

(layer0_output, layer1_output, actuals), where layer0_output is the first layer’s raw tensor output, with shape (batch_size, 18). layer1_output is the second layer’s raw tensor output, with shape (batch_size, 10). actuals is the actual list of predicted digits, with shape (batch_size,).