PyRTL Inference¶
This is a a hardware re-implementation of NumPy Inference in PyRTL. It produces nearly the same output tensors as LiteRT Inference and exactly the same output tensors as NumPy Inference.
Implement quantized inference with PyRTL.
This does not invoke the LiteRT Inference or NumPy Inference implementations.
This effectively reimplements NumPy Inference in hardware, using the
PyRTL Matrix Library, running in a PyRTL Simulation.
The pyrtl_inference demo uses PyRTLInference to implement quantized
inference with PyRTL.
- class pyrtlnet.pyrtl_inference.PyRTLInference(tensor_path, input_bitwidth, accumulator_bitwidth, axi, initial_delay_cycles=0, batch_size=1)[source]¶
Convert a quantized model to hardware, and simulate the hardware with a PyRTL
Simulation.- __init__(tensor_path, input_bitwidth, accumulator_bitwidth, axi, initial_delay_cycles=0, batch_size=1)[source]¶
Convert the quantized model to PyRTL inference hardware.
This builds the necessary hardware for each layer’s matrix operations. The diagram below shows
layer0’s data flow:layer0 weight, shape: (18, 144), input_bitwidth │ │ layer0 input image, shape: (144, batch_size), input_bitwidth │ │ ▼ ▼ ┌─────────────────────────────────────────────────────────┐ │ layer0 systolic_array (matrix multiplication by weight) │ └─────────────────────────────────────────────────────────┘ │ │ output, shape: (18, batch_size), accumulator_bitwidth │ │ layer0 bias, shape: (18, batch_size), accumulator_bitwidth │ │ ▼ ▼ ┌───────────────────────────────────┐ │ layer0 elementwise_add (add bias) │ └───────────────────────────────────┘ │ │ output, shape: (18, batch_size), accumulator_bitwidth ▼ ┌─────────────────────────┐ │ layer0 elementwise_relu │ └─────────────────────────┘ │ │ output, shape: (18, batch_size), accumulator_bitwidth ▼ ┌────────────────────────────────────────────────┐ │ layer0 elementwise_normalize (reduce bitwidth) │ └────────────────────────────────────────────────┘ │ │ ▼ layer0 output, shape: (18, batch_size), input_bitwidthAnd this diagram shows the
layer1’s data flow, where thelayer0 outputfromlayer0is the second input tolayer1’ssystolic_array:layer1 weight, shape: (10, 18), input_bitwidth │ │ layer0 output, shape: (18, batch_size), input_bitwidth │ │ ▼ ▼ ┌─────────────────────────────────────────────────────────┐ │ layer1 systolic_array (matrix multiplication by weight) │ └─────────────────────────────────────────────────────────┘ │ │ output, shape: (10, batch_size), accumulator_bitwidth │ │ layer1 bias, shape: (10, batch_size), accumulator_bitwidth │ │ ▼ ▼ ┌───────────────────────────────────┐ │ layer1 elementwise_add (add bias) │ └───────────────────────────────────┘ │ │ output, shape: (10, batch_size), accumulator_bitwidth ▼ ┌────────────────────────────────────────────────┐ │ layer1 elementwise_normalize (reduce bitwidth) │ └────────────────────────────────────────────────┘ │ │ ▼ layer1 output, shape: (10, batch_size), input_bitwidthmake_systolic_array()performs the matrix multiplication of each layer’s weight and input.make_elementwise_add()performs the elementwise addition of each layer’s bias.make_elementwise_relu()performs ReLU (only forlayer0).make_elementwise_normalize()performs normalization to convert from intermediate values with bitwidthaccumulator_bitwidthto each layer’s output values with bitwidthinput_bitwidth.
- Parameters:
tensor_path (
str) – Path to the.npzfile created byquantize_model().input_bitwidth (
int) – Bitwidth of each element in the input matrix. This should generally be8.accumulator_bitwidth (
int) – Bitwidth of accumulator registers in the systolic array. This should generally be32, and larger thaninput_bitwidth.axi (
bool) – IfTrue, receive input batch data via an AXI-Stream, and return the output’sargmaxvia AXI-Lite, at addresses 0 - batch_size. IfFalse, the input batch data will be loaded inself.flat_batch_memblockviaSimulation’smemory_value_map, and the output’sargmaxwill be inspected as awire_matrix.initial_delay_cycles (
int, default:0) – Number of cycles to wait before starting operation. This is a temporary hack that’s currently required for correct synthesis with Vivado. No delay cycles should be required.
- simulate(test_batch, verilog=False)[source]¶
Simulate quantized inference on an image batch.
All calculations are done in PyRTL
Simulation, using the hardware generated by__init__().- Parameters:
test_batch (
ndarray) – A batch to run through the PyRTL inference implementation.verilog (
bool, default:False) – IfTrue, export the inference implementation to Verilog. Two Verilog files will be generated, one for thepyrtlnetmodule itself, and another for its testbench. Thepyrtlnetmodule will be namedpyrtl_inference.v, orpyrtl_inference_axi.vwhen constructed withaxi=True. The testbench will be namedpyrtl_inference_test.v, orpyrtl_inference_axi_test.vwhen constructed withaxi=True.
- Return type:
- Returns:
(layer0_output, layer1_output, argmax), wherelayer0_outputis the first layer’s raw tensor output, with shape(batch_size, 18).layer1_outputis the second layer’s raw tensor output, with shape(batch_size, 10).argmaxis a list of predicted digits for each image in the batch, with lengthbatch_size.