

# Mirko Mariotti <sup>1,2</sup> Giulio Bianchini <sup>1</sup> Loriano Storchi <sup>3,2</sup> Giacomo Surace <sup>2</sup> Daniele Spiga <sup>2</sup>

<sup>1</sup>Dipartimento di Fisica e Geologia, Universitá degli Studi di Perugia

<sup>2</sup>INFN sezione di Perugia

<sup>3</sup>Dipartimento di Farmacia, Universitá degli Studi G. D'Annunzio

ICTP-IAEA School on FPGA-based SoC 22





#### Requirements:

Linux Workstation

Vivado





## Current challenges in computing

### Von Neumann Bottleneck:

New computational problems show that current architectural models has to be improved or changed to address future payloads.

Energy Efficient computation:

Not wasting "resources" (silicon, time, energy, instructions). Using the right resource for the specific case

Edge/Fog/Cloud Computing: Making the computation where it make sense Avoiding the transfer of unnecessary data Creating consistent interfaces for distributed systems

ICTP-IAEA School on FPGA-based SoC 22

Current challenges in computing

Von Neumann Bottleneck:

New computational problems show that current architectural models has to be improved or changed to address future payloads.

Energy Efficient computation: Not wasting "resources" (silicon, time, energy, instructions). Using the right resource for the specific case

Edge/Fog/Cloud Computing: Making the computation where it make sense Avoiding the transfer of unnecessary data Creating consistent interfaces for distributed systems

ICTP-IAEA School on FPGA-based SoC 22

Current challenges in computing

Von Neumann Bottleneck:

New computational problems show that current architectural models has to be improved or changed to address future payloads.

Energy Efficient computation: Not wasting "resources" (silicon, time, energy, instructions). Using the right resource for the specific case

Edge/Fog/Cloud Computing: Making the computation where it make sense Avoiding the transfer of unnecessary data Creating consistent interfaces for distributed systems

ICTP-IAEA School on FPGA-based SoC 22

A field programmable gate array (FPGA) is an integrated circuit whose logic is re-programmable.

- Parallel computing Highly specialized
- Energy efficient





- Array of programmable logic blocksLogic blocks configurable
  - to perform complex functions
- The configuration is specified with the hardware description language



The use of FPGA in computing is growing due several reasons:

can potentially deliver great performance via massive parallelism

can address payloads which are not performing well on uniprocessors (Neural Networks, Deep Learning)

can handle efficiently non-standard data types

ICTP-IAEA School on FPGA-based SoC 22



The use of FPGA in computing is growing due several reasons:

can potentially deliver great performance via massive parallelism

can address payloads which are not performing well on uniprocessors (Neural Networks, Deep Learning)

can handle efficiently non-standard data types

ICTP-IAEA School on FPGA-based SoC 22



The use of FPGA in computing is growing due several reasons:

can potentially deliver great performance via massive parallelism

can address payloads which are not performing well on uniprocessors (Neural Networks, Deep Learning)

can handle efficiently non-standard data types

ICTP-IAEA School on FPGA-based SoC 22

# Integration of neural networks on FPGA

FPGAs are playing an increasingly important role in the industry sampling and data processing.





**Deep Learning** 

In the industrial field

- Intelligent vision;
- Financial services;
- Scientific simulations;
- Life science and medical data analysis;

In the scientific field

- Real time deep learning in particle physics;
- Hardware trigger of LHC experiments;

intel

渔╹□ΓΒ≈λΔ\

And many others ...



On the other hand the adoption on FPGA poses several challenges:

Porting of legacy code is usually hard.

Interoperability with standard applications is problematic.

ICTP-IAEA School on FPGA-based SoC 22



On the other hand the adoption on FPGA poses several challenges:

Porting of legacy code is usually hard.

Interoperability with standard applications is problematic.

# Firmware generation

Many projects have the goal of abstracting the firmware generation and use process.



ICTP-IAEA School on FPGA-based SoC 22

Today's computer architecture are:

Multi-core, Two or more independent actual processing units execute multiple instructions at the same time.

- The power is given by the number of cores.
- Parallelism has to be addressed.

## Heterogeneous, different types of processing units.

- Cell, GPU, Parallela, TPU.
- The power is given by the specialization.
- The units data transfer has to be addressed
- The payloads scheduling has to be addressed

ICTP-IAEA School on FPGA-based SoC 22

Today's computer architecture are:

Multi-core, Two or more independent actual processing units execute multiple instructions at the same time.

The power is given by the number of cores.
Parallelism has to be addressed.

Heterogeneous, different types of processing units.

Cell, GPU, Parallela, TPU.

The power is given by the specialization.

The units data transfer has to be addressed

The payloads scheduling has to be addressed

ICTP-IAEA School on FPGA-based SoC 22

Today's computer architecture are:

Multi-core, Two or more independent actual processing units execute multiple instructions at the same time.

- The power is given by the number of cores.
- Parallelism has to be addressed.

Heterogeneous, different types of processing units

Cell, GPU, Parallela, TPU.

The power is given by the specialization.

- The units data transfer has to be addressed
- The payloads scheduling has to be addressed

ICTP-IAEA School on FPGA-based SoC 22

Today's computer architecture are:

Multi-core, Two or more independent actual processing units execute multiple instructions at the same time.

- The power is given by the number of cores.
- Parallelism has to be addressed.

Heterogeneous, different types of processing units.

Cell, GPU, Parallela, TPU.

The power is given by the specialization.

The units data transfer has to be addressed.

The payloads scheduling has to be addressed

ICTP-IAEA School on FPGA-based SoC 22

Today's computer architecture are:

Multi-core, Two or more independent actual processing units execute multiple instructions at the same time.

- The power is given by the number of cores.
- Parallelism has to be addressed.

Heterogeneous, different types of processing units.

- Cell, GPU, Parallela, TPU.
- The power is given by the specialization.
- The units data transfer has to be addressed.
- The payloads scheduling has to be addressed

ICTP-IAEA School on FPGA-based SoC 22

Today's computer architecture are:

Multi-core, Two or more independent actual processing units execute multiple instructions at the same time.

- The power is given by the number of cores.
- > Parallelism has to be addressed.

Heterogeneous, different types of processing units.

▶ Cell, GPU, Parallela, TPU.

The power is given by the specialization.

- The units data transfer has to be addressed.
- The payloads scheduling has to be addressed.

ICTP-IAEA School on FPGA-based SoC 22

Today's computer architecture are:

Multi-core, Two or more independent actual processing units execute multiple instructions at the same time.

- The power is given by the number of cores.
- Parallelism has to be addressed.

Heterogeneous, different types of processing units.

- Cell, GPU, Parallela, TPU.
- > The power is given by the specialization.
- The units data transfer has to be addressed.
- The payloads scheduling has to be addressed.

ICTP-IAEA School on FPGA-based SoC 22

Today's computer architecture are:

Multi-core, Two or more independent actual processing units execute multiple instructions at the same time.

- The power is given by the number of cores.
- Parallelism has to be addressed.

Heterogeneous, different types of processing units.

- Cell, GPU, Parallela, TPU.
- The power is given by the specialization.
- The units data transfer has to be addressed.
- The payloads scheduling has to be addressed

Today's computer architecture are:

Multi-core, Two or more independent actual processing units execute multiple instructions at the same time.

- The power is given by the number of cores.
- Parallelism has to be addressed.

Heterogeneous, different types of processing units.

- Cell, GPU, Parallela, TPU.
- The power is given by the specialization.
- The units data transfer has to be addressed.
- The payloads scheduling has to be addressed



Today's computer architecture are:

Multi-core, Two or more independent actual processing units execute multiple instructions at the same time.

- The power is given by the number of cores.
- Parallelism has to be addressed.

Heterogeneous, different types of processing units.

- Cell, GPU, Parallela, TPU.
- The power is given by the specialization.
- The units data transfer has to be addressed.
- The payloads scheduling has to be addressed.











# Layer, Abstractions and Interfaces

A Computing system is a matter of abstraction and interfaces. A lower layer exposes its functionalities (via interfaces) to the above layer hiding (abstraction) its inner details.

The quality of a computing system is determined by how abstractions are simple and how interfaces are clean.



ICTP-IAEA School on FPGA-based SoC 22



ICTP-IAEA School on FPGA-based SoC 22



ICTP-IAEA School on EPGA-based SoC 22



Е



Register Machine

Opcodes

ICTP-IAEA School on FPGA-based SoC 22 The BondMachine Project




Е





Е





Е







ICTP-IAEA School on EPGA-based SoC 22



ICTP-IAEA School on EPGA-based SoC 22

| Layers, /  | Abstractions and Interfaces               |
|------------|-------------------------------------------|
| The second | idea                                      |
|            |                                           |
|            | Rethinking the stack                      |
|            | Build a computing system with a decrease  |
|            | number of layers resulting in a minor gan |

between HW and SW but keeping an user friendly way of programming it.

## The BondMachine project The BondMachine project Architectures handling Architectures molding Bondgo Basm ΔΡΙ

The BondMachine is a software ecosystem for the dynamic generation of computer architectures that:

Are composed by many, possibly hundreds, computing cores.

- Have very small cores and not necessarily of the same type (different ISA and ABI).
- Have a not fixed way of interconnecting cores.
- May have some elements shared among cores (for example channels and shared memories).

The BondMachine is a software ecosystem for the dynamic generation of computer architectures that:

Are composed by many, possibly hundreds, computing cores.

- Have very small cores and not necessarily of the same type (different ISA and ABI).
- Have a not fixed way of interconnecting cores.
- May have some elements shared among cores (for example channels and shared memories).

The BondMachine is a software ecosystem for the dynamic generation of computer architectures that:

Are composed by many, possibly hundreds, computing cores.

Have very small cores and not necessarily of the same type (different ISA and ABI).

Have a not fixed way of interconnecting cores.

May have some elements shared among cores (for example channels and shared memories).

The BondMachine is a software ecosystem for the dynamic generation of computer architectures that:

Are composed by many, possibly hundreds, computing cores.

Have very small cores and not necessarily of the same type (different ISA and ABI).Have a not fixed way of interconnecting cores.

May have some elements shared among cores (for example channels and shared memories).

The BondMachine is a software ecosystem for the dynamic generation of computer architectures that:

Are composed by many, possibly hundreds, computing cores.

- Have very small cores and not necessarily of the same type (different ISA and ABI).
  Have a not fixed way of interconnecting cores.
- May have some elements shared among cores (for example channels and shared memories).



The computational unit of the BM

The atomic computational unit of a BM is the "connecting processor" (CP) and has:

Some general purpose registers of size RSize. Some I/O dedicated registers of size Rsize. A set of implemented opcodes chosen among many availabl Dedicated ROM and RAM.

Three possible operating modes.

The computational unit of the BM

The atomic computational unit of a BM is the "connecting processor" (CP) and has:

Some general purpose registers of size Rsize. Some I/O dedicated registers of size Rsize. A set of implemented opcodes chosen among many available. Dedicated ROM and RAM.

Three possible operating modes.

#### General purpose registers

 $2^R$  registers: r0,r1,r2,r3 ... r $2^R$ 

ICTP-IAEA School on FPGA-based SoC 22

The computational unit of the BM

The atomic computational unit of a BM is the "connecting processor" (CP) and has:

Some general purpose registers of size Rsize

## Some I/O dedicated registers of size Rsize.

A set of implemented opcodes chosen among many available.

Dedicated ROM and RAM.

Three possible operating modes.

## I/O specialized registers

N input registers: i0,i1 ... iN M output registers: o0,o1 ... oM

ICTP-IAEA School on FPGA-based SoC 22

The computational unit of the BM

The atomic computational unit of a BM is the "connecting processor" (CP) and has:

Some general purpose registers of size Rsize.

A set of implemented opcodes chosen among many available.

Dedicated ROM and RAM

Three possible operating modes.

#### Full set of possible opcodes

adc,add,addf,addi,and,chc,chw,cil,cilc,cir,cirn,clc,clr,cpy,cset,dec,div,divf,dpc,expf,hit hlt,i2r,i2rw,incc,inc,j,jc,je,jgt0f,jlt,jlte,jr,jz,lfsr82,lfsr162r,m2r,mod,mulc,mult,multf nand,nop,nor,not,or,r2m,r2o,r2owa,r2owaa,r2s,r2v,r2vri,ro2r,ro2rri,rsc,rset,sic,s2r,saj,sbc sub,wrd,wwr,xnor,xor

ICTP-IAEA School on FPGA-based SoC 22

The computational unit of the BM

The atomic computational unit of a BM is the "connecting processor" (CP) and has:

Some general purpose registers of size Rsize. Some I/O dedicated registers of size Rsize. A set of implemented opcodes chosen among many availab

Dedicated ROM and RAM.

Three possible operating modes.

### RAM and ROM

2<sup>L</sup> RAM memory cells.

2<sup>0</sup> ROM memory cells.

The computational unit of the BM

The atomic computational unit of a BM is the "connecting processor" (CP) and has:

Some general purpose registers of size RSize. Some I/O dedicated registers of size Rsize. A set of implemented opcodes chosen among many available. Dedicated ROM and RAM.

Three possible operating modes.

#### Operating modes

Full Harvard mode.

- Full Von Neuman mode.
- Hybrid mode.

## Shared Objects (SO)

The non-computational element of the BM

Alongside CPs, BondMachines include non-computing units called "Shared Objects" (SO). Examples of their purposes are:

- Data storage (Memories).
- Message passing.
- CP synchronization.

A single SO can be shared among different CPs. To use it CPs have special instructions (opcodes) oriented to the specific SO.

Four kind of SO have been developed so far: the Channel, the Shared Memory, the Barrier and a Pseudo Random Numbers Generator.

ICTP-IAEA School on FPGA-based SoC 22

Alongside CPs, BondMachines include non-computing units called "Shared Objects" (SO).

Examples of their purposes are:

Data storage (Memories).

Message passing

CP synchronization.

A single SO can be shared among different CPs. To use it CPs have special instructions (opcodes) oriented to the specific SO.

Four kind of SO have been developed so far: the Channel, the Shared Memory, the Barrier and a Pseudo Random Numbers Generator.

ICTP-IAEA School on FPGA-based SoC 22

Alongside CPs, BondMachines include non-computing units called "Shared Objects" (SO).

Examples of their purposes are:

Data storage (Memories).

Message passing.

CP synchronization.

A single SO can be shared among different CPs. To use it CPs have special instructions (opcodes) oriented to the specific SO.

Four kind of SO have been developed so far: the Channel, the Shared Memory, the Barrier and a Pseudo Random Numbers Generator.

ICTP-IAEA School on FPGA-based SoC 22

Alongside CPs, BondMachines include non-computing units called "Shared Objects" (SO).

Examples of their purposes are:

Data storage (Memories).

Message passing.

CP synchronization.

A single SO can be shared among different CPs. To use it CPs have special instructions (opcodes) oriented to the specific SO.

Four kind of SO have been developed so far: the Channel, the Shared Memory, the Barrier and a Pseudo Random Numbers Generator.

ICTP-IAEA School on FPGA-based SoC 22

Alongside CPs, BondMachines include non-computing units called "Shared Objects" (SO).

Examples of their purposes are:

Data storage (Memories).

Message passing.

CP synchronization.

A single SO can be shared among different CPs. To use it CPs have special instructions (opcodes) oriented to the specific SO.

Four kind of SO have been developed so far: the Channel, the Shared Memory, the Barrier and a Pseudo Random Numbers Generator.

ICTP-IAEA School on FPGA-based SoC 22

Alongside CPs, BondMachines include non-computing units called "Shared Objects" (SO).

Examples of their purposes are:

Data storage (Memories).

Message passing.

CP synchronization.

A single SO can be shared among different CPs. To use it CPs have special instructions (opcodes) oriented to the specific SO.

Four kind of SO have been developed so far: the Channel, the Shared Memory, the Barrier and a Pseudo Random Numbers Generator.

ICTP-IAEA School on FPGA-based SoC 22

```
Channel
 The Channel SO is an hardware implementation of the CSP (communicating sequential
 processes) channel.
```

Criw. Channel Walt

ICTP-IAEA School on FPGA-based SoC 22

## Channel

The Channel SO is an hardware implementation of the CSP (communicating sequential processes) channel.

It is a model for inter-core communication and synchronization via message passing.

#### CPs use channels via 4 opcodes

- wrd: Want Read
- wwr: Want Write
- chc: Channel Check.
  - chw: Channel Wait.

ICTP-IAEA School on FPGA-based SoC 22

## Channel

The Channel SO is an hardware implementation of the CSP (communicating sequential processes) channel.

It is a model for inter-core communication and synchronization via message passing.

| CPs use channels via 4 opcodes |  |
|--------------------------------|--|
| wrd: Want Read.                |  |
| wwr: Want Write.               |  |
| chc: Channel Check.            |  |
| <i>chw</i> : Channel Wait.     |  |



CPs use shared memories via 2 opcodes

*s2r*: Shared memory read

*r2s*: Shared memory write.

ICTP-IAEA School on FPGA-based SoC 22

## Shared Memory

The Shared Memory SO is a RAM block accessible from more than one CP.

Different Shared Memories can be used by different CP and not necessarily by all of them.

#### CPs use shared memories via 2 opcodes

s2r: Shared memory read

r2s: Shared memory write.

ICTP-IAEA School on FPGA-based SoC 22

# Shared Memory

The Shared Memory SO is a RAM block accessible from more than one CP.

Different Shared Memories can be used by different CP and not necessarily by all of them.

#### CPs use shared memories via 2 opcodes

s2r: Shared memory read.

r2s: Shared memory write.

ICTP-IAEA School on FPGA-based SoC 22


CPs use barriers via 1 opcode

*hit*: Hit the barrier.

ICTP-IAEA School on FPGA-based SoC 22



When a CP hits a barrier, the execution stop until all the CPs that share the same barrier hit it.

CPs use barriers via 1 opcode

*hit*: Hit the barrier.

ICTP-IAEA School on FPGA-based SoC 22



When a CP hits a barrier, the execution stop until all the CPs that share the same barrier hit it.

CPs use barriers via 1 opcode

hit: Hit the barrier.

ICTP-IAEA School on FPGA-based SoC 22



Having a multi-core architecture completely heterogeneous both in cores types and interconnections.

The BondMachine may have many cores, eventually all different, arbitrarily interconnected and sharing non computing elements.

ICTP-IAEA School on FPGA-based SoC 22

The BM computer architecture is managed by a set of tools to:

build a specify architecture

modify a pre-existing architecture

simulate or emulate the behavior

generate the Hardware Description Language Code (HDL)

Processor Builder

Selects the single processor, assembles and disassembles, saves on disk as JSON, creates the HDL code of a CP BondMachine Builder

Connects CPs and SOs together in custom topologies, loads and saves on disk as JSON, create BM's HDL code Simulates the behaviour, emulates a BM on a standard Linux workstation

ICTP-IAEA School on FPGA-based SoC 22

#### The BondMachine Project

19

The BM computer architecture is managed by a set of tools to:

build a specify architecture

modify a pre-existing architecture

simulate or emulate the behavior

generate the Hardware Description Language Code (HDL)

Processor Builder

Selects the single processor, assembles and disassembles, saves on disk as JSON, creates the HDL code of a CP BondMachine Builder

Connects CPs and SOs together in custom topologies, loads and saves on disk as JSON, create BM's HDL code Simulates the behaviour, emulates a BM on a standard Linux workstation

The BM computer architecture is managed by a set of tools to:

build a specify architecture

modify a pre-existing architecture

simulate or emulate the behavior

generate the Hardware Description Language Code (HDL)

Processor Builder

Selects the single processor, assembles and disassembles, saves on disk as JSON, creates the HDL code of a CP BondMachine Builder

Connects CPs and SOs together in custom topologies, loads and saves on disk as JSON, create BM's HDL code Simulation Framework Simulates the behaviour, emulates a BM on a standard Linux workstation

The BM computer architecture is managed by a set of tools to:

build a specify architecture

modify a pre-existing architecture

simulate or emulate the behavior

generate the Hardware Description Language Code (HDL)

Processor Builder

Selects the single processor, assembles and disassembles, saves on disk as JSON, creates the HDL code of a CP BondMachine Builder

Connects CPs and SOs together in custom topologies, loads and saves on disk as JSON, create BM's HDL code Simulation Framework

Simulates the behaviour, emulates a BM on a standard Linux workstation



### Examples

(32 bit registers counter machine)

procbuilder -register-size 32 -opcodes clr,cpy,dec,inc,je,jz

(Input and Output registers)

procbuilder -inputs 3 -outputs 2 ...

ICTP-IAEA School on FPGA-based SoC 22





ICTP-IAEA School on FPGA-based SoC 22





ICTP-IAEA School on FPGA-based SoC 22



### Examples

(Create the CP RTL code in Verilog) procbuilder -create-verilog ...

(Create testbench)

procbuilder -create-verilog-testbench test.v ...

ICTP-IAEA School on FPGA-based SoC 22



To create a simple processor

To assemble and disassemble code for it

To produce its HDL code

Bondmachine is the tool that compose CP and SO to form BondMachines.

BM CP insert and remove BM SO insert and remove BM Inputs and Outputs BM Bonding Processors and/or IO BM Visualizing or HDL

# (Add a processor) bondmachine -add-domains proc.json ... ; ... -add-processor 0

bondmachine -bondmachine-file bmach.json -del-processor n

ICTP-IAEA School on FPGA-based SoC 22

Bondmachine is the tool that compose CP and SO to form BondMachines.

### BM SO insert and remove

BM Inputs and Outputs BM Bonding Processors and/or IO BM Visualizing or HDL

### Examples

(Add a Shared Object) bondmachine -add-shared-objects specs ...

(Connect an SO to a processor)

bondmachine -connect-processor-shared-object ...

ICTP-IAEA School on FPGA-based SoC 22

Bondmachine is the tool that compose CP and SO to form BondMachines.

### BM CP insert and remove BM SO insert and remove BM Inputs and Outputs

BM Bonding Processors and/or IO BM Visualizing or HDL

### Examples

(Adding inputs or outputs) bondmachine -add-inputs ... ; bondmachine -add-outputs ... (Removing inputs or outputs)

bondmachine -del-input ... ; bondmachine -del-output ...

ICTP-IAEA School on FPGA-based SoC 22

Bondmachine is the tool that compose CP and SO to form BondMachines.

BM CP insert and remove BM SO insert and remove BM Inputs and Outputs BM Bonding Processors and/or IO BM Visualizing or HDL



Bondmachine is the tool that compose CP and SO to form BondMachines.

BM CP insert and remove BM SO insert and remove BM Inputs and Outputs BM Bonding Processors and/or IO BM Visualizing or HDL

> (Visualizing) bondmachine -emit-dot ...

> > (Create RTL code)

bondmachine -create-verilog ...

ICTP-IAEA School on FPGA-based SoC 22

Examples



To create a single-core BondMachine

To attach an external output

To produce its HDL code

# A set of toolchains allow the build and the direct deploy to a target device of BondMachines

### Bondgo Toolchain main targets

A file local.mk contains references to the source code as well all the build necessities make bondmachine creates the JSON representation of the BM and assemble its code make hdl creates the HDL files of the BM make show displays a graphical representation of the BM make simulate [simbatch] start a simulation [batch simulation] make bitstream [design\_bitstream] create the firwware [accelerator firmware] make program flash the device into the destination target

ICTP-IAEA School on FPGA-based SoC 22

Toolchains



To explore the toolchain

To flash the board with the code from the previous example



To build a BondMachine with a processor and a shared object

To flash the board

ICTP-IAEA School on FPGA-based SoC 22



To build a dual-core BondMachine

To connect cores

To flash the board

### BondMachine web front-end

Operations on BondMachines can also be performed via an under development web



ICTP-IAEA School on FPGA-based SoC 22

# Simulation

An important feature of the tools is the possibility of simulating BondMachine behavior.

An event input file describes how BondMachines elements has to change during the simulation timespan and which one has to be be reported.

The simulator can produce results in the form of:

- Activity log of the BM internal.
- Graphical representation of the simulation.
- Report file with quantitative data. Useful to construct metrics

### Graphical simulation in action

ICTP-IAEA School on FPGA-based SoC 22

### Simulation

An important feature of the tools is the possibility of simulating BondMachine behavior.

An event input file describes how BondMachines elements has to change during the simulation timespan and which one has to be be reported.

The simulator can produce results in the form of:

- Activity log of the BM internal.
- Graphical representation of the simulation.
- Report file with quantitative data. Useful to construct metrics

Graphical simulation in action

ICTP-IAEA School on FPGA-based SoC 22



An important feature of the tools is the possibility of simulating BondMachine behavior.

An event input file describes how BondMachines elements has to change during the simulation timespan and which one has to be be reported.

The simulator can produce results in the form of:

- Activity log of the BM internal.
- Graphical representation of the simulation.
- Report file with quantitative data. Useful to construct metrics

### Graphical simulation in action

ICTP-IAEA School on FPGA-based SoC 22



To show the simulation capabilities of the framework



The same engine that simulate BondMachines can be used as emulator.

Through the emulator BondMachines can be used on Linux workstations.

ICTP-IAEA School on FPGA-based SoC 22

As stated before BondMachines are not general purpose architectures, and to be effective have to be shaped according the specific problem.

Several methods (apart from writing in assembly and building a BondMachine from scratch) have been developed to do that:

*bondgo*: A new type of compiler that create not only the CPs assembly but also the architecture itself.

*basm*: The BondMachine Assembler.

A set of API to create BondMachine to fit a specific computational problems.

An Evolutionary Computation framework to "grow" BondMachines according some fitness function via simulation.

A set of tools to use BondMachine in Machine Learning.

ICTP-IAEA School on FPGA-based SoC 22

As stated before BondMachines are not general purpose architectures, and to be effective have to be shaped according the specific problem.

Several methods (apart from writing in assembly and building a BondMachine from scratch) have been developed to do that:

*bondgo*: A new type of compiler that create not only the CPs assembly but also the architecture itself.

*basm*: The BondMachine Assembler.

A set of API to create BondMachine to fit a specific computational problems.

An Evolutionary Computation framework to "grow" BondMachines according some fitness function via simulation.

A set of tools to use BondMachine in Machine Learning.

ICTP-IAEA School on FPGA-based SoC 22

As stated before BondMachines are not general purpose architectures, and to be effective have to be shaped according the specific problem.

Several methods (apart from writing in assembly and building a BondMachine from scratch) have been developed to do that:

*bondgo*: A new type of compiler that create not only the CPs assembly but also the architecture itself.

*basm*: The BondMachine Assembler.

A set of API to create BondMachine to fit a specific computational problems.

An Evolutionary Computation framework to "grow" BondMachines according some fitness function via simulation.

A set of tools to use BondMachine in Machine Learning.

ICTP-IAEA School on FPGA-based SoC 22

As stated before BondMachines are not general purpose architectures, and to be effective have to be shaped according the specific problem.

Several methods (apart from writing in assembly and building a BondMachine from scratch) have been developed to do that:

*bondgo*: A new type of compiler that create not only the CPs assembly but also the architecture itself.

basm: The BondMachine Assembler.

A set of API to create BondMachine to fit a specific computational problems.

An Evolutionary Computation framework to "grow" BondMachines according some fitness function via simulation.

A set of tools to use BondMachine in Machine Learning.

ICTP-IAEA School on FPGA-based SoC 22

As stated before BondMachines are not general purpose architectures, and to be effective have to be shaped according the specific problem.

Several methods (apart from writing in assembly and building a BondMachine from scratch) have been developed to do that:

*bondgo*: A new type of compiler that create not only the CPs assembly but also the architecture itself.

basm: The BondMachine Assembler.

A set of API to create BondMachine to fit a specific computational problems.

An Evolutionary Computation framework to "grow" BondMachines according some fitness function via simulation.

A set of tools to use BondMachine in Machine Learning.

ICTP-IAEA School on FPGA-based SoC 22

As stated before BondMachines are not general purpose architectures, and to be effective have to be shaped according the specific problem.

Several methods (apart from writing in assembly and building a BondMachine from scratch) have been developed to do that:

*bondgo*: A new type of compiler that create not only the CPs assembly but also the architecture itself.

basm: The BondMachine Assembler.

A set of API to create BondMachine to fit a specific computational problems.

An Evolutionary Computation framework to "grow" BondMachines according some fitness function via simulation.

A set of tools to use BondMachine in Machine Learning.

ICTP-IAEA School on FPGA-based SoC 22

As stated before BondMachines are not general purpose architectures, and to be effective have to be shaped according the specific problem.

Several methods (apart from writing in assembly and building a BondMachine from scratch) have been developed to do that:

*bondgo*: A new type of compiler that create not only the CPs assembly but also the architecture itself.

basm: The BondMachine Assembler.

- A set of API to create BondMachine to fit a specific computational problems.
- An Evolutionary Computation framework to "grow" BondMachines according some fitness function via simulation.
  - A set of tools to use BondMachine in Machine Learning.


#### Mapping specific computational problems to BMs



#### more about these tools

ICTP-IAEA School on FPGA-based SoC 22



more about these tools

ICTP-IAEA School on FPGA-based SoC 22



#### more about these tools

ICTP-IAEA School on FPGA-based SoC 22



more about these tools

ICTP-IAEA School on FPGA-based SoC 22



more about these tools

ICTP-IAEA School on FPGA-based SoC 22



#### more about these tools

ICTP-IAEA School on FPGA-based SoC 22



more about these tools

ICTP-IAEA School on FPGA-based SoC 22



Bondgo is the name chosen for the compiler developed for the BondMachine.

The compiler source language is Go as the name suggest.

# Bondgo

#### This is the standard flow when building computer programs

# Bondgo

This is the standard flow when building computer programs

high level language source

ICTP-IAEA School on FPGA-based SoC 22





# Bondgo

#### Bondgo does something different from standard compilers ...

ICTP-IAEA School on FPGA-based SoC 22

# Bondgo

#### Bondgo does something different from standard compilers ...

high level GO source

ICTP-IAEA School on FPGA-based SoC 22



































To create a BondMachine from a Go source file

- To build the architecture
- To build the program
- To create the firmware and flash it to the board

#### Bondgo

... it can do even much more interesting things when compiling concurrent programs.

#### Bondgo

... it can do even much more interesting things when compiling concurrent programs.

high level GO source








ICTP-IAEA School on FPGA-based SoC 22



ICTP-IAEA School on FPGA-based SoC 22





Compiling the code with the bondgo compiler:

bondgo -input-file ds.go -mpm

The toolchain perform the following steps:

- Map the two goroutines to two hardware cores.
- Creates two types of core, each one optimized to execute the assigned goroutine.
- Creates the two binaries.
- Connected the two core as inferred from the source code, using special IO registers. The result is a multicore BondMachine:



ICTP-IAEA School on FPGA-based SoC 22

| Compiling Architectures |  |
|-------------------------|--|
|                         |  |

## One of the most important result

The architecture creation is a part of the compilation process.

ICTP-IAEA School on FPGA-based SoC 22



To use bondgo to create a chain of interconnected processors

To flash the firmware to the board





ICTP-IAEA School on FPGA-based SoC 22





ICTP-IAEA School on FPGA-based SoC 22





ICTP-IAEA School on FPGA-based SoC 22





ICTP-IAEA School on FPGA-based SoC 22





ICTP-IAEA School on FPGA-based SoC 22





ICTP-IAEA School on FPGA-based SoC 22

# Bondgo Go in hardware Bondgo implements a sort of "Go in hardware"

High level Go source code is directly mapped to interconnected processors without Operating Systems or runtimes.



ICTP-IAEA School on FPGA-based SoC 22

# Bondgo Go in hardware Bondgo implements a sort of "Go in hardware"

High level Go source code is directly mapped to interconnected processors without Operating Systems or runtimes.



ICTP-IAEA School on FPGA-based SoC 22





ICTP-IAEA School on FPGA-based SoC 22

# Bondgo Go in hardware Bondgo implements a sort of "Go in hardware"

High level Go source code is directly mapped to interconnected processors without Operating Systems or runtimes.



ICTP-IAEA School on FPGA-based SoC 22

# Bondgo Go in hardware Bondgo implements a sort of "Go in hardware'

High level Go source code is directly mapped to interconnected processors without Operating Systems or runtimes.



ICTP-IAEA School on FPGA-based SoC 22

## Go in hardware Second idea on the BondMachine

The idea was: Build a computing system with a decreased number of layers resulting in a lower HW/SW gap.

This would raise the overall performances yet keeping an user friendly way of programming.

Between HW and SW there is only the processor abstraction, no Operating System nor runtimes. Despite that programming is done at high level.

## Layers, Abstractions and Interfaces

#### and BondMachines



ICTP-IAEA School on FPGA-based SoC 22

| ondgo<br><sub>example</sub>                                                                      |                                  |
|--------------------------------------------------------------------------------------------------|----------------------------------|
|                                                                                                  | bondgo stream processing example |
| <br>package main                                                                                 |                                  |
| <pre>import (     "bondgo" ) func streamprocessor(a *[]uint8, b *[])</pre>                       | 11178                            |
| c *[]uint8, gid uint8) {<br>(*c)[gid] = (*a)[gid] + (*b)[gid]                                    |                                  |
| a := make([]uint8, 256)<br>b := make([]uint8, 256)<br>c := make([]uint8, 256)                    |                                  |
| // some a and b values fill                                                                      |                                  |
| <pre>for i := 0; i &lt; 256; i++ {     go streamprocessor(&amp;a, &amp;b, &amp;c, u.   } }</pre> | int8(i))                         |

The compilation of this example results in the creation of a 257 CPs where 256 are the stream processors executing the code in the function called *streamprocessor*, and one is the coordinating CP. Each stream processor is optimized and capable only to make additions since it is the only operation requested by the source code. The three slices created on the main function are passed by reference to the Goroutines then a shared RAM is created by the *Bondgo* compiler available to the generated CPs.

ICTP-IAEA School on FPGA-based SoC 22



The BondMachine assembler *Basm* is the compiler complementary tools. The BondMachine "fluid" nature gives the assembler some unique features: Support for template based assembly code

Combining and rewriting fragments of assembly code

Building hardware from assembly

Software/Hardware rearrange capabilities

ICTP-IAEA School on FPGA-based SoC 22



Support for template based assembly code

Combining and rewriting fragments of assembly code

Building hardware from assembly

Software/Hardware rearrange capabilities

ICTP-IAEA School on FPGA-based SoC 22



Support for template based assembly code

Combining and rewriting fragments of assembly code

Building hardware from assembly

Software/Hardware rearrange capabilities

ICTP-IAEA School on FPGA-based SoC 22



Support for template based assembly code

Combining and rewriting fragments of assembly code

Building hardware from assembly

Software/Hardware rearrange capabilities

ICTP-IAEA School on FPGA-based SoC 22



Support for template based assembly code

Combining and rewriting fragments of assembly code

Building hardware from assembly

Software/Hardware rearrange capabilities

ICTP-IAEA School on FPGA-based SoC 22

## Abstract Assembly

The Assembly language for the BM has been kept as independent as possible from the particular CP.

Given a specific piece of assembly code Bondgo has the ability to compute the "minimum CP" that can execute that code.

| ·   |    |    | - |
|-----|----|----|---|
| i2r | r0 | i0 |   |
| i2r | r1 | i1 |   |
| add | r0 | r1 |   |
| r2o | r0 | о0 |   |
| j 0 |    |    |   |



These are Building Blocks for complex BondMachines.

ICTP-IAEA School on FPGA-based SoC 22

# With these Building Blocks Several libraries have been developed to map specific problems on BondMachines:

Boolbond, to map boolean expression.

Matrixwork, to perform matrices operations.

#### more about these tools

ICTP-IAEA School on FPGA-based SoC 22

**Builders** API

#### With these Building Blocks

Several libraries have been developed to map specific problems on BondMachines:

#### Symbond, to handle mathematical expression.

Boolbond, to map boolean expression.

Matrixwork, to perform matrices operations.

#### more about these tools

ICTP-IAEA School on FPGA-based SoC 22

**Builders** API

# With these Building Blocks Several libraries have been developed to map specific problems on BondMachines:

## Symbond, to handle mathematical expression.

Boolbond, to map boolean expression.

Matrixwork, to perform matrices operations.

#### more about these tools

ICTP-IAEA School on FPGA-based SoC 22

**Builders** API

## With these Building Blocks

Several libraries have been developed to map specific problems on BondMachines:

- Symbond, to handle mathematical expression.
- Boolbond, to map boolean expression.
- Matrixwork, to perform matrices operations.

#### more about these tools

**Builders** API



symbond -expression "sum(var(x),const(2))" -save-bondmachine bondmachine.json

Resulting in:

ICTP-IAEA School on FPGA-based SoC 22



Resulting in:

ICTP-IAEA School on FPGA-based SoC 22



#### Boolbond

symbond -expression "sum(var(x),const(2))" -save-bondmachine bondmachine.json

Resulting in:

ICTP-IAEA School on FPGA-based SoC 22



ICTP-IAEA School on FPGA-based SoC 22

The BondMachine Project

3C



#### Boolbond

boolbond -system-file expression.txt -save-bondmachine bondmachine.json

Resulting in:

ICTP-IAEA School on FPGA-based SoC 22


o:var(t) o:var(l)

#### Boolbond

boolbond -system-file expression.txt -save-bondmachine bondmachine.json

Resulting in:

ICTP-IAEA School on FPGA-based SoC 22



o:var(l)

#### Boolbond

boolbond -system-file expression.txt -save-bondmachine bondmachine.json

Resulting in:

ICTP-IAEA School on FPGA-based SoC 22





To create complex multi-cores from boolean expressions

ICTP-IAEA School on FPGA-based SoC 22



#### Matrix multiplication

#### if mymachine, $ok := matrixwork.Build_M(n, t)$ ; ok == nil ...



ICTP-IAEA School on FPGA-based SoC 22

## Evolutionary BondMachine



ICTP-IAEA School on FPGA-based SoC 22















Interconnected BondMachines

What if we could extend the this layer to multiple interconnected devices ?

ICTP-IAEA School on FPGA-based SoC 22

#### So far we saw:

An user friendly approach to create processors (single core).

Optimizing a single device to support intricate computational work-flows (multi-cores) over an heterogeneous layer.

Interconnected BondMachines

What if we could extend the this layer to multiple interconnected devices ?

ICTP-IAEA School on FPGA-based SoC 22



An user friendly approach to create processors (single core).

Optimizing a single device to support intricate computational work-flows (multi-cores) over an heterogeneous layer.

Interconnected BondMachines

What if we could extend the this layer to multiple interconnected devices ?

ICTP-IAEA School on FPGA-based SoC 22



An user friendly approach to create processors (single core).

Optimizing a single device to support intricate computational work-flows (multi-cores) over an heterogeneous layer.

#### Interconnected BondMachines

What if we could extend the this layer to multiple interconnected devices ?

ICTP-IAEA School on FPGA-based SoC 22

# The same logic existing among CP have been extended among different BondMachines organized in clusters.

Protocols, one ethernet called *etherbond* and one using UDP called *udpbond* have been created for the purpose.

FPGA based BondMachines, standard Linux Workstations, Emulated BondMachines might join a cluster an contribute to a single distributed computational problem.

ICTP-IAEA School on FPGA-based SoC 22

The same logic existing among CP have been extended among different BondMachines organized in clusters.

Protocols, one ethernet called *etherbond* and one using UDP called *udpbond* have been created for the purpose.

FPGA based BondMachines, standard Linux Workstations, Emulated BondMachines might join a cluster an contribute to a single distributed computational problem.

ICTP-IAEA School on FPGA-based SoC 22

The same logic existing among CP have been extended among different BondMachines organized in clusters.

Protocols, one ethernet called *etherbond* and one using UDP called *udpbond* have been created for the purpose.

FPGA based BondMachines, standard Linux Workstations, Emulated BondMachines might join a cluster an contribute to a single distributed computational problem.

ICTP-IAEA School on FPGA-based SoC 22



A distributed example

import (

device\_1: go pong()

```
distributed counter
package main
   "bondgo"
func pong() {
   var inO bondgo.Input
   var outO bondgo.Output
   in0 = bondgo.Make(bondgo.Input. 3)
   out0 = bondgo.Make(bondgo.Output, 5)
      bondgo.IOWrite(out0, bondgo.IORead(in0)+1)
func main() {
   var inO bondgo.Input
  var out0 bondgo.Output
   in0 = bondgo.Make(bondgo.Input, 5)
  out0 = bondgo.Make(bondgo.Output, 3)
```

ICTP-IAEA School on EPGA-based SoC 22

bondgo.IOWrite(out0, bondgo.IORead(in0))





#### A general result

Parts of the system can be redeployed among different devices without changing the system behavior (only the performances).

ICTP-IAEA School on FPGA-based SoC 22



 Workstation with emulated BondMachines, workstation with etherbond drivers, standalone BondMachines (FPGA) may join these clusters.

















The BondMachine Project

4D

## Interconnection firmware

The input and output buses are the endpoints that we would like to have on the linux system.





ICTP-IAEA School on FPGA-based SoC 22

## Interconnection firmware

The input and output buses are the endpoints that we would like to have on the linux system.



The BondMachine Project

PS (arm)

FPGA

Custom HW

design



The Advanced eXtensible Interface Protocol

AXI is a communication bus protocol defined by ARM as part of the Advanced Microcontroller Bus Architecture (AMBA) standard. There are 3 types of AXI Interfaces:

AXI Full: for high-performance memory-mapped requirements. AXI Lite: for low-throughput memory-mapped communication.

AXI Stream: for high-speed streaming data.

| Entite sterningt support | + -<br>b (resetsces<br>$\oplus$ (soo, Axt) |  |  |  | 0<br>v<br>v<br>v<br>v<br>v<br>v<br>v<br>v<br>v<br>v<br>v<br>v<br>v |
|--------------------------|--------------------------------------------|--|--|--|--------------------------------------------------------------------|
|--------------------------|--------------------------------------------|--|--|--|--------------------------------------------------------------------|



ICTP-IAEA School on FPGA-based SoC 22


### Linux

Now that we have a custom accelerated hardware, we need a Linux distro to run on it.

### **Common Features**

Complete system build from source Allow choice of kernel and bootloader Support for modifying packages with patches or custom configuration files Can build cross-toolchains for development Convenient support for read-only root filesystems Support offline builds The build configuration files integrate well with SCM tools

### Yocto

Convenient sharing of build configuration among similar projects (meta-layers) Larger community (Linux Foundation project) Can build a toolchain that runs on the target A package management system

### Buildroot

Simple Makefile approach, easier to understand how the build system works Reduced resource requirements on the build machine Very easy to customize the final root filesystem (overlays)

Credits: https://jumpnowtek.com/linux/Choosing-an-embedded-linux-build-system.html

ICTP-IAEA School on FPGA-based SoC 22

The BondMachine Project



51



### kernel module

The accelerator endpoints are exposed via AXI memory-mapped as memory location of the arm processor running Linux.

To properly use the accelerator from user space, the kernel has to handle the accelerator endpoints and make them available to user space.

We developed a kernel module for our accelerators. It manages 3 data flows:





### Kernel from and to user space: char device

The communication are through the standard read and write system call on a kernel generated char device

A language has been implemented for the desired operations



ICTP-IAEA School on FPGA-based SoC 22



AXI guarantees consistency and transfer to the firmware input ports. Moreover the data flow from kernel cannot saturate the PL part.

ICTP-IAEA School on FPGA-based SoC 22

The BondMachine Project

PS (arm)

App

Linux based OS

Firmware to kernel: IRQ

Different story is the data flow from the FPGA to the PS part. Data can easily flow so fast to saturate and make the PS part completely unusable.

The firmware collect all the changes to send and fill in a list using a dedicated AXI register



ICTP-IAEA School on FPGA-based SoC 22

Different story is the data flow from the FPGA to the PS part. Data can easily flow so fast to saturate and make the PS part completely unusable.

The firmware collect all the changes to send and fill in a list using a dedicated AXI register

Stop accepting new changes from the IP



Firmware to kernel: IRQ

Different story is the data flow from the FPGA to the PS part. Data can easily flow so fast to saturate and make the PS part completely unusable.

The firmware collect all the changes to send and fill in a list using a dedicated AXI register

Stop accepting new changes from the IP Send an interrupt request to the kernel



Different story is the data flow from the FPGA to the PS part. Data can easily flow so fast to saturate and make the PS part completely unusable.



The BondMachine Project

FPGA

Custom HW design

Interconnect firmware

Wires

Different story is the data flow from the FPGA to the PS part. Data can easily flow so fast to saturate and make the PS part completely unusable.



The BondMachine Project

FPGA

Custom HW design

Interconnec

Wires

Different story is the data flow from the FPGA to the PS part. Data can easily flow so fast to saturate and make the PS part completely unusable.



ICTP-IAEA School on FPGA-based SoC 22

The BondMachine Project

PS (arm

FPGA

Custom HW design

Interconnec

Wires C









Check of the correctness of the accelerator results

Benchmark of the execution

ICTP-IAEA School on FPGA-based SoC 22















Correctness and module debug

# To verify the correct computation of the accelerator:

a tool to monitor the AXI memory

write directly to AXI memory mapped input addresses (through devmem)

| 1   | # ./mon  | itor -g 0: | <43c00000 -n  | 3 |              |              |              |
|-----|----------|------------|---------------|---|--------------|--------------|--------------|
|     |          |            | (0x43c00003)  |   | (0x43c00002) | (0x43c00001) | (0x43c00000) |
|     |          |            | (0x43c00007)  |   | (0x43c00006) | (0x43c00005) | (0x43c00004) |
|     |          |            | (0x43c0000b)  |   | (0x43c0000a) | (0x43c00009) | (0x43c00008) |
|     |          |            | (0x43c0000f)  |   | (0x43c0000e) | (0x43c0000d) | (0x43c0000c) |
|     |          |            | (0x43c00013)  |   | (0x43c00012) | (0x43c00011) | (0x43c00010) |
|     |          |            | (0x43c00017)  |   | (0x43c00016) | (0x43c00015) | (0x43c00014) |
|     |          |            | (0x43c0001b)  |   | (0x43c0001a) | (0x43c00019) | (0x43c00018) |
|     |          |            | (0x43c0001f)  |   | (0x43c0001e) | (0x43c0001d) | (0x43c0001c) |
|     | PS2PL:   |            | (0x43c00023)  |   | (0x43c00022) | (0x43c00021) | (0x43c00020) |
|     | STATES:  |            | (0x43c00027)  |   | (0x43c00026) | (0x43c00025) | (0x43c00024) |
|     |          |            | (0x43c0002b)  |   |              |              |              |
|     |          |            | (0x43c0002f)  |   |              |              |              |
|     |          |            | (0x43c00033)  |   | (0x43c00032) | (0x43c00031) | (0x43c00030) |
|     |          |            | (0x43c00037)  |   | (0x43c00036) | (0x43c00035) | (0x43c00034) |
|     |          |            | (⊗x43c0003b)  |   |              |              |              |
|     |          |            | (0x43c0003f)  |   |              |              |              |
|     |          |            | (0x43c00043)  |   |              |              |              |
|     |          |            | (0x43c00047)  |   |              |              |              |
|     | bench:   |            | (0x43c0004b)  |   |              |              |              |
| . 1 | PL2PS:   |            | (0x43c00004f) |   |              |              |              |
|     | CHANGE : |            | (0x43c00053)  |   | (0x43c00052) | (0x43c00051) | (0x43c00050) |
|     |          |            |               |   |              |              |              |

check the AXI memory mapped output addresses

ICTP-IAEA School on FPGA-based SoC 22

Correctness and module debug

To verify the correct computation of the accelerator:

a tool to monitor the AXI memory

write directly to AXI memory mapped input addresses (through devmem)

check the AXI memory mapped output addresses

| 1 | # ./mon  | itor -a Ø | x43c00000 -n i | 3 |              |               |              |
|---|----------|-----------|----------------|---|--------------|---------------|--------------|
|   | i0:      |           | (0x43c00003)   |   | (0x43c00002) | (0x43c000001) | (0x43c00000) |
|   |          |           | (0x43c00007)   |   | (0x43c00006) | (0x43c00005)  | (0x43c00004) |
|   |          |           | (0x43c0000b)   |   |              |               |              |
|   |          |           | (0x43c0000f)   |   | (0x43c0000e) | (0x43c0000d)  | (0x43c0000c) |
|   |          |           | (0x43c00013)   |   | (0x43c00012) | (0x43c00011)  | (0x43c00010) |
|   |          |           | (0x43c00017)   |   | (0x43c00016) | (0x43c00015)  | (0x43c00014) |
|   |          |           | (0x43c0001b)   |   | (0x43c0001a) | (0x43c00019)  | (0x43c00018) |
|   |          |           | (0x43c0001f)   |   | (0x43c0001e) | (0x43c0001d)  | (0x43c0001c) |
|   | PS2PL:   |           | (0x43c00023)   |   | (0x43c00022) | (0x43c00021)  | (0x43c00020) |
|   | STATES:  |           | (0x43c00027)   |   | (0x43c00026) | (0x43c00025)  | (0x43c00024) |
|   |          |           | (0x43c0002b)   |   |              |               |              |
|   |          |           | (0x43c0002f)   |   | (0x43c0002e) | (0x43c0002d)  | (0x43c0002c) |
|   |          |           | (0x43c00033)   |   | (0x43c00032) | (0x43c00031)  | (0x43c00030) |
|   |          |           | (0x43c00037)   |   | (0x43c00036) | (0x43c00035)  | (0x43c00034) |
|   |          |           | (0x43c0003b)   |   |              |               |              |
|   |          |           | (0x43c0003f)   |   |              |               |              |
|   |          |           | (0x43c00043)   |   |              |               |              |
|   |          |           | (0x43c00047)   |   |              |               |              |
|   | bench:   |           | (0x43c0004b)   |   |              |               |              |
|   | PL2PS:   |           | (0x43c0004f)   |   |              |               |              |
|   | CHANGE : |           | (0x43c00053)   |   | (0x43c00052) | (0x43c00051)  | (0x43c00050) |
|   |          |           |                |   |              |               |              |

#### devmem @x43c00000 b 1

ICTP-IAEA School on FPGA-based SoC 22

Correctness and module debug

To verify the correct computation of the accelerator:

a tool to monitor the AXI memory

write directly to AXI memory mapped input addresses (through devmem)

check the AXI memory mapped output addresses

|          | <br>         |              | - | -            |              |
|----------|--------------|--------------|---|--------------|--------------|
|          |              |              |   |              |              |
| i0:      | (0x43c00003) |              |   |              |              |
| i1:      | (0x43c00007) | (0x43c00006) |   | (0x43c00005) | (0x43c00004) |
| i2:      | (0x43c0000b) | (0x43c0000a) |   | (0x43c00009) | (0x43c00008) |
| 13:      | (0x43c0000f) | (0x43c0000e) |   | (0x43c0000d) | (0x43c0000c) |
| 14:      | (0x43c00013) | (0x43c00012) |   | (0x43c00011) | (0x43c00010) |
| 15:      | (0x43c00017) | (0x43c00016) |   | (0x43c00015) | (0x43c00014) |
| 16:      | (0x43c0001b) | (0x43c0001a) |   | (0x43c00019) | (0x43c00018) |
| 17:      | (0x43c0001f) | (0x43c0001e) |   | (0x43c0001d) | (0x43c0001c) |
| PS2PL:   | (0x43c00023) | (0x43c00022) |   | (0x43c00021) | (0x43c00020) |
| STATES : | (0x43c00027) | (0x43c00026) |   | (0x43c00025) | (0x43c00024) |
| 00:      | (0x43c0002b) | (0x43c0002a) |   | (0x43c00029) | (0x43c00028) |
| 01:      | (0x43c0002f) | (0x43c0002e) |   | (0x43c0002d) | (0x43c0002c) |
| 02:      | (@x43c00033) | (0x43c00032) |   | (0x43c00031) | (0x43c00030) |
| 03:      | (@x43c00037) | (0x43c00036) |   | (0x43c00035) | (0x43c00034) |
| 04:      | (@x43c0003b) | (0x43c0003a) |   | (0x43c00039) | (0x43c00038) |
| 05:      | (0x43c0003f) | (0x43c0003e) |   | (0x43c0003d) | (0x43c0003c) |
| 06:      | (0x43c00043) | (0x43c00042) |   | (0x43c00041) | (0x43c00040) |
| 07:      | (0x43c00047) |              |   |              |              |
| bench:   | (0x43c0004b) |              |   |              |              |
| PL2PS:   | (0x43c0004f) |              |   |              |              |
| CHANGE : | (0x43c00053) |              |   |              |              |
| CHARGE 1 |              |              |   |              |              |

#### devmem @x43c00000 b 1

### An example of error

| # ./mor  | hitor -q ( | 0x43c00000 -n | 13       |              |         |                      |          |              |
|----------|------------|---------------|----------|--------------|---------|----------------------|----------|--------------|
| i0:      |            | (0x43c00003)  |          | (0x43c00002) |         | (0x43c00001)         |          | (0x43c00000) |
|          |            | (0x43c00007)  |          | (0x43c00006) |         | (0x43c00005)         |          | (0x43c00004) |
|          |            | (0x43c0000b)  |          | (0x43c0000a) |         | (0x43c00009)         |          | (0x43c00008) |
|          |            | (0x43c0000f)  |          | (0x43c0000e) |         | (0x43c0000d)         |          | (0x43c0000c) |
|          |            | (0x43c00013)  |          | (0x43c00012) |         | (0x43c00011)         |          | (0x43c00010) |
|          |            | (0x43c00017)  |          | (0x43c00016) |         | (0x43c00015)         |          | (0x43c00014) |
|          |            | (0x43c0001b)  |          | (0x43c0001a) |         | (0x43c00019)         |          | (0x43c00018) |
|          |            | (0x43c0001f)  |          | (0x43c0001e) |         | (0x43c0001d)         |          | (0x43c0001c) |
|          |            | (0x43c00023)  |          | (0x43c00022) |         | (0x43c00021)         |          | (0x43c00020) |
|          |            | (0x43c00027)  |          | (0x43c00026) |         | (0x43c00025)         |          | (0x43c00024) |
| i10:     |            | (0x43c0002b)  |          | (0x43c0002a) |         | (0x43c00029)         |          | (0x43c00028) |
|          |            | (0x43c0002f)  |          | (0x43c0002e) |         | (0x43c0002d)         |          | (0x43c0002c) |
|          |            | (0x43c00033)  |          | (0x43c00032) |         | (0x43c00031)         |          | (0x43c00030) |
| PS2PL:   |            | (0x43c00037)  |          | (0x43c00036) |         | (0x43c00035)         |          | (0x43c00034) |
| STATES:  |            | (0x43c0003b)  |          | (0x43c0003a) |         | (0x43c00039)         |          | (0x43c00038) |
| 00:      |            | (0x43c0003f)  |          | (0x43c0003e) |         | (0x43c0003d)         |          | (0x43c0003c) |
|          |            | (0x43c00043)  |          | (0x43c00042) |         | (0x43c00041)         |          | (0x43c00040) |
|          |            | (0x43c00047)  |          | (0x43c00046) |         | (0x43c00045)         |          | (0x43c00044) |
|          |            | (0x43c0004b)  |          | (0x43c0004a) |         | (0x43c000 <u>49)</u> | 00000100 | (0x43c00048) |
| o4:      |            | (0x43c0004f)  |          | (0x43c0004e) |         | (0x43c000ld)         | 00000001 | (0:43c0004c) |
| o5:      |            | (0x43c00053)  |          | (0x43c00052) |         | (0x43c00051)         |          |              |
| 06:      |            | (0x43c00057)  |          | (0x43c00056) |         | (0x43c0005)          |          | (0)43c00054) |
|          |            | (0x43c0005b)  |          | (0x43c0005a) |         | (0x43c00059)         | 00000010 | (0x43c00058) |
| o8:      |            | (0x43c0005f)  |          | (0x43c0005e) |         | (0x43c0005d)         |          | (0x43c0005c) |
| o9:      |            | (0x43c00063)  |          | (0x43c00062) |         | (0x43c00061)         |          | (0x43c00060) |
| o10:     |            | (0x43c00067)  |          | (0x43c00066) | 0000000 | (0x43c00065)         | 00000010 | (0x43c00064) |
| o11:     |            | (0x43c0006b)  | 00000000 | (0x43c0006a) | 0000000 | (0x43c00069)         | 00000110 | (0x43c00068) |
|          |            | (0x43c0006f)  |          | (0x43c0006e) |         | (0x43c0006d)         |          | (0x43c0006c) |
|          |            | (0x43c00073)  |          | (0x43c00072) |         | (0x43c00071)         |          | (0x43c00070) |
| PL2PS:   |            | (0x43c00077)  |          | (0x43c00076) |         | (0x43c00075)         |          | (0x43c00074) |
| CHANGE : |            | (0x43c0007b)  |          | (0x43c0007a) |         | (0x43c00079)         |          | (0x43c00078) |
|          |            |               |          |              |         |                      |          |              |
|          |            |               |          |              |         |                      |          |              |
|          |            |               |          |              |         |                      |          |              |

ICTP-IAEA School on FPGA-based SoC 22

### An example of error

|          | 000000000 |              | (0x43c00002) | (0x43c00001)         |          | (0x43c00000) |
|----------|-----------|--------------|--------------|----------------------|----------|--------------|
|          | 000000000 | (0x43c00007) | (0x43c00006) | (0x43c00005)         |          | (0x43c00004) |
|          |           | (0x43c0000b) | (0x43c0000a) | (0x43c00009)         |          | (0x43c00008) |
|          |           | (0x43c0000f) | (0x43c0000e) | (0x43c0000d)         |          | (0x43c0000c) |
| i4:      |           | (0x43c00013) | (0x43c00012) | (0x43c00011)         |          | (0x43c00010) |
|          |           | (0x43c00017) | (0x43c00016) | (0x43c00015)         |          | (0x43c00014) |
| i6:      |           | (0x43c0001b) | (0x43c0001a) | (0x43c00019)         |          | (0x43c00018) |
|          |           | (0x43c0001f) | (0x43c0001e) | (0x43c0001d)         |          | (0x43c0001c) |
| i8:      |           | (0x43c00023) | (0x43c00022) | (0x43c00021)         |          | (0x43c00020) |
| i9:      |           | (0x43c00027) | (0x43c00026) | (0x43c00025)         |          | (0x43c00024) |
|          |           | (0x43c0002b) | (0x43c0002a) | (0x43c00029)         |          | (0x43c00028) |
| i11:     |           | (0x43c0002f) | (0x43c0002e) | (0x43c0002d)         |          | (0x43c0002c) |
| i12:     |           | (0x43c00033) | (0x43c00032) | (0x43c00031)         |          | (0x43c00030) |
| PS2PL:   |           | (0x43c00037) | (0x43c00036) | (0x43c00035)         |          | (0x43c00034) |
| STATES:  |           | (0x43c0003b) | (0x43c0003a) | (0x43c00039)         |          | (0x43c00038) |
| :00      |           | (0x43c0003f) | (0x43c0003e) | (0x43c0003d)         |          | (0x43c0003c) |
| 51:      |           | (0x43c00043) | (0x43c00042) | (0x43c00041)         |          | (0x43c00040) |
| 52:      |           | (0x43c00047) | (0x43c00046) | (0x43c00045)         |          | (0x43c00044) |
| 53:      |           | (0x43c0004b) | (0x43c0004a) | (0x43c000 <u>49)</u> | 00000100 | (0x43c00048) |
| o4:      |           | (0x43c0004f) | (0x43c0004e) | (0x43c000ld)         | 00000001 | (0:43c0004c) |
| 5:       |           | (0x43c00053) | (0x43c00052) | (0x43c00051)         |          | (0;43c00050) |
| D6:      |           | (0x43c00057) | (0x43c00056) | (0x43c00055)         | 00000010 | (0:43c00054) |
| o7:      |           | (0x43c0005b) | (0x43c0005a) | (0x43c00059)         | 00000010 | (0x43c00058) |
| :80      |           | (0x43c0005f) | (0x43c0005e) | (0x43c0005d)         |          | (0x43c0005c) |
| o9:      |           | (0x43c00063) | (0x43c00062) | (0x43c00061)         |          | (0x43c00060) |
| o10:     |           | (0x43c00067) | (0x43c00066) | (0x43c00065)         |          | (0x43c00064) |
| o11:     |           | (0x43c0006b) | (0x43c0006a) | (0x43c00069)         |          | (0x43c00068) |
| 512:     |           | (0x43c0006f) | (0x43c0006e) | (0x43c0006d)         |          | (0x43c0006c) |
| ol3 bcm  |           | (0x43c00073) | (0x43c00072) | (0x43c00071)         |          | (0x43c00070) |
| PL2P5:   |           | (0x43c00077) | (0x43c00076) | (0x43c00075)         |          | (0x43c00074) |
| CHANGE : |           | (0x43c0007b) | (0x43c0007a) | (0x43c00079)         |          | (0x43c00078) |

•••

ICTP-IAEA School on FPGA-based SoC 22

The BondMachine Project

5D



The FPGA benchmarks do not include the PS part overhead (the comparisons are not really fair)

# Benchmark: the CPU (Golang)

Time measures: built-in golang facilities

- Energy measures: perf
- Intel(R) Xeon(R) CPU E3-1270 v5 @ 3.60GHz

Go 1.18.2

| 2   | 0.00543209 | 259280   | 3.858015-00  |
|-----|------------|----------|--------------|
| 8   | 0.01831868 | 484280   | 2.305478-06  |
| 4   | 0.02399964 | 722280   | L304002-06   |
| 5   | 0.00632906 | 1870400  | 9.34235-07   |
|     | 0.00570083 | 1471400  | 0.736234-07  |
| 7   | 0.07363833 | 1835800  | 5.355822-47  |
|     | 0.09997730 | 2737800  | 0.05364E-87  |
| 1.1 | 0.12227912 | 3429000  | 2.515136-07  |
| 30  | 0.36490378 | 4465800  | 2.338396-47  |
| 11  | 0.00173032 | 5530300  | L80822E-87   |
| 32  | 0.34205632 | 6643300  | L505216-87   |
| 11  | 0.38554472 | 1752800  | 1.208238-07  |
| 34  | 0.35400825 | 8954800  | L.13582E-07  |
| 15  | 0.3061176  | 18630508 | 9.40434E-00  |
| 33  | 0.41550508 | 11832200 | 8.416518-00  |
| 37  | 0.5084054  | 13004308 | 7.35042-08   |
| 35  | 0.5063083  | 15124500 | 6.52550-08   |
| 11  | 0.63335665 | 17024400 | 5.306336-00  |
| 20  | 0.708354   | 18718300 | 5.0728-08    |
| 21  | 0.3553206  | 22133800 | 4.517908-00  |
| 11  | 0.0030085  | 22525300 | 4.250706-00  |
| 23  | 0.07467220 | 27348930 | 3.414.714-01 |
| 24  | 1.3031791  | 28358308 | 3.429958-05  |



ICTP-IAEA School on FPGA-based SoC 22







Benchmark an IP is not an easy task.

Fortunately we have a custom design and an FPGA.



Benchmark an IP is not an easy task.

Fortunately we have a custom design and an FPGA.



Benchmark an IP is not an easy task.

Fortunately we have a custom design and an FPGA.



Benchmark an IP is not an easy task.

Fortunately we have a custom design and an FPGA.



Benchmark an IP is not an easy task.

Fortunately we have a custom design and an FPGA.



### Benchmark core clock cycles distributions



ICTP-IAEA School on FPGA-based SoC 22
## FPGA benchmark summary

| y |  |  |  |  |
|---|--|--|--|--|
|   |  |  |  |  |
|   |  |  |  |  |
|   |  |  |  |  |
|   |  |  |  |  |
|   |  |  |  |  |
|   |  |  |  |  |
|   |  |  |  |  |
|   |  |  |  |  |
|   |  |  |  |  |
|   |  |  |  |  |
|   |  |  |  |  |
|   |  |  |  |  |

|   | N  | single op time (us) | Register LUTs | Slice LUTs | Power | single op energy (pJ) | CPs |
|---|----|---------------------|---------------|------------|-------|-----------------------|-----|
| 1 | 2  | 0.1044              | 947           | 875        | 0.005 | 522                   | 6   |
| 2 | 4  | 0.1587              | 1457          | 1813       | 0.015 | 2380.5                | 20  |
| 3 | 8  | 0.2819              | 3131          | 4897       | 0.049 | 13813.1               | 72  |
| 4 | 13 | 0.4456              | 6422          | 12819      | 0.138 | 61492.8               | 182 |
| 5 | 16 | 0.5234              | 7950          | 15979      | 0.160 | 83744                 | 272 |
| 6 | 24 | 0.7432              | 10974         | 22669      | 0.199 | 147896.8              | 600 |

ICTP-IAEA School on FPGA-based SoC 22

## Benchmark core



ICTP-IAEA School on FPGA-based SoC 22

## Comparisons: Performace



ICTP-IAEA School on FPGA-based SoC 22

## Comparisons: Energy



ICTP-IAEA School on FPGA-based SoC 22



The BondMachine is a software ecosystem for the dynamical generation (from several HL types of origin) of computer architectures that can be synthesized of FPGA and

used as standalone devices,

as clustered devices,

and as firmware for computing accelerators.

ICTP-IAEA School on FPGA-based SoC 22



The BondMachine is a software ecosystem for the dynamical generation (from several HL types of origin) of computer architectures that can be synthesized of FPGA and

used as standalone devices,

as clustered devices,

and as firmware for computing accelerators.



ICTP-IAEA School on FPGA-based SoC 22

The BondMachine is a software ecosystem for the dynamical generation (from several HL types of origin) of computer architectures that can be synthesized of FPGA and

used as standalone devices,

as clustered devices,

and as firmware for computing accelerators.



ICTP-IAEA School on FPGA-based SoC 22

The BondMachine is a software ecosystem for the dynamical generation (from several HL types of origin) of computer architectures that can be synthesized of FPGA and

used as standalone devices,

as clustered devices,

and as firmware for computing accelerators.

ICTP-IAEA School on FPGA-based SoC 22

## CCR 2015 First ideas, 2016 Poster, 2017 Talk

InnovateFPGA 2018 Iron Award, Grand Final at Intel Campus (CA) USA Invited lectures at: "Advanced Workshop on Modern FPGA Based Technology for Scientific Computing", ICTP 2019

Invited lectures at: "NiPS Summer School 2019

- Architectures and Algorithms for
   Energy-Efficient IoT and HPC Applications"
- Golab 2018 talk and ISGC 2019 PoS
- Article published on Parallel Computing, Elsevier 2022



#### PON PHD program

ICTP-IAEA School on FPGA-based SoC 22

# CCR 2015 First ideas, 2016 Poster, 2017 Talk InnovateFPGA 2018 Iron Award, Grand Final at Intel Campus (CA) USA

Modern FPGA Based Technology for Scientific Computing", ICTP 2019

> Invited lectures at: "NiPS Summer School 2019 – Architectures and Algorithms for Energy-Efficient IoT and HPC Applications"

- Golab 2018 talk and ISGC 2019 PoS
- Article published on Parallel Computing, Elsevier 2022



### PON PHD program

ICTP-IAEA School on FPGA-based SoC 22

CCR 2015 First ideas, 2016 Poster, 2017 Talk InnovateFPGA 2018 Iron Award, Grand Final at Intel Campus (CA) USA

## Invited lectures at: "Advanced Workshop on Modern FPGA Based Technology for Scientific Computing", ICTP 2019

Invited lectures at: "NiPS Summer School 2019 – Architectures and Algorithms for Energy-Efficient IoT and HPC Applications"

- Golab 2018 talk and ISGC 2019 PoS
- Article published on Parallel Computing, Elsevier 2022



#### PON PHD program

ICTP-IAEA School on FPGA-based SoC 22

- CCR 2015 First ideas, 2016 Poster, 2017 Talk InnovateFPGA 2018 Iron Award, Grand Final at Intel Campus (CA) USA
- Invited lectures at: "Advanced Workshop on Modern FPGA Based Technology for Scientific Computing", ICTP 2019
- Invited lectures at: "NiPS Summer School 2019 – Architectures and Algorithms for Energy-Efficient IoT and HPC Applications"
- Golab 2018 talk and ISGC 2019 PoS
- Article published on Parallel Computing, Elsevier 2022

The BondMachine Toolkit Enabling Machine Learning on FPGA

#### Mirko Mariotti

Department of Physics and Geology - University of Perugia INFN Perugia

NiPS Summer School 2019 Architectures and Algorithms for Energy-Efficient IoT and HPC Applications 3-6 September 2019 - Perugia





PON PHD program

ICTP-IAEA School on FPGA-based SoC 22

- CCR 2015 First ideas, 2016 Poster, 2017 Talk InnovateFPGA 2018 Iron Award, Grand Final at Intel Campus (CA) USA
- Invited lectures at: "Advanced Workshop on Modern FPGA Based Technology for Scientific Computing", ICTP 2019
- Invited lectures at: "NiPS Summer School 2019 – Architectures and Algorithms for Energy-Efficient IoT and HPC Applications"
- Golab 2018 talk and ISGC 2019 PoS
- Article published on Parallel Computing, Elsevier 2022



PON PHD program

ICTP-IAEA School on FPGA-based SoC 22

- CCR 2015 First ideas, 2016 Poster, 2017 Talk InnovateFPGA 2018 Iron Award, Grand Final at Intel Campus (CA) USA
- Invited lectures at: "Advanced Workshop on Modern FPGA Based Technology for Scientific Computing", ICTP 2019
- Invited lectures at: "NiPS Summer School 2019
- Architectures and Algorithms for Energy-Efficient IoT and HPC Applications"
- Golab 2018 talk and ISGC 2019 PoS
- Article published on Parallel Computing, Elsevier 2022

# Parallel Computing Vulume 109, Merch 2022, 102873 Image: Computing Vulume 109, Merch 2022, 102873 The BondMachine, a moldable computer architecture Network Markets \*>A to 0, Dannet Magaket \*, Dannete Spigs \*, Lotano Storch \* > A to 0 \* A dd to Mandesi \* \* & R to mark Magaket \*, Dannete Spigs \*, Lotano Storch \* > A to 0 \* A dd to Mandesi \* \* & To 0 \* A dd to Mandesi \* \* & Spite \* 30 the Mpter/Molocy10.10160 parco.2021.102873 Cet rights and context • Codefision HW/SW of domain specific architectures via the modern GO

- language.
- Design of essential processors where only needed components are implemented.
- · Creation of heterogeneous processor systems distributed over multiple fabrics.

PON PHD program

ICTP-IAEA School on FPGA-based SoC 22

- CCR 2015 First ideas, 2016 Poster, 2017 Talk InnovateFPGA 2018 Iron Award, Grand Final at Intel Campus (CA) USA
- Invited lectures at: "Advanced Workshop on Modern FPGA Based Technology for Scientific Computing", ICTP 2019
- Invited lectures at: "NiPS Summer School 2019
- Architectures and Algorithms for Energy-Efficient IoT and HPC Applications"
- Golab 2018 talk and ISGC 2019 PoS
- Article published on Parallel Computing, Elsevier 2022
- PON PHD program

ICTP-IAEA School on FPGA-based SoC 22

## Fabrics

The HDL code for the BondMachine has been tested on these devices/system: Digilent Basys3 - Xilinx Artix-7 - Vivado Kintex7 Evaluation Board - Vivado Digilent Zedboard - Xilinx Zyng 7020 - Vivado ZC702 - Xilinx Zvng 7020 - Vivado ebaz4205 - Xilinx Zyng 7020 - Vivado Linux - Iverilog ice40lp1k icefun icebreaker icesugarnano - Lattice - Icestorm Terasic De10nano - Intel Cyclone V - Quartus Arrow Max1000 - Intel Max10 - Quartus Within the project other firmware have been written or tested: Microchip ENC28J60 Ethernet interface controller.

- Microchip ENC424J600 10/100 Base-T Ethernet interface controller.
- ESP8266 Wi-Fi chip.

ICTP-IAEA School on FPGA-based SoC 22



Real time pulse shape analysis in neutron detectors

bringing the intelligence to the edge

Test beam for space experiments (DAMPE, HERD)

increasing testbed operations efficiency

Use cases



## Machine Learning with BondMachine

Architectures with multiple interconnected processors like the ones produced by the BondMachine Toolkit are a perfect fit for Neural Networks and Computational Graphs.

Several ways to map this structures to BondMachine has been developed:

- A native Neural Network library
- A Tensorflow to BondMachine translator
- An NNEF based BondMachine composer

## Machine Learning with BondMachine

Architectures with multiple interconnected processors like the ones produced by the BondMachine Toolkit are a perfect fit for Neural Networks and Computational Graphs.

Several ways to map this structures to BondMachine has been developed:

- A native Neural Network library
- A Tensorflow to BondMachine translator
- An NNEF based BondMachine composer

## Machine Learning with BondMachine Native Neural Network library

The tool *neuralbond* allow the creation of BM-based neural chips from an API go interface.

Neurons are converted to BondMachine connecting processors.

Tensors are mapped to CP connections.



ICTP-IAEA School on FPGA-based SoC 22





ICTP-IAEA School on FPGA-based SoC 22

Machine Learning with BondMachine NNEF Composer

Neural Network Exchange Format (NNEF) is a standard from Khronos Group to enable the easy transfer of trained networks among frameworks, inference engines and devices

The NNEF BM tool approach is to descent NNEF models and build BondMachine multi-core accordingly

This approch has several advandages over the previous:

- It is not limited to a single framework
- NNEF is a textual file, so no complex operations are needed to read models



## FPGA

- Digilent Zedboard
- Soc: Zynq XC7Z020-CLG484-1
- 512 MB DDR3
- Vivado 2020.2
- 100MHz
- PYNQ 2.6 (custom build)



## BM inference: A first tentative idea

A neuron of a neural network can be seen as Connecting Processor of BM

H1

X1

X2

Х3

Χ4



#### 

 $e^{z_j}$ 

| <pre>%section softmax .romtext iomode:sync<br/>entry _start ; Entry point</pre> |
|---------------------------------------------------------------------------------|
| _start:                                                                         |
| mov r8, 0f0.0                                                                   |
| {{range \$y := intRange "0" .Params.inputs}}                                    |
| {{printf "i2r r1,i%d\n" \$y}}                                                   |
| mov r0, 0f1.0                                                                   |
| mov r2, 0f1.0                                                                   |
| mov r3, 0f1.0                                                                   |
| mov r4, 0f1.0                                                                   |
| mov r5, 0f1.0                                                                   |
| <pre>mov r7, {{\$.Params.expprec}}</pre>                                        |
| loop{{printf "%d" \$y}}:                                                        |
| multf r2, r1                                                                    |
| multf r3, r4                                                                    |
| addf r4, r5                                                                     |
| mov r6,r2<br>divf r6,r3                                                         |
| divf r6, r3                                                                     |
| addf r0, r6                                                                     |
| , i                                                                             |
| dec r7                                                                          |
| jz r7,exit{{printf "%d" \$y}}                                                   |
| j loop{{printf "%d" \$y}}                                                       |
| exit{{printf "%d" \$y}}:                                                        |
| {{\$z := atoi \$.Params.pos}}                                                   |
| {{if eq \$y \$z}}                                                               |
| mov r9, r0                                                                      |
| %endsection                                                                     |
|                                                                                 |

inputs hidden layer output layer outputs

S1

S2

Y1

Y2

ICTP-IAEA School on FPGA-based SoC 22

The BondMachine Project

-6 -4 -2 0 2 4

 $\sigma(\vec{z})_i$ 

## From idea to implementation

Starting from High Level Code, a NN model trained with **TensorFlow** and exported in a standard interpreted by **neuralbond** that converts nodes and weights of the network into a set of heterogeneous processors.



ICTP-IAEA School on FPGA-based SoC 22



A first test Dataset info:

- **Dataset name**: Banknote Authentication
- **Description**: Dataset on the distinction between genuine and counterfeit banknotes. The data was extracted from images taken from genuine and fake banknote-like samples.
- N. features: 4
- Classification: binary
- **Samples**: 1097

Neural network info: Class: Multilayer perceptron fully connected

Layers:

 An hidden layer with 1 linear neuron
 One output layer with 2 softmax neurons

Graphic representation:



ICTP-IAEA School on FPGA-based SoC 22



ICTP-IAEA School on FPGA-based SoC 22

Command > mkdir Example

ICTP-IAEA School on FPGA-based SoC 22

Command > mkdir Example Command > cd Example

ICTP-IAEA School on FPGA-based SoC 22



ICTP-IAEA School on FPGA-based SoC 22

[ Command > mkdir Example
[ Command > cd Example
[ Command > cd Example
[ Command > git clone https://github.com/BondMachineHQ/ml-zedboard.git
cd ml-zedboard
[ output >
Cloning into 'ml-zedboard'...
remote: Enumerating objects: 103, done.
remote: Compressing objects: 100% (103/103), done.
remote: Compressing objects: 100% (65/65), done.
remote: Total 103 (delta 38), reused 95 (delta 30), pack-reused 0
Receiving objects: 100% (103/103), Z.70 MiB | 6.33 MiB/s, done.
Resolving deltas: 100% (38/38), done.

[ Command > mkdir Example [ Command > cd Example [ Command > git clone https://github.com/BondMachineHQ/ml-zedboard.git cd ml-zedboard ] output > Cloning into 'ml-zedboard'... remote: Enumerating objects: 103, done. remote: Counting objects: 100% (103/103), done. remote: Total 103 (delta 38), reused 95 (delta 30), pack-reused 0 Receiving objects: 100% (103/103), 2.70 MiB | 6.33 MiB/s, done. Resolving deltas: 100% (38/38), done.

ICTP-IAEA School on FPGA-based SoC 22

remote: Enumerating objects: 103, done. remote: Counting objects: 100% (103/103), done. remote: Compressing objects: 100% (65/65), done. remote: Total 103 (delta 38), reused 95 (delta 30), pack-reused 0 Receiving objects: 100% (103/103), 2.70 MiB | 6.33 MiB/s, done. Resolving deltas: 100% (38/38), done. Command > ls -a total 69 drwx----- 8 mirko users 18 Nov 3 23:20 . drwx----- 3 mirko users 3 Nov 3 23:20 ... 12 Nov 3 23:20 .git drwx----- 7 mirko users -rw----- 1 mirko users 9548 Nov 3 23:20 README.md -rwx----- 1 mirko users 25 Nov 3 23:20 activate environment.sh -rw------ 1 mirko users 5818 Nov 3 23:20 analyze.pv -rw----- 1 mirko users 7515 Nov 3 23:20 analyze output.pv -rw----- 1 mirko users 18799 Nov 3 23:20 bmtrain.pv drwx----- 2 mirko users 11 Nov 3 23:20 images -rw----- 1 mirko users 2229 Nov 3 23:20 main.pv drwx----- 2 mirko users 3 Nov 3 23:20 notebooks drwx----- 3 mirko users 3 Nov 3 23:20 outputs 4 Nov 3 23:20 reports drwx----- 4 mirko users -rw----- 1 mirko users 69 Nov 3 23:20 requirements.txt drwx----- 2 mirko users 4 Nov 3 23:20 resources 43 Nov 3 23:20 setup\_enviroment.sh -rwx----- 1 mirko users -rw----- 1 mirko users 519 Nov 3 23:20 specifics.ison 559 Nov 3 23:20 utils.txt -rw----- 1 mirko users

| remote: Counting objects: 100% (103/103), done.                   |            |       |     |                  |       |                         |  |  |
|-------------------------------------------------------------------|------------|-------|-----|------------------|-------|-------------------------|--|--|
| remote: Compressing objects: 100% (65/65), done.                  |            |       |     |                  |       |                         |  |  |
| remote: Total 103 (delta 38), reused 95 (delta 30), pack-reused 0 |            |       |     |                  |       |                         |  |  |
| Receiving objects: 100% (103/103), 2.70 MiB   6.33 MiB/s, done.   |            |       |     |                  |       |                         |  |  |
| Resolving deltas: 100% (38/38), done.                             |            |       |     |                  |       |                         |  |  |
| [ Command > ls -al                                                |            |       |     |                  |       |                         |  |  |
| [ Output >                                                        |            |       |     |                  |       |                         |  |  |
| total 69                                                          |            |       |     |                  |       |                         |  |  |
| drwx 8 mi                                                         | irko users | 18    | Nov | 3                | 23:20 |                         |  |  |
| drwx 3 mi                                                         | irko users | 3     | Nov | 3                | 23:20 |                         |  |  |
| drwx 7 mi                                                         | irko users | 12    | Nov | з                | 23:20 | .git                    |  |  |
| -rw 1 mi                                                          | irko users | 9548  | Nov | 3                | 23:20 | README.md               |  |  |
| -rwx 1 mi                                                         | irko users | 25    | Nov | 3                | 23:20 | activate_environment.sh |  |  |
| -rw 1 mi                                                          | irko users | 5818  | Nov | 3                | 23:20 | analyze.py              |  |  |
| -rw 1 mi                                                          | irko users | 7515  | Nov | з                | 23:20 | analyze_output.py       |  |  |
| -rw 1 mi                                                          | irko users | 18799 | Nov | з                | 23:20 | bmtrain.py              |  |  |
| drwx 2 mi                                                         | irko users | 11    | Nov | 3                | 23:20 | images                  |  |  |
| -rw 1 mi                                                          | irko users | 2229  | Nov | 3                | 23:20 | main.py                 |  |  |
| drwx 2 mi                                                         | irko users | 3     | Nov | 3                | 23:20 | notebooks               |  |  |
| drwx 3 mi                                                         | irko users | 3     | Nov | з                | 23:20 | outputs                 |  |  |
| drwx 4 mi                                                         | irko users | 4     | Nov | 3                | 23:20 | reports                 |  |  |
| -rw 1 mi                                                          | irko users | 69    | Nov | з                | 23:20 | requirements.txt        |  |  |
| drwx 2 mi                                                         | irko users | 4     | Nov | 3                | 23:20 | resources               |  |  |
| -rwx 1 mi                                                         | irko users | 43    | Nov | 3                | 23:20 | setup_enviroment.sh     |  |  |
| -rw 1 mi                                                          | irko users | 519   | Nov | 3                | 23:20 | specifics.json          |  |  |
| -rw 1 mi                                                          | irko users | 559   | Nov | 3                | 23:20 | utils.txt               |  |  |
| [ Command > conda createname ml-zedboard                          |            |       |     | -y python==3.8.0 |       |                         |  |  |
|                                                                   |            |       |     |                  |       |                         |  |  |
| libstdcxx-ng | pkgs/main/linux-64::libstdcxx-ng-11.2.0-h1234567_1 None   |
|--------------|-----------------------------------------------------------|
| ncurses      | pkgs/main/linux-64::ncurses-6.3-h5eee18b_3 None           |
| openssl      | pkgs/main/linux-64::openssl-1.1.1q-h7f8727e_0 None        |
| pip          | pkgs/main/linux-64::pip-22.2.2-py38h06a4308_0 None        |
| python       | pkgs/main/linux-64::python-3.8.0-h0371630_2 None          |
| readline     | pkgs/main/linux-64::readline-7.0-h7b6447c_5 None          |
| setuptools   | pkgs/main/linux-64::setuptools-65.5.0-py38h06a4308_0 None |
| sqlite       | pkgs/main/linux-64::sqlite-3.33.0-h62c20be_0 None         |
| tk           | pkgs/main/linux-64::tk-8.6.12-h1ccaba5_0 None             |
| wheel        | pkgs/main/noarch::wheel-0.37.1-pyhd3eb1b0_0 None          |
| xz           | pkgs/main/linux-64::xz-5.2.6-h5eee18b_0 None              |
| zlib         | pkgs/main/linux-64::zlib-1.2.13-h5eee18b_0 None           |
|              |                                                           |

Preparing transaction: done Verifying transaction: done Executing transaction: done

To activate this environment, use

\$ conda activate ml-zedboard

To deactivate an active environment, use

\$ conda deactivate

Retrieving notices: ...working... done

ICTP-IAEA School on FPGA-based SoC 22

| ncurses    | pkgs/main/linux-64::ncurses-6.3-h5eee18b_3 None           |
|------------|-----------------------------------------------------------|
| openssl    | pkgs/main/linux-64::openssl-1.1.1q-h7f8727e_0 None        |
| pip        | pkgs/main/linux-64::pip-22.2.2-py38h06a4308_0 None        |
| python     | pkgs/main/linux-64::python-3.8.0-h0371630_2 None          |
| readline   | pkgs/main/linux-64::readline-7.0-h7b6447c_5 None          |
| setuptools | pkgs/main/linux-64::setuptools-65.5.0-py38h06a4308_0 None |
| sqlite     | pkgs/main/linux-64::sqlite-3.33.0-h62c20be_0 None         |
| tk         | pkgs/main/linux-64::tk-8.6.12-h1ccaba5_0 None             |
| wheel      | pkgs/main/noarch::wheel-0.37.1-pyhd3eb1b0_0 None          |
| xz         | pkgs/main/linux-64::xz-5.2.6-h5eee18b_0 None              |
| zlib       | pkgs/main/linux-64::zlib-1.2.13-h5eee18b_0 None           |
|            |                                                           |

Preparing transaction: done Verifying transaction: done Executing transaction: done

To activate this environment, use

\$ conda activate ml-zedboard

To deactivate an active environment, use

\$ conda deactivate

Retrieving notices: ...working... done [ Command > conda activate ml-zedboard

| ncurses    | pkgs/main/linux-64::ncurses-6.3-h5eee18b_3 None           |
|------------|-----------------------------------------------------------|
| openssl    | pkgs/main/linux-64::openssl-1.1.1q-h7f8727e_0 None        |
| pip        | pkgs/main/linux-64::pip-22.2.2-py38h06a4308_0 None        |
| python     | pkgs/main/linux-64::python-3.8.0-h0371630_2 None          |
| readline   | pkgs/main/linux-64::readline-7.0-h7b6447c_5 None          |
| setuptools | pkgs/main/linux-64::setuptools-65.5.0-py38h06a4308_0 None |
| sqlite     | pkgs/main/linux-64::sqlite-3.33.0-h62c20be_0 None         |
| tk         | pkgs/main/linux-64::tk-8.6.12-h1ccaba5_0 None             |
| wheel      | pkgs/main/noarch::wheel-0.37.1-pyhd3eb1b0_0 None          |
| xz         | pkgs/main/linux-64::xz-5.2.6-h5eee18b_0 None              |
| zlib       | pkgs/main/linux-64::zlib-1.2.13-h5eee18b_0 None           |
|            |                                                           |

Preparing transaction: done Verifying transaction: done Executing transaction: done

To activate this environment, use

\$ conda activate ml-zedboard

To deactivate an active environment, use

\$ conda deactivate

Retrieving notices: ...working... done [ Command > conda activate ml-zedboard

ICTP-IAEA School on FPGA-based SoC 22

| openssl              | pkgs/main/linux-64::openssl-1.1.1q-h7f8727e_0 None        |
|----------------------|-----------------------------------------------------------|
| pip                  | pkgs/main/linux-64::pip-22.2.2-py38h06a4308_0 None        |
| python               | pkgs/main/linux-64::python-3.8.0-h0371630_2 None          |
| readline             | pkgs/main/linux-64::readline-7.0-h7b6447c_5 None          |
| setuptools           | pkgs/main/linux-64::setuptools-65.5.0-py38h06a4308_0 None |
| sqlite               | pkgs/main/linux-64::sqlite-3.33.0-h62c20be_0 None         |
| tk                   | pkgs/main/linux-64::tk-8.6.12-h1ccaba5_0 None             |
| wheel                | pkgs/main/noarch::wheel-0.37.1-pyhd3eb1b0_0 None          |
| xz                   | pkgs/main/linux-64::xz-5.2.6-h5eee18b_0 None              |
| zlib                 | pkgs/main/linux-64::zlib-1.2.13-h5eee18b_0 None           |
|                      |                                                           |
|                      |                                                           |
| Preparing transactio |                                                           |
| Verifying transactio | n: done                                                   |
| Executing transactio | n: done                                                   |
| #                    |                                                           |
| # To activate this e | nvironment, use                                           |
| #                    |                                                           |
| # \$ conda activa    | te ml-zedboard                                            |
| #                    |                                                           |
| # To deactivate an a | ctive environment, use                                    |
| #                    |                                                           |
| # \$ conda deacti    | vate                                                      |
|                      |                                                           |
| Retrieving notices:  |                                                           |
|                      | tivate ml-zedboard                                        |
| Command > pip3 ins   | tall -r requirements.txt                                  |
|                      |                                                           |

#### Collecting MarkupSafe>=2.1.1

Using cached MarkupSafe-2.1.1-cp38-cp38-manylinux\_2\_17\_x86\_64.manylinux2014\_x86\_64.whl (25 kB) Collecting zipp>=0.5

Using cached zipp-3.10.0-py3-none-any.whl (6.2 kB)

Collecting pyasn1<0.5.0,>=0.4.6

Using cached pyasn1-0.4.8-py2.py3-none-any.whl (77 kB)

Collecting oauthlib>=3.0.0

Using cached oauthlib-3.2.2-py3-none-any.whl (151 kB)

Installing collected packages: tensorboard-plugin-wit, pytz, pyasn1, libclang, keras, flatbuffers, z ipp, xlrd, wrapt, urllib3, typing-extensions, threadpoolctl, termcolor, tensorflow-io-gcs-filesystem , tensorflow-estimator, tensorboard-data-server, six, rsa, pyyaml, pyparsing, pyasn1-modules, protob uf, pillow, oauthlib, numpy, networkx, MarkupSafe, kiwisolver, joblib, idna, gast, fonttools, cycler , charset-normalizer, cachetools, absl-py, werkzeug, scipy, requests, python-dateutil, pydot, packag ing, opt-einsum, onnx, keras-preprocessing, importlib-metadata, h5py, grpcio, google-pasta, google-a uth, contourpy, astunparse, scikit-learn, requests-oauthlib, pandas, matplotlib, markdown, hls4ml, s klearn, google-auth-oauthlib, tensorboard, tensorflow

Successfully installed MarkupSafe-2.1.1 absl-py-1.3.0 astunparse-1.6.3 cachetools-5.2.0 charset-norm alizer-2.1.1 contourpy-1.0.6 cycler-0.11.0 flatbuffers-22.10.26 fonttools-4.38.0 gast-0.4.0 google-a uth-2.14.0 google-auth-oauthlib-0.4.6 google-pasta-0.2.0 grpcio-1.50.0 h5py-3.7.0 h1s4m1-0.6.0 idna-3.4 importlib-metadata-5.0.0 joblib-1.2.0 keras-2.10.0 keras-preprocessing-1.1.2 kiwisolver-1.4.4 li bclang-14.0.6 markdown-3.4.1 matplotlib-3.6.2 networkx-2.8.8 numpy-1.23.4 oauthlib-3.2.2 onnx-1.12.0 opt-einsum-3.3.0 packaging-21.3 pandas-1.5.1 pillow-9.3.0 protobuf-3.19.6 pyasn1-0.4.8 pyasn1-modul es-0.2.8 pydot-1.4.2 pyparsing-3.0.9 python-dateutil-2.8.2 pytz-2022.6 pyyaml-6.0 requests-2.28.1 re quests-oauthlib-1.3.1 rsa-4.9 scikit-learn-1.1.3 scipy-1.9.3 six-1.16.0 sklearn-0.0 tensorboard-2.10 .1 tensorboard-data-server-0.6.1 tensorboard-plugin-wit-1.8.1 tensorflow-2.10.0 tensorflow-estimator -2.10.0 tensorflow-io-gcs-filesystem-0.27.0 termcolor-2.1.0 threadpoolctl-3.1.0 typing-extensions-4.

Using cached MarkupSafe-2.1.1-cp38-cp38-manylinux\_2\_17\_x86\_64.manylinux2014\_x86\_64.whl (25 kB) Collecting zipp>=0.5

Using cached zipp-3.10.0-py3-none-any.whl (6.2 kB)

Collecting pyasn1<0.5.0,>=0.4.6

Using cached pyasn1-0.4.8-py2.py3-none-any.whl (77 kB)

Collecting oauthlib>=3.0.0

Using cached oauthlib-3.2.2-py3-none-any.whl (151 kB)

Installing collected packages: tensorboard-plugin-wit, pytz, pyasn1, libclang, keras, flatbuffers, z ipp, xlrd, wrapt, urllib3, typing-extensions, threadpoolctl, termcolor, tensorflow-io-gcs-filesystem , tensorflow-estimator, tensorboard-data-server, six, rsa, pyyaml, pyparsing, pyasn1-modules, protob uf, pillow, oauthlib, numpy, networkx, MarkupSafe, kiwisolver, joblib, idna, gast, fonttools, cycler , charset-normalizer, cachetools, absl-py, werkzeug, scipy, requests, python-dateutil, pydot, packag ing, opt-einsum, onnx, keras-preprocessing, importlib-metadata, h5py, grpcio, google-pasta, google-a uth, contourpy, astunparse, scikit-learn, requests-oauthlib, pandas, matplotlib, markdown, hls4ml, s klearn, google-auth-oauthlib, tensorboard, tensorflow

Successfully installed MarkupSafe-2.1.1 absl-py-1.3.0 astunparse-1.6.3 cachetools-5.2.0 charset-norm alizer-2.1.1 contourpy-1.0.6 cycler-0.11.0 flatbuffers-22.10.26 fonttools-4.38.0 gast-0.4.0 google-a uth-2.14.0 google-auth-oauthlib-0.4.6 google-pasta-0.2.0 grpcio-1.50.0 hByp-3.7.0 hls4ml-0.6.0 idna-3.4 importlib-metadata-5.0.0 joblib-1.2.0 keras-2.10.0 keras-preprocessing-1.1.2 kiwisolver-1.4.4 li belang-14.0.6 markdown-3.4.1 matplotlib-3.6.2 networkx-2.8.8 numpy-1.23.4 oauthlib-3.2.2 onnx-1.12.0 opt-einsum-3.3.0 packaging-21.3 pandas-1.5.1 pillow-9.3.0 protobuf-3.19.6 pyasn1-0.4.8 pyasn1-modul es-0.2.8 pydot-1.4.2 pyparsing-3.0.9 python-dateutil-2.8.2 pytz-2022.6 pyyaml-6.0 requests-2.28.1 re quests-oauthlib-1.3.1 rsa-4.9 scikit-learn-1.1.3 scipy-1.9.3 six-1.16.0 sklearn-0.0 tensorboard-2.10 1 tensorboard-data-server-0.6.1 tensorboard-plugin-wit-1.8.1 tensorflow-2.10.0 tensorflow-estimator -2.10.0 tensorflow-io-gcs-filesystem-0.27.0 termcolor-2.1.0 threadpoolctl-3.1.0 typing-extensions-4. 4.0 urllib3-1.26.12 werkzeug-2.2.2 wrapt-1.14.1 xlrd-2.0.1 zipp-3.10.0 [ Command > ls -all main.py bmtrain.py banknote-authentication\*

ICTP-IAEA School on FPGA-based SoC 22

#### Using cached pyasn1-0.4.8-py2.py3-none-any.whl (77 kB) Collecting oauthlib>=3.0.0

Using cached oauthlib-3.2.2-py3-none-any.whl (151 kB)

Installing collected packages: tensorboard-plugin-wit, pytz, pyasn1, libclang, keras, flatbuffers, z ipp, xlrd, wrapt, urllib3, typing-extensions, threadpoolctl, termcolor, tensorflow-io-gcs-filesystem , tensorflow-estimator, tensorboard-data-server, six, rsa, pyyaml, pyparsing, pyasn1-modules, protob uf, pillow, oauthlib, numpy, networkx, MarkupSafe, kiwisolver, joblib, idna, gast, fonttools, cycler , charset-normalizer, cachetools, absl-py, werkzeug, scipy, requests, python-dateutil, pydot, packag ing, opt-einsum, onnx, keras-preprocessing, importlib-metadata, h5py, grpcio, google-pasta, google-a uth, contourpy, astunparse, scikit-learn, requests-oauthlib, pandas, matplotlib, markdown, hls4ml, s klearn, google-auth-oauthlib, tensorboard, tensorflow

Successfully installed MarkupSafe-2.1.1 absl-py-1.3.0 astunparse-1.6.3 cachetools-5.2.0 charset-norm alizer-2.1.1 contourpy-1.0.6 cycler-0.11.0 flatbuffers-22.10.26 fonttools-4.38.0 gast-0.4.0 google-a uth-2.14.0 google-auth-oauthlib-0.4.6 google-pasta-0.2.0 grpcio-1.50.0 h5py-3.7.0 hls4ml-0.6.0 idna-3.4 importlib-metadata-5.0.0 joblib-1.2.0 keras-2.10.0 keras-preprocessing-1.1.2 kiwisolver-1.4.4 li bclang-14.0.6 markdown-3.4.1 matplotlib-3.6.2 networkx-2.8.8 numpy-1.23.4 oauthlib-3.2.2 onnx-1.12.0 opt-einsum-3.3.0 packaging-21.3 pandas-1.5.1 pillow-9.3.0 protobuff-3.19.6 pyasn1-0.4.8 pyasn1-modul es-0.2.8 pydot-1.4.2 pyparsing-3.0.9 python-dateutil-2.8.2 pytz-2022.6 pyyaml-6.0 requests-2.28.1 re quests-oauthlib-1.3.1 rsa-4.9 scikit-learn-1.1.3 scipy-1.9.3 six-1.16.0 sklearn-0.0 tensorboard-dzt. 1 tensorboard-data-server-0.6.1 tensorboard-plugin-wit-1.8.1 tensorflow-2.10.0 tensorflow-estimator -2.10.0 tensorflow-io-gcs-filesystem-0.27.0 termcolor-2.1.0 threadpoolctl-3.1.0 typing-extensions-4. 4.0 urllib3-1.26.12 werkzeug-2.2.2 wrapt-1.14.1 xlrd-2.0.1 zipp-3.10.0 f Command > ls -a.1 main.py bmtrain.py banknote-authentication\*

Output >

ls: cannot access 'banknote-authentication\*': No such file or directory

-rw------ 1 mirko users 18799 Nov 3 23:20 bmtrain.py

-rw----- 1 mirko users 2229 Nov 3 23:20 main.py

ICTP-IAEA School on FPGA-based SoC 22

Installing collected packages: tensorboard-plugin-wit, pytz, pyasn1, libclang, keras, flatbuffers, z ipp, xlrd, wrapt, urllib3, typing-extensions, threadpoolctl, termcolor, tensorflow-io-gcs-filesystem tensorflow-estimator, tensorboard-data-server, six, rsa, pyyaml, pyparsing, pyasn1-modules, protob uf, pillow, oauthlib, numpy, networkx, MarkupSafe, kiwisolver, joblib, idna, gast, fonttools, cvcler charset-normalizer, cachetools, absl-py, werkzeug, scipy, requests, python-dateutil, pydot, packag ing, opt-einsum, onnx, keras-preprocessing, importlib-metadata, h5py, grpcio, google-pasta, google-a uth, contourpy, astunparse, scikit-learn, requests-oauthlib, pandas, matplotlib, markdown, hls4ml, s klearn, google-auth-oauthlib, tensorboard, tensorflow Successfully installed MarkupSafe-2.1.1 absl-py-1.3.0 astunparse-1.6.3 cachetools-5.2.0 charset-norm alizer-2.1.1 contourpy-1.0.6 cycler-0.11.0 flatbuffers-22.10.26 fonttools-4.38.0 gast-0.4.0 google-a uth-2.14.0 google-auth-oauthlib-0.4.6 google-pasta-0.2.0 grpcio-1.50.0 h5pv-3.7.0 hls4ml-0.6.0 idna-3.4 importlib-metadata-5.0.0 joblib-1.2.0 keras-2.10.0 keras-preprocessing-1.1.2 kiwisolver-1.4.4 li bclang-14.0.6 markdown-3.4.1 matplotlib-3.6.2 networkx-2.8.8 numpy-1.23.4 oauthlib-3.2.2 onnx-1.12.0 opt-einsum-3.3.0 packaging-21.3 pandas-1.5.1 pillow-9.3.0 protobuf-3.19.6 pvasn1-0.4.8 pvasn1-modul es-0.2.8 pvdot-1.4.2 pvparsing-3.0.9 pvthon-dateutil-2.8.2 pvtz-2022.6 pvvaml-6.0 requests-2.28.1 re guests-oauthlib-1.3.1 rsa-4.9 scikit-learn-1.1.3 scipv-1.9.3 six-1.16.0 sklearn-0.0 tensorboard-2.10 .1 tensorboard-data-server-0.6.1 tensorboard-plugin-wit-1.8.1 tensorflow-2.10.0 tensorflow-estimator -2.10.0 tensorflow-io-gcs-filesystem-0.27.0 termcolor-2.1.0 threadpoolctl-3.1.0 typing-extensions-4. 4.0 urllib3-1.26.12 werkzeug-2.2.2 wrapt-1.14.1 xlrd-2.0.1 zipp-3.10.0 Command > ls - al main.pv bmtrain.pv banknote-authentication\*ls: cannot access 'banknote-authentication\*': No such file or directory -rw----- 1 mirko users 18799 Nov 3 23:20 bmtrain.pv -rw----- 1 mirko users 2229 Nov 3 23:20 main.pv Command > export PYTHONPATH=/tmp/tmpti5 gk0p/Example/ml-zedboard python-inspect -m bmtrain -o build model 2> /dev/null | pygmentize -l python | head -n 20

```
Command >
export PYTHONPATH=/tmp/tmptj5 gk0p/Example/ml-zedboard
python-inspect -m bmtrain -o build model 2> /dev/null | pygmentize -l python | head -n 20
    def build model(self):
        if self.nn_model_type == "MLP":
            self.model = Sequential()
            self.parse_network_specifics()
            if self.network_spec == None:
                 for i in range(0, 24, 3):
                     self.model.add(Dense(i, input_shape=(self.X_train_val.shape[1],)))
                 for i in reversed(range(0, 24, 3)):
                     self.model.add(Dense(i, input shape=(self.X train val.shape[1],)))
                 opt = Adam(lr=0.0001)
                 arch = self.network_spec["network"]["arch"]
                 for i in range(0, len(arch)):
                     layer_name = self.network_spec["network"]["arch"][i]["layer_name"]
activation_function = self.network_spec["network"]["arch"][i]["activation_functin"]
   1
                     neurons = self.network_spec["network"]["arch"][i]["neurons"]
                     if i == 0:
Exception ignored in: < io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
BrokenPipeError: [Errno 32] Broken pipe
```

```
def build model(self):
       if self.nn model type == "MLP":
            self.model = Sequential()
            self.parse_network_specifics()
           if self.network spec == None:
                for i in range(0, 24, 3):
                    self.model.add(Dense(i, input_shape=(self.X_train_val.shape[1],)))
                for i in reversed(range(0, 24, 3)):
                    self.model.add(Dense(i, input_shape=(self.X_train_val.shape[1],)))
               opt = Adam(lr=0.0001)
                arch = self.network spec["network"]["arch"]
                for i in range(0, len(arch)):
                    layer_name = self.network_spec["network"]["arch"][i]["layer_name"]
                   activation_function = self.network_spec["network"]["arch"][i]["activation_functi
  "1
                   neurons = self.network_spec["network"]["arch"][i]["neurons"]
                   if i == 0:
Exception ignored in: < io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
BrokenPipeError: [Errno 32] Broken pipe
 Command >
export PYTHONPATH=/tmp/tmptj5 gk0p/Example/ml-zedboard
python-inspect -m bmtrain -o dump ison for bondmachine 2> /dev/null | pygmentize -l python | head -n
20
```

```
Command >
```

```
export PYTHONPATH=/tmp/tmptj5 gk0p/Example/ml-zedboard
python-inspect -m bmtrain -o dump ison for bondmachine 2> /dev/null | pygmentize -l python | head -n
20
   def dump_json_for_bondmachine(self):
       lavers = self.model.lavers
       weights = self.model.weights
       to_dump = \{\}
       weights = []
       nodes = [1]
       # save weigths
       for i in range(0 , len(layers)):
            layer_weights = layers[i] get_weights()
            for m in range(0, len(layer_weights)):
                for w in range(0, len(laver_weights[m])):
                        for v in range(0, len(layer_weights[m][w])):
Exception ignored in: < io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
BrokenPipeError: [Errno 32] Broken pipe
```

```
export PYTHONPATH=/tmp/tmptj5_gk0p/Example/ml-zedboard
python-inspect -m bmtrain -o dump ison for bondmachine 2> /dev/null | pygmentize -l python | head -n
20
   def dump_json_for_bondmachine(self):
       lavers = self.model.lavers
       weights = self model weights
       to dump = \{\}
       weights = []
       nodes = []
       # save weigths
       for i in range(0 . len(lavers)):
            laver weights = lavers[i] get weights()
            for m in range(0, len(layer_weights)):
                for w in range(0, len(layer_weights[m])):
                        for v in range(0, len(layer_weights[m][w])):
Exception ignored in: < io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
BrokenPipeError: [Errno 32] Broken pipe
 Command > python3 main.py -- dataset banknote-authentication -m MLP
```

#### .4695 - val acc: 0.9636 \*\*\* dump model # INFO: Training finished, saved model path: models/banknote-authentication KERAS model.h5 Model: "sequential" Laver (type) **Output Shape** Param # \_\_\_\_\_ dense (Dense) (None, <u>1</u>) 5 dense 1 (Dense) (None, 2) 4

Total params: 9 Trainable params: 9 Non-trainable params: 0

None

/tools/Conda/envs/ml-zedboard/lib/python3.8/site-packages/keras/engine/training\_v1.py:2356: UserWarn ing: `Model.state\_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.

updates=self.state\_updates,

Sofware predicions have been exported in CSV (path is: datasets/banknote-authentication\_swprediction .csv)

#### # INFO: Accuracy is 0.9454545454545454

Model has been exported in JSON for Bondmachine (path is: models/banknote-authentication/modelBM.jso n)

| Model: "sequential"                                                               |              |         |
|-----------------------------------------------------------------------------------|--------------|---------|
| Layer (type)                                                                      | Output Shape | Param # |
| dense (Dense)                                                                     | (None, 1)    | 5       |
| dense_1 (Dense)                                                                   | (None, 2)    | 4       |
| ===============================<br>Trainable params: 9<br>Non-trainable params: 0 |              |         |

#### None

/tools/Conda/envs/ml-zedboard/lib/python3.8/site-packages/keras/engine/training\_v1.py:2356: UserWarn ing: `Model.state\_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.

updates=self.state\_updates,

Sofware predicions have been exported in CSV (path is: datasets/banknote-authentication\_swprediction .csv)

#### # INFO: Accuracy is 0.945454545454545454

Model has been exported in JSON for Bondmachine (path is: models/banknote-authentication/modelBM.jso

#### Command >

cp models/banknote-authentication/modelBM.json /tmp/modelBM.json

cp datasets/banknote-authentication\_swprediction.csv /tmp/sw.csv

cp datasets/banknote-authentication\_sample.csv /tmp/sample.csv

| Layer (type)                                                      | Output | Shape | Param # |
|-------------------------------------------------------------------|--------|-------|---------|
| dense (Dense)                                                     | (None, | 1)    | 5       |
| dense_1 (Dense)                                                   | (None, | 2)    | 4       |
| Total params: 9<br>Trainable params: 9<br>Non-trainable params: 0 |        |       |         |

#### None

/tools/Conda/envs/ml-zedboard/lib/python3.8/site-packages/keras/engine/training\_v1.py:2356: UserWarn ing: `Model.state\_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.

updates=self.state\_updates,

Sofware predicions have been exported in CSV (path is: datasets/banknote-authentication\_swprediction .csv)

#### # INFO: Accuracy is 0.945454545454545454

Model has been exported in JSON for Bondmachine (path is: models/banknote-authentication/modelBM.jso n)

#### Command >

cp models/banknote-authentication/modelBM.json /tmp/modelBM.json

cp datasets/banknote-authentication\_swprediction.csv /tmp/sw.csv

cp datasets/banknote-authentication\_sample.csv /tmp/sample.csv

#### Output

ICTP-IAEA School on FPGA-based SoC 22

| Layer   | (type)                                          | Output | Shape | Param # |
|---------|-------------------------------------------------|--------|-------|---------|
| dense   | (Dense)                                         | (None, | 1)    | 5       |
| dense_  | 1 (Dense)                                       | (None, | 2)    | 4       |
| Trainab | params: 9<br>Dle params: 9<br>Linable params: 0 |        |       |         |

#### None

/tools/Conda/envs/ml-zedboard/lib/python3.8/site-packages/keras/engine/training\_v1.py:2356: UserWarn ing: `Model.state\_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.

updates=self.state\_updates,

Sofware predicions have been exported in CSV (path is: datasets/banknote-authentication\_swprediction .csv)

#### # INFO: Accuracy is 0.9454545454545454

Model has been exported in JSON for Bondmachine (path is: models/banknote-authentication/modelBM.jso n)

#### Command >

cp models/banknote-authentication/modelBM.json /tmp/modelBM.json

cp datasets/banknote-authentication\_swprediction.csv /tmp/sw.csv

cp datasets/banknote-authentication\_sample.csv /tmp/sample.csv

Output

Command > conda deactivate ; conda env remove --name ml-zedboard

| Layer   | (type)                                          | Output | Shape | Param # |
|---------|-------------------------------------------------|--------|-------|---------|
| dense   | (Dense)                                         | (None, | 1)    | 5       |
| dense_  | 1 (Dense)                                       | (None, | 2)    | 4       |
| Trainab | params: 9<br>Dle params: 9<br>Linable params: 0 |        |       |         |

#### None

/tools/Conda/envs/ml-zedboard/lib/python3.8/site-packages/keras/engine/training\_v1.py:2356: UserWarn ing: `Model.state\_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.

updates=self.state\_updates,

Sofware predicions have been exported in CSV (path is: datasets/banknote-authentication\_swprediction .csv)

#### # INFO: Accuracy is 0.9454545454545454

Model has been exported in JSON for Bondmachine (path is: models/banknote-authentication/modelBM.jso n)

#### Command >

cp models/banknote-authentication/modelBM.json /tmp/modelBM.json

cp datasets/banknote-authentication\_swprediction.csv /tmp/sw.csv

cp datasets/banknote-authentication\_sample.csv /tmp/sample.csv

Output

Command > conda deactivate ; conda env remove --name ml-zedboard



BM hardware and the BM simulation

sw.csv is the software predictions over that dataset and will be used to check the BM inference probabilities and predictions

modelBM.json is the trained network that will use as BM source in the next demo

ICTP-IAEA School on EPGA-based SoC 22



ICTP-IAEA School on FPGA-based SoC 22

Command > mkdir Example

ICTP-IAEA School on FPGA-based SoC 22

Command > mkdir Example Command > cd Example



ICTP-IAEA School on FPGA-based SoC 22

[ Command > mkdir Example
[ Command > cd Example
[ Command > bmhetper create --project\_name mlinfn --board zedboard --project\_type neural\_network --n
\_inputs 4 --n\_outputs 3 --source\_neuralbond banknote.json

ICTP-IAEA School on FPGA-based SoC 22

[ Command > mkdir Example
[ Command > cd Example
[ Command > bmhetper create --project\_name mlinfn --board zedboard --project\_type neural\_network --n
\_inputs 4 --n\_outputs 3 --source\_neuralbond banknote.json

ICTP-IAEA School on FPGA-based SoC 22



| ГС  | command > | b | mhelpe | er crea | ate   | oroie | ct | name r | <pre>nlinfnboard zedboardproject_type neural networkn</pre> |
|-----|-----------|---|--------|---------|-------|-------|----|--------|-------------------------------------------------------------|
|     |           |   |        |         |       |       |    |        | banknote.json                                               |
|     | command > |   |        |         |       |       |    |        |                                                             |
|     | proj_mli  |   | 'n     |         |       |       |    |        |                                                             |
|     | -al       |   |        |         |       |       |    |        |                                                             |
| 1 0 | utput >   |   |        |         |       |       |    |        |                                                             |
| tot | al 10     |   |        |         |       |       |    |        |                                                             |
| dry | X         | 3 | mirko  | users   | 20    | Nov   | 2  | 22:21  |                                                             |
| drv | IX        | 3 | mirko  | users   | 17    | Nov   | 2  | 22:21  |                                                             |
| -rv |           | 1 | mirko  | users   | 44179 | Nov   | 2  | 22:21  | Makefile                                                    |
| -rv |           | 1 | mirko  | users   | 397   | Nov   | 2  | 22:21  | authorized_keys                                             |
| -rv |           | 1 | mirko  | users   |       |       |    |        | banknote.json                                               |
| -rv |           | 1 | mirko  | users   |       |       |    |        | bmapi.json                                                  |
| -rv |           | 1 | mirko  | users   | 242   | Nov   | 2  | 22:21  | bmapi.mk                                                    |
| -rv | /         | 1 | mirko  | users   | 1351  | Nov   | 2  | 22:21  | bminfo.json                                                 |
| -rv | /         | 1 | mirko  | users   | 130   | Nov   | 2  | 22:21  | buildroot.mk                                                |
| -rv |           | 1 | mirko  | users   | 129   | Nov   | 2  | 22:21  | crosscompile.mk                                             |
| -rv | /X        | 1 | mirko  | users   | 3613  | Nov   | 2  | 22:21  | deploy_jupyter_board.py                                     |
| -rv | 1         | 1 | mirko  | users   | 495   | Nov   | 2  | 22:21  | local.mk                                                    |
| -rv |           | 1 | mirko  | users   | 145   | Nov   | 2  | 22:21  | neuralbondconfig.json                                       |
| drv | /x        | 2 | mirko  | users   | 20    | Nov   | 2  | 22:21  | neurons                                                     |
| -rv |           | 1 | mirko  | users   | 145   | Nov   | 2  | 22:21  | simbatch.mk                                                 |
| -rv | /x        | 1 | mirko  | users   | 4059  | Nov   | 2  | 22:21  | simbatch.py                                                 |
| -rv |           | 1 | mirko  | users   | 24057 | Nov   | 2  | 22:21  | simbatch_input.csv                                          |
| -rv |           | 1 | mirko  | users   | 1100  | Nov   | 2  | 22:21  | sumapp.go                                                   |
| -rv |           | 1 | mirko  | users   | 21319 | Nov   | 2  | 22:21  | zedboard.xdc                                                |
| -rv | 1         | 1 | mirko  | users   | 53    | Nov   | 2  | 22:21  | zedboard_maps.json                                          |
|     |           |   |        |         |       |       |    |        |                                                             |

ICTP-IAEA School on FPGA-based SoC 22

| _inputs 4   |    | n_outpu | its 3 | sour  | ce_ne | ura | lbond | banknote.json          |
|-------------|----|---------|-------|-------|-------|-----|-------|------------------------|
| [ Command 3 |    |         |       |       |       |     |       |                        |
| cd proj_ml: | in | fn      |       |       |       |     |       |                        |
| ls -al      |    |         |       |       |       |     |       |                        |
| [ Output :  |    |         |       |       |       |     |       |                        |
| total 10    |    |         |       |       |       |     |       |                        |
| drwx        | 3  | mirko   | users | 20    | Nov   | 2   | 22:21 |                        |
| drwx        | 3  | mirko   | users | 17    | Nov   | 2   | 22:21 |                        |
| -rw         | 1  | mirko   | users | 44179 | Nov   | 2   | 22:21 | Makefile               |
| -rw         | 1  | mirko   | users | 397   | Nov   | 2   | 22:21 | authorized_keys        |
| -rw         | 1  | mirko   | users | 1962  | Nov   |     |       | banknote.json          |
| -rw         | 1  | mirko   | users | 150   | Nov   | 2   | 22:21 | bmapi.json             |
| -rw         | 1  | mirko   | users | 242   | Nov   |     |       | bmapi.mk               |
| -rw         | 1  | mirko   | users | 1351  | Nov   |     |       | bminfo.json            |
| -rw         | 1  | mirko   | users | 130   | Nov   |     |       | buildroot.mk           |
| -rw         | 1  | mirko   | users | 129   | Nov   | 2   | 22:21 | crosscompile.mk        |
| -rwx        | 1  | mirko   | users | 3613  | Nov   |     |       | deploy_jupyter_board.p |
| -rw         | 1  | mirko   | users | 495   | Nov   |     |       | local.mk               |
| -rw         | 1  | mirko   | users | 145   | Nov   | 2   | 22:21 | neuralbondconfig.json  |
| drwx        | 2  | mirko   | users | 20    | Nov   |     |       | neurons                |
| -rw         | 1  | mirko   | users | 145   | Nov   | 2   | 22:21 | simbatch.mk            |
| -rwx        | 1  | mirko   | users | 4059  | Nov   | 2   | 22:21 | simbatch.py            |
| -rw         | 1  | mirko   | users | 24057 | Nov   |     |       | simbatch_input.csv     |
| -rw         | 1  | mirko   | users | 1100  | Nov   | 2   | 22:21 | sumapp.go              |
| -rw         | 1  | mirko   | users | 21319 | Nov   | 2   | 22:21 | zedboard.xdc           |
| -rw         | 1  | mirko   | users | 53    | Nov   | 2   | 22:21 | zedboard_maps.json     |
| [ Command 3 |    |         |       |       |       |     |       |                        |
|             |    |         |       |       |       |     |       |                        |

-rw----- 1 mirko users 145 Nov 2 22:21 simbatch.mk -rwx----- 1 mirko users 4059 Nov 2 22:21 simbatch.pv -rw----- 1 mirko users 24057 Nov 2 22:21 simbatch input.csv -rw----- 1 mirko users 1100 Nov 2 22:21 sumapp.go -rw----- 1 mirko users 21319 Nov 2 22:21 zedboard.xdc -rw----- 1 mirko users 53 Nov 2 22:21 zedboard maps.ison Command > cat local.mk WORKING DIR=working dir CURRENT\_DIR=\$(shell\_pwd) SOURCE NEURALBOND=banknote.ison NEURALBOND\_LIBRARY=neurons NEURALBOND\_ARGS=-config-file neuralbondconfig.json -operating-mode fragment BMINFO=bminfo.ison BOARD=zedboard MAPFILE=zedboard maps.ison SHOWARGS=\_dot\_detail 5 SHOWRENDERER=dot VERILOG OPTIONS=-comment-verilog #BASM ARGS=-d BENCHCORE=10.p0o0 #HDL REGRESSION=bondmachine.sv #BM\_REGRESSION=bondmachine.json include bmapi.mk include crosscompile.mk include buildroot.mk include simbatch.mk

-rwx----- 1 mirko users 4059 Nov 2 22:21 simbatch.py -rw------ 1 mirko users 24057 Nov 2 22:21 simbatch input.csv -rw----- 1 mirko users 1100 Nov 2 22:21 sumapp.go -rw----- 1 mirko users 21319 Nov 2 22:21 zedboard.xdc 53 Nov 2 22:21 zedboard\_maps.json -rw----- 1 mirko users Command > cat local.mk WORKING DIR=working dir CURRENT\_DIR=\$(shell pwd) SOURCE NEURALBOND=banknote.ison NEURALBOND LIBRARY=neurons NEURALBOND ARGS=-config-file neuralbondconfig.json -operating-mode fragment BMINFO=bminfo.ison BOARD=zedboard MAPFILE=zedboard maps.json SHOWARGS=-dot-detail 5 SHOWRENDERER VERILOG OPTIONS=-comment-verilog #BASM ARGS=-d BENCHCORE=10,p000 #HDL REGRESSION=bondmachine.sv #BM REGRESSION=bondmachine.ison include bmapi.mk include crosscompile.mk include buildroot mk include simbatch.mk Command > 1s neurons

#### Command > cat local.mk

| l ouchar ~                        |                      |                      |                   |           |
|-----------------------------------|----------------------|----------------------|-------------------|-----------|
| WORKING_DIR=working_              | dir                  |                      |                   |           |
| CURRENT_DIR=\$(shell              | pwd )                |                      |                   |           |
| SOURCE_NEURALBOND=bai             | nknote.json          |                      |                   |           |
| NEURALBOND_LIBRARY=n              | eurons               |                      |                   |           |
| NEURALBOND_ARGS=-con <sup>-</sup> | fig-file neuralbondc | onfig.json -operatin | g-mode fragment   |           |
| BMINFO=bminfo.json                |                      |                      |                   |           |
| BOARD=zedboard                    |                      |                      |                   |           |
| MAPFILE=zedboard_map              | s.json               |                      |                   |           |
| SHOWARGS=-dot-detail              | 5                    |                      |                   |           |
| SHOWRENDERER=dot                  |                      |                      |                   |           |
| VERILOG_OPTIONS=-com              | ment-verilog         |                      |                   |           |
| #BASM_ARGS=-d                     |                      |                      |                   |           |
| BENCHCORE=10,p0o0                 |                      |                      |                   |           |
| #HDL_REGRESSION=bond              |                      |                      |                   |           |
| #BM_REGRESSION=bondma             | achine.json          |                      |                   |           |
| include bmapi.mk                  |                      |                      |                   |           |
| include crosscompile              | .mk                  |                      |                   |           |
| include buildroot.mk              |                      |                      |                   |           |
| include simbatch.mk               |                      |                      |                   |           |
| [ Command > ls neuro              | ns                   |                      |                   |           |
| [Output >                         |                      |                      |                   |           |
| frag-linear.basm                  | frag-terminal.basm   |                      | rom-terminal.basm |           |
| frag-relu.basm                    | frag-weight.basm     | rom-relu.basm        | rom-weight.basm   | weight.nb |
| frag-softmax.basm                 | linear.nb            | rom-softmax.basm     | softmax.nb        |           |
| frag-summation.basm               | relu.nb              | rom-summation.basm   | summation.nb      |           |
|                                   |                      |                      |                   |           |

ICTP-IAEA School on FPGA-based SoC 22

The BondMachine Project

#### Output

| WORKING_DIR=working_c  | lir                  |                      |                   |          |
|------------------------|----------------------|----------------------|-------------------|----------|
| CURRENT_DIR=\$(shell p | bwd )                |                      |                   |          |
| SOURCE_NEURALBOND=bar  | nknote.json          |                      |                   |          |
| NEURALBOND_LIBRARY=ne  | eurons               |                      |                   |          |
| NEURALBOND_ARGS=-conf  | fig-file neuralbondc | onfig.json -operatin | g-mode fragment   |          |
| BMINFO=bminfo.json     |                      |                      |                   |          |
| BOARD=zedboard         |                      |                      |                   |          |
| MAPFILE=zedboard_maps  | s.json               |                      |                   |          |
| SHOWARGS=-dot-detail   | 5                    |                      |                   |          |
| SHOWRENDERER=dot       |                      |                      |                   |          |
| VERILOG_OPTIONS=-comm  | nent-verilog         |                      |                   |          |
| #BASM_ARGS=-d          |                      |                      |                   |          |
| BENCHCORE=i0,p0o0      |                      |                      |                   |          |
| #HDL_REGRESSION=bondm  | nachine.sv           |                      |                   |          |
| #BM_REGRESSION=bondma  | achine.json          |                      |                   |          |
| include bmapi.mk       |                      |                      |                   |          |
| include crosscompile.  | . mk                 |                      |                   |          |
| include buildroot.mk   |                      |                      |                   |          |
| include simbatch.mk    |                      |                      |                   |          |
| [ Command > ls neuror  | าร                   |                      |                   |          |
| [Output >              |                      |                      |                   |          |
| frag-linear.basm       | frag-terminal.basm   |                      | rom-terminal.basm |          |
| frag-relu.basm         | frag-weight.basm     | rom-relu.basm        | rom-weight.basm   | weight.n |
| frag-softmax.basm      | linear.nb            | rom-softmax.basm     | softmax.nb        |          |
|                        | relu.nb              | rom-summation.basm   | summation.nb      |          |
| [ Command > cat neuro  | ons/frag-softmax.bas | m   head -n 15       |                   |          |
|                        |                      |                      |                   |          |

ICTP-IAEA School on FPGA-based SoC 22

The BondMachine Project

. nb

```
include crosscompile.mk
include buildroot.mk
include simbatch.mk
 Command > ls neurons
frag-linear.basm
                    frag-terminal.basm
                                        rom-linear.basm
                                                             rom-terminal.basm
                                                                                terminal.nb
frag-relu.basm
                    frag-weight.basm
                                         rom-relu.basm
                                                             rom-weight.basm
                                                                                weight.nb
frag-softmax.basm
                     linear_nb
                                        rom-softmax.basm
                                                             softmax_nb
frag-summation.basm relu.nb
                                         rom-summation.basm
                                                             summation nh
 Command > cat neurons/frag-softmax.basm | head -n 15
%fragment softmax iomode:sync template:true resout:r9
%meta literal resin {{ with $last := adds "10" .Params.inputs }}{{range $y := intRange "10" $last }}
{{printf "r%d:" $v}}{{end}}{{end}}
                r8. 0f0.0
       mov
{{ with $last := adds "10" .Params.inputs }}
{{range $v := intRange "10" $last}}
{{printf "mov r1,r%d\n" $y}}
               r0, 0f1.0
       mov
       mov
                r2, 0f1.0
               r3. 0f1.0
       mov
               r4, 0f1.0
       mov
                r5, 0f1.0
       mov
                r7, {{$.Params.expprec}}
       mov
loop{{printf "%d" $y}}:
       multf
                r2. r1
       multf
                r3, r4
```

ICTP-IAEA School on FPGA-based SoC 22

```
include buildroot.mk
include simbatch.mk
  Command > ls neurons
frag-linear.basm
                                        rom-linear.basm
                                                             rom_terminal basm
                     frag-terminal.basm
                                                                                terminal nh
frag-relu.basm
                     frag-weight.basm
                                         rom-relu.basm
                                                             rom-weight.basm
                                                                                weight.nb
                     linear.nb
                                                             softmax_nb
frag-softmax.basm
                                         rom-softmax.basm
frag-summation.basm relu.nb
                                         rom-summation.basm
                                                            summation_nb
  Command > cat neurons/frag-softmax.basm | head -n 15
%fragment softmax iomode:svnc template:true resout:r9
%meta literal resin {{ with $last := adds "10" .Params.inputs }}{{range $y := intRange "10" $last }}
{{printf "r%d:" $y}}{{end}}{{end}}
                r8. 0f0.0
        mov
{{ with $last := adds "10" .Params.inputs }}
{{range $y := intRange "10" $last}}
{{printf "mov r1.r%d\n" v}
               r0, 0f1.0
       mov
                r2, 0f1.0
       mov
       mov
               r3. 0f1.0
               r4. 0f1.0
       mov
                r5. 0f1.0
        mov
                r7, {{$.Params.expprec}}
       mov
loop{{printf "%d" $v}}:
       multf
                r2, r1
       multf
                r3. r4
  Command > cp /tmp/modelBM.json banknote.json
```

```
include buildroot.mk
include simbatch.mk
  Command > ls neurons
frag-linear.basm
                                        rom-linear.basm
                                                             rom_terminal basm
                     frag-terminal.basm
                                                                                terminal nh
frag-relu.basm
                     frag-weight.basm
                                         rom-relu.basm
                                                             rom-weight.basm
                                                                                weight.nb
                     linear.nb
                                                             softmax_nb
frag-softmax.basm
                                         rom-softmax.basm
frag-summation.basm relu.nb
                                         rom-summation.basm
                                                            summation_nb
  Command > cat neurons/frag-softmax.basm | head -n 15
%fragment softmax iomode:svnc template:true resout:r9
%meta literal resin {{ with $last := adds "10" .Params.inputs }}{{range $y := intRange "10" $last }}
{{printf "r%d:" $y}}{{end}}{{end}}
                r8. 0f0.0
        mov
{{ with $last := adds "10" .Params.inputs }}
{{range $y := intRange "10" $last}}
{{printf "mov r1.r%d\n" v}
               r0, 0f1.0
       mov
                r2, 0f1.0
       mov
       mov
               r3. 0f1.0
               r4. 0f1.0
       mov
                r5. 0f1.0
        mov
                r7, {{$.Params.expprec}}
       mov
loop{{printf "%d" $v}}:
       multf
                r2, r1
       multf
                r3. r4
  Command > cp /tmp/modelBM.json banknote.json
```

| include simbatch.mk                                   |                       |                       |                      |                        |  |
|-------------------------------------------------------|-----------------------|-----------------------|----------------------|------------------------|--|
| [ Command > ls neurons                                |                       |                       |                      |                        |  |
|                                                       |                       |                       |                      |                        |  |
| frag-linear.basm                                      | frag-terminal.basm    | rom-linear.basm       | rom-terminal.basm    | terminal.nb            |  |
| frag-relu.basm                                        | frag-weight.basm      | rom-relu.basm         | rom-weight.basm      | weight.nb              |  |
| frag-softmax.basm                                     | linear.nb             | rom-softmax.basm      | softmax.nb           | 2                      |  |
| frag-summation.basm                                   | relu.nb               | rom-summation.basm    | summation.nb         |                        |  |
| [ Command > cat neur                                  | ons/frag-softmax.bas  | n   head -n 15        |                      |                        |  |
| Output >                                              |                       |                       |                      |                        |  |
| %fragment softmax iomode:sync template:true resout:r9 |                       |                       |                      |                        |  |
| %meta literal resin                                   | {{ with \$last := add | s "10" .Params.input: | s }}{{range \$y := i | ntRange "10" \$last }} |  |
| {{printf "r%d:" \$y}}                                 | {{end}}{{end}}        |                       |                      |                        |  |
| mov r8,                                               | 0f0.0                 |                       |                      |                        |  |
| {{ with \$last := add                                 | s "10" .Params.input: | s }}                  |                      |                        |  |
| {{range \$y := intRan                                 | ge "10" \$last}}      |                       |                      |                        |  |
| {{printf "mov r1,r%d                                  | \n" \$y}}             |                       |                      |                        |  |
| mov r0,                                               | 0f1.0                 |                       |                      |                        |  |
| mov r2,                                               | 0f1.0                 |                       |                      |                        |  |
| mov r3,                                               | 0f1.0                 |                       |                      |                        |  |
| mov r4,                                               | 0f1.0                 |                       |                      |                        |  |
| mov r5,                                               | 0f1.0                 |                       |                      |                        |  |
|                                                       | {{\$.Params.expprec}} |                       |                      |                        |  |
| loop{{printf "%d" \$y}}:                              |                       |                       |                      |                        |  |
| multf r2,                                             |                       |                       |                      |                        |  |
| multf r3,                                             |                       |                       |                      |                        |  |
| [ Command > cp /tmp/modelBM.json banknote.json        |                       |                       |                      |                        |  |
| [ Command > make bone                                 | dmachine              |                       |                      |                        |  |
|                                                       |                       |                       |                      |                        |  |

| %meta literal resin {{ with \$last := adds "10" .Params.inputs }}{{range \$y := intRange "10" \$last }}<br>{{printf "r%d:" \$y}}{{end}} |
|-----------------------------------------------------------------------------------------------------------------------------------------|
| moy rs, 060.0                                                                                                                           |
|                                                                                                                                         |
| <pre>{{ with \$last := adds "10" .Params.inputs }}</pre>                                                                                |
| <pre>{{range \$y := intRange "10" \$last}}</pre>                                                                                        |
| {{printf "mov r1,r%d\n" \$y}}                                                                                                           |
| mov r0, Of1.0                                                                                                                           |
| mov r2, 0f1.0                                                                                                                           |
| mov r3, 0f1.0                                                                                                                           |
| mov r4, Of1.0                                                                                                                           |
| mov r5, 0f1.0                                                                                                                           |
| mov r7, {{\$.Params.expprec}}                                                                                                           |
| loop{{printf "%d" \$y}}:                                                                                                                |
| multf r2, r1                                                                                                                            |
| multf r3, r4                                                                                                                            |
| $\int Command > cp /tmp/modelBM.json banknote.json$                                                                                     |
| Command > make bondmachine                                                                                                              |
| [Project: proj_mlinfn] - [Working directory creation begin] - [Target: working_dir]                                                     |
| mkdir -p working_dir                                                                                                                    |
| [Project: proj_mlinfn] - [Working directory creation end]                                                                               |
| [Project: proj_mlinfn] - [BondMachine generation begin] - [Target: working_dir/bondmachine_target]                                      |
| neuralbond -net-file banknote.json -neuron-lib-path neurons -save-basm working_dir/bondmachine.basm                                     |
| -config-file neuralbondconfig.json -operating-mode fragment -bminfo-file bminfo.json ; basm -bminfo                                     |
| -file bminfo.json -o working_dir/bondmachine.json working_dir/bondmachine.basm neurons/*.basm                                           |
| [Project: proj_mlinfn] - [BondMachine generation end]                                                                                   |
|                                                                                                                                         |
|                                                                                                                                         |
|                                                                                                                                         |

ICTP-IAEA School on FPGA-based SoC 22

| {{printf "r%d:" \$y}}{{end}}                                                                                  |
|---------------------------------------------------------------------------------------------------------------|
| mov r8, 0f0.0                                                                                                 |
| {{ with \$last := adds "10" .Params.inputs }}                                                                 |
| {{range \$y := intRange "10" \$last}}                                                                         |
| {{printf "mov r1,r%d\n" \$y}}                                                                                 |
| mov r0, 0f1.0                                                                                                 |
| mov r2, 0f1.0                                                                                                 |
| mov r3, 0f1.0                                                                                                 |
| mov r4, 0f1.0                                                                                                 |
| mov r5, 0f1.0                                                                                                 |
| mov r7, {{\$.Params.expprec}}                                                                                 |
| loop{{printf "%d" \$y}}:                                                                                      |
| multf r2, r1                                                                                                  |
| multf r3, r4                                                                                                  |
| <pre>[ Command &gt; cp /tmp/modelBM.json banknote.json</pre>                                                  |
| Command > make bondmachine                                                                                    |
| <pre>[Project: proj_mlinfn] - [Working directory creation begin] - [Target: working_dir]</pre>                |
| mkdir -p working_dir                                                                                          |
| <pre>[Project: proj_mlinfn] - [Working directory creation end]</pre>                                          |
|                                                                                                               |
| <pre>[Project: proj_mlinfn] - [BondMachine generation begin] - [Target: working_dir/bondmachine_target]</pre> |
| neuralbond -net-file banknote.json -neuron-lib-path neurons -save-basm working_dir/bondmachine.basm           |
| -config-file neuralbondconfig.json -operating-mode fragment -bminfo-file bminfo.json ; basm -bminfo           |
| -file bminfo.json -o working_dir/bondmachine.json working_dir/bondmachine.basm neurons/*.basm                 |
| <pre>[Project: proj_mlinfn] - [BondMachine generation end]</pre>                                              |
|                                                                                                               |
| <pre>[ Command &gt; ls working_dir</pre>                                                                      |
|                                                                                                               |
|                                                                                                               |

ICTP-IAEA School on FPGA-based SoC 22
```
{{ with $last := adds "10" .Params.inputs }}
{{range $y := intRange "10" $last}}
{{printf "mov r1.r%d\n" $v}}
               r0. 0f1.0
       mov
       mov
               r2, 0f1.0
               r3. 0f1.0
       mov
       mov
               r4, 0f1.0
               r5. 0f1.0
       mov
               r7, {{$.Params.expprec}}
       mov
loop{{printf "%d" $v}}:
       multf
               r2, r1
       multf
                r3, r4
 Command > cp /tmp/modelBM.json banknote.json
  Command > make bondmachine
          proj mlinfn] - [Working directory creation begin] - [Target: working dir]
mkdir -p working dir
   piect: proj mlinfn] - [Working directory creation end]
     ect: proj mlinfn] - [BondMachine generation begin] - [Target: working dir/bondmachine target]
neuralbond -net-file banknote.ison -neuron-lib-path neurons -save-basm working dir/bondmachine.basm
-config-file neuralbondconfig.json -operating-mode fragment -bminfo-file bminfo.json ; basm -bminfo
-file bminfo.ison -o working dir/bondmachine.ison working dir/bondmachine.basm neurons/*.basm
  roject: proj mlinfn] - [BondMachine generation end]
 Command > ls working_dir
bondmachine.basm bondmachine.json bondmachine target
```

|      | {{range \$y := intRange "10" \$last}}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |  |  |
|------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
|      | {{printf "mov r1,r%d\n" \$y}}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |  |  |
|      | mov r0, 0f1.0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |  |  |
| c.m  | mov r2, 0f1.0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |  |  |
| an a | mov r3, 0f1.0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |  |  |
| 10   | mov r4, 0f1.0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |  |  |
| 1    | mov r5, 0f1.0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |  |  |
| V    | mov r7, {{\$.Params.expprec}}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |  |  |
| 7    | loop{{printf "%d" \$y}}:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |  |  |
|      | multf r2, r1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |  |  |
|      | multf r3, r4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |  |  |
|      | [ Command > cp /tmp/modelBM.json banknote.json                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |  |  |
|      | [ Command > make bondmachine                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |  |  |
|      | [Project: proj_mlinfn] - [Working directory creation begin] - [Target: working_dir]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |  |  |
|      | mkdir -p working_dir                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |  |  |
|      | [Project: proj_mlinfn] - [Working directory creation end]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |  |  |
|      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |  |  |
|      | <pre>[Project: proj_mlinfn] [BondMachine generation begin] [Target: working_dir/bondmachine_target]</pre>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |  |  |
|      | neuralbond -net-file banknote.json -neuron-lib-path neurons -save-basm working_dir/bondmachine.basm                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |  |  |
|      | -config-file neuralbondconfig.json -operating-mode fragment -bminfo-file bminfo.json ; basm -bminfo                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |  |  |
|      | -file bminfo.json -o working_dir/bondmachine.json working_dir/bondmachine.basm neurons/*.basm                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |  |  |
|      | <pre>[Project: proj_mlinfn] - [BondMachine generation end]</pre>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |  |  |
|      | Commend & la contribut dia                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |  |  |
|      | [ Command > ls working_dir<br>[ Output >                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |  |  |
|      | Forchard Standard St |  |  |
|      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |  |  |
|      | [ Command > cat working_dir/bondmachine.basm                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |  |  |
|      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |  |  |

ICTP-IAEA School on FPGA-based SoC 22

%meta filinkatt downweightfi\_3\_1\_\_4\_1 fi:weightfi\_3\_1\_\_4\_1, type:input, index:0 %meta filinkatt downweightfi 3 1 4 1 fi:node 3 1, type:output, index:0 %meta filinkatt upweightfi\_3\_1\_\_4\_1 fi:node\_4\_1, type:input, index:0 %meta filinkatt upweightfi 3 1 4 1 fi:weightfi 3 1 4 1. type:output. index:0 %meta cpdef node 4 0 fragcollapse:node 4 0 %meta cpdef node\_1\_0 fragcollapse:node\_1\_0 %meta cpdef node 2 0 fragcollapse:node 2 0 %meta cpdef weightfi\_0\_3\_\_1\_0 fragcollapse:weightfi 0 3 1 0 %meta cpdef weightfi 2 0 3 0 fragcollapse:weightfi 2 0 3 0 %meta cpdef weightfi 0 0 1 0 fragcollapse:weightfi 0 0 1 0 %meta cpdef weightfi 2 1 3 0 fragcollapse;weightfi 2 1 3 0 %meta cpdef node 4 1 fragcollapse:node 4 1 %meta cpdef weightfi 2 1 3 1 fragcollapse:weightfi 2 1 3 1 %meta cpdef node 3 0 fragcollapse:node 3 0 %meta cpdef weightfi 3 0 4 0 fragcollapse:weightfi 3 0 4 0 %meta cpdef node 0 2 fragcollapse:node 0 2 %meta cpdef weightfi 0 2 1 0 fragcollapse:weightfi 0 2 1 0 %meta cpdef node 0 1 fragcollapse:node 0 1 %meta cpdef weightfi 1 0 2 0 fragcollapse:weightfi 1 0 2 0 %meta cpdef weightfi 1 0 2 1 fragcollapse:weightfi 1 0 2 1 %meta cpdef weightfi 2 0 3 1 fragcollapse:weightfi 2 0 3 1 %meta cpdef node 0 0 fragcollapse:node 0 0 %meta cpdef weightfi 0 1 1 0 fragcollapse;weightfi 0 1 1 0 %meta cpdef node 0 3 fragcollapse:node 0 3 %meta cpdef node 3 1 fragcollapse:node 3 1 %meta cpdef weightfi\_3\_1\_\_4\_1 fragcollapse:weightfi 3 1\_\_4\_1 %meta cpdef node 2 1 fragcollapse:node 2 1

%meta filinkatt downweightfi 3 1 4 1 fi:node\_3\_1, type:output, index:0 %meta filinkatt upweightfi\_3\_1\_\_4\_1 fi:node\_4\_1, type:input, index:0 %meta filinkatt upweightfi 3 1 4 1 fi:weightfi 3 1 4 1. type:output. index:0 %meta cpdef node 4 0 fragcollapse:node 4 0 %meta cpdef node 1 0 fragcollapse:node 1 0 %meta cpdef node\_2\_0 fragcollapse:node\_2\_0 %meta cpdef weightfi 0 3 1 0 fragcollapse:weightfi 0 3 1 0 %meta cpdef weightfi 2 0 3 0 fragcollapse:weightfi 2 0 3 0 %meta cpdef weightfi 0 0 1 0 fragcollapse:weightfi 0 0 1 0 %meta cpdef weightfi 2 1 3 0 fragcollapse:weightfi 2 1 3 0 %meta cpdef node 4 1 fragcollapse:node 4 1 %meta cpdef weightfi 2 1 3 1 fragcollapse:weightfi 2 1 3 1 %meta cpdef node 3 0 fragcollapse:node 3 0 %meta cpdef weightfi 3 0 4 0 fragcollapse:weightfi 3 0 4 0 %meta cpdef node 0 2 fragcollapse:node 0 2 %meta cpdef weightfi 0 2 1 0 fragcollapse:weightfi 0 2 1 0 %meta cpdef node 0 1 fragcollapse:node 0 1 %meta cpdef weightfi\_1\_0\_\_2\_0 fragcollapse:weightfi\_1\_0\_\_2\_0 %meta cpdef weightfi 1 0 2 1 fragcollapse:weightfi 1 0 2 1 %meta cpdef weightfi 2 0 3 1 fragcollapse:weightfi 2 0 3 1 %meta cpdef node 0 0 fragcollapse:node 0 0 %meta cpdef weightfi 0 1 1 0 fragcollapse:weightfi 0 1 1 0 %meta cpdef node\_0\_3 fragcollapse:node\_0\_3 %meta cpdef node 3 1 fragcollapse:node 3 1 %meta cpdef weightfi 3 1 4 1 fragcollapse:weightfi 3 1 4 1 %meta cpdef node 2 1 fragcollapse:node 2 1 Command > make hdl

%meta filinkatt downweightfi\_3\_1\_\_4\_1 fi:node\_3\_1, type:output, index:0 %meta filinkatt upweightfi\_3\_1\_\_4\_1 fi:node\_4\_1, type:input, index:0 %meta filinkatt upweightfi 3 1 4 1 fi:weightfi 3 1 4 1. type:output. index:0 %meta cpdef node 4 0 fragcollapse:node 4 0 %meta cpdef node 1 0 fragcollapse:node 1 0 %meta cpdef node\_2\_0 fragcollapse:node\_2\_0 %meta cpdef weightfi 0 3 1 0 fragcollapse:weightfi 0 3 1 0 %meta cpdef weightfi 2 0 3 0 fragcollapse:weightfi 2 0 3 0 %meta cpdef weightfi 0 0 1 0 fragcollapse:weightfi 0 0 1 0 %meta cpdef weightfi 2 1 3 0 fragcollapse:weightfi 2 1 3 0 %meta cpdef node 4 1 fragcollapse:node 4 1 %meta cpdef weightfi 2 1 3 1 fragcollapse:weightfi 2 1 3 1 %meta cpdef node 3 0 fragcollapse:node 3 0 %meta cpdef weightfi 3 0 4 0 fragcollapse:weightfi 3 0 4 0 %meta cpdef node 0 2 fragcollapse:node 0 2 %meta cpdef weightfi 0 2 1 0 fragcollapse:weightfi 0 2 1 0 %meta cpdef node 0 1 fragcollapse:node 0 1 %meta cpdef weightfi\_1\_0\_\_2\_0 fragcollapse:weightfi\_1\_0\_\_2\_0 %meta cpdef weightfi 1 0 2 1 fragcollapse:weightfi 1 0 2 1 %meta cpdef weightfi 2 0 3 1 fragcollapse:weightfi 2 0 3 1 %meta cpdef node 0 0 fragcollapse:node 0 0 %meta cpdef weightfi 0 1 1 0 fragcollapse:weightfi 0 1 1 0 %meta cpdef node\_0\_3 fragcollapse:node\_0\_3 %meta cpdef node 3 1 fragcollapse:node 3 1 %meta cpdef weightfi 3 1 4 1 fragcollapse:weightfi 3 1 4 1 %meta cpdef node 2 1 fragcollapse:node 2 1 Command > make hd

%meta filinkatt upweightfi\_3\_1\_\_4\_1 fi:node\_4\_1, type:input, index:0 %meta filinkatt upweightfi 3 1 4 1 fi:weightfi 3 1 4 1, type:output, index:0 %meta cpdef node 4 0 fragcollapse:node 4 0 %meta cpdef node 1 0 fragcollapse:node 1 0 %meta cpdef node 2 0 fragcollapse:node 2 0 %meta cpdef weightfi\_0\_3\_\_1\_0 fragcollapse:weightfi 0 3 1 0 %meta cpdef weightfi 2 0 3 0 fragcollapse:weightfi 2 0 3 0 %meta cpdef weightfi\_0\_0\_\_1\_0 fragcollapse:weightfi 0 0 1 0 %meta cpdef weightfi 2 1 3 0 fragcollapse:weightfi 2 1 3 0 %meta cpdef node 4 1 fragcollapse:node 4 1 %meta cpdef weightfi 2 1 3 1 fragcollapse;weightfi 2 1 3 1 %meta cpdef node 3 0 fragcollapse:node 3 0 %meta cpdef weightfi 3 0 4 0 fragcollapse:weightfi 3 0 4 0 %meta cpdef node\_0\_2 fragcollapse:node\_0\_2 %meta cpdef weightfi 0 2 1 0 fragcollapse:weightfi 0 2 1 0 %meta cpdef node 0 1 fragcollapse:node 0 1 %meta cpdef weightfi 1 0 2 0 fragcollapse:weightfi 1 0 2 0 %meta cpdef weightfi 1\_0\_2\_1 fragcollapse:weightfi 1\_0\_2\_1 %meta cpdef weightfi 2 0 3 1 fragcollapse:weightfi 2 0 3 1 %meta cpdef node 0 0 fragcollapse:node 0 0 %meta cpdef weightfi 0 1 1 0 fragcollapse:weightfi 0 1 1 0 %meta cpdef node 0 3 fragcollapse:node 0 3 %meta cpdef node\_3\_1 fragcollapse:node\_3\_1 %meta cpdef weightfi 3 1 4 1 fragcollapse:weightfi 3 1 4 1 %meta cpdef node 2 1 fragcollapse:node 2 1 Command > make hd Command > make show

```
%meta cpdef weightfi_2_0__3_0 fragcollapse:weightfi_2_0__3_0
%meta cpdef weightfi_0_0_1_0 fragcollapse:weightfi_0_0_1_0
%meta cpdef weightfi_2_1__3_0 fragcollapse:weightfi 2_1__3_0
%meta cpdef node_4_1 fragcollapse:node_4_1
%meta cpdef weightfi_2_1__3_1 fragcollapse:weightfi_2_1__3_1
%meta cpdef node_3_0 fragcollapse:node_3_0
%meta cpdef weightfi 3 0 4 0 fragcollapse:weightfi 3 0 4 0
%meta cpdef node 0 2 fragcollapse:node 0 2
%meta cpdef weightfi_0_2_1_0 fragcollapse:weightfi 0 2 1 0
%meta cpdef node 0 1 fragcollapse:node 0 1
%meta cpdef weightfi 1 0 2 0 fragcollapse;weightfi 1 0 2 0
%meta cpdef weightfi_1_0__2_1 fragcollapse:weightfi_1_0__2_1
%meta cpdef weightfi 2 0 3 1 fragcollapse:weightfi 2 0 3 1
%meta cpdef node 0 0 fragcollapse:node 0 0
%meta cpdef weightfi 0 1 1 0 fragcollapse:weightfi 0 1 1 0
%meta cpdef node 0 3 fragcollapse:node 0 3
%meta cpdef node 3 1 fragcollapse:node 3 1
%meta cpdef weightfi 3 1 4 1 fragcollapse:weightfi 3 1 4 1
%meta cpdef node 2 1 fragcollapse:node 2 1
  Command > make hdl
  Command > make show
          proj mlinfn] - [BondMachine diagram show begin] - [Target: show]
bondmachine -bondmachine-file working dir/bondmachine.ison -emit-dot -dot-detail 5 -bminfo-file bmin
fo.json | dot -Txlib
 Project: proj_mlinfn] [BondMachine diagram show end]
```

ICTP-IAEA School on FPGA-based SoC 22



Al the HDL files needed to build the firmware for the given board



ICTP-IAEA School on FPGA-based SoC 22

Command > mkdir Example

ICTP-IAEA School on FPGA-based SoC 22

Command > mkdir Example Command > cd Example

ICTP-IAEA School on FPGA-based SoC 22

[ Command > mkdir Example [ Command > cd Example [ Command > bmhelper create --project\_name mlinfn --board zedboard --project\_type neural\_network --n \_inputs 4 --n\_outputs 3 --source\_neuralbond banknote.json

ICTP-IAEA School on FPGA-based SoC 22

[ Command > mkdir Example [ Command > cd Example [ Command > bmhelper create --project\_name mlinfn --board zedboard --project\_type neural\_network --n \_inputs 4 --n\_outputs 3 --source\_neuralbond banknote.json

ICTP-IAEA School on FPGA-based SoC 22

# Demo - BondMachine simulation Command > mkdir Example Command > cd Example Command > bmhelper create --project\_name mlinfn --board zedboard --project\_type neural\_network --n \_\_\_\_\_\_inputs 4 --n\_outputs 3 --source\_neuralbond banknote.json Command > cd proj\_mlinfn

ICTP-IAEA School on FPGA-based SoC 22

# Demo - BondMachine simulation Command > mkdir Example Command > cd Example Command > bmhelper create --project\_name mlinfn --board zedboard --project\_type neural\_network --n \_\_\_\_\_\_inputs 4 --n\_outputs 3 --source\_neuralbond banknote.json Command > cd proj\_mlinfn

ICTP-IAEA School on FPGA-based SoC 22





| <pre>[ Command &gt; mkdir Example</pre>           |                                                       |
|---------------------------------------------------|-------------------------------------------------------|
| <pre>[ Command &gt; cd Example</pre>              |                                                       |
| <pre>[ Command &gt; bmhelper createproject_</pre> | name mlinfnboard zedboardproject_type neural_networkn |
| _inputs 4n_outputs 3source_neura                  | lbond banknote.json                                   |
| [ Command > cd proj_mlinfn                        |                                                       |
| [ Command > cp /tmp/sim.csv .                     |                                                       |
| [ Command > make simbatch                         |                                                       |
|                                                   |                                                       |
|                                                   |                                                       |
|                                                   |                                                       |
|                                                   |                                                       |
|                                                   |                                                       |
|                                                   |                                                       |
|                                                   |                                                       |
|                                                   |                                                       |
|                                                   |                                                       |
|                                                   |                                                       |
|                                                   |                                                       |
|                                                   |                                                       |

ICTP-IAEA School on FPGA-based SoC 22

1.316250120615847 Running simulation with inputs: 0f-1.0601420171169955.0f0.3471542056645857.0f-0.4248275125188447.0f-0.04608508181009227 Running simulation with inputs: 0f-0.15228760297525445.0f-0.2821256600040472.0f-0.3931947744117846.0 fo 5712245546772439 Running simulation with inputs: 0f1.052405089774165.0f0.7521166535304541.0f-0.7981025904661143.0f0.3 9395848746270123 Running simulation with inputs: 0f-0.324656804509,0f-0.15459195394199834,0f<u>-0.7235721768066175,0f0.2</u> 947911065500622 Running simulation with inputs: 0f1.1202121241061977.0f-0.05885513674190243.0f-0.03632330979904628.0 f1.4916348403989907 Running simulation with inputs: 0f-1.5832950470099985,0f0.18817820405272936,0f-0.14704366178162517,0 f0\_2538807872456312 Running simulation with inputs: 0f-0.2817404268789742.0f-1.2433487678699013.0f0.7539298193063557.0f1 3088205957275203 Running simulation with inputs: 0f-0.9459019049271576.0f-0.32230699215865727.0f0.3011633249327807.0f 0.8005548150124344 Running simulation with inputs: 0f0.3527853059363061.0f-0.19267696420599642.0f-0.8155007117971548.0f 1.0596199512474536 Running simulation with inputs: 0f1.0634254876935534.0f-1.0567651440981527.0f0.4696066746895383.0f0. 6412610081648472 Running simulation with inputs: 0f-0.24940573706008093,0f1.0770592106221761.0f-1.0709577426405883.0f -0.6442337339483407Running simulation with inputs: 0f0.8249426728456427.0f1.5484150348383172.0f-1.1686087366365605.0f-1 5435897047849427 Project: proj\_mlinfn] - [BondMachine simbatch end]

```
Running simulation with inputs: 0f-1.0601420171169955,0f0.3471542056645857,0f-0.4248275125188447.0f-
0.04608508181009227
Running simulation with inputs: 0f-0.15228760297525445.0f-0.2821256600040472.0f-0.3931947744117846.0
f0.5712245546772439
Running simulation with inputs: 0f1.052405089774165.0f0.7521166535304541.0f-0.7981025904661143.0f0.3
9395848746270123
Running simulation with inputs: 0f-0.324656804509,0f-0.15459195394199834,0f-0.7235721768066175.0f0.2
947911065500622
Running simulation with inputs: 0f1.1202121241061977.0f-0.05885513674190243.0f-0.03632330979904628.0
f1.4916348403989907
Running simulation with inputs: 0f-1.5832950470099985.0f0.18817820405272936.0f-0.14704366178162517.0
f0.2538807872456312
Running simulation with inputs: 0f-0.2817404268789742.0f-1.2433487678699013.0f0.7539298193063557.0f1
3088205957275203
Running simulation with inputs: 0f-0.9459019049271576,0f-0.32230699215865727,0f0.3011633249327807,0f
0.8005548150124344
Running simulation with inputs: 0f0.3527853059363061.0f-0.19267696420599642.0f-0.8155007117971548.0f
1 0596199512474536
Running simulation with inputs: 0f1.0634254876935534,0f-1.0567651440981527,0f0.4696066746895383.0f0.
6412610081648472
Running simulation with inputs: 0f-0.24940573706008093.0f1.0770592106221761.0f-1.0709577426405883.0f
-0.6442337339483407
Running simulation with inputs: 0f0.8249426728456427,0f1.5484150348383172,0f-1.1686087366365605.0f-1
5435897047849427
 Project: proj_mlinfn] - [BondMachine simbatch end]
 Command > cat working dir/simbatch output.csv
```

0.71279794,0.28720203,0 0.6313562,0.36864382,0 0.7589688,0.24103124,0 0.6479448.0.35205516.0 0.3601988,0.63980114,1 0.6425791,0.35742098,0 0.5682741,0.43172595,0 0.61973804.0.38026193.0 0.6914931.0.3085069.0 0.6783158,0.32168424,0 0.4921839,0.5078161,1 0.37793863,0.6220614,1 0.66365564.0.33634433.0 0.6749563,0.32504368,0 0.66059536,0.3394046,0 0.4266389.0.57336116.1 0.4380828.0.56191725.1 0.6834962,0.31650382,0 0.4042624,0.59573764,1 0.63697994.0.3630201.0 0.36208335.0.6379167.1 0.403224.0.59677607.1 0.40639094,0.5936091,1 0.4439535.0.5560465.1 0.593614,0.40638596,0 0.5749001.0.42509994.0 0.77141094,0.22858903,0

ICTP-IAEA School on FPGA-based SoC 22



The outcome of this third part of the demo is:

simbatchoutput.csv, a simulated CSV files containing the output probabilities and the prediction

ICTP-IAEA School on FPGA-based SoC 22



ICTP-IAEA School on FPGA-based SoC 22

The BondMachine Project

7B

Command > mkdir Example

ICTP-IAEA School on FPGA-based SoC 22

Command > mkdir Example Command > cd Example

ICTP-IAEA School on FPGA-based SoC 22







ICTP-IAEA School on FPGA-based SoC 22

Command > mkdir Example Command > cd Example Command > bmhelper create --project\_name mlinfn --board zedboard --project\_type neural\_network --n inputs 4 -- n outputs 3 -- source neuralbond banknote ison Command > cd proj mlinfn cat local.mk WORKING DIR=working dir CURRENT\_DIR=\$(shell pwd) SOURCE NEURALBOND=banknote.ison NEURALBOND LIBRARY=neurons NEURALBOND\_ARGS=-config-file neuralbondconfig.json -operating-mode fragment BMINFO=bminfo.ison BOARD=zedboard MAPFILE=zedboard maps.json SHOWARGS=-dot-detail 5 SHOWRENDERER VERILOG OPTIONS=-comment-verilog #BASM ARGS=-d BENCHCORE=10,p000 #HDL REGRESSION=bondmachine.sv #BM REGRESSION=bondmachine.ison include bmapi.mk include crosscompile.mk include buildroot.mk include simbatch.mk

Command > mkdir Example Command > cd Example Command > bmhelper create --project\_name mlinfn --board zedboard --project\_type neural\_network --n inputs 4 -- n outputs 3 -- source neuralbond banknote ison Command > cd proj mlinfn cat local.mk WORKING DIR=working dir CURRENT\_DIR=\$(shell pwd) SOURCE NEURALBOND=banknote.ison NEURALBOND LIBRARY=neurons NEURALBOND\_ARGS=-config-file neuralbondconfig.json -operating-mode fragment BMINFO=bminfo.ison BOARD=zedboard MAPFILE=zedboard maps.json SHOWARGS=-dot-detail 5 SHOWRENDERER VERILOG OPTIONS=-comment-verilog #BASM ARGS=-d BENCHCORE=10,p000 #HDL REGRESSION=bondmachine.sv #BM REGRESSION=bondmachine.ison include bmapi.mk include crosscompile.mk include buildroot mk include simbatch.mk Command > make accelerator

ICTP-IAEA School on FPGA-based SoC 22

```
INFO: [IP_Flow 19-3166] Bus Interface 'S00_AXI': References existing memory map 'S00_AXI'.
# set_property core_revision 4 [ipx::current_core]
# ipx::update source project archive -component [ipx::current core]
# ipx::create_xgui_files [ipx::current_core]
# ipx::update_checksums [ipx::current_core]
# ipx::save_core [ipx::current_core]
# ipx::move temp component back -component [ipx::current core]
# close project -delete
# update ip catalog -rebuild -repo_path ${ip_directory}
INFO: [IP Flow 19-725] Reloaded user IP repository 'ip repo'
# close project -delete
# exit
INFO: [Common 17-206] Exiting Vivado at Wed Nov 2_23:03:37 2022...
cp -a working dir/bondmachine.sv working dir/ip repo/bondmachineip 1.0/hdl/bondmachine.sv
# Comments
bash -c "cd working dir ; ./vivadoAXIcomment.sh"
# Insert the AXI code
bash -c "cd working_dir ; sed -i -e '/Add user logic here/r aux/axipatch.txt' ./ip_repo/bondmachinei
p 1.0/hdl/bondmachineip v1 0 S00 AXI.v"
bash -c "cd working dir : sed -i -e '/Users to add ports here/r aux/designexternal.txt' ./ip repo/bo
ndmachineip 1.0/hdl/bondmachineip v1 0 S00 AXI.v"
bash -c "cd working dir : sed -i -e '/Users to add ports here/r aux/designexternal.txt' ./ip repo/bo
ndmachineip 1.0/hdl/bondmachineip v1 0.v"
<u>bash -c "cd working dir ;</u> sed -i -e '/bondmachineip_v1_0_S00_AXI_inst/r aux/designexternalinst.txt'
./ip repo/bondmachineip 1.0/hdl/bondmachineip v1 0.v"
 Project: proj mlinfn] [Vivado toolchain - IP accelerator creation end]
```

ICTP-IAEA School on FPGA-based SoC 22

```
# set_property core_revision 4 [ipx::current_core]
 ipx::update_source_project_archive -component [ipx::current_core]
# ipx::create_xgui_files [ipx::current_core]
# ipx::update_checksums [ipx::current_core]
# ipx::save_core [ipx::current_core]
# ipx::move_temp_component_back -component [ipx::current_core]
# close project -delete
# update ip catalog -rebuild -repo path ${ip directorv}
INFO: [IP Flow 19-725] Reloaded user IP repository 'ip repo'
# close project -delete
# exit
INFO: [Common 17-206] Exiting Vivado at Wed Nov 2 23:03:37 2022...
cp -a working dir/bondmachine.sv working dir/ip repo/bondmachineip 1.0/hdl/bondmachine.sv
# Comments
bash -c "cd working dir ; ./vivadoAXIcomment.sh"
# Insert the AXI code
bash -c "cd working dir : sed -i -e '/Add user logic here/r aux/axipatch.txt' ./ip repo/bondmachinei
p 1.0/hdl/bondmachineip v1 0 S00 AXI.v"
bash -c "cd working dir ; sed -i -e '/Users to add ports here/r aux/designexternal.txt' ./ip repo/bo
ndmachineip 1.0/hdl/bondmachineip v1 0 S00 AXI.v"
bash -c "cd working dir : sed -i -e '/Users to add ports here/r aux/designexternal.txt' ./ip repo/bo
ndmachineip 1.0/hdl/bondmachineip v1 0.v"
bash -c "cd working dir ; sed -i -e '/bondmachineip v1 0 S00 AXI inst/r aux/designexternalinst.txt'
./ip repo/bondmachineip 1.0/hdl/bondmachineip v1 0.v"
 Project: proj mlinfn] [Vivado toolchain - IP accelerator creation end]
 Command > make design
```

# make\_wrapper -files [get\_files \${project\_dir}/\${project\_name}.srcs/sources\_1/bd/bm\_design/bm\_desig n.bd] -top

INFO: [BD 41-1662] The design 'bm\_design.bd' is already validated. Therefore parameter propagation w ill not be re-run.

Wrote : </tmp/tmpof\_4uxc5/Example/proj\_mlinfn/working\_dir/bmaccelerator/bmaccelerator.srcs/sources\_ 1/bd/bm\_design/bm\_design.bd>

VHDL Output written to : /tmp/tmpof\_4uxc5/Example/proj\_mlinfn/working\_dir/bmaccelerator/bmaccelerato r.srcs/sources\_1/bd/bm\_design/synth/bm\_design.v

VHDL Output written to : /tmp/tmpof\_4uxc5/Example/proj\_mlinfn/working\_dir/bmaccelerator/bmaccelerato r.srcs/sources\_1/bd/bm\_design/sim/bm\_design.v

VHDL Output written to : /tmp/tmpof\_4uxc5/Example/proj\_mlinfn/working\_dir/bmaccelerator/bmaccelerato r.srcs/sources\_1/bd/bm\_design/hdl/bm\_design\_wrapper.v

# update\_compile\_order -fileset sources\_1

CRITICAL WARNING: [filemgmt 20-730] Could not find a top module in the fileset sources\_1.

Resolution: With the gui up, review the source files in the Sources window. Use Add Sources to add a ny needed sources. If the files are disabled, enable them. You can also select the file and choose S et Used In from the pop-up menu. Review if they are being used at the proper points of the flow. # add\_files -norecurse -scan\_for\_includes \${project\_dir}/\${project\_name}.srcs/sources\_1/bd/bm\_design /hdl/bm design wrapper.v

# update\_compile\_order -fileset sources\_1

# add\_files -fileset constrs\_1 -norecurse zedboard.xdc

# update\_compile\_order -fileset sources\_1

# close\_project

# exit

INFO: [Common 17-206] Exiting Vivado at Wed Nov 2 23:04:33 2022...

[Project: proj\_mlinfn] - [Vivado toolchain - design creation end]

ICTP-IAEA School on FPGA-based SoC 22

n.bd] -top

INFO: [BD 41-1662] The design 'bm\_design.bd' is already validated. Therefore parameter propagation w ill not be re-run.

Wrote : </tmp/tmpof\_4uxc5/Example/proj\_mlinfn/working\_dir/bmaccelerator/bmaccelerator.srcs/sources\_ 1/bd/bm\_design/bm\_design.bd>

VHDL Output written to : /tmp/tmpof\_4uxc5/Example/proj\_mlinfn/working\_dir/bmaccelerator/bmaccelerato r.srcs/sources\_1/bd/bm\_design/synth/bm\_design.v

VHDL Output written to : /tmp/tmpof\_4uxc5/Example/proj\_mlinfn/working\_dir/bmaccelerator/bmaccelerato r.srcs/sources\_1/bd/bm\_design/sim/bm\_design.v

VHDL Output written to : /tmp/tmpof\_4uxc5/Example/proj\_mlinfn/working\_dir/bmaccelerator/bmaccelerato r.srcs/sources\_1/bd/bm\_design/hdl/bm\_design\_wrapper.v

# update\_compile\_order -fileset sources\_1

CRITICAL WARNING: [filemgmt 20-730] Could not find a top module in the fileset sources\_1.

Resolution: With the gui up, review the source files in the Sources window. Use Add Sources to add a ny needed sources. If the files are disabled, enable them. You can also select the file and choose S et Used In from the pop-up menu. Review if they are being used at the proper points of the flow. # add\_files -norecurse -scan\_for\_includes \${project\_dir}/\${project\_name}.srcs/sources\_1/bd/bm\_design

/hdl/bm\_design\_wrapper.v

# update\_compile\_order -fileset sources\_1

# add\_files -fileset constrs\_1 -norecurse zedboard.xdc

# update\_compile\_order -fileset sources\_1

# close\_project

# exit

INFO: [Common 17-206] Exiting Vivado at Wed Nov 2 23:04:33 2022...

Project: proj\_mlinfn] - [Vivado toolchain - design creation end]

**Command > make design\_synthesis** 

```
INFO: [Project 1-571] Translating synthesized netlist
Netlist sorting complete. Time (s): cpu = 00:00:00.01 ; elapsed = 00:00:00.01 . Memory (MB): peak =
2133.133 : gain = 0.000 : free physical = 3826 : free virtual = 4432
INFO: [Project 1-570] Preparing netlist for logic optimization
INFO: [Opt 31-138] Pushed 0 inverter(s) to 0 load pin(s).
Netlist sorting complete. Time (s): cpu = 00:00:00 ; elapsed = 00:00:00 . Memory (MB): peak = 2140.0
62 ; gain = 0.000 ; free physical = 3774 ; free virtual = 4381
INFO: [Project 1-111] Unisim Transformation Summary:
No Unisim elements were transformed.
INFO: [Common 17-83] Releasing license: Synthesis
36 Infos, 26 Warnings, 0 Critical Warnings and 0 Errors encountered.
synth_design completed successfully
synth design: Time (s): cpu = 00:00:49 ; elapsed = 00:00:51 . Memory (MB): peak = 2140.062 ; gain =
55.961 ; free physical = 3905 ; free virtual = 4512
INFO: [Common 17-1381] The checkpoint '/tmp/tmpof 4uxc5/Example/proj mlinfn/working dir/bmaccelerato
r/bmaccelerator.runs/synth 1/bm design wrapper.dcp' has been generated.
INFO: [runtcl-4] Executing : report_utilization -file bm_design_wrapper_utilization_synth.rpt -pb_bm
design wrapper utilization synth.pb
INFO: [Common 17-206] Exiting Vivado at Wed Nov 2 23:17:23 2022...
[Wed Nov 2 23:17:33 2022] synth 1 finished
wait on run: Time (s): cpu = 00:15:18 : elapsed = 00:12:01 . Memory (MB): peak = 2258.160 : gain = 0
.000 ; free physical = 4618 ; free virtual = 5223
# exit
INFO: [Common 17-206] Exiting Vivado at Wed Nov 2 23:17:34 2022...
 Project: proj_mlinfn] - [Vivado toolchain - design synthesis end]
```

ICTP-IAEA School on FPGA-based SoC 22

```
Netlist sorting complete. Time (s): cpu = 00:00:00.01 ; elapsed = 00:00:00.01 . Memory (MB): peak =
2133.133 ; gain = 0.000 ; free physical = 3826 ; free virtual = 4432
INFO: [Project 1-570] Preparing netlist for logic optimization
INFO: [Opt 31-138] Pushed 0 inverter(s) to 0 load pin(s).
Netlist sorting complete. Time (s): cpu = 00:00:00; elapsed = 00:00:00. Memory (MB): peak = 2140.0
62 ; gain = 0.000 ; free physical = 3774 ; free virtual = 4381
INFO: [Project 1-111] Unisim Transformation Summary:
No Unisim elements were transformed.
INFO: [Common 17-83] Releasing license: Synthesis
36 Infos, 26 Warnings, 0 Critical Warnings and 0 Errors encountered.
synth_design completed successfully
synth design: Time (s): cpu = 00:00:49 : elapsed = 00:00:51 . Memory (MB): peak = 2140.062 : gain =
55.961 : free physical = 3905 : free virtual = 4512
INFO: [Common 17-1381] The checkpoint '/tmp/tmpof 4uxc5/Example/proj mlinfn/working dir/bmaccelerato
r/bmaccelerator.runs/synth 1/bm design wrapper.dcp' has been generated.
INFO: [runtcl-4] Executing : report utilization -file bm design wrapper utilization synth.rpt -pb bm
design wrapper utilization synth.pb
INFO: [Common 17-206] Exiting Vivado at Wed Nov 2 23:17:23 2022...
[Wed Nov 2 23:17:33 2022] synth 1 finished
wait on run: Time (s): cpu = 00:15:18 : elapsed = 00:12:01 . Memory (MB): peak = 2258.160 : gain = 0
.000 : free physical = 4618 : free virtual = 5223
# exit
INFO: [Common 17-206] Exiting Vivado at Wed Nov 2 23:17:34 2022...
 Project: proj mlinfn] - [Vivado toolchain - design synthesis end]
 Command > make design_implementation
```
#### DEMO - BondMachine accelerator creation

```
report power completed successfully
report_power: Time (s): cpu = 00:00:27 ; elapsed = 00:00:15 . Memory (MB): <u>peak = 3082.508 ; gain =</u>
26.992 ; free physical = 3048 ; free virtual = 3694
INFO: [runtcl-4] Executing : report route status -file bm design wrapper route status.rpt -pb bm des
ign wrapper route status.pb
INFO: [runtcl-4] Executing : report_timing_summary -max_paths 10 -file bm_design_wrapper_timing_summ
ary routed.rpt -pb bm design wrapper timing summary routed.pb -rpx bm design wrapper timing summary
routed.rpx -warn on violation
INFO: [Timing 38-91] UpdateTimingParams: Speed grade: -1. Delay Type: min max.
INFO: [Timing 38-191] Multithreading enabled for timing update using a maximum of 4 CPUs
INFO: [runtc]-4] Executing : report incremental reuse -file bm design wrapper incremental reuse rout
ed.rpt
INFO: [Vivado Tcl 4-1062] Incremental flow is disabled. No incremental reuse Info to report.
INFO: Truntcl-41 Executing : report clock utilization -file bm design wrapper clock utilization rout
ed.rpt
INFO: [runtcl-4] Executing : report bus skew -warn on violation -file bm design wrapper bus skew rou
ted.rpt -pb bm design wrapper bus skew routed.pb -rpx bm design wrapper bus skew routed.rpx
INFO: [Timing 38-91] UpdateTimingParams: Speed grade: -1. Delay Type: min max.
INFO: [Timing 38-191] Multithreading enabled for timing update using a maximum of 4 CPUs
INFO: [Common 17-206] Exiting Vivado at Wed Nov 2 23:23:42 2022...
[Wed Nov 2 23:23:58 2022] impl 1 finished
wait on run: Time (s): cpu = 00:00:02 : elapsed = 00:05:46 . Memory (MB): peak = 2202.250 : gain = 0
.000 : free physical = 4625 : free virtual = 5273
# exit
INFO: [Common 17-206] Exiting Vivado at Wed Nov 2 23:23:58 2022...
 Project: proj mlinfn] [Vivado toolchain - design implementation end]
```

ICTP-IAEA School on FPGA-based SoC 22

#### DEMO - BondMachine accelerator creation

```
report_power: Time (s): cpu = 00:00:27 ; elapsed = 00:00:15 . Memory (MB): peak = 3082.508 ; gain =
26.992 ; free physical = 3048 ; free virtual = 3694
INFO: [runtcl-4] Executing : report route status -file bm design wrapper route status.rpt -pb bm des
ign wrapper route status.pb
INFO: [runtcl-4] Executing : report timing summary -max paths 10 -file bm design wrapper timing summ
ary_routed.rpt -pb bm_design_wrapper_timing_summary_routed.pb -rpx bm_design_wrapper_timing_summary_
routed.rpx -warn on violation
INFO: [Timing 38-91] UpdateTimingParams: Speed grade: -1. Delay Type: min max.
INFO: [Timing 38-191] Multithreading enabled for timing update using a maximum of 4 CPUs
INFO: [runtc]-4] Executing : report incremental reuse -file bm design wrapper incremental reuse rout
ed.rpt
INFO: [Vivado_Tcl 4-1062] Incremental flow is disabled. No incremental reuse Info to report.
INFO: [runtcl-4] Executing : report clock utilization -file bm design wrapper clock utilization rout
ed.rpt
INFO: [runtcl-4] Executing : report bus skew -warn on violation -file bm design wrapper bus skew rou
ted.rpt -pb bm design wrapper bus skew routed.pb -rpx bm design wrapper bus skew routed.rpx
INFO: [Timing 38-91] UpdateTimingParams: Speed grade: -1, Delay Type: min_max.
INFO: [Timing 38-191] Multithreading enabled for timing update using a maximum of 4 CPUs
INFO: [Common 17-206] Exiting Vivado at Wed Nov 2 23:23:42 2022...
[Wed Nov 2 23:23:58 2022] impl 1 finished
wait on run: Time (s): cpu = 00:00:02 : elapsed = 00:05:46 . Memory (MB): peak = 2202.250 : gain = 0
.000 : free physical = 4625 : free virtual = 5273
# exit
INFO: [Common 17-206] Exiting Vivado at Wed Nov 2 23:23:58 2022...
 Project: proj_mlinfn] - [Vivado toolchain - design implementation end]
 Command > make design_bitstream
```

#### DEMO - BondMachine accelerator creation

```
the MREG and PREG registers to be used. If the DSP48 was instantiated in the design, it is suggeste
d to set both the MREG and PREG attributes to 1 when performing multiply functions.
INFO: [Vivado 12-3199] DRC finished with 0 Errors. 84 Warnings
INFO: Vivado 12-3200] Please refer to the DRC report (report drc) for more information.
INFO: [Designutils 20-2272] Running write bitstream with 4 threads.
Loading data files...
Loading site data...
Loading route data...
Processing options...
Creating bitmap...
Creating bitstream...
Writing bitstream ./bm_design_wrapper.bit...
Writing bitstream ./bm design wrapper.bin...
INFO: [Vivado 12-1842] Bitgen Completed Successfully.
INFO: [Common 17-83] Releasing license: Implementation
22 Infos. 84 Warnings. 0 Critical Warnings and 0 Errors encountered.
write bitstream completed successfully
write bitstream: Time (s): cpu = 00:00:58 ; elapsed = 00:00:44 . Memory (MB<u>): peak = 2909.914 ; gain</u>
= 498.211 ; free physical = 3484 ; free virtual = 4141
INFO: [Common 17-206] Exiting Vivado at Wed Nov 2 23:26:13 2022...
[Wed Nov 2 23:26:14 2022] impl 1 finished
wait on run: Time (s): cpu = 00:01:47 : elapsed = 00:01:38 . Memory (MB): peak = 2186.242 : gain = 0
.000 ; free physical = 4694 ; free virtual = 5344
# exit
INFO: [Common 17-206] Exiting Vivado at Wed Nov 2 23:26:14 2022...
 Project: proj_mlinfn] - [Vivado toolchain - design bitstream end]
```

ICTP-IAEA School on FPGA-based SoC 22



ICTP-IAEA School on FPGA-based SoC 22

ICTP-IAEA School on FPGA-based SoC 22

Command > mkdir Example

ICTP-IAEA School on FPGA-based SoC 22

Command > mkdir Example Command > cd Example

ICTP-IAEA School on FPGA-based SoC 22

# DEMO - Standalone BondMachine creation Command > mkdir Example Command > cd Example Command > bmhelper create --project\_name mlinfn --board zedboard --project\_type neural\_network --n \_\_\_\_\_\_inputs 4 --n\_outputs 3 --source\_neuralbond banknote.json

ICTP-IAEA School on FPGA-based SoC 22

# DEMO - Standalone BondMachine creation Command > mkdir Example Command > cd Example Command > bmhelper create --project\_name mlinfn --board zedboard --project\_type neural\_network --n \_\_\_\_\_\_inputs 4 --n\_outputs 3 --source\_neuralbond banknote.json

ICTP-IAEA School on FPGA-based SoC 22

## DEMO - Standalone BondMachine creation Command > mkdir Example Command > cd Example Command > bmhelper create --project\_name mlinfn --board zedboard --project\_type neural\_network --n inputs 4 --n\_outputs 3 --source\_neuralbond banknote.json Command > cd proj\_mlinfn

ICTP-IAEA School on FPGA-based SoC 22

## DEMO - Standalone BondMachine creation Command > mkdir Example Command > cd Example Command > bmhelper create --project\_name mlinfn --board zedboard --project\_type neural\_network --n inputs 4 --n\_outputs 3 --source\_neuralbond banknote.json Command > cd proj\_mlinfn

ICTP-IAEA School on FPGA-based SoC 22



```
Command > mkdir Example
 Command > cd Example
 Command > bmhelper create --project_name mlinfn --board zedboard --project_type neural_network --n
______inputs 4 --n_outputs 3 --source_neuralbond banknote.json
 Command > cd proj_mlinfn
 Command > cat zedboard_maps.json
"Assoc" :
        "clk" : "clk",
        "reset" : "btnC"
```

ICTP-IAEA School on FPGA-based SoC 22

```
Command > mkdir Example
 Command > cd Example
 Command > bmhelper create --project_name mlinfn --board zedboard --project_type neural_network --n
______inputs 4 --n_outputs 3 --source_neuralbond banknote.json
 Command > cd proj_mlinfn
 Command > cat zedboard_maps.json
"Assoc" : {
        "clk" : "clk",
        "reset" : "btnC"
 Command >
make project
make synthesis
make implementation
make bitstream
```

ICTP-IAEA School on FPGA-based SoC 22

```
Command > mkdir Example
 Command > cd Example
 Command > bmhelper create --project_name mlinfn --board zedboard --project_type neural_network --n
______inputs 4 --n_outputs 3 --source_neuralbond banknote.json
 Command > cd proj_mlinfn
 Command > cat zedboard_maps.json
"Assoc" : {
        "clk" : "clk",
        "reset" : "btnC"
 Command >
make project
make synthesis
make implementation
make bitstream
```

ICTP-IAEA School on FPGA-based SoC 22

#### Notebook on the board - predictions and correctness



Thanks to PYNQ we can easily load the bitstream and program the FPGA in real time.

With their APIs we interact with the memory addresses of the BM IP to send data into the inputs and read the outputs (not using BM kernel module)

Dump output results for future analysis

Open the notebook

ICTP-IAEA School on FPGA-based SoC 22

## Inference evaluation

Evaluation metrics used:

**Inference speed**: time taken to predict a sample i.e. time between the arrival of the input and the change of the output measured with the **benchcore**; **Resource usage**: luts and registers in use;

Accuracy: as the average percentage of error on probabilities.



 $\sigma$ : 2875.94

Mean: 10268.45

Latency: 102.68 µs

| resource | value | occupancy |
|----------|-------|-----------|
| regs     | 15122 | 28.42%    |
| luts     | 11192 | 10.51%    |

Resource usage

ICTP-IAEA School on FPGA-based SoC 22

| ebook  |            |       |           |        |              |            |    |
|--------|------------|-------|-----------|--------|--------------|------------|----|
|        | ebook is ı |       | compare r |        | different ac | celerators | s. |
|        | Software   |       |           |        | BondMachine  | 2          |    |
| prob0  | prob1      | class |           | prob0  | prob1        | class      | I  |
| 0.6895 | 0.3104     | 0     |           | 0.6895 | 0.3104       | 0          | I  |
| 0.5748 | 0.4251     | 0     | 1         | 0.5748 | 0.4251       | 0          | ı  |

1

The output of the bm corresponds to the software output

0.4009

0.5990

1

#### Open the notebook

ICTP-IAEA School on FPGA-based SoC 22

0.4009

0.5990



## A first example of optimization

#### Remember the softmax function?

$$\underbrace{\sigma(z_i) = \frac{e^{z_i}}{\sum_{j=1}^N e^{z_j}}}_{$$

$$e^{x} = \sum_{l=0}^{K} \frac{x^{l}}{l!}$$

•••

```
%section softmax .romtext iomode:sync
        entrv start
                        : Entry point
start:
mov r8. 0f0.0
{{range $v := intRange "0" .Params.inputs}}
{{printf "i2r r1,i%d\n" $v}}
               r0, 0f1.0
        mov
        mov
               r2, 0f1.0
               r3. 0f1.0
        mov
               r4, 0f1.0
        mov
        mov
               r5, 0f1.0
                r7, {{$.Params.expprec}}
        mov
loop{{printf "%d" $y}}
        multf
               r2. r1
               r3. r4
        multf
        addf
                r4, r5
        mov
                r6. r2
        divf
               r6. r3
        addf
                r0, r6
        dec
               r7.exit{{printf "%d" $y}}
                loop{{printf "%d" $v}}
exit{{printf "%d" $y}}:
{{$z := atoi $.Params.pos}}
{{if eq $v $z}}
mov r9, r0
%endsection
```











Changing number of K of the exponential factors in the softmax function...











Reduced inference times by a factor of  $10 \dots$  only by decreasing the number of iterations.



ICTP-IAEA School on FPGA-based SoC 22

## The tools (neuralbond+basm) create a graph of relations among fragments of assembly

Not necessarily a fragment has to be mapped to a single CP

 They can arbitrarily be rearranged into CPs
 The resulting firmwares are identical in term of the computing outcome, but differs in occupancy and latency.



#### Let see it live

ICTP-IAEA School on FPGA-based SoC 22

The tools (neuralbond+basm) create a graph of relations among fragments of assembly Not necessarily a fragment has to be mapped to a single CP

They can arbitrarily be rearranged into CPs The resulting firmwares are identical in term of the computing outcome, but differs in occupancy and latency.



#### Let see it live

ICTP-IAEA School on FPGA-based SoC 22

The tools (neuralbond+basm) create a graph of relations among fragments of assembly

Not necessarily a fragment has to be mapped to a single CP

They can arbitrarily be rearranged into CPs

The resulting firmwares are identical in term of the computing outcome, but differs in occupancy and latency.



#### Let see it live

ICTP-IAEA School on FPGA-based SoC 22

- The tools (neuralbond+basm) create a graph of relations among fragments of assembly
- Not necessarily a fragment has to be mapped to a single CP
- They can arbitrarily be rearranged into CPs
  The resulting firmwares are identical in term of the computing outcome, but differs in occupancy and latency.



#### Let see it live



ICTP-IAEA School on FPGA-based SoC 22

## Conclusions and Future directions

#### 1 Introduction

FPGA Architectures

#### Abstractions

#### The BondMachine project

- Architectures handlin Architectures molding Bondgo Basm API
- 3 Clustering An example Video Distributed architectu

#### 4 Accelerators Hardware Software

Tests

#### Benchmark MiSC Project timeline Supported boards

#### 6 Machine Learnir

- Train BondMachine creation Simulation Accelerator Benchmark
- 7 Optimizations Softmax example Results Fragments compositions

#### 8 Conclusions and Future directions Conclusions Ongoing Future

The BondMachine is a new kind of computing device made possible in practice only by the emerging of new re-programmable hardware technologies such as FPGA.

The result of this process is the construction of a computer architecture that is not anymore a static constraint where computing occurs but its creation becomes a part of the computing process, gaining computing power and flexibility.

Over this abstraction is it possible to create a full computing Ecosystem, ranging from small interconnected IoT devices to Machine Learning accelerators.

Conclusions



Documentation

- First DAQ use case
- Complete the inclusion of Intel and Lattice FPGAs
- ML inference in a cloud workflow



Different data types and operations, especially low and trans-precision

Different boards support, especially data center accelerator

Compare with GPUs

Include some real power consumption measures



#### Quantization

- More datasets: test on other datasets with more features and multiclass classification
- Neurons: increase the library of neurons to support other activation functions
- **Evaluate results**: compare the results obtained with other technologies (CPU and GPU) in terms of inference speed and energy efficiency



Assembles in an end of the second sector is the second set

Improve the networking including new kind of interconnection firmware

What would an OS for BondMachines look like ?

ICTP-IAEA School on FPGA-based SoC 22

Include new processor shared objects and currently unsupported opcodes Extend the compiler to include more data structures

Assembler improvements, fragments optimization and others

Improve the networking including new kind of interconnection firmware

What would an OS for BondMachines look like ?

ICTP-IAEA School on FPGA-based SoC 22

Future work

Include new processor shared objects and currently unsupported opcodes

Extend the compiler to include more data structures

Assembler improvements, fragments optimization and others

I Improve the networking including new kind of interconnection firmware

What would an OS for BondMachines look like ?

ICTP-IAEA School on FPGA-based SoC 22

Future work

Include new processor shared objects and currently unsupported opcodes Extend the compiler to include more data structures

Assembler improvements, fragments optimization and others

Improve the networking including new kind of interconnection firmware

What would an OS for BondMachines look like ?

ICTP-IAEA School on FPGA-based SoC 22

Future work

Include new processor shared objects and currently unsupported opcodes Extend the compiler to include more data structures

Assembler improvements, fragments optimization and others

Improve the networking including new kind of interconnection firmware

What would an OS for BondMachines look like ?

Future work



website: http://bondmachine.fisica.unipg.it code: https://github.com/BondMachineHQ parallel computing paper: link contact email: mirko.mariotti@unipg.it

ICTP-IAEA School on FPGA-based SoC 22