The BondMachine Toolkit A novel moldable computer architecture

<u>Mirko Mariotti</u> Loriano Storchi Daniele Spiga

Department of Physics and Geology - University of Perugia INFN Perugia

The International Conference on Parallel Computing ParCo2019 Prague, Czech Republic, 10-13 September 2019





# The BondMachine Toolkit: A novel moldable computer architecture

In this presentation i will talk about:

- Technological background of the project
- The BondMachine Project: the Architecture
- The BondMachine Project: the Tools

Use cases



## Current challenges in computing

#### Von Neumann Bottleneck:

New computational problems show that current architectural models has to be improved or changed to address future payloads.

Energy Efficient computation: Not wasting "resources" (silicon, time, energy, instructions). Using the right resource for the specific case

Edge/Fog/Cloud Computing: Making the computation where it make sense Avoiding the transfer of unnecessary data Creating consistent interfaces for distributed systems



## Current challenges in computing

#### Von Neumann Bottleneck:

New computational problems show that current architectural models has to be improved or changed to address future payloads.

Energy Efficient computation: Not wasting "resources" (silicon, time, energy, instructions). Using the right resource for the specific case

Edge/Fog/Cloud Computing: Making the computation where it make sense Avoiding the transfer of unnecessary data Creating consistent interfaces for distributed systems



## Current challenges in computing

#### Von Neumann Bottleneck:

New computational problems show that current architectural models has to be improved or changed to address future payloads.

Energy Efficient computation: Not wasting "resources" (silicon, time, energy, instructions). Using the right resource for the specific case

Edge/Fog/Cloud Computing: Making the computation where it make sense Avoiding the transfer of unnecessary data Creating consistent interfaces for distributed systems



A field-programmable gate array (FPGA) is an integrated circuit whose logic is re-programmable. It's used to build reconfigurable digital circuits.

FPGAs contain an array of programmable logic blocks, and a hierarchy of reconfigurable interconnects that allow the blocks to be "wired together".

Logic blocks can be configured to perform complex combinational functions.







- A field-programmable gate array (FPGA) is an integrated circuit whose logic is re-programmable. It's used to build reconfigurable digital circuits.
  - FPGAs contain an array of programmable logic blocks, and a hierarchy of reconfigurable interconnects that allow the blocks to be "wired together".

Logic blocks can be configured to perform complex combinational functions. Interconnection Matrix



The FPGA configuration is generally specified using a hardware description language (HDL).



■ A field-programmable gate array (FPGA) is an integrated circuit whose logic is re-programmable. It's used to build reconfigurable digital circuits.

FPGAs contain an array of programmable logic blocks, and a hierarchy of reconfigurable interconnects that allow the blocks to be "wired together".

Logic blocks can be configured to perform complex combinational functions.

The FPGA configuration is generally specified using a hardware description language (HDL).







A field-programmable gate array (FPGA) is an integrated circuit whose logic is re-programmable. It's used to build reconfigurable digital circuits.

FPGAs contain an array of programmable logic blocks, and a hierarchy of reconfigurable interconnects that allow the blocks to be "wired together".

- Logic blocks can be configured to perform complex combinational functions.
- The FPGA configuration is generally specified using a hardware description language (HDL).







The use of FPGA in computing is growing due several reasons:

can potentially deliver great performance via massive parallelism

can address payloads which are not performing well on uniprocessors (Neural Networks, Deep Learning)

can handle efficiently non-standard data types



The use of FPGA in computing is growing due several reasons:

an potentially deliver great performance via massive parallelism

can address payloads which are not performing well on uniprocessors (Neural Networks, Deep Learning)

can handle efficiently non-standard data types



The use of FPGA in computing is growing due several reasons:

can potentially deliver great performance via massive parallelism

can address payloads which are not performing well on uniprocessors (Neural Networks, Deep Learning)

can handle efficiently non-standard data types



#### On the other hand the adoption on FPGA poses several challenges:

- Porting of legacy code is usually hard.
  - Interoperability with standard applications is problematic.



On the other hand the adoption on FPGA poses several challenges:

Porting of legacy code is usually hard.

Interoperability with standard applications is problematic.



- Multi-core, Two or more independent actual processing units execute multiple instructions at the same time.
  - The power is given by the number of cores.
  - Parallelism has to be addressed.
- Heterogeneous, different types of processing units.
  - Cell, GPU, Parallela, TPU.
  - The power is given by the specialization.
  - The units data transfer has to be addressed.
  - The payloads scheduling has to be addressed.



#### Today's computer architecture are:

• Multi-core, Two or more independent actual processing units execute multiple instructions at the same time.

- The power is given by the number of cores.
- Parallelism has to be addressed.
- Heterogeneous, different types of processing units.
  - Cell, GPU, Parallela, TPU.
  - The power is given by the specialization.
  - $\square$  The units data transfer has to be addressed.
  - The payloads scheduling has to be addressed.



- Multi-core, Two or more independent actual processing units execute multiple instructions at the same time.
  - The power is given by the number of cores.
  - Parallelism has to be addressed.
- Heterogeneous, different types of processing units.
  - Cell, GPU, Parallela, TPU.
  - The power is given by the specialization.
  - $\square$  The units data transfer has to be addressed.
  - The payloads scheduling has to be addressed.



- Multi-core, Two or more independent actual processing units execute multiple instructions at the same time.
  - The power is given by the number of cores.
  - Parallelism has to be addressed.
- Heterogeneous, different types of processing units.
  - Cell, GPU, Parallela, TPU.
  - The power is given by the specialization.
  - $\square$  The units data transfer has to be addressed.
  - The payloads scheduling has to be addressed.



- Multi-core, Two or more independent actual processing units execute multiple instructions at the same time.
  - The power is given by the number of cores.
  - Parallelism has to be addressed.
  - Heterogeneous, different types of processing units.
    - Cell, GPU, Parallela, TPU.
    - The power is given by the specialization.
    - The units data transfer has to be addressed.
    - The payloads scheduling has to be addressed.



- Multi-core, Two or more independent actual processing units execute multiple instructions at the same time.
  - The power is given by the number of cores.
  - Parallelism has to be addressed.
- Heterogeneous, different types of processing units.
  - Cell, GPU, Parallela, TPU.
  - The power is given by the specialization.
  - The units data transfer has to be addressed.
  - The payloads scheduling has to be addressed.



- Multi-core, Two or more independent actual processing units execute multiple instructions at the same time.
  - The power is given by the number of cores.
  - Parallelism has to be addressed.
- Heterogeneous, different types of processing units.
  - Cell, GPU, Parallela, TPU.
  - The power is given by the specialization.
  - The units data transfer has to be addressed.
  - The payloads scheduling has to be addressed.



- Multi-core, Two or more independent actual processing units execute multiple instructions at the same time.
  - The power is given by the number of cores.
  - Parallelism has to be addressed.
- Heterogeneous, different types of processing units.
  - Cell, GPU, Parallela, TPU.
  - The power is given by the specialization.
  - The units data transfer has to be addressed.
  - The payloads scheduling has to be addressed.



- Multi-core, Two or more independent actual processing units execute multiple instructions at the same time.
  - The power is given by the number of cores.
  - Parallelism has to be addressed.
- Heterogeneous, different types of processing units.
  - Cell, GPU, Parallela, TPU.
  - The power is given by the specialization.
  - The units data transfer has to be addressed.
  - The payloads scheduling has to be addressed.



- Multi-core, Two or more independent actual processing units execute multiple instructions at the same time.
  - The power is given by the number of cores.
  - Parallelism has to be addressed.
- Heterogeneous, different types of processing units.
  - Cell, GPU, Parallela, TPU.
  - The power is given by the specialization.
  - The units data transfer has to be addressed.
  - The payloads scheduling has to be addressed.













- Are composed by many, possibly hundreds, computing cores.
- Have very small cores and not necessarily of the same type (different ISA and ABI).
- Have a not fixed way of interconnecting cores.
- May have some elements shared among cores (for example channels and shared memories).



The BondMachine is a software ecosystem for the dynamic generation of computer architectures that:

#### Are composed by many, possibly hundreds, computing cores.

- Have very small cores and not necessarily of the same type (different ISA and ABI).
- Have a not fixed way of interconnecting cores.
- May have some elements shared among cores (for example channels and shared memories).



- Are composed by many, possibly hundreds, computing cores.Have very small cores and not necessarily of the same type (different ISA and ABI).
- Have a not fixed way of interconnecting cores.
- May have some elements shared among cores (for example channels and shared memories).



- Are composed by many, possibly hundreds, computing cores.
- Have very small cores and not necessarily of the same type (different ISA and ABI).
- Have a not fixed way of interconnecting cores.
- May have some elements shared among cores (for example channels and shared memories).



- Are composed by many, possibly hundreds, computing cores.
- Have very small cores and not necessarily of the same type (different ISA and ABI).
- Have a not fixed way of interconnecting cores.
- May have some elements shared among cores (for example channels and shared memories).



An example





### Connecting Processor (CP) The computational unit of the BM

# The atomic computational unit of a BM is the "connecting processor" (CP) and has:

- Some general purpose registers of size Rsize.
- Some I/O dedicated registers of size Rsize.
- A set of implemented opcodes chosen among many available.
- Dedicated ROM and RAM.
- There possible operating modes.


- Some general purpose registers of size Rsize.
- Some I/O dedicated registers of size Rsize.
- A set of implemented opcodes chosen among many available.
- Dedicated ROM and RAM.
- There possible operating modes.



- Some general purpose registers of size Rsize.
- Some I/O dedicated registers of size Rsize.
- A set of implemented opcodes chosen among many available.
- Dedicated ROM and RAM.
- There possible operating modes.



- Some general purpose registers of size Rsize.
- Some I/O dedicated registers of size Rsize.
  - A set of implemented opcodes chosen among many available.
- Dedicated ROM and RAM.
- There possible operating modes.



- Some general purpose registers of size Rsize.
- Some I/O dedicated registers of size Rsize.
- A set of implemented opcodes chosen among many available.
- Dedicated ROM and RAM.
  - There possible operating modes.



- Some general purpose registers of size Rsize.
- Some I/O dedicated registers of size Rsize.
- A set of implemented opcodes chosen among many available.
- Dedicated ROM and RAM.
- There possible operating modes.



# Shared Objects (SO)

The non-computational element of the BM

Alongside CPs, BondMachines include non-computing units called "Shared Objects" (SO).

Examples of their purposes are:

- Data storage (Memories).
- Message passing.
- CP synchronization.

A single SO can be shared among different CPs. To use it CPs have special instructions (opcodes) oriented to the specific SO.

Four kind of SO have been developed so far: the Channel, the Shared Memory, the Barrier and a Pseudo Random Numbers Generator.



#### Shared Objects

# Shared Objects (SO)

The non-computational element of the BM

Alongside CPs, BondMachines include non-computing units called "Shared Objects" (SO).

Examples of their purposes are:

- Data storage (Memories).



#### Shared Objects

# Shared Objects (SO)

The non-computational element of the BM

Alongside CPs, BondMachines include non-computing units called "Shared Objects" (SO).

Examples of their purposes are:

- Data storage (Memories).
- Message passing.

# Shared Objects (SO)

The non-computational element of the BM

Alongside CPs, BondMachines include non-computing units called "Shared Objects" (SO).

Examples of their purposes are:

- Data storage (Memories).
- Message passing.
- CP synchronization.

A single SO can be shared among different CPs. To use it CPs have special instructions (opcodes) oriented to the specific SO.

Four kind of SO have been developed so far: the Channel, the Shared Memory, the Barrier and a Pseudo Random Numbers Generator.

Shared Objects

# Shared Objects (SO)

The non-computational element of the BM

Alongside CPs, BondMachines include non-computing units called "Shared Objects" (SO).

Examples of their purposes are:

- Data storage (Memories).
- Message passing.
- CP synchronization.

A single SO can be shared among different CPs. To use it CPs have special instructions (opcodes) oriented to the specific SO.



# Shared Objects (SO)

The non-computational element of the BM

Alongside CPs, BondMachines include non-computing units called "Shared Objects" (SO).

Examples of their purposes are:

- Data storage (Memories).
- Message passing.
- CP synchronization.

A single SO can be shared among different CPs. To use it CPs have special instructions (opcodes) oriented to the specific SO.

Four kind of SO have been developed so far: the Channel, the Shared Memory, the Barrier and a Pseudo Random Numbers Generator.



- The BM computer architecture is managed by a set of tools to:
  - build a specify architecture
  - modify a pre-existing architecture
  - simulate or emulate the behavior
    - Generate the Hardware Description Code (HDL)

#### Processor Builder

Selects the single processor, assembles and disassembles, saves on disk as JSON, creates the HDL code of a CP

#### BondMachine Builder

Connects CPs and SOs together in custom topologies, loads and saves on disk as JSON, create BM's HDL code

#### Simulation Framework



- The BM computer architecture is managed by a set of tools to:
  - build a specify architecture
  - modify a pre-existing architecture
  - simulate or emulate the behavior
    - Generate the Hardware Description Code (HDL)

#### **Processor Builder**

Selects the single processor, assembles and disassembles, saves on disk as JSON, creates the HDL code of a CP

#### BondMachine Builder

Connects CPs and SOs together in custom topologies, loads and saves on disk as JSON, create BM's HDL code

#### Simulation Framework



- The BM computer architecture is managed by a set of tools to:
  - build a specify architecture
  - modify a pre-existing architecture
  - simulate or emulate the behavior
    - Generate the Hardware Description Code (HDL)

#### Processor Builder

Selects the single processor, assembles and disassembles, saves on disk as JSON, creates the HDL code of a CP

#### BondMachine Builder

Connects CPs and SOs together in custom topologies, loads and saves on disk as JSON, create BM's HDL code

#### Simulation Framework



- The BM computer architecture is managed by a set of tools to:
  - build a specify architecture
  - modify a pre-existing architecture
  - simulate or emulate the behavior
    - Generate the Hardware Description Code (HDL)

#### Processor Builder

Selects the single processor, assembles and disassembles, saves on disk as JSON, creates the HDL code of a CP

#### BondMachine Builder

Connects CPs and SOs together in custom topologies, loads and saves on disk as JSON, create BM's HDL code

#### Simulation Framework



#### Mapping specific computational problems to BMs











### Mapping specific computational problems to BMs





























### The major innovation of the BondMachine Project is its compiler.

### Bondgo is the name chosen for the compiler developed for the BondMachine.

The compiler source language is Go as the name suggest.





### This is the standard flow when building computer programs



Mirko Mariotti

The BondMachine Toolkit

ParCo2019 - Prague 10-13 Sept

16/40



This is the standard flow when building computer programs

high level language source



This is the standard flow when building computer programs





This is the standard flow when building computer programs





#### bondgo loop example

```
package main
import ()
func main() {
  var reg_aa uint8
  var reg_ab uint8
  for reg_aa = 10; reg_aa > 0; reg_aa-- {
     reg_ab = reg_aa
     }
}
```

#### bondgo loop example in asm

| clr aa     |  |
|------------|--|
| clr ab     |  |
| rset ac 10 |  |
| cpy aa ac  |  |
| cpy ac aa  |  |
| jz ac 11   |  |
| cpy ac aa  |  |
| cpy ab ac  |  |
| j 11       |  |
| dec aa     |  |
| j 4        |  |
|            |  |
|            |  |



Bondgo does something different from standard compilers ...





Bondgo does something different from standard compilers ...

high level GO source





Bondgo does something different from standard compilers ...





Bondgo does something different from standard compilers ...





Bondgo does something different from standard compilers ...





Bondgo does something different from standard compilers ...





Bondgo does something different from standard compilers ...





### Bondgo A first example


### Bondgo A first example



Mirko Mariotti

The BondMachine Toolkit

ParCo2019 - Prague 10-13 Sept



Mirko Mariotti

The BondMachine Toolkit

ParCo2019 - Prague 10-13 Sept



Mirko Mariotti



Mirko Mariotti

The BondMachine Toolkit

ParCo2019 - Prague 10-13 Sept



Mirko Mariotti



Mirko Mariotti

The BondMachine Toolkit

ParCo2019 - Prague 10-13 Sept



Mirko Mariotti

The BondMachine Toolkit

ParCo2019 - Prague 10-13 Sept



Mirko Mariotti

The BondMachine Toolkit

ParCo2019 - Prague 10-13 Sept



Mirko Mariotti

The BondMachine Toolkit

ParCo2019 - Prague 10-13 Sept



#### $\dots$ bond go may not only create the binaries, but also the CP architecture, and $\dots$



Mirko Mariotti

The BondMachine Toolkit

ParCo2019 - Prague 10-13 Sept

... it can do even much more interesting things when compiling concurrent programs.



Mirko Mariotti

# Bondgo

### ... it can do even much more interesting things when compiling concurrent programs.

high level GO source







# Bondgo















### Bondgo A multi-core example

```
func multproc() {
  var inO bondgo.Input
  var out0 bondgo.Output
  var reg_d_in uint8
  var reg_d_out uint8
  in0 = bondgo.Make(bondgo.Input, 4)
  out0 = bondgo.Make(bondgo.Output, 3)
  for {
    reg_d_in = bondgo.IORead(in0)
    reg_d_out = reg_d_in * 8
    bondgo.IOWrite(out0, reg_d_out)
func main() {
  var inO bondgo.Input
  var in1 bondgo.Input
  var out0 bondgo.Output
  var reg_d_in0 uint8
  var reg d in1 uint8
  var reg_d_out0 uint8
  in0 = bondgo.Make(bondgo.Input, 1)
  in1 = bondgo.Make(bondgo.Input, 2)
  out0 = bondgo.Make(bondgo.Output, 4)
device 0:
 go multproc()
  for {
    reg_d_in0 = bondgo.IORead(in0)
    reg_d_in1 = bondgo.IORead(in1)
    reg_d_out0 = reg_d_in0 + reg_d_in1
    bondgo.IOWrite(out0, reg_d_out0)
  3
```



# Compiling Architectures

### One of the most important result

The architecture creation is a part of the compilation process.



Mirko Mariotti

The BondMachine Toolkit

# Machine Learning with BondMachine

Architectures with multiple interconnected processors like the ones produced by the BondMachine Toolkit are a perfect fit for Neural Networks and Computational Graphs.

Several ways to map this structures to BondMachine has been developed:

- A native Neural Network library
- A Tensorflow to BondMachine translator
- An NNEF based BondMachine composer



# Machine Learning with BondMachine

Architectures with multiple interconnected processors like the ones produced by the BondMachine Toolkit are a perfect fit for Neural Networks and Computational Graphs.

Several ways to map this structures to BondMachine has been developed:

- A native Neural Network library
- A Tensorflow to BondMachine translator
- An NNEF based BondMachine composer



### So far we saw:

- An user friendly approach to create processors (single core).
- Optimizing a single device to support intricate computational work-flows (multi-cores) over an heterogeneous layer.

#### Interconnected BondMachines

What if we could extend the this layer to multiple interconnected devices ?



So far we saw:

An user friendly approach to create processors (single core).

• Optimizing a single device to support intricate computational work-flows (multi-cores) over an heterogeneous layer.

#### Interconnected BondMachines

What if we could extend the this layer to multiple interconnected devices ?



So far we saw:

- An user friendly approach to create processors (single core).
- Optimizing a single device to support intricate computational work-flows (multi-cores) over an heterogeneous layer.

#### Interconnected BondMachines

What if we could extend the this layer to multiple interconnected devices ?



So far we saw:

- An user friendly approach to create processors (single core).
- Optimizing a single device to support intricate computational work-flows (multi-cores) over an heterogeneous layer.

### Interconnected BondMachines

What if we could extend the this layer to multiple interconnected devices ?



Mirko Mariotti

### The same logic existing among CP have been extended among different BondMachines organized in clusters.

Protocols, one ethernet called etherbond and one using UDP called udpbond have been created for the purpose.

FPGA based BondMachines, standard Linux Workstations, Emulated BondMachines might join a cluster an contribute to a single distributed computational problem.



The same logic existing among CP have been extended among different BondMachines organized in clusters.

Protocols, one ethernet called etherbond and one using UDP called udpbond have been created for the purpose.

FPGA based BondMachines, standard Linux Workstations, Emulated BondMachines might join a cluster an contribute to a single distributed computational problem.



The same logic existing among CP have been extended among different BondMachines organized in clusters.

Protocols, one ethernet called etherbond and one using UDP called udpbond have been created for the purpose.

FPGA based BondMachines, standard Linux Workstations, Emulated BondMachines might join a cluster an contribute to a single distributed computational problem.







Mirko Mariotti

#### Clustering

### Bondgo A cluster creation example

```
func multproc() {
  var inO bondgo.Input
  var out0 bondgo.Output
  var reg_d_in uint8
  var reg_d_out uint8
  in0 = bondgo.Make(bondgo.Input, 4)
  out0 = bondgo.Make(bondgo.Output, 3)
  for {
    reg_d_in = bondgo.IORead(in0)
    reg_d_out = reg_d_in * 8
    bondgo.IOWrite(out0, reg_d_out)
func main() {
  var inO bondgo.Input
  var in1 bondgo.Input
  var out0 bondgo.Output
  var reg_d_in0 uint8
  var reg d in1 uint8
  var reg_d_out0 uint8
  in0 = bondgo.Make(bondgo.Input, 1)
  in1 = bondgo.Make(bondgo.Input, 2)
  out0 = bondgo.Make(bondgo.Output, 4)
device 1:
 go multproc()
  for {
    reg_d_in0 = bondgo.IORead(in0)
    reg_d_in1 = bondgo.IORead(in1)
    reg_d_out0 = reg_d_in0 + reg_d_in1
    bondgo.IOWrite(out0, reg_d_out0)
  7
3
```



A distributed example

### The result is: BondMachine Clustering Youtube video

### A general result

Parts of the system can be redeployed among different devices without changing the system behavior (only the performances).



Mirko Mariotti

The BondMachine Toolkit

Two use cases in Physics experiments are currently being developed:

Real time pulse shape analysis in neutron detectorsbringing the intelligence to the edge

Test beam for space experiments (DAMPE, HERD)increasing testbed operations efficiency

# Possible other uses

### The BondMachine could be used in several types of real world applications, some of them being:



# Possible other uses

### The BondMachine could be used in several types of real world applications, some of them being:

IoT and CyberPhysical systems.



The BondMachine could be used in several types of real world applications, some of them being:

I IoT and CyberPhysical systems.

Computer Science educational applications.

### Computing Accelerator

Our effort is now in enabling the possibility of building computing accelerators to be used from within standard (Linux) applications.



# Possible other uses

The BondMachine could be used in several types of real world applications, some of them being:

IoT and CyberPhysical systems.

Computer Science educational applications.

Our effort is now in enabling the possibility of building computing accelerators to be used from within standard (Linux) applications.




Mirko Mariotti

The BondMachine Toolkit



Mirko Mariotti

The BondMachine Toolkit



Mirko Mariotti

The BondMachine Toolkit



Mirko Mariotti

# Accelerators

#### Example

```
#include "bondmachineip1.h"
#include "bmapi.h"
/* Define the base memaddr of the BM IP core */
#define BM_BASE XPAR_BONDMACHINEIP1_0_S00_AXI_BASEADDR
int main(void)
  u32 input0 = 0, input1 = 0, output = 0;
  int i = 0, retval, input0_id = 0, input1_id = 1,
      output id = 0:
  /* Loop on input0 */
  for (input0 = 0; input0 < 5; input0 = input0 + 1)
    for (input1 = 0 ; input1 < 5 ; input1 = input1 + 1 )</pre>
      /* Write value to the two accelerator inputs */
      retval = BM_r2o(&input0, input0_id);
      retval = BM r2o(&input1, input1 id);
      /* run a simple delay to allow changes on output */
      for(i=0;i<DELAY;i++);</pre>
      retval = BM_i2r(&output, output_id);
  return 1;
3
```





# Hardware implementation FPGA

The HDL code for the BondMachine is written in Verilog and System Verilog, and has been tested on these devices/system:

- Digilent Basys3 Xilinx Artix-7 Vivado.
- Kintex7 Evaluation Board Vivado.
- Digilent Zedboard Xilinx Zynq 7020 Vivado.
- Linux Iverilog.
- Terasic De10nano Intel Cyclone V Quartus

Within the project other firmwares have been written or tested:

- Microchip ENC28J60 Ethernet interface controller.
- Microchip ENC424J600 10/100 Base-T Ethernet interface controller.
- ESP8266 Wi-Fi chip.



# The Prototype

The project has been selected for the participation at MakerFaire 2016 Rome (The Europen Edition) and a prototype has been assembled and presented.



#### First run Youtube video



Mirko Mariotti

The BondMachine Toolkit

ParCo2019 - Prague 10-13 Sept

# Project History



May 2016 - First tests on the idea.

- October 2016 Prototype at "Makerfaire 2016 Rome"
- Jul 2018 InnovateFPGA EMEA Silver Award.
- Aug 2018 Presented at Intel Campus, Santa Jose (CA) .
- Aug 2018 InnovateFPGA Iron Award in the Grand Final.









Mirko Mariotti

ParCo2019 - Prague 10-13 Sept

The BondMachine is a new kind of computing device made possible in practice only by the emerging of new re-programmable hardware technologies such as FPGA.

The result of this process is the construction of a computer architecture that is not anymore a static constraint where computing occurs but its creation becomes a part of the computing process, gaining computing power and flexibility.

Over this abstraction is it possible to create a full computing Ecosystem, ranging from small interconnected IoT devices to Machine Learning accelerators.



Improve the use of BondMachines as accelerators, integrating them into the ecosystem



Mirko Mariotti

- Improve the use of BondMachines as accelerators, integrating them into the ecosystem
- Start making benchmarks



- Improve the use of BondMachines as accelerators, integrating them into the ecosystem
- Start making benchmarks
- Integrate low and trans-precision instructions



- Improve the use of BondMachines as accelerators, integrating them into the ecosystem
- Start making benchmarks
- Integrate low and trans-precision instructions
- Find a way to sustain the project



- Improve the use of BondMachines as accelerators, integrating them into the ecosystem
- Start making benchmarks
- Integrate low and trans-precision instructions
- Find a way to sustain the project
- Move the repositories to github and open the code to the community



The project is at the stage of a working prototype, so work has to be done in several areas:



The project is at the stage of a working prototype, so work has to be done in several areas:

Include new processor shared objects and currently unsupported opcodes.



The project is at the stage of a working prototype, so work has to be done in several areas:

- Include new processor shared objects and currently unsupported opcodes.
- Extend the compiler to include more data structures.



The project is at the stage of a working prototype, so work has to be done in several areas:

- Include new processor shared objects and currently unsupported opcodes.
- Extend the compiler to include more data structures.
- Improve the networking including new interconnection firmwares.



The project is at the stage of a working prototype, so work has to be done in several areas:

- Include new processor shared objects and currently unsupported opcodes.
- Extend the compiler to include more data structures.
- Improve the networking including new interconnection firmwares.

What would an OS for BondMachines look like ?





# If you have question/curiosity on the project: Mirko Mariotti mirko.mariotti@unipg.it http://bondmachine.fisica.unipg.it