Fault-Tolerant Communication over Micronmesh NOC with Micron-Message Passing Protocol

Heikki Kariniemi and Jari Nurmi
Department of computer systems, Tampere university of technology


In the future Multi-Processor System-on-Chip (MPSoC) platforms are becoming more vulnerable to transient and intermittent faults due to physical level problems of VLSI technologies. This sets new requirements to the fault-tolerance of the messaging layer software which applications use for communication, because these faults make the operation of the Network-on-Chip (NOC) hardware of the MPSoCs less reliable. This paper presents Micron Message-Passing (MMP) Protocol which is a light-weight protocol designed for improving the fault tolerance of the messaging layer of the MPSoCs where Micronmesh NOC is used. Its fault-tolerance is implemented by watchdog timers and Cyclic Redundancy Checks (CRC) which are usable for detecting packet losses, communication deadlocks, and bit errors. These two functionalities are necessary, because without them the software of the MPSoCs is not able to detect the faults and recover from them. This paper presents also how the MMP Protocol can be used in applications which are able to recover from communication faults.