Search Here

Network-on-Chip


Introduction

The integration of an entire system onto the same silicon die (System-on-Chip, SoC) has become technically feasible as an effect of the increasing integration densities made available by deep sub-micron technologies and of the computational requirements of the most aggressive applications in the multimedia, automotive and ambient intelligence domain.

SoCs represent high-complexity, high-value semiconductor products that incorporate building blocks from multiple sources (either in-house made or externally supplied): in particular, general-purpose fully programmable processors, co-processors, DSPs, dedicated hardware accelerators, memory blocks, I/O blocks, etc. Even though commercial products currently exhibit only a few integrated cores (e.g., a RISC general purpose MIPS core and a VLIW Trimedia processor for the Philips Nexperia platform for multimedia applications), in the next few years technology will allow the integration of thousands of cores, making a large computational power available.

In contrast to past projections, which assumed that technology advances only needed to be linear and that all semiconductor products would deploy them, today the introduction of new technology solutions is increasingly application driven . As an example, let us consider ambient intelligence, which is considered the new paradigm for consumer electronics. Systems designed for ambient intelligence will be based on high-speed digital signal processing, with computational loads ranging from 10 MOPS for lightweight audio processing, 3 GOPS for video processing, and 20 GOPS for multilingual conversation interfaces and up to 1 TOPS for synthetic video generation. This computational challenge will have to be addressed at manageable power levels and affordable costs, and a single processor will not suffice, thus driving the development of more and more complex multi-processor SoCs (MPSoCs).

In this context, the performance of gigascale SoCs will be limited by the ability to efficiently interconnect predesigned and pre-verified functional blocks and to accommodate their communication requirements, i.e. it will be communication—rather than computation- dominated. Only an interconnect centric system architecture will be able to cope with these new challenges.

Current on-chip interconnects consist of low-cost shared communication resources, where an arbitration logic is needed for the serialization of bus access requests: only one master at a time can drive the bus. In spite of its low complexity, the main drawback of this solution is its lack of scalability, which will result in unacceptable performance degradation (e.g., contention-related delays for bus accesses) when the level of SoC integration will exceed a dozen of cores. Moreover, the connection of new blocks to a shared bus increases its associated load capacitance, resulting in more energy consuming bus transactions.



State-of-the-art communication architectures make use of evolutionary approaches, such as full or partial crossbars, allowing a higher degree of parallelism in accessing communication resources, but in the long term more aggressive solutions are required.

A scalable communication infrastructure that better supports the trend of SoC integration consists of an onchip packet- switched micro-network of interconnects, generally known as Network-on-Chip (NoC) architecture. The basic idea is borrowed from traditional large-scale multi-processors and the wide-area networks domain, and envisions on-chip router (or switch)-based networks on which packetized communication takes place, as depicted in Fig. 1. Cores access the network by means of proper interfaces, and have their packets forwarded to destination through a multihop routing path.

The scalable and modular nature of NoCs and their support for efficient on-chip communication potentially leads to NoC-based multi-processor systems characterized by high structural complexity and functional diversity.



Design Challenges for On-Chip Communication Architectures

Designing communication architectures for highly integrated deep sub-micron SoCs is a non-trivial task that needs to take into account the main challenges posed by technology scaling and by exponentially increasing system complexity. A few relevant SoC design issues are hereafter discussed:

Technology issues: While gate delays scale down with technology, global wire delays typically increase or remain constant as repeaters are inserted. It is estimated that in 50 nm technology, at a clock frequency of 10 GHz, a global wire delay can be up to 6–10 clock cycles. Therefore, limiting the on-chip distance travelled by critical signals will be key to guarantee the performance of the overall system, and will be a common design guideline for all kinds of system interconnects. Synchronization of cores on future SoCs will be unfeasible due to deep submicron effects (clock skew,power associated with clock distribution trees, etc.), and an alternative scenario consists of self-synchronous cores that communicate with one another through a network-centric architecture . Finally, signal integrity issues (cross- talk, power supply noise, soft errors,etc.) will lead to more transient and permanent failures of signals,logic values, devices and interconnects, thus raising the reliability concern for on-chip communication. In many cases, on-chip networks can be designed as regular structures, allowing electrical parameters of wires to be optimized and well controlled. This leads to lower communication failure probabilities, thus enabling the use of low swing signalling techniques , and to the capability of exploiting performance optimization techniques such as wavefront pipelining .

Performance issues: In traditional busses, all communication actors share the same bandwidth. As a consequence, performance does not scale with the level of system integration, but degrades significantly. Though, once the bus is granted to a master, access occurs with no additional delay. On the contrary, NoCs can provide much better performance scalability. No delays are experienced for accessing the communication infrastructure, since multiple outstanding transactions originated by multiple cores can be handled at the same time, resulting in a more efficient network resource utilization. However, given a certain network dimension (e.g., number of instantiated switches), large latency fluctuations for packet delivery could be experienced as a consequence of network congestion. This is unacceptable when hard real time constraints of an application have to be met, and two solutions are viable: network over dimensioning (for NoCs designed to support best-effort traffic only) or implementation of dedicated mechanisms to provide guarantees for timing constrained traffic (e.g., loss-less data transport, minimal bandwidth, bounded latency, minimal throughput, etc.).

Design productivity issues. It is well known that synthesis and compiler technology development does not keep up with IC manufacturing technology development. Moreover, time-to-market needs to be kept as low as possible. The reuse of complex pre-verified design blocks is efficient means to increase productivity, and regards both computation resources and the communication infrastructure. It would be highly desirable to have processing elements that could be employed in different platforms by means of a plug-and-play design style. To this purpose, a scalable and modular on-chip network represents a more efficient communication infrastructure compared with shared bus-based architectures. However, the reuse of processing elements is facilitated by the definition of standard network interfaces, which also make the modularity property of the NoC effective. The Virtual Socket Interface Alliance (VSIA) has attempted to set the characteristics of this interface industry-wide. OCP is another example of standard interface sockets for cores. It is worth remarking that such network interfaces also decouple the development of new cores from the evolution of new communication architectures. Finally, let us observe that NoC components (e.g., switches or interfaces) can be instantiated multiple times in the same design (as opposed to the arbiter of traditional shared busses, which is instance-specific) and reused in a large number of products targeting a specific application domain.


NEW DESIGN APPROACH

Network engineers have already gained experience with using stochastic techniques and models for large-scale designs. We propose borrowing models, techniques, and tools from the network design field and applying them to SoC design. We view a SoC as a micronetwork of components. The network is the abstraction of the communication among components and must satisfy quality-of-service requirements—such as reliability, performance, and energy bounds—under the limitation of intrinsically unreliable signal transmission and significant communication delays on wires. We propose using the micronetwork stack paradigm, an adaptation of the protocol stack shown in Figure to abstract the electrical, logic, and functional properties of the interconnection scheme.



SoCs differ from wide area networks in their local proximity and because they exhibit less nondeterminism. Local, high-performance networks—such as those developed for large-scale multiprocessors— have similar requirements and constraints. Some distinctive characteristics, such as energy constraints and design-time specialization, are unique to SoC networks, however.

Whereas computation and storage energy greatly benefit from device scaling, which provides smaller gates and memory cells, the energy for global communication does not scale down. On the contrary, as the “Wiring Delays” sidebar indicates, projections based on current delay optimization techniques for global wires3 show that global on-chip communication will require increasingly higher energy consumption. Hence, minimizing the energy used for communications will be a growing concern in future technologies. Further, network traffic control and monitoring can help better manage the power that networked computational resources consume. For example, the clock speed and voltage of end nodes can vary according to available network bandwidth.

Another facet of the SoC network design problem, design-time specialization, raises many new challenges. Macroscopic networks emphasize general- purpose communication and modularity. Communication network design has traditionally been decoupled from specific end applications and is strongly influenced by standardization and compatibility constraints in legacy network infrastructures. In SoC networks, these constraints are less restrictive because developers design the communication network fabric on silicon from scratch. Thus, only the abstract network interface for the end nodes requires standardization. Developers can tailor the network architecture itself to the application, or class of applications, the SoC design targets.

We thus envision a vertical design flow in which every layer of the micronetwork stack is specialized and optimized for the target application domain. Such an application-specific on-chip network- synthesis paradigm represents an open and exciting research field. Specialization does not imply complete loss of flexibility, however. From a design standpoint, network reconfigurability will be key in providing plug-and-play component use because the components will interact with one another through reconfigurable protocols.


NoC architectures


SPIN

The Scalable, Programmable, Integrated Network (SPIN) on-chip micronetwork defines packets as sequences of 32-bit words, with the packet header fitting in the first word. SPIN uses a byte in the header to identify the destination, allowing the network to scale up to 256 terminal nodes. Other bits carry packet tagging and routing information, and the packet payload can be of variable size. A trailer—which does not contain data, but a checksum for error detection—terminates every packet. SPIN has a packetization overhead of two words. The payload should thus be significantly larger than two words to amortize the overhead.



The SPIN micronetwork adopts cut-through switching to minimize message latency and storage requirements in the design of network switches. However, it provides some extra buffering space on output links to store data from blocked packets. Figure B shows SPIN’s fat-tree network architecture, which derives its name from the progressively increasing communication bandwidth toward the root. The architecture is nonblocking when packet size is limited to a single word. Because packets can span more than one switch, SPIN’s blocking is a side effect of cutthrough switching alone.

SPIN uses deterministic routing, with routing decisions set by the network architecture. In fat-tree networks, tree routing is the algorithm of choice. The network routes packets from a node, or tree leaf, toward the tree root until they reach a switch that is a common ancestor with the destination node. At that point, the network routes the packet toward the destination by following the unique path between the ancestor and destination nodes.

CLICHE

Kumar et al. have proposed a mesh-based interconnect architecture called CLICH_EE (Chip-Level Integration of Communicating Heterogeneous Elements). This architecture consists of an m _ n mesh of switches interconnecting computational resources (IPs) placed along with the switches, as shown in Fig. 1b in the particular case of 16 functional IP blocks. Every switch, except those at the edges, is connected to four neighboring switches and one IP block. In this case, the number of switches is equal to the number of IPs. The IPs and the switches are connected through communication channels. A channel consists of two unidirectional links between two switches or between aswitch and a resource.



TORUS

Dally and Towles ] have proposed a 2D torus as an NoC architecture, shown in Fig. The Torus architecture is basically the same as a regular mesh ; the only difference is that the switches at the edges are connected to the switches at the opposite edge through wrap-around channels. Every switch has five ports, one connected to the local resource and the others connected to the closest neighboring switches. Again, the number of switches is S ¼ N. The long end-around connections can yield excessive delays. However, this can be avoided by folding the torus, as shown in Fig.. This renders to a more suitable VLSI implementation and, consequently, in our further comparative analysis, we consider the Folded Torus.


OCTAGON

Karim et al. have proposed the OCTAGON MP-SoC architecture. Fig shows a basic octagon unit consisting of eight nodes and 12 bidirectional links. Each node is associated with a processing element and a switch. Communication between any pair of nodes takes at most two hops within the basic octagonal unit. For a system consisting of more than eight nodes, the octagon is extended to multidimensional space. The scaling strategy is as follows: Each octagon node is indexed by the 2-tuple
(i,j), i , j € [0 7] for each i = I , I € [0 7]
an octagon is constructed using nodes {(I,j), j € [0 7]} which results in eight individual octagon structures. These octagons are then connected by linking the corresponding i nodes according to the octagon configuration. Each node (I,J) belongs to two octagons: one consisting of nodes {(I,j), j € [0 7]} and the other consisting of nodes {(i,J), i € [0 7]}. Of course, this type of interconnection mechanism may significantly increase the wiring complexity.


Butterfly Fat-Tree

As shown in Fig. 1f. In our network, the IPs are placed at the leaves and switches placed at the vertices. A pair of coordinates is used to label each node, (l, p), where l denotes a node’s level and p denotes its position within that level. In general, at the lowest level, there are N functional IPs with addresses ranging from 0 to (N-1). The pair (0,N) denotes the locations of IPs at that lowest level. Each switch, denoted by S(l, p), has four child ports and two parent ports. The IPs are connected to N/4 switches at the first level. In the jth level of the tree, there are N/2j+1 switches. The number of switches in the butterfly fat tree architecture converges to a constant independent of the number of levels. If we consider a 4-ary tree, as shown in Fig. 1f, with four down links corresponding to child ports and two up links corresponding to parent ports, then the total number of switches in level j =1 is N/4. At each subsequent level, the number of required switches reduces by a factor of 2. In this way, the total number of switches approaches S = N/2 , as N grows arbitrarily large.




SWITCHING METHODOLOGIES

Switching techniques determine when and how internal switches connect their inputs to outputs and the time at which message components may be transferred along these paths. For uniformity, we apply the same approach for all NoC architectures. There are different types of switching techniques, namely,

Circuit Switching
Packet Switching, and
Wormhole Switching

In circuit switching, a physical path from source to destination is reserved prior to the transmission of the data. The path is held until all the data has been transmitted. The advantage of this approach is that the network bandwidth is reserved for the entire duration of the data. However, valuable resources are also tied up for the duration of the transmitted data and the set up of an end-to-end path causes unnecessary delays.


In packet switching, data is divided into fixed-length blocks called packets and, instead of establishing a path before sending any data, whenever the source has a packet to be sent, it transmits the data. The need for storing entire packets in a switch in case of conventional packet switching makes the buffer requirement high in these cases. In an SoC environment, the requirement is that switches should not consume a large fraction of silicon area compared to the IP blocks.large fraction of silicon area compared to the IP blocks.



In wormhole switching, the packets are divided into fixed length flow control units (flits) and the input and output buffers are expected to store only a few flits.



As a result, the buffer space requirement in the switches can be small compared to that generally required for packet switching. Thus, using a wormhole switching technique, the switches will be small and compact. The first flit, i.e., header flit, of a packet contains routing information. Header flit decoding enables the switches to establish the path and subsequent flits simply follow this path in a pipelined fashion. As a result, each incoming data flit of a message packet is simply forwarded along the same output channel as the preceding data flit and no packet reordering is required at destinations. If a certain flit faces a busy channel, subsequent flits also have to wait at their current locations.

One drawback of this simple wormhole switching method is that the transmission of distinct messages cannot be interleaved or multiplexed over a physical channel. Messages must cross the channel in their entirety before the channel can be used by another message. This will decrease channel utilization if a flit from a given packet is blocked in a buffer. By introducing virtual channels in the input and output ports, we can increase channel utility considerably. If a flit belonging to a particular packet is blocked in one of the virtual channels, then flits of alternate packets can use the other virtual channel buffers and, ultimately, the physical channel. The canonical architecture of a switch having virtual channels is shown in Fig.

PERFORMANCE METRICS

To compare and contrast different NoC architectures, a standard set of performance metrics can be used. For example, it is desirable that an MP-SoC interconnect architecture exhibits high throughput, low latency, energy efficiency, and low area overhead. In today’s power constrained environments, it is increasingly critical to be able to identify the most energy efficient architectures and to be able to quantify the energy-performance trade-offs. Generally, the additional area overhead due to the infrastructure IPs should be reasonably small. We now describe these metrics in more detail

1. Message Throughput

Typically, the performance of a digital communication network is characterized by its bandwidth in bits/sec. However, we are more concerned here with the rate that message traffic can be sent across the network and, so, throughput is a more appropriate metric. Throughput can be defined in a variety of different ways depending on the specifics of the implementation. For message passing systems, we can define message throughput, TP, as follows:


where Total messages completed refers to the number of whole messages that successfully arrive at their destination IPs, Message length is measured in flits, Number of IP blocks is the number of functional IP blocks involved in the communication, and Total time is the time (in clock cycles) that elapses between the occurrence of the first message generation and the last message reception. Thus, message throughput is measured s the fraction of the maximum load that the network is capable of physically handling. An overall throughput of TP = 1 corresponds to all end nodes receiving one flit every cycle. Accordingly, throughput is measured in flits/cycle/IP. Throughput signifies the maximum value of the accepted traffic and it is related to the peak data rate sustainable by the system.

2. Transport Latency

Transport latency is defined as the time (in clock cycles) that elapses from between the occurrence of a message header injection into the network at the source node and the occurrence of a tail flit reception at the destination node .We refer to this simply as latency in the remainder of this paper. In order to reach the destination node from some starting source node, flits must travel through a path consisting of a set of switches and interconnect, called stages. Depending on the source/destination pair and the routing algorithm, each message may have a different latency. There is also some overhead in the source and destination that also contributes to the overall latency.Therefore, for a given message i, the latency Li is:
Li = sender overhead + transport latency + receiver overhead
We use the average latency as a performance metric in our evaluation methodology. Let P be the total number of messages reaching their destination IPs and let Li be the latency of each message i, where i ranges from 1 to P. The average latency, Lavg, is then calculated according to the following:


3. Energy

When flits travel on the interconnection network, both the interswitch wires and the logic gates in the switches toggle and this will result in energy dissipation. Here, we are concerned with the dynamic energy dissipation caused by the communication process in the network. The flits from the source nodes need to traverse multiple hops consisting of switches and wires to reach destinations. Consequently, we determine the energy dissipated by the flits in each interconnect and switch hop. The energy per flit per hop is given by

Ehop = Eswitch + Einterconnect

where Eswitch and Einterconnect depend on the total capacitances and signal activity of the switch and each section of interconnect wire, respectively. They are determined as follows:
Eswitch = αswitch Cswitch V2
Einterconnect = αinterconnect Cinterconnect V2

αswitch αinterconnect and Cswitch; Cinterconnect are the signal activities and the total capacitances of the switches and wire segments, respectively. The energy dissipated in transporting a packet consisting of n flits over h hops can be calculated as
Let P be the total number of packets transported, and let Epacket be the energy dissipated by the ith packet, where i ranges from 1 to P. The average energy per packet, Epacket, is then calculated according to the following equation:



The parameters αswitch αinterconnect are those that capture the fact that the signal activities in the switches and the interconnect segments will be data-dependent, e.g., there may be long sequences of 1s or 0s that will not cause any transitions. Any of the different low-power coding techniques aimed at minimizing the number of transitions can be applied to any of the topologies described here. For the sake of simplicity and without loss of generality, we do not consider any specialized coding techniques in our analysis.

4. Area Requirements

To evaluate the feasibility of these interconnect schemes, we consider their respective silicon area requirements. As the switches form an integral part of the active components, the Virtual-channel switch. infrastructure, it is important to determine the amount of relative silicon area they consume. The switches have two main components: the storage buffer and logic to implement routing and flow control. The storage buffers are the FIFOs at the inputs and outputs of the switch. Another source of silicon area overhead arises from the interswitch wires, which, depending on their lengths, may have to be buffered through repeater insertion to keep the interswitch delay within one clock cycle. Consequently, this additional buffer area should also be taken into account. Another important factor that needs to be considered when analyzing the area overhead is the wiring layout. One of the main advantages of the NoC design methodology is the division of long global wires into smaller segments, characterized by propagation times that are compatible with the clock cycle budget . All the NoC architectures considered here achieve this as a result of their inherent interconnect structure. But, the segmented wire lengths will vary from one topology to another. Consequently, for each architecture, the layout of interswitch wire segments presents different degrees of complexity. Architectures that possess longer interswitch wires will generally create more routing challenges, compared to those possessing only shorter wire segments. Long wires can block wiring channels, forcing the use of additional metal layers and causing other wires to become longer. The determination of the distribution of interswitch wire lengths can give a first order indication of the overall wiring complexity.

5. Evaluation Methodology

In order to carry out a consistent comparison developed a simulator employing flit-level event-driven wormhole routing to study the characteristics of the communication centric parameters of the interconnect infrastructures. In our experiments, the traffic injected by the functional IP blocks followed Poisson and self-similar distributions . In the past, a Poisson distributed injection rate was frequently used when characterizing performance of multiprocessor platforms. However, the self-similar distribution was found to be a better match to real-world SoC scenarios . Each simulation was initially run for 1,000 cycles to allow transient effects to stabilize and, subsequently, it was executed for 20,000 cycles. Using a flit counter at the destinations, we obtain the throughput as the number of flits reaching each destination per unit time. To calculate average latency and energy, we associate an ordered pair, (Lswitch; Eswitch), with each switch and an ordered pair, (Linterconnect; Einterconnect), with each interconnect segment, where Lswitch; Linterconnect and Eswitch; Einterconnect denote the delays and energy dissipated in the switch and interconnect, respectively. The average latency and energy dissipation are calculated according to (2) and (7).
To estimate the silicon area consumed by the switches, we developed their VHDL models and synthesized them using a fully static, standard cell-based approach for a 0.13 µm CMOS technology library. Starting from this initial estimation, by using an ITRS (International Technology Roadmap for Semiconductors) suggested scaling factor of 0.7, we can project the area overhead in future technology nodes.


INFRASTRUCTURE IP DESIGN CONSIDERATIONS

One common characteristic of the communication-centric architectures described in this paper is that the functional IP blocks communicate with each other with the help of intelligent switches. The switches provide a robust data transport medium for the functional IP modules. To ensure the consistency of the comparisons we later make in this paper, we assume that similar types of switching and routing circuits are used in all cases. These designs are now described in more detail.

Switch Architecture

The different components of the switch port are shown in Fig. It mainly consists of input/output FIFO buffers, input/output arbiters, one-of-four MUX and DEMUX units, and a routing block. In order to have a considerably high throughput, we use a virtual channel switch, where each port of the switch has multiple parallel buffers .



Each physical input port has more than one virtual channel, uniquely identified by its virtual channel identifier (VCID). Flits may simultaneously arrive at more than one virtual channel. As a result, an arbitration mechanism is necessary to allow only one virtual channel to access a single physical port. Let there be m virtual channels corresponding to each input port; we need an m : 1 arbiter at the input. Similarly, flits from more than one input port may simultaneously try to access a particular output port. If k is the number of ports in a switch, then we need a (k-1): 1 arbiter at each output port. The routing logic block determines the output port to be taken by an incoming flit.

The operation of the switch consists of one or more processes, depending on the nature of the flit. In the case of a header flit, the processing sequence is: 1) input arbitration, 2) routing, and 3) output arbitration. In the case of body flits, switch traversal replaces the routing process since the routing decision based on the header information is maintained for the subsequent body flits. The basic functionality of the input/output arbitration blocks does not vary from one architecture to another. The design of the routing hardware depends on the specific topology and routing algorithm adopted. In order to make the routing logic simple, fast, and compact, we follow different forms of deterministic routing . In our routing schemes, we use distributed source routing, i.e., the source node determines only its neighboring nodes that are involved in message delivery. For the tree- based architectures (SPIN and BFT), the routing algorithm applied is the least common ancestor (LCA) and, for CLICHEE and Folded Torus, we apply the e-Cube (dimensional) routing . In the case of Octagon, we adopt the hierarchical address-based routing as proposed in . The corresponding routing blocks have been implemented for all the above-mentioned cases.

The arbiter circuit essentially consists of a priority matrix, which stores the priorities of the requesters, and grant generation circuits used to grant resources to requesters. The matrix arbiter stores priorities between n requesters in a binary n-by-n matrix. Each matrix element [ i , j ] records the binary priority between each pair of inputs. For example, suppose requester i has a higher priority than requester j, then the matrix element [ i , j ] will be set to 1, while the corresponding matrix element [ j , i ] will be 0. A requester will be granted the resource if no other higher priority requester is bidding for the same resource. Once a requester succeeds in being granted a resource, its priority is updated and set to be the lowest among all requesters. A block diagram of the arbiter and one element of the priority matrix circuit is shown in Fig.



The FIFO buffers are also critical components of the switch. Their operating speed should be high enough not to become a bottleneck in a high-speed network. More specifically, the switches at level one need to be interfaced with the SoC’s constituent IP blocks. Hence, the switches should be able to receive and transmit data at the rated speed of the corresponding IPs. Furthermore, the FIFOs should be able to operate with different read and write clocks as the SoC’s constituents IPs are expected to generally operate at different frequencies. Instead of using separate counters to implement read and write pointers, two tokens are circulated among the FIFO cells to implement read and write operations. A FIFO cell can be read from or written into only if it holds the corresponding token. After a token is used in a given cell, it is subsequently passed on to the adjacent cell.

Virtual Channel Allocation

The virtual channel allocation determines which output virtual channel is taken by a message at each of the intermediate switch nodes. Each switch input port has a separate queue buffer corresponding to the virtual channels. When a flit first arrives at an input port, its type is decoded. If it is a header flit, then, according to its VCID field, it is stored in the corresponding virtual channel buffer. The routing logic determines the output port to be taken by this flit and assigns the incoming flit to an available output virtual channel. The VCID of the flit is modified accordingly. When the subsequent body flits arrive, they are queued into the buffer of the input virtual channel and subsequently inherit the particular output virtual channel reserved by the header. Instead of reserving output ports for the entire duration of a packet, the switch allocates output ports on a flit-by-flit basis.



Network Interfacing

The success of the NoC design paradigm relies greatly on the standardization of the interfaces between IP cores and the interconnection fabric. The Open Core Protocol (OCP) is an interface standard receiving wide industrial and academic acceptance. Using a standard interface should not impact the methodologies for IP core development. In fact, IP cores wrapped with a standard interface like the OCP interface will exhibit a higher reusability and greatly simplify the task of system integration. The network interface will have two functions:

  1. injecting/absorbing the flits leaving/arriving at the functional IP blocks;
  2. packetizing/depacketizing the signals coming from/reaching to OCP compatible cores in form of messages/flits.
As shown in Fig 5, for a core having both master and slave interfaces, the OCP compliant signals coming out of the functional IP blocks are packetized by a second interface, which sits between the OCP instances and the communication fabric



No comments:

Post a Comment