Pages

Search Here

System-on-Chip Design


Introduction

System-on-a-chip or system on chip (SoC or SOC) refers to integrating all components of a computer or other electronic system into a single integrated circuit (chip). It may contain digital, analog, mixed-signal, and often radio-frequency functions – all on one chip. A typical application is in the area of embedded systems.



A typical SoC consists of:
  1. One or more microcontroller, microprocessor or DSP core(s).
  2. Memory blocks including a selection of ROM, RAM, EEPROM and Flash.
  3. Timing sources including oscillators and phase-locked loops.
  4. Peripherals including counter-timers, real-time timers and power-on reset generators.
  5. External interfaces including industry standards such as USB, FireWire, Ethernet, USART, SPI.
  6. Analog interfaces including ADCs and DACs.
  7. Voltage regulators and power management circuits.
System-on-Chip

An IC, designed by stitching together multiple stand-alone VLSI designs to provide full functionality for an application

  • Possible due to DSM technology
  • Mind boggling design complexity due integration of multiple cores and embedded software
  • Conventionally functionality, delay, power and testability were main issues
  • New issues like interconnect delays, power and clock distribution, signal integrity, electromigration, transistor leakage, packaging effects, IR drop, etc
  • Moore’s law silicon complexity doubles every 18 months
  • Embedded software complexity is also increasing at a rate higher than Moore’s law.
  • Hence overall system complexity is compounded by these above factors

Broad classification


ASIC vendor design – all comps. in chip designed & fabbed by ASIC vendor

Integrated design – all comps. not designed by ASIC vendor. Cores can be from IP vendor/foundry

Desktop design – by a fabless company using cores IP vendors, EDA companies, etc.


Reusable Core or Virtual component or Macro


Reusable pre-designed and pre-verified cores form the basis of any SoC

Soft cores: synthesizable RTL, with user responsible for actual implementation and layout E.g. Virtually any digital design

Firm cores: structurally and topologically optimized for performance and area in the form of a synthesized netlist

Hard cores: optimized for power, performance, area mapped to a particular technology as a fully placed and routed netlist. Eg. Memory cells, ADCs, DACs, PLLs, Microprocessors.


Design flow

A SoC consists of both the hardware described above, and the software that controls the microcontroller, microprocessor or DSP cores, peripherals and interfaces. The design flow for an SoC aims to develop this hardware and software in parallel.



Most SoCs are developed from pre-qualified hardware blocks for the hardware elements described above, together with the software drivers that control their operation. Of particular importance are the protocol stacks that drive industry-standard interfaces like USB. The hardware blocks are put together using CAD tools; the software modules are integrated using a software development environment.

A key step in the design flow is emulation: the hardware is mapped onto an emulation platform based on a field programmable gate array (FPGA) that mimics the behavior of the SoC, and the software modules are loaded into the memory of the emulation platform. Once programmed, the emulation platform enables the hardware and software of the SoC to be tested and debugged at close to its full operational speed.

After emulation the hardware of the SoC follows the place and route phase of the design of an integrated circuit before it is fabricated.

Chips are verified for logical correctness before being sent to foundry. The process is called ASIC verification. Verilog and VHDL are typical hardware description languages used for verification. With growing complexity of chips, hardware verification languages like SystemVerilog, SystemC, e, and OpenVera are used. The bugs found in the verification stage are reported to the designer. Traditionally, 70% of time and energy in chip design life cycle are spent on verification

Synthesis-based design vs Full-custom design

  • Full-custom is used for analog blocks, PHYs for high speed commn., etc
  • With today’s tools and methodologies performance penalty for synthesis-based standard cell design is quite small
  • Even in processor designs, only data path will use hard macros, control logic- synthesis based
  • Decision is a balance of economics, time-to-market and performance

SoC Design Flow -Waterfall vs. Spiral

Waterfall
  • Project goes thru various phases never returning to previous phase
  • HW and SW development are serialized
  • For DSM this model fails as HW & SW have to be concurrently developed
  • Physical design issues should be accounted for early in design process
Spiral
  • Multiple aspects of design are addressed simultaneously
  • Concurrent SW & HW development
  • Parallel verification and synthesis of modules
  • Physically aware synthesis
  • Planned iterations

SoC Design Flow –Top-Down vs. Bottom-Up
  • Top-Down model is an ideal one which is never practiced
  • In a TpDn methodology if it is not feasible to design a lowest level block, the specification process has to be revisited
  • In real world a mixture of TpDn and BmUp are used
  • Critical low-level blocks can be built when the specs are refined
  • The availability of reusable hard and soft IPs helps facilitate this mixed methodology

System Design Process



1. System specification
  • Defines the behavior of the system
  • Most important part of a design phase
  • Specification documentation should also be done at the initial phase itself
  • Hardware
............................1. Functionality
............................2. External interfaces
............................3. Register definitions
............................4. Timing
............................5. Area and Power
  • Software
.............................1. Functionality
.............................2. Timing
.............................3. Performance
............................4. Interface to HW

  • Formal specification
............................1. Mechanism for specifying functionality, timing, power, area, etc
............................2. VSPEC
............................3. More of an research area

  • Executable specification
..........................1. An abstract model in C, C++, System C, etc.
..........................2. Addresses functional, timing, area and power reqts.
..........................3. Helps verify the functionality and HW/SW interfaces prior to detailed design


2. Develop a behavioral model

  • An executable specification to test the basic algorithms at the system level-System C, C, C++
  • Can act as a golden reference model
3. Refine the behavioral model and develop a verification suite

  • The design may go through multiple representations

4. Hardware/software partitioning
  • Based on cost/performance tradeoffs
  • Library of pre-verified macros and software modules is reqd.

5. Specify and develop a hardware architecture model

  • Memory architecture, bus structure and bandwidth are key factors
  • Architectures are evaluated by running application code
  • TLMs using System C can be used

6. Refine and test the architecture model

  • HW/SW cosimulation is done
  • SW cane be debugged even without actual HW
  • Fast and accurate hardware models are required

7. Specify implementation blocks
  • Function, performance, timing, power and area reqts. of individual blocks
  • I/O pins, registers

SoC Design Issues

a. Primary issues

  • Timing closure and functional verification
b. Interfaces and timing closure

  • In DSM interconnect delays dominate
  • Variance between actual delay and estimated delay is large
  • May require architectural changes affecting the entire design process
  • Physical synthesis – combining synthesis and timing driven placement and Timing driven routing help alleviate this problem
  • Registered inputs and outputs for a macro make the timing closure problem local – essential for a reusable IP

c. Synchronous vs. Asynchronous

  • Synchronous and flip-flop based
  • Earlier latches were considered to offer greater density and performance then flops
  • In DSM interconnect delays dominate and latch performance is insignificant
  • Latch timing is ambiguous and designers make use of it by Time borrowing to improve timing
  • Timing analysis of latch-based designs are difficult – is the circuit slow or is it due to time borrowing
d. Reset strategy

  • Issues
.......................Synchronous or asynchronous
.......................Internal or external POR
.......................Soft reset scheme ?
  • Synchronous reset
.......................Just another I/P
.......................Requires a free-running clock
  • Asynchronous reset
.......................No free-running clock reqt.
.......................Special reset lines like clk are reqd.
  • Reset should always be synchronously de-asserted so that all flops exit reset on the same clock
  • Asynchronous reset is preferred for reusable macros

e. Timing exceptions

  • Fully synchronous design is desirable
  • Tools tend to support a synchronous design
  • Asynchronous signals, False and Multi-cycle paths should be fully specified
  • If not tool may focus on these paths and fail to optimize the real critical paths

f. Clocking

  • Earlier latches were considered to offer greater density and performance then flops
  • In DSM interconnect delays dominate and latch performance is insignificant
  • Latch timing is ambiguous and designers make use of it by Time borrowing to improve timing
  • Timing analysis of latch-based designs are difficult – is the circuit slow or is it due to time borrowing

g. No Tristate on-chip buses

  • Tristate buses for board level as no. of wires is reduced
  • Multiple drivers active simultaneously can cause reliability issues
  • Floating tristate buses can cause leakage at receiver
  • Difficult to have critical timing control signals across different tech. especially at power-on
  • Mux-based on-chip buses should only be used

h. Guidelines for on-chip bus

  • On-chip bus architecture should have
.................Separate address, data and control buses
.................Support for multiple masters
.................Fully synchronous and multiple cycle transactions
  • Choose an industry standard bus architecture
.................AMBA, IBM- Core Connect, Wishbone, etc
  • For point-to-point connection as in a commn. Interface to its PHY use standard interfaces
  • Choose IPs compatible with selected on-chip bus

i. On-chip debug


  • Plan for debug features early in your design cycle otherwise the penalty in terms of cost and time can be very high
  • Without debug structures very difficult to test SoC
  • Controllability
.....................Turn-off/on or put each macro in debug mode
  • Observability
...................Additional logic to monitor internal nodes and buses to check and detect illegal
________transactions
...................Bring out internal nodes to existing I/O pins by using mux


j. Power

  • Static Power
.............Function of process and library
.............In today’s low voltage processes multi Vt libraries are offered
.............Low Vt libraries with high performance but also high leakage
.............High Vt libraries with low leakage but with a performance penalty
.............Techniques like turn-off power to low Vt blocks is being explored
  • Dynamic Power
.............f ( kCV2f)
.............k – switching activity
.............C – load capacitance
.............V – supply voltage
.............f – operating frequency
.............In low-power design k, C and V are minimized


k. Two typical scenarios for power reduction

  • High-performance where low-power is a secondary issue – Lap top computers
  • Extremely low-power designs – Handheld devices
l. Power reduction techniques

  • Run the core at lowest supply voltage
....................Primarily affects timing
....................To compensate performance degradation –parallelism, pipelining. etc can be
________used but performance increase may not be proportional due to increased area
________and capacitance
....................Typically I/O supply will be higher 3.3V, 5V to meet board level reqts.
  • Reducing capacitance and switching activity
...................(a) .Appropriate memory architectures
..................................instead of a single memory, partition into several blocks
______________Only accessed block needs to be powered
...................(b) Clock distribution
.................................Clock distribution network consumes power
.................................Single clock flop-based designs better than latch-based dual
.................................non-overlapping clock designs
.................................Clock gating – block level or for a flop
.................................Clock gating circuits tend to be technology dependent
.................................Block level gating is preferred for reusable designs
  • Gate sizing and synthesis techniques to restructure logic

Desired features of a Reusable Macro


Configurability

Ability to meet the needs of different designs
  • Processors with different implementations of multipliers, caches, etc
  • Communication interfaces like USB – with multiple configurations (no. of endpoints, FIFO size, low/full/high speed, etc)
  • Buses and peripherals – configurable address/data widths, arbitration schemes, etc

Standard Interfaces

  • Adopt industry standard interfaces – like AMBA, Core-connect, etc
  • More easy to integrate the core in a system

Compliance to Defensive Design Practices

  • Keep design as simple as possible
  • Follow guidelines for coding, timing closure, ease of verification and packaging for reuse

Deliverables

  • Synthesizable RTL (encrypted or unencrypted)
  • Verification IP for standalone and chip-level testing
  • Synthesis scripts
  • Documentation
Models for Hard Macros

Functional models
  • In the case of a processor the chip designer needs an accurate functional model, the software developer needs a fast model for software development
  • Tradeoffs w.r.t. speed and accuracy for different reqts.

BFMs
  • Useful for system simulation
  • Internal behavior of macro is abstracted out and provide capability to generate transactions on the interfaces of the macro

Behavioral and ISA models

  • A high level model which reflects the instruction-level behavior of the processor -ISA
  • Used as a golden reference
  • This fast model is used by SW developers
  • For non-processor designs it may be a high level algorithmic representation


Cycle-accurate models
  • Accurately model the behavior of the macro on a cycle-by-cycle basis
  • Slower than behavioral/ISA models and is used for HW verification not for SW development and debug
  • Zero-delay simulation model which can be used by the macro integrator


Sign-off models

  • Models the complete functionality as well as the timing
  • Typically generated from gate-level netlist with extracted SDF timing


Emulation models

  • Using FPGA based boards with programmable interconnect for rapid prototyping large chips
  • Many orders of magnitude faster than simulation
  • Generate a netlist for the target emulator


Hardware models

  • A physical chip interfacing directly to a software simulator
  • Thus the physical chip forms part of the total system being modeled


Timing models

  • Extracted from the SDF back-annotated netlist
  • Provides the setup and hold requirements, capacitance of input pins and clock-to-output delays for the output pins
  • If the design has ant state-dependent timing, the internal timing info of the macro has to be retained and this leads to a slow down of the STA runtimes
  • Clock insertion delay should be properly modeled


Power models

  • Not a fully mature area
  • Requires a detailed netlist simulated with vectors accurately representing the level of activity
  • Usually a rough estimate of the power for different supply voltages in mW/MHz is provided with the assumption that about 10% of gates will toggle at a time


Test models

  • For integrating the manufacturing tests for the macro into the overall manufacturing tests
  • IEEE is defining a Core Test Language, CTL
  • Contains information on clocks and synchronous signals, scan ports, scan chain specs, test mode signals, etc.
  • Typical test modes –Normal, Internal scan, InTest, ExTest and Safe

Physical models

  • The chip designer/integrator does the complete physical design of the chip excepting the hard macro which is a black box
  • It’s the silicon vendor who integrates the hard macro
  • A floorplanning model of macro will be generated which provides the block, pin location, etc information
  • An full/abstract LVS netlist will be provided to do a full chip layout-versus-schematic
  • A full/abstract GDSII for running a full chip DRC

1 comment:

  1. Hi, Thank you for sharing the blog on VLSI physical design course . FutureWiz is one of the top providers of VLSI training with certificates and placement.

    ReplyDelete