

| Test Structures and Optimization Methodologies for Electrical Variation in IC Manufacturing                                                                       | 1-1  |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|
| A Micropower DSP Architecture for Self-powered Microsensor Applications                                                                                           | 1-2  |
| A Sub-threshold Cell Library and Methodology                                                                                                                      | 1-3  |
| Minimum Energy Tracking Loop with Embedded DC-DC Converter in 65-nm CMOS                                                                                          | 1-4  |
| A 65-nm 8T Sub-threshold SRAM Employing Sense-amplifier Redundancy                                                                                                | 1-5  |
| A 65-nm, Ultra-dynamic Voltage, Scalable SRAM with Operating Range from 300mV to 1.2V for Optimal Performance and Energy                                          | 1-6  |
| Algorithms and Architectures for Ultra-low-power Video Compression                                                                                                | 1-7  |
| An All-digital UWB Transmitter in 90-nm CMOS                                                                                                                      | 1-8  |
| A 3- to 5-GHz Sub-banded UWB Receiver in 90-nm CMOS                                                                                                               | 1-9  |
| Pulsed UWB Transceiver for Small Lightweight Flying Vehicles                                                                                                      | 1-10 |
| Reaching the Optimal Mixed-signal Energy Point                                                                                                                    | 1-11 |
| 18Gb/s Optical IO: VCSEL Driver and TIA in 90-nm CMOS                                                                                                             | 1-12 |
| Reconfigurable Zero-crossing-based Analog Circuits                                                                                                                | 1-13 |
| Design and Characterization of CNT-CMOS Hybrid Systems                                                                                                            | 1-14 |
| A Piecewise-linear Moment-matching Approach to Parameterized Model Order Reduction for Highly Nonlinear Systems                                                   | 1-15 |
| A Quasi-convex Optimization Approach to Parameterized Model-order Reduction                                                                                       | 1-16 |
| Open-loop Digital Predistortion Using Cartesian Feedback for Adaptive RF Power Amplifier Linearization                                                            | 1-17 |
| Wideband Two-point Modulators for Multi-standard Transceivers                                                                                                     | 1-18 |
| An Ultra-low Power CMOS RF Transceiver for Medical Implants                                                                                                       | 1-19 |
| An Integrated Circuit Capable of Rapid Multi-frequency Measurements and a Reconfigurable Electrode Array for Use in<br>Anisotropic Electrical Impedance Myography | 1-20 |
| Equation-based Hierarchical Optimization of a Pipelined ADC                                                                                                       | 1-21 |
| A Hierarchical Bottom-up, Equation-based Optimization Design Methodology for RF Transceivers                                                                      | 1-22 |
| Comparator-based Switched Capacitor Circuits (CBSC)                                                                                                               | 1-23 |
| A Zero-crossing Based, 8b, 200MS/s Pipelined ADC                                                                                                                  | 1-24 |
| Ultra-high Speed A/D Converters Using Zero-crossing-based Circuits                                                                                                | 1-25 |
| High-accuracy Pipelined A/D Converter Based on Zero-crossing Switched Capacitor Circuits                                                                          | 1-26 |
| Low-voltage Comparator-based Switched-capacitor Sigma-delta ADC                                                                                                   | 1-27 |
| Zero-crossing-based ADC for mm-Wave Applications                                                                                                                  | 1-28 |
| Comparator-based Circuits for HBTs                                                                                                                                | 1-29 |
| Massively Parallel ADC with Self-calibration                                                                                                                      | 1-30 |
| Prediction of Time-to-contact for Intelligent Vehicles                                                                                                            | 1-31 |
| Very High-frequency DC-DC Boost Conversion                                                                                                                        | 1-32 |
| Techniques for Low-jitter Clock Multiplication                                                                                                                    | 1-33 |
| A Digitally-enhanced Delta-sigma Fractional-N Synthesizer                                                                                                         | 1-34 |
| Voltage-controlled Oscillator-based A/D Conversion                                                                                                                | 1-35 |
| A Σ∆ ADC with Noise-shaping VCO Quantizer and DEM Circuit                                                                                                         | 1-36 |
| A Sub Picosecond Time-to-digital Converter for On-chip Jitter Measurement                                                                                         | 1-37 |
| Techniques for Highly-digital Implementation of Clock and Data Recovery Circuits                                                                                  | 1-38 |
| Low-power CMOS Rectifier Design for RFID Applications                                                                                                             | 1-39 |
| Low-power Circuits for Brain-machine Interfaces                                                                                                                   | 1-40 |
| A 77-GHz Receiver Front-end for Passive Imaging                                                                                                                   | 1-41 |
| Power Amplifier Design for Millimeter-wave Imaging                                                                                                                | 1-42 |
| A 77-GHz System for Millimeter-wave Active Imaging                                                                                                                | 1-43 |
| Coding in Wideband OFDM Wireless Communications with Adaptive Modulation                                                                                          | 1-44 |
| An Organic Imager for Flexible Large-area Electronics                                                                                                             | 1-45 |
| Channel- and Circuits-aware, Energy-efficient Coding for High-speed Links                                                                                         | 1-46 |
| Design and Optimization of Equalized Interconnects for Energy-efficient On-chip Networks                                                                          | 1-47 |
| System-to-circuit Framework for High-speed Link Design-space Exploration                                                                                          | 1-48 |
| System Architecture Implications of CNT Interconnects                                                                                                             | 1-49 |

# Test Structures and Optimization Methodologies for Electrical Variation in IC Manufacturing

K. Balakrishnan, N. Drego, K. Gettings, D. Lim, D.S. Boning Sponsorship: SRC/FCRP C2S2, SRC/FCRP IFC, Samsung Electronics, IBM Faculty Fellowship

Modern circuit design needs efficient methods to characterize and model circuit variation in order to obtain high-yielding chips. Circuit and mask designers need accurate guidelines to prevent failures due to layout-induced variations. We address this need by contributing methodologies and new test structures to characterize the variations at device, interconnect, and system levels. Our recent work examines the sources of variation that affect the contact and via resistances associated with integrated MOSFET devices. Due to slight variations in the placement of these metal contacts due to lithographic uncertainties as well as the impacts of straininduced regions, the uniformity of nominally identical contacts on a die may be lessened. Early results of simulations performed in MEDICI demonstrate the effects of geometric perturbations on the current flow through the contact.

In order to address both device and interconnect variations, we introduce another methodology that uses a large number of test structures, such as MOSFETs and Charge-Based Capacitance Measurement (CBCM), to model variations in threshold voltage and leakage current, among others, based on the architecture proposed by Lefferts [1]. We designed, implemented and measured test circuits that include a large number of high-performance devices (devices under test or DUTs) controlled by low leakage switches and sensors to ensure a nominal value at the DUT terminals, as shown in Figure 1. With this methodology we gather the statistics necessary to identify and model these variations and prevent them from contributing to performance failure.

In addition, we also study an efficient method to characterize the variation in circuit parameters and optimize a circuit based on the variation model. While a direct measurement of physical circuit parameters, e.g., threshold voltage and parasitic capacitance, is costly, the measurement of electrical outputs such as drain currents and oscillation frequencies can be conducted in efficient ways. Combined with a response sensitivity model (RSM) evaluated from a circuit simulator such as SPICE, the statistical properties of the circuit parameter variation with correlation can be estimated from the measured electrical observables. This scheme has been tested using a frequency divider that generates self-oscillation frequencies at multiple bias conditions. The knowledge about the uncertainty of process and circuit parameters can be applied to a robust circuitoptimization framework developed based on convex optimization theory [2].

Lastly, a test-structure consisting of a DUT array containing ~140K DUTs to study threshold voltage ( $V_T$ ) variation has been designed and submitted for fabrication. Due to the exponential dependence of drain current on  $V_T$  in the sub-threshold regime of operation,  $V_T$  variation can be effectively isolated from other variation sources (Figure 2). A hierarchical, memory-like access scheme and analog-to-digital converter enable efficient data collection with minimal post-processing required to compute the actual DUT  $V_T$ . The large number of DUTs in a dense array allows determination of spatial correlation with high statistical significance. Additionally, a design consisting of completely digital logic paths is currently being formulated to study spatial correlation at the digital circuit level rather than solely at the device parameter level. These measurements will then be used to study architectural ideas to mitigate the impact of variation.



▲ Figure 1: Full  $I_{DS}$  vs.  $V_{GS}$  curves for a transistor that is part of a large array, and with dynamically controlled and monitored terminals.





- R. Lefferts and C. Jakubiec, "An integrated test chip for the complete characterization and monitoring of a 0.25µm CMOS technology that fits into five scribe line structures 150µm by 5,000µm," in Proc. International Conference on Microelectronic Test Structures, Monterey, CA, Mar. 2003, pp. 59-63.
- [2] Y. Xu, L. Pileggi, and S. Boyd, "ORACLE: Optimization with recourse of analog circuits including layout extraction," in Proc. of IEEE Design Automation Conference, San Diego, CA, June 2004, pp. 151-154.

# A Micropower DSP Architecture for Self-powered Microsensor Applications

N. Ickes, A.P. Chandrakasan Sponsorship: DARPA, Texas Instruments

Distributed microsensor networks consist of hundreds or thousands of miniature sensor nodes. Each node individually monitors the environment and collects data as directed by the user, and the network collaborates as a whole to deliver high-quality observations to a central base station. The large number of nodes in a microsensor network enables high-resolution, multi-dimensional observations and fault-tolerance that are superior to more traditional sensing systems. However, the small size and highly distributed arrangement of the individual sensor nodes make aggressive power management a necessity.

The aim of our project is to develop a micropower DSP platform optimized for medium bandwidth microsensor applications, such as acoustic sensing and tracking. These applications require significant signal processing capability at each node within a sensor network, while maintaining a roughly  $100\mu$ W average power consumption to enable self-powered (energy scavenging) operation. As illustrated in Figure 1, our DSP includes a general-purpose

processor core with an energy efficient instruction set, as well as coprocessors for accelerating Fourier transforms and FIR filtering. Power consumption in the large (62kB) on-chip memory is reduced by dividing the memory into banks (to reduce access energy) and by power-gating inactive banks (to reduce leakage energy). The CPU, FIR, and FFT cores are also power-gated. The DSP was fabricated in 90-nm CMOS by ST Microelectronics.

As part of ongoing work to develop a lightweight, power-aware operating system, the power-gating mechanisms have been characterized with respect to wakeup and energy break-even times. This information is being used to develop scheduling and memory management mechanisms that efficiently utilize power gating. The goal is to automate the details of power management behind standard programming interfaces, exposing only clear and easyto-use controls.



▲ Figure 1: DSP architecture, illustrating the twelve independent power domains, controlled by off-chip power switches. When combined with an external nonvolatile memory (for program storage), radio, and ADC, the DSP becomes a complete microsensor node.

# A Sub-threshold Cell Library and Methodology

J. Kwong, A.P. Chandrakasan Sponsorship: Texas Instruments, DARPA

In this work, we develop a sub-threshold library and design methodology that addresses the unique challenges and trade-offs in ultra-low voltage operation. Drive currents become comparable in magnitude to idle leakage currents, causing reduced output swings and possible functional errors. Due to the exponential dependence of sub-threshold currents on threshold voltage, subthreshold circuits are particularly sensitive to environmental and process variations. Figure 1 compares the delay distributions of an 8-bit adder in the sub-threshold and above-threshold regimes under transistor threshold voltage variation. Circuit performance exhibits much larger variability in sub-threshold, which can be mitigated through device sizing and choice of logic styles. The sub-threshold library employs a device-sizing methodology with functionality as the primary consideration, while implementing appropriate trade-offs between energy, delay, and variability. The theoretical sizing approach is detailed in [1] and validated in a 65-nm CMOS test chip fabricated by Texas Instruments. The test chip, shown in Figure 2, consists of two 16-bit FIR filters synthesized from the custom library using a standard CAD flow. By voltage scaling the power supply from 1.2V to 300mV, the test chip achieves 8 times energy reduction per filtering operation.



▲ Figure 1: Delay distribution of 8-bit adder in sub-threshold (top) and nominal  $V_{DD}$  (bottom), under threshold voltage variation. Arrow points to outliers.





## REFERENCES

[1] J. Kwong and A.P. Chandrakasan, "Variation-driven device sizing for minimum energy sub-threshold circuits," in *Proc. International Symposium on Low Power Electronics and Design*, Tegernsee, Germany, Oct. 2006, pp. 8-13.

# Minimum Energy Tracking Loop with Embedded DC-DC Converter in 65-nm CMOS

Y.K. Ramadass, A.P. Chandrakasan Sponsorship: DARPA, Texas Instruments

Minimizing the energy consumption of battery-powered systems is a key focus in integrated circuit design. Switching energy of digital circuits reduces quadratically as  $V_{\rm DD}$  is decreased below  $V_{\rm T}$  i.e., sub-threshold operation, while the leakage energy increases exponentially. These opposing trends result in a minimum energy point (MEP), defined as the operating voltage at which the total energy consumed per operation ( $\rm E_{op}$ ) is minimized [1]. The MEP can vary widely for a given circuit depending on its workload and environmental conditions, e.g., temperature. Energy savings of 50 - 100% are demonstrated by tracking the MEP as it varies, and even greater savings can be achieved in circuits dominated by leakage.

The energy-minimization circuitry shown in Figure 1 consists of a buck converter [2] that operates in the Pulse Frequency Modulation (PFM) mode. The digital circuit (FIR filter), which operates at the  $V_{DD}$  set by the converter, is clocked by a critical path replica ring oscillator. The energy-sensor circuitry determines the

energy consumed per operation at different operating voltages. Based on the energy per operation at a given operating voltage obtained from the energy-sensing circuit, an energy minimization algorithm changes the reference voltage to the buck converter suitably and the system approaches the minimum-energy operating voltage of the digital circuit using a slope-detection strategy. A test chip containing the minimum-energy tracking loop and the embedded DC-DC converter was fabricated in Texas Instruments' 65nm CMOS process (Figure 2). The area overhead of the minimum energy tracking loop which comprises the energyminimizing block and the energy sense capacitors  $C_1$  and  $C_2$  is just 0.05mm<sup>2</sup>. The digital test circuitry operates at voltages as low as 0.25V. Energy savings of the order of 50 - 100% were measured while tracking the MEP as it varies with workload and temperature. The DC-DC converter was able to deliver load voltages between 0.25V and 0.7V with an efficiency > 80% at load power levels of the order of  $1\mu$ W and above.



▲ Figure 1: Block diagram of the minimum-energy tracking loop and embedded DC-DC converter.



▲ Figure 2: Die Photo of the test chip in 65nm CMOS. The EMB is the energy-minimizing block, which comprises the energy-sensor circuitry and the energy-minimization algorithm.

- [1] B.H. Calhoun and A.P. Chandrakasan, "Characterizing and modeling minimum energy operation for subthreshold circuits," *IEEE International Symposium on Low Power Electronics and Design*, Aug. 2004, pp. 90-95.
- [2] J. Xiao, A. Peterchev, J. Zhang, and S. Sanders, "A 4μA-quiescent-current dual-mode buck converter IC for cellular phone applications," in IEEE International Solid-State Circuits Conference Digest of Technical Papers, San Francisco, CA, Feb. 2004, pp. 280-281.

# A 65-nm 8T Sub-threshold SRAM Employing Sense-amplifier Redundancy

N. Verma, A.P. Chandrakasan Sponsorship: DARPA, Intel Foundation Ph.D. Fellowship Program, NSERC

Deeply scaled technologies promise greater efficiency for digital circuits. Unfortunately, random device variations compromise the functionality of large SRAM arrays, which traditionally rely on ratioed bit-cell topologies to achieve the highest density. By virtue of greatly reduced leakage and access energy, sub-threshold SRAMs tremendously lower the total system power but require new bit-cell topologies and peripheral assists to manage variation and read-current degradation [1]. This work demonstrates a 256kb SRAM in 65-nm CMOS that uses the bit-cell shown in Figure 1 [2]. The buffered read eliminates the read static noise margin limitation [3]; peripheral footer circuitry eliminates read data signal degradation due to bit-line leakage; peripheral supply

drivers weaken the accessed storage cells to enforce the relative device strengths required for write-ability; and sense-amplifier redundancy provides a favorable trade-off between the offset and the area of the sensing network. These techniques are applied in the prototype test-chip shown in Figure 2. The test-chip integrates 256kb-in-8, 256-row-by-128-column blocks. Test results show that the design achieves full read and write functionality to 350mV, where the leakage power savings are over 20x compared to a 6T SRAM at 1V and over 3x compared to a 6T SRAM operating at its projected lowest voltage. Additionally, sense-amplifier redundancy reduces the probability of error from offsets by a factor of 5 for a given area constraint.



▲ Figure 1: An 8T bit-cell with 2T read-buffer for read stability and peripheral control of buffer-foot and W<sub>DD</sub> to manage bit-line leakage and write-ability in the presence of variation.



▲ Figure 2: Prototype 256kb SRAM in 65-nm CMOS organized as 8 blocks with 256 rows and 128 columns. Test-chip achieves full read and write functionality to 350mV.

- [1] B. Calhoun and A.P. Chandrakasan, "A 256kb sub-threshold SRAM in 65-nm CMOS," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, San Francisco, CA, Feb. 2006, pp. 480–481.
- [2] N. Verma and A.P. Chandrakasan, "A 65-nm 8T sub-Vt SRAM employing sense-amplifier redundancy," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, San Francisco, CA, Feb. 2007, pp. 328–329.
- [3] E. Seevinck, F.J. List, and J. Lohstroh, "Static-noise margin analysis of MOS SRAM cells," IEEE Journal of Solid-State Circuits, vol. 22, no. 5, pp. 748–754, Oct. 1987.

# A 65-nm, Ultra-dynamic Voltage, Scalable SRAM with Operating Range from 300mV to 1.2V for Optimal Performance and Energy

M.E. Sinangil, N. Verma, A.P. Chandrakasan Sponsorship: DARPA

Memory blocks account for a large fraction of the total chip area and total power consumption, so making a low power memory is very important for applications where power consumption is crucial. Dynamic Voltage Scaling is a very well-known method for reducing the power consumption of a system when the performance requirements change [1]. If the performance requirement for a system is varying, the voltage level and hence the speed of the system can also be changed instead of running the whole circuit at the highest speed and voltage all the time. Since energy consumption depends on the voltage level, significant savings can be achieved. Figure 1 shows the normalized leakage power of a 64kbit-SRAM memory array over different voltages. Operating the memory at 0.3V provides nearly 30X leakage power reduction. Figure 2 shows the normalized frequency of operation for the same memory over 0.3V to 1.2V range. Designing a memory for operation in both sub-threshold and above threshold is challenging because of the totally different characteristics of these regions. It is shown that a classical 6Tcell cannot operate in sub-threshold because of the degraded Read and Write Static Noise Margins (SNM). An 8T design is shown to be fully operational in the sub-threshold region [2]. In this design, the addition of a read-buffer to the bit-cell and the peripheral assists ensures the correct read operation. Correct write operation requires the header of the bit-cell to be driven to ground. However, for the high voltage operation, some of the peripheral assists may not be needed and, even more importantly, they might severely degrade the performance of the circuit. Because of that, the bit-cell transistor sizing and peripheral circuitry design should be made by considering their effects on circuit operation for both ends of the voltage range.



▲ Figure 1: Normalized leakage power of a 64kbit-SRAM memory array over the range 0.3V to 1.2V.



▲ Figure 2: Normalized frequency of operation of the memory over the range 0.3V to 1.2V.

- [1] P. Macken, M. Degrauwe, M.V. Paemel, and H. Oguey, "A voltage reduction technique for digital systems," in *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, San Francisco, CA, Feb. 1990, pp. 238-239.
- [2] N. Verma and A.P. Chandrakasan, "A 65nm 8T Sub-V<sub>t</sub> SRAM employing sense-amplifier redundancy," in IEEE International Solid-State Circuits Conference Digest of Technical Papers, San Francisco, CA, Feb. 2006, pp. 328-329.

## **Algorithms and Architectures for Ultra-low-power Video Compression**

D. Finchelstein, V. Sze, A.P. Chandrakasan Sponsorship: Nokia, Texas Instruments

Multimedia applications, such as video playback, are becoming increasingly pervasive. Since the platforms are often energy-constrained devices (cell phones, IPODs), the user experience is enhanced by extending the battery life during video decoding. The latest video coding standard is H.264 [1], and it is used in DVB-H and HDTV. While it provides a 50% improvement in compression efficiency over previous standards, this coding efficiency comes at the cost of increased decoder complexity of 4X over MPEG-2 and 2X over MPEG-4 Visual Simple Profile. This increased complexity translates to increased energy consumption, which is a critical concern for mobile and handheld devices.

This project aims to build an ASIC decoder that exploits techniques such as pipelining, parallelism, ultra-low voltage operation, and ultra-dynamic voltage scaling [2]. For instance, the IDCT computation can be parallelized as shown in Figure 2. In video decoders, memory consumes a large portion of overall system power. As a result, the number of redundant memory transfers must be minimized and caching data in on-chip SRAMs/registers should be explored. Using these techniques, the goal is to minimize system power, as compared to previously published decoders [3-4]. In addition to optimizing the hardware architecture of the H.264 decoder (Figure 1), we will also focus on the design of future video coding standards, e.g., "H.265". We envision that future algorithms will account for the energy and complexity costs of their hardware implementations. By incorporating the energy-awareness into the algorithm, future video coders can provide an explicit energy/PSNR trade-off, along with the existing bitrate/PSNR trade-off curves.



Figure 1: H.264 video decoder architecture.



Figure 2: Parallel IDCT architecture.

- [1] ITU-T Study Group, "Series H: Audiovisual and multimedia systems: Infrastructure of audiovisual services coding of moving video," in *General* Secretariat and Telecom Radiocommunication (ITU-R) Standardization (ITU-T), sec. H, no. 264, Sept. 2005.
- [2] B.H. Calhoun and A.P. Chandrakasan, "Ultra-dynamic voltage scaling using sub-threshold operation and local voltage dithering in 90nm CMOS," in Proc. IEEE International Sold-State Circuits Conference, San Francisco, CA, Feb. 2006, pp. 300-301.
- [3] C.C. Lin, J.W. Chen, H.C. Chang, Y.C. Yang, Y.H. Ou Yang, M.C. Tsai, J.I. Guo, J.S. Wang, "A 160K Gates/4.5 KB SRAM H.264 video decoder for HDTV applications," IEEE Journal of Solid-State Circuits, vol. 42, no. 1, pp. 170-182, Jan. 2007.
- [4] T.M. Liu, T.A. Lin, S.Z. Wang, W.P. Lee, J.Y. Yang, K.C. Hou, C.Y. Lee, "A 125 μW, fully scalable MPEG-2 and H.264/AVC video decoder for mobile applications," *IEEE Journal of Solid-State Circuits*, vol. 42, no. 1, pp. 161-169, Jan. 2007.

# An All-digital UWB Transmitter in 90-nm CMOS

D. Wentzloff, A.P. Chandrakasan Sponsorship: SRC/FCRP C2S2, NSF

Pulsed ultra-wideband (UWB) transceivers offer the potential for ultra-low-energy/bit operation because the signals are inherently duty-cycled. By eliminating components with long startup times, such as a phase-locked loop, all components in a pulsed-UWB transceiver can be disabled during the interval between pulses. This work focuses on an all-digital, pulsed-UWB transmitter that requires no analog bias currents, in which the energy is dissipated only in switching events, i.e.,  $CV^2$ , and by sub-threshold leakage currents.

This transmitter supports three channels with center frequencies of 3.45GHz, 4.05GHz, and 4.65GHz, and each channel carries 550-MHz-wide pulses. It communicates with a separate receiver IC [1] that performs energy detection in the desired channel. Pulse position modulation (PPM) is used to encode the data with a pulse repetition frequency (PRF) range of 10kHz-16.7MHz and a fixed PPM delay of 30ns. A block diagram of the transmitter is shown in Figure 1. Pulses are formed by combining a programmable number of equally-delayed edges, similar in operation to a frequency multiplier. During pulsed operation, the last stage of the delay line feeding back to the input is disabled, and the input to the delay line is a clock signal operating at the PRF. Depending on the data, the rising edge of this clock is externally delayed by the PPM interval of 30ns. This clock propagates through a

32-stage differential delay line with a delay controlled by an 8bit code, which generates a series of edges. The output of each stage is independently masked, and thus only the selected edges are combined to form an RF pulse. The 30 masked edges are combined using interleaved 15-edge combiners that toggle their outputs on falling edges on any of their inputs [2]. Thus, pulses are generated only on the rising edge of the PRF signal. The two combiners' outputs are XOR'ed. The resulting pulse has a center frequency determined by the delay per stage and a pulse width determined by the number of edges selected. The RF pad driver is essentially a CMOS inverter, with some added features for reducing leakage current and digitally controlling the output power (see Figure 2). The pulsed output of the inverter chain has spectral content at DC; therefore an off-chip band-select filter has been used to eliminate the DC content.

The chip was fabricated in a 90-nm CMOS process; a die photo is shown in Figure 2. The transmitter consumes a fixed  $96\mu W$ due to leakage currents. The active energy added only while pulsing is 37pJ/pulse, independent of the data rate. Pulse generation has been demonstrated in the three desired channels and the transmitter operation was verified in a wireless link using an energy-detection receiver [1]. At 16.7Mb/s, the total energy consumption (active and leakage power) is 43pJ/bit.



▲ Figure 1: Block diagram of the all-digital transmitter. All blocks are full-swing static CMOS circuits.





- F.S. Lee and A.P. Chandrakasan, "A 2.5nJ/b 0.65V 3-to-5GHz sub-banded UWB receiver in 90-nm CMOS," in Proc. IEEE International Solid-State Circuits Conference, San Francisco, CA, Feb. 2007, pp. 116-117.
- [2] C. Kim, I.-C. Hwang, and S.-M. Kang, "A low-power small-area ±7.28-ps-jitter 1-GHz DLL-based clock generator," IEEE Journal of Solid-State Circuits, vol. 37, no. 11, pp. 1414-1420, Nov. 2002.

# A 3- to 5-GHz Sub-banded UWB Receiver in 90-nm CMOS

F.S. Lee, A.P. Chandrakasan Sponsorship: SRC/FCRP C2S2, NSF

With the proliferation of portable electronics and wireless sensor networks, energy-efficient radios have become an active area of research [1-3]. Though sub-nJ/b data reception is achievable for data rates >100Mbps using optimized coherent architectures [4], there is a need for simple, low-cost, low-power, and scalable radios as specified in the IEEE 802.15.4a task group [5]. This work explores the unique properties of FCC-compliant pulsed UWB signals and scaled CMOS devices to improve energy/b of existing low data-rate GHz-range integrated radios for use in wireless sensor network applications. A non-coherent 0–16Mbps UWB receiver (Figure 1) using 3– 5GHz sub-banded PPM signaling is implemented in a 90-nm CMOS process (Figure 2) [6]. The RF and mixed-signal baseband circuits operate at 0.65V and 0.5V. Using duty-cycling, adjustable bandpass filters, and a relative-compare baseband, the receiver achieves 2.5nJ/b at  $10^{-3}$  BER with -99dBm sensitivity at 100~kbps. The energy efficiency is maintained across three orders of magnitude in data rate. A basic acquisition algorithm is developed on an FPGA platform and a transceiver system demo is assembled using this chip.







- B.P. Otis, Y.H. Chee, and J. Rabaey, "A 400 µW-RX, 1.6mW-TX superregenerative transceiver for wireless sensor networks," presented at IEEE International Solid-State Circuits Conference, Feb. 2005.
- [2] B. Cook, A. Berny, A. Molnar, S. Lanzisera, and J. Pister, "An ultra-low power 2.4GHz RF transceiver for wireless sensor networks in 0.13/spl mu/m CMOS with 400mV supply and an integrated passive RX front-end," in *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, San Francisco, CA, Feb. 2006, pp. 1460-1469.
- [3] J. Ryckaert, M. Badaroglu, V.D. Heyn, G.V. der Plas, P. Nuzzo, A. Baschirotto, S. D'Amico, C. Desset, H. Suys, M. Libois, B.V. Poucke, P. Wambacq, and B. Gyselinckx, "A 16mA UWB 3-to-5GHz 20Mpulses/s quadrature analog correlation receiver in 0.18/spl mu/m CMOS," in IEEE International Solid-State Circuits Conference Digest of Technical Papers, San Francisco, CA, Feb. 2006, pp. 368-377.
- [4] T. Aytur, H.-C. Kang, R. Mahadevappa, M. Altintas, S. ten Brink, T. Diep, C.-C. Hsu, F. Shi, F.-R. Yang, C.-C. Lee, R.-H. Yan, and B. Razavi, "A fully integrated UWB PHY in 0.13/spl mu/m CMOS," in IEEE International Solid-State Circuits Conference Digest of Technical Papers, San Francisco, CA, Feb. 2006, pp. 418-427.
- [5] IEEE 802.15.4a task group website, 2006. [Online]. Available: http://www.ieee802.org/15/pub/TG4a.html
- [6] F.S. Lee and A.P. Chandrakasan, "A 2.5 nJ/bit 0.65 V 3 to 5 GHz subbanded UWB receiver in 90 nm CMOS," in IEEE International Solid-State Circuits Conference Digest of Technical Papers, San Francisco, CA, Feb. 2007, pp. 116-117.

## **Pulsed UWB Transceiver for Small Lightweight Flying Vehicles**

D.C. Daly, M. Bhardwaj, P. Mercier, A.P. Chandrakasan Sponsorship: DARPA, NSERC

Ultra-wideband (UWB) technology has recently gained popularity for low-power, low-data rate wireless links [1-2]. In January 2007, an amendment to the low-power IEEE 802.15.4 standard was approved that adds support for an alternate, UWB physical layer. The UWB physical layer supports scalable data rates from kbps to Mbps, distances up to 100m, and both non-coherent and coherent signaling. The signaling scheme includes pulse-position modulation (PPM) combined with BPSK pulse bursting. Figure 1 presents a time-domain waveform of the 802.15.4a signaling scheme, in which multiple pulses are BPSK- modulated in a short burst during a single PPM time slot.

Our target wireless application is a small, lightweight flying vehicle. The flying vehicle must be able to communicate wirelessly up to 100 meters at a data rate of tens to hundreds of kbps. As the vehicle is miniature, power consumption, volume, and weight must all be minimized. In such systems, the transceiver must be highly integrated with few if any off-chip components. Non-coherent UWB signaling is used to relax the frequency accuracy requirements of RF circuit blocks, thereby allowing for a highly integrated, low-power implementation. Figure 2 presents the proposed UWB transceiver architecture. The receiver consists of a windowed energy detector and the transmitter consists of an all-digital pulse generator followed by a power amplifier. The lack of phase information associated with non-coherent signaling makes synchronization more challenging, leading to longer preambles. We are designing codes and algorithms to minimize this penalty.







▲ Figure 2: Proposed 802.15.4a transceiver consisting of a burst-mode UWB transmitter and a non-coherent, energy-detection receiver.

- F.S. Lee and A.P. Chandrakasan, "A 2.5nJ/b 0.65V 3-to-5GHz subbanded UWB receiver in 90nm CMOS," in Proc. IEEE International Solid-State Circuits Conference, San Francisco, CA, Feb. 2007, pp. 116-117.
- [2] D.D. Wentzloff and A.P. Chandrakasan, "A 47pJ/pulse 3.1-to-5GHz all-digital UWB transmitter in 90nm CMOS," in Proc. IEEE International Solid-State Circuits Conference, San Francisco, CA, Feb. 2007, pp. 118-119.

## **Reaching the Optimal Mixed-signal Energy Point**

B.P. Ginsburg, A.P. Chandrakasan Sponsorship: NDSEG Fellowship, DARPA

Ultra-wideband radio can be used for very high data rate (≥480 Mb/s) communication over short distances. For proper reception, the receiver requires a 500 MS/s analog-to-digital converter (ADC) with 4 bits of resolution. While flash is the typical architecture chosen, successive approximation register (SAR) ADCs feature superior complexity characteristics with similar amenability to deep submicron CMOS. Time-interleaving multiple SAR ADCs allows this long latency architecture to equal the throughputs necessary for UWB reception [1]. A comparative energy model concludes that SAR should outperform flash above 5 bits of resolution in 0.18µm CMOS, with its energy advantage improving in more advanced technologies [2]. The SAR architecture is particularly well suited to meet the challenges of design in deep submicron CMOS, including reduced voltage supplies, increased variability, and lower transistor output impedances. It uses only open loop amplification in a comparator, as opposed to the operational amplifier for the pipelined architecture. There is significant digital complexity on the critical path in a SAR converter, but digital power and speed directly benefit from the reduced feature sizes.

A prototype 500-MS/s, 5-b, 6-way time-interleaved SAR ADC [3] has been designed and fabricated in Texas Instruments' 65nm CMOS process. The prototype includes the split capacitor array that conserves charge between bit-cycles to lower the overall switching energy, and it settles faster because fewer capacitors switch during each period. The array also exhibits improved differential nonlinearity, as seen in Figure 1, with the same integral nonlinearity, which decreases the capacitor matching required to avoid missing codes. The ADC achieves Nyquist performance and consumes 6 mW from a 1.2 V supply. Its die photo is shown in Figure 2.

An understated trend in ADC design is the significant impact of digital circuitry on performance and power, particularly at medium to low resolutions. Even in this advanced technology, half of the total ADC power is consumed by the digital circuitry. A genuine mixed-signal energy optimization that explicitly includes all of the analog and digital blocks, and their interactions, is being developed to minimize the power consumption of this ADC. The optimization uses coupled energy and behavioral models to explicitly define the analog/digital interactions and tradeoffs.



▲ Figure 1: Behavioral simulations comparing the static linearity of the conventional and split capacitor arrays.



▲ Figure 2: Die photograph of 65-nm CMOS ADC. Total die area is 1.2 x 1.8 mm<sup>2</sup>.

- D. Draxelmayr, "A 6b 600MHz, 10mW ADC array in digital 90nm CMOS," in Proc. IEEE International Solid-States Circuits Conference, San Francisco, CA, Feb. 2004, pp. 264-265.
- [2] B.P. Ginsburg and A.P. Chandrakasan, "Dual time-interleaved successive approximation register ADCs for an ultra-wideband receiver," IEEE Journal of Solid-State Circuits, vol. 42, no. 2, pp. 247-257, Feb. 2007.
- [3] B.P. Ginsburg and A.P. Chandrakasan, "500-MS/s 5-bit ADC in 65-nm CMOS with split capacitor array DAC," IEEE Journal of Solid-State Circuits, vol. 42, no. 4, pp. 739-747, Apr. 2007.

# 18Gb/s Optical IO: VCSEL Driver and TIA in 90-nm CMOS

A. Kern, I. Young, A.P. Chandrakasan Sponsorship: SRC/FCRP IFC, Intel Corporation, NSF

Electrical IO is becoming limited by copper interconnect channel losses that depend on frequency and distance. Package-topackage optical interconnect sees negligible frequency-dependent channel losses, but data rates are limited by the intrinsic optical dynamics and electrical parasitics of the optical devices. This abstract summarizes the results of [1] and [2], including a pre-emphasis VCSEL driver and a cross-coupled cascode transimpedance amplifier (TIA) that apply circuit techniques to operate optical components beyond the intrinsic data rates imposed by these bandwidth limits. Die photographs of the fabricated VCSEL driver and TIA are shown in Figure 1A and Figure 1B, respectively.

The presented VCSEL driver operates a standard commercial GaAs VCSEL at 18Gb/s by using pre-emphasis to compensate for the large capacitance and intrinsic optical dynamics of the VCSEL. The driver derives timing information directly from the full-rate input data and generates pre-emphasis pulses with width resolution less than one bit period in a manner that is compatible with full-rate IO architectures. The VCSEL is modulated

with the summed output of two current-mode drivers, where the output of the second driver is delayed, inverted, and attenuated with respect to the first. DACs and a digital delay line provide digital control of the pre-emphasis pulse height and pulse width. Optical measurements shown in Figure 2A and Figure 2B demonstrate that pre-emphasis improves the vertical eye opening by 122% and the horizontal eye opening by 76% for modulation from 2mA to 10mA.

The differential TIA is based on a proposed core amplifier that uses cross-coupled NMOS cascodes to increase gain and bandwidth. A proposed symmetric feedback method provides nearconstant gain from DC to 9GHz in the differential TIA when used in conjunction with a DAC to perform DC offset cancellation. Measurements and simulations demonstrate that the TIA has the required gain and bandwidth to operate at 12.5Gb/s with 260fF input capacitance (Figure 2C) and 18Gb/s with 90fF input capacitance (Figure 2D) for an input current of 200uA.



▲ Figure 1: Die photographs of the fabricated pre-emphasis VCSEL driver (A) and cross-coupled cascode transimpedance amplifier (B).



▲ Figure 2: Measured eye diagrams of the VCSEL driver and transimpedance amplifier. For the VCSEL driver, pre-emphasis (A) improves the original eye (B). The TIA operates at 12.5Gb/s with 260fF (C) and 18Gb/s with 90fF (D) of input capacitance.

- [1] A. Kern, A. Chandrakasan, and I. Young, "18Gb/s optical IO: VCSEL driver and TIA in 90-nm CMOS," presented at VLSI Symposium, Kyoto, Japan, June 2007.
- [2] A. Kern, "CMOS circuits for VCSEL-based optical IO," Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, 2007.

# **Reconfigurable Zero-crossing-based Analog Circuits**

P. Lajevardi, A.P. Chandrakasan, H.-S. Lee Sponsorship: CICS, DARPA

Switched-capacitor circuits can be used to implement many analog systems such as ADCs, DACs, filters, amplifiers, and integrators. In this research, a reconfigurable switched-capacitor system is proposed to implement different analog systems. Using the same building blocks, Figure 1 shows the block diagram of the system. Each switched-capacitor block can implement an integrator or a multiplier with a reconfigurable coefficient. Such a system will be useful for software defined radios and fast prototyping of analog circuits.

The design of such systems has not been practical since switchedcapacitor circuits are op-amp-based. The design of reconfigurable switched-capacitor blocks with op-amp is very challenging if widely ranging speed, accuracy, signal-to-noise ratio (SNR), and power consumption space are to be covered. Many different op-amp topologies may be required to cover a large performance and configuration space. While new technology nodes provide transistors with higher  $f_{\rho}$  the design of op-amp is becoming more challenging as the supply voltage and intrinsic gain of transistors are decreasing. Recently, [1] and [2] proposed zero-crossing circuits to design ADCs. Zero-crossing circuits can replace the op-amp in traditional switched-capacitor design with a combination of a current source and a zero-crossing detector. The power consumption of zero-crossing-based analog circuits scales according to the operating frequency and required SNR. Zerocrossing circuits are used to implement the reconfigurable analog blocks needed for this research. The system can operate at different speeds and SNR requirements while the power consumption is kept at the optimum level.



▲ Figure 1: Block diagram of reconfigurable zero-crossing-based analog circuits. Each configurable analog block can be programmed to perform an integration or multiplication.

- T. Sepke, J. Fiorenza, C.G. Sodini, P. Holloway, and H.-S Lee, "Comparator-based switched-capacitor circuits for scaled CMOS technologies," in IEEE International Solid-State Circuits Conference Digest of Technical Papers, San Francisco, CA, Feb. 2006, pp.220-221.
- L. Brooks and H.-S. Lee, "A zero-crossing-based 8b 200MS/S pipeline ADC," in IEEE International Solid-State Circuits Conference Digest of Technical Papers, San Francisco, CA, Feb. 2007, pp. 460-461.

# **Design and Characterization of CNT-CMOS Hybrid Systems**

T.S. Cho, K.-J. Lee, T. Pan, J. Kong, A.P. Chandrakasan Sponsorship: SRC/FCRP IFC, Intel

Carbon nanotubes (CNTs) are nanometer-diameter cylinders formed from rolled-up graphene sheets [1]. CNTs have found widespread interest due to many of its excellent electrical properties. In particular, the low density and high electron mobility of CNTs make them attractive for electronic applications. Our investigation of hybrid CMOS-CNT systems attempts to take advantage of the superior properties of CNTs while building on top of existing CMOS technology.

We propose an integrated chemical sensor system to verify the concept of a CNT-CMOS hybrid system design. The CNT changes its conductance when exposed to certain chemicals, and thus we can effectively use CNTs as resistive chemical sensors [2]. Room-temperature operation of the CNT sensors makes them an appealing candidate for low-power chemical sensor application.

However, poor control over the local and global variation of CNT devices, the resolution requirements in resistance measurements, and the changes in resistance due to specific chemicals implies a large dynamic range in the front-end circuitry. We investigate energy efficient architectures to accommodate the specification (Figure 1). Chip fabrication is done by National Semiconductor.

Another system of interest is a DC-DC power converter circuit. Near ballistic transport behavior [3] of CNTFET makes it a potential energy-efficient candidate in power applications. In addition, the power transistor size could be greatly reduced if CNT-FETs can replace the CMOS power transistors and the CNTs are aligned. Currently, we are looking into ways to model CNTFET behavior and fabricating CNT devices that can support large currents (Figure 2).



▲ Figure 1: Diagram of CMOS interface. The interface architecture includes on-chip calibration functionality. This interface chip and CNT sensors are integrated at the PCB level.



▲ Figure 2: Device schematic of a massively distributed CNTFET. Bundles of CNTs are fabricated to support large currents. Additional chemical or electrical treatment may be required to eliminate metallic CNTs.

- [1] S. lijima, "Helical microtubules of graphitic carbon," Nature, vol. 354, pp. 56-58, Nov. 1991.
- [2] J. Kong, N. Franklin, C. Zhou, M. Chapline, S. Peng, K. Cho, and H. Dai, "Nanotube molecular wires as chemical sensor," Science, vol. 287, no. 5453, pp. 622-625, Jan. 2000.
- [3] A. Javey, J. Guo, D.B. Farmer, Q. Wang, E. Yenilmez, R.G. Gordon, M. Lundstrom, and H. Dai, "Self-aligned ballistic molecular transistors and electrically parallel nanotube arrays," *Nano Letters*, vol. 4, no. 7, pp. 1319-1322, June 2004.

## A Piecewise-linear Moment-matching Approach to Parameterized Model Order Reduction for Highly Nonlinear Systems

B. Bond, L. Daniel Sponsorship: SRC/FCRP GSRC, NSF, DARPA

The automatic extraction of parameterized macromodels for modern mixed signal System-on-Chips is an extremely challenging task due to the presence of several nonlinear analog circuits and Micro-Electro-Mechanical (MEM) components. The ability to generate Parameterized Reduced Order Models (PROM) of nonlinear dynamical systems could serve as a first step toward the automatic and accurate characterization of geometrically complex components and sub-circuits, eventually enabling their synthesis and optimization.

Our approach to this problem combines elements of a non-parameterized trajectory piecewise linear method [1] for nonlinear systems with a moment matching parameterized technique [2] for linear systems. By building on these two existing methods, we have created an algorithm for generating PROMs for nonlinear systems. The algorithms were tested on three different systems: a MEM switch, shown in Figure 1, and two nonlinear analog circuits. All of the examples contain distributed strong nonlinearities and possess some dependence on several geometric parameters.

In addition, we have proposed a model-construction procedure in which we approximate the system sensitivity to parameters of interest for the purpose of efficiently sampling important regions of the parameter space. Figure 2 shows the output of one PROM created for the example in Figure 1 and compared to the field solver output of the full nonlinear system at several parameter values. Typical PROMs constructed in this manner can be accurately reduced in size by a factor of 10, yielding a speedup of a factor of 10 in general. For further details on parameter-space accuracy and cost of the algorithms, see [3].



▲ Figure 1: Application example: MEM switch realized by a polysilicon beam fixed at both ends and suspended over a semiconducting pad and substrate expansion



▲ Figure 2: Center point deflection predicted by our parameterized reduced model (crosses) at a series of parameter values, compared to a finite difference detailed simulation (solid lines).

- [1] M. Rewienski and J.K. White, "A trajectory piecewise-linear approach to model order reduction and fast simulation of nonlinear circuits and micromachined devices," in *Proc. IEEE/ACM International Conference on Computer Aided-Design*, San Jose, CA, Nov. 2001, pp. 252-257.
- [2] L.Daniel, C.S. Ong, S.C. Low, K.H. Lee, and J.K. White, "A multi-parameter moment-matching model reduction approach for generating geometrically parameterized interconnect performance models," *IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems*, vol. 23, no. 5, pp. 678-693, May 2004.
- [3] B. Bond and L. Daniel, "Parameterized model order reduction of nonlinear dynamical systems," in Proc. IEEE Conference on Computer-Aided Design, San Jose, CA, Nov. 2005, pp. 487-494.

# A Quasi-convex Optimization Approach to Parameterized Model-order Reduction

K.C. Sou, L. Daniel, A. Megretski

Sponsorship: SRC/FCRP GSRC, Semiconductor Research Corporation, NSF

This work proposes an optimization-based model order reduction (MOR) framework [1]. The method involves setting up a quasi-convex program that explicitly minimizes a relaxation of the optimal H-infinity norm MOR problem. The method generates guaranteed stable and passive reduced models and it is very flexible in imposing additional constraints. The proposed optimization approach is also extended to a parameterized model reduction problem (PMOR). The proposed method is compared to existing moment matching and optimization based MOR methods in several examples. For example, a 32<sup>nd</sup> order parameterized reduced model has been constructed for a 7- turn RF inductor with substrate (infinite order) and the error of quality factor matching was less than 5% for all design parameter values of interest.



▲ Figure 1: A 7-turn RF inductor for which a parameterized (with respect to wire width and wire separation) reduced model has been constructed.



▲ Figure 2: Matching of quality factor of 7-turn RF inductor when wire width =  $16.5 \mu m$ , wire separation =  $1,5,18,20 \mu m$ . Blue dash line: Full model. Red solid line: ROM.

## REFERENCES

[1] K. Sou, A. Megretski, and L. Daniel, "A quasi-convex optimization approach to parameterized model order reduction," in *Proc. IEEE/ACM Design Automation Conference*, Anaheim, CA, June 2005, pp. 933-938.

## **Open-loop Digital Predistortion Using Cartesian Feedback for Adaptive RF Power Amplifier Linearization**

S. Chung, J.W. Holloway, J.L. Dawson Sponsorship: SRC/FCRP C2S2, KOSEF

This work focuses on implementing a new wideband linearization technique for RF power amplifiers (PAs). Linearization is necessary for PAs because they consume the bulk of the power in most transmitter chains. The efficiency improvement offered by linearization translates into significant overall power savings in mobile broadband streaming video systems. Our technique combines the best of two established RF PA linearization techniques: digital predistortion and Cartesian feedback. We get the modeling simplicity of Cartesian feedback, combined with the wideband capability of digital predistortion. Substantial improvement in the PA output spectrum and adjacent channel power ratio can be achieved with little increase in power and die area.

Cartesian feedback is an analog technique, giving the ability to continuously linearize a PA without the extensive knowledge of PA characteristics [1], but the bandwidth of classical Cartesian feedback systems is severely limited by the need for a surfaceacoustic wave filter. Digital predistortion is an inherently openloop technique and thus does not suffer from the bandwidth limitation. Nevertheless, digital predistortion requires detailed modeling of a PA and cannot cope with the drift of the PA characteristics [2]. Our technique uses a slow Cartesian feedback loop to train a Cartesian look-up table predistorter to be used for digital predistortion [3], characterizing a PA over the input symbol constellation.

We designed and implemented a 900-MHz RF transmitter with a class-A PA using discrete modules (Figure 1). A measured spectrum of QAM-16 signals having 40 MHz bandwidth (Figure 2) shows approximately 10 dB linearity enhancement and good noise floor. The measured channel power is 27 dBm. The 1-dB compression point of the PA is 26.5 dBm. The implemented transmitter achieved a 2.3-W reduction in power consumption. Our prototype provides linearization at symbol rates over two orders of magnitude higher than is possible with conventional analog feedback.



Figure 1: Cartesian feedback for predistortion.



▲ Figure 2: Power amplifier output spectrum for 40-MHz bandwidth 16-QAM signals.

- J.L. Dawson and T.H. Lee, "Automatic phase alignment for a fully integrated Cartesian feedback power amplifier system," IEEE Journal of Solid-State Circuits, vol. 38, no. 12, pp. 2269-2279, Dec. 2003.
- [2] K.J. Muhonen, M. Kavehrad, and R. Krishnamoorthy, "Look-up table techniques for adaptive digital predistortion: A development and comparison," IEEE Transactions on Vehicular Technology, vol. 49, no. 5, pp. 1995-2001, Sep. 2000.
- [3] S. Chung, J.W. Holloway, J.L. Dawson, "Open-loop digital predistortion using Cartesian feedback for adaptive RF power amplifier linearization," IEEE MTT-S International Microwave Symposium Digest, June 2007.

## Wideband Two-point Modulators for Multi-standard Transceivers

S. Rayanakorn, J.L. Dawson Sponsorship: SRC/FCRP C2S2

Two-point modulators are a fundamental building block in polar transmitters, which have the potential to accommodate multiple wireless standards. A primary challenge for polar transmitters, however, is that they demand large baseband bandwidths compared to their Cartesian counterparts. To put polar transmitters into use, the separate amplitude and phase paths have to be extremely broadband. This project addresses the need on the phase path.

The two-point modulator, used to perform phase modulation, is a phase-locked loop (PLL) with two inputs (Figure 1). The input data through the first path is low-pass filtered to the output by the closed-loop transfer function of the PLL. If this terminal were the only input, the speed of the PLL would therefore limit the achievable data rate. However, in a two-point modulator, data injected into the second path is high-pass filtered to the output. The corner frequency of this high-pass filter is exactly equal to the low-pass corner of the PLL's closed-loop transfer function. In theory, the bandwidth of a two-point modulator is therefore unbounded. However, nonlinearity in the voltage-controlled oscillator (VCO) (Figure 2) is a barrier to realizing this potential of two-point modulators. The high-pass second path does not benefit from the linearized VCO tuning characteristic that the PLL provides for the first path. If this linearity goes uncorrected, a wideband two-point modulator can introduce significant phase error.

Adaptive digital predistortion, a linearization technique commonly applied to RF power amplifiers, is a promising solution. Recent work using analog feedback to train a predistorter has been shown to enable dramatic bandwidth extensions for Cartesian feedback power amplifiers [1]. With this same principle and the observation that the PLL continuously performs VCO linearization, a predistortion block is added in the second data path. The introduction of this predistortion circuit will eliminate the phase error, and it therefore enables the two-point modulator to function as a truly broadband phase path in polar transmitters.



Figure 1: Classical two-point modulator.



▲ Figure 2: Voltage-controlled oscillator nonlinear tuning characteristic (frequency vs. control voltage).

## REFERENCES

 S. Chung, J.W. Holloway, and J.L. Dawson, "Open-loop digital predistortion using Cartesian feedback for adaptive RF power amplifier linearization," IEEE MTT-S International Microwave Symposium, Honolulu, HI, June 2007.

# An Ultra-low Power CMOS RF Transceiver for Medical Implants

J. Bohorquez, J.L. Dawson, A.P. Chandrakasan Sponsorship: The Lemelson Foundation Presidential Fellowship

Until recently, few medical implantable devices existed and fewer still provided the capability for wireless transmission of information. Most devices capable of data transmission did so through inductive coupling, which requires physical contact with the basestation and allows for only low data rates. In 1999, the FCC created the Medical Implant Communications Service (MICS) band in the range of 402-405 MHz specifically for medical telemetry [1]. The MICS band plan allows for RF communication between a medical implant and a base-station that is up to two meters away. This research seeks to design a transceiver specifically optimized for low-power, short-distance data transmission in a temperature-regulated environment, i.e., the human body. We do this by pushing as much complexity as possible out of the implant and into the base-station, taking advantage of the attributes of the environment, such as temperature control and slow transients, and incorporating the antenna into the oscillator for reduced power and improved performance. By optimizing the transceiver for reduced volume and power, we hope to extend the battery lifetime and functionality of medical implants for greater comfort and benefits to patients.

Figure 1a shows a conventional direct up-conversion transmitter that comprises a digital baseband, digital-to-analog converters (DAC), low-pass filters, up-conversion mixers, an I/Q phase generator, a power amplifier (PA), a frequency synthesizer, and an antenna. We propose a much simpler, almost all-digital implementation (see Figure 1b, composed of a digital baseband and a digitally controlled oscillator (DCO)). Instead of direct I/Q upconversion, we propose using minimum frequency-shift keying to directly modulate the DCO with baseband information. We exploit the inherent temperature regulation of the human body and the lax frequency stability requirements of the MICS standard to replace the frequency synthesizer with a much slower frequencycontrol loop, which incorporates the base-station. Furthermore, we create a linear digital-to-frequency converter by using predistorted capacitor banks for coarse and fine frequency tuning. Instead of driving the antenna with a matched PA, we exploit the low radiation power requirement to incorporate a loop antenna into the DCO. The inherently high Q of the antenna leads to improved noise performance for a given amount of power. Figure 2 shows the differential Clapp DCO including coarse- and finetuning capacitor banks, a circuit model for the loop antenna, and a cross-coupled pair of transistors for power reduction [2].







<sup>▲</sup> Figure 2: Differential digitally controlled Clapp oscillator topology including a simple antenna model, coarse- and fine-tuning capacitor banks, and a cross-coupled transistor pair for current switching.

- [1] FCC Rules and Regulations, "MICS Band Plan," Part 95, Jan. 2003.
- [2] R. Aparicio and A. Hajimiri, "A CMOS differential noise-shifting Colpitts VCO," in Proc. IEEE International Solid-State Circuits Conference, San Francisco, CA, Feb. 2002, pp. 288-289.

## An Integrated Circuit Capable of Rapid Multi-frequency Measurements and a Reconfigurable Electrode Array for Use in Anisotropic Electrical Impedance Myography

O.T. Ogunnika, M. Scharfstein, J.L. Dawson Sponsorship: CIMIT, NIH

Electrical impedance myography (EIM) is a noninvasive technique for neuromuscular assessment originally developed by Dr. Seward Rutkove [1] of Beth Israel Deaconess Medical Center and Drs. Ronald Aaron and Carl Shiffman of Northeastern University. The technique is capable of detecting degenerative neuromuscular diseases such as amyotrophic lateral sclerosis (Lou Gehrig's disease) and inclusion body myositis. In this technique, a low-intensity alternating current is applied to a muscle and the consequent surface voltage patterns are evaluated [2,4-5]. Although the current system is sufficient to prove the value of EIM, it is too slow and cumbersome, being made up of large discrete components, to achieve its full potential as a diagnostic medical tool.

This project consists of two main parts. For the first part, we will develop an IC that combines a spectrum analyzer and signal generator (Figure 1) capable of making rapid measurements between

100Hz and 10MHz. This will significantly miniaturize the system and reduce its power consumption, allowing us to build a handheld, battery-operated instrument. Our focus will be on greatly increasing measurement speed, which will enable the clinician to do dynamic muscle characterization that is impossible with current technology. For the second part, we will develop a new, reconfigurable electrode array to make measurements at different orientations on a patient (Figure 2). The electrode array, consisting of many small electrodes combinable into larger virtual electrodes, needs to inject current into a patient's muscle, going through layers of skin and fat, as well as measure voltage precisely and be reusable in a single patient [3]. Through the development of a new electrode array and a new integrated circuit, our goal is to move EIM from a clinically proven concept to a widely applicable, inexpensive, and sophisticated diagnostic tool.



- [1] S. Rutkove, R. Aaron, and C. Shiffman, "Localized bio-impedance analysis in the evaluation of neuromuscular," *Muscle & Nerve*, pp. 390-397, Mar. 2002.
- [2] R. Aaron, M. Huang, and C. Shiffman, "Anisotropy of human muscle via non-invasive impedance measurements," Phys. Med. Biol., pp. 1245-1262, 1997.
- [3] L. Livshitz, J. Mizrahi, and P. Einziger, "Interaction of array of finite electrodes with layered biological tissue: Effect of electrode size and configuration," IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 9, no. 4, pp. 355-361, Dec. 2001.
- [4] G. Esper, C. Shiffman, R. Aaron, K. Lee, and S. Rutkove, "Assessing neuromuscular disease with multifrequency electrical impedance myography," *Muscle & Nerve*, vol 34, pp. 595-602, July 2006.
- [5] S. Rutkove, G. Esper, K. Lee, R. Aaron, and C. Shiffman, "Electrical impedance myography in the detection of radiculopathy," *Muscle & Nerve*, vol 32, pp. 335-341, Sept. 2005.

## **Equation-based Hierarchical Optimization of a Pipelined ADC**

T. Khanna, R. Sredojević, W. Sanchez, V. Stojanović, J.L. Dawson Sponsorship: SRC/FCRP C2S2, CICS

Much work has been done within the optimization and circuit communities related to the optimization of individual circuit blocks [1-2]. Both equation-based and simulation-based optimization methods have enjoyed recent success for certain problems [3-4]. However, the best of these newest methods are still painfully overwhelmed by the sheer size of the design space typical of even modest-sized mixed-signal systems. Employing hierarchy is a natural way to cope with a large number of design variables, and many hierarchical approaches have been explored to simplify system level optimization [5-7]. Still, an efficient technique that brings value to the designer remains elusive.

We propose decomposing a large system into manageable circuit blocks and adopting a hierarchical, bottom-up (H-BU) approach. After efficient exploration of the design space, we generate Pareto optimal curves for the smaller, less complex blocks. The tradeoffs between system design variables are quantitatively modeled as simple monomial functions. Unlike past works, we use simple models to take advantage of the gentle nature of the tradeoffs, thereby making the system formulation less complex and more solver-friendly. This work applies the H-BU method on a 10stage pipeline ADC in a 0.18-um CMOS process as proof of concept.

The pipeline was optimized to achieve a sampling frequency of 100 MHz with an SNR greater than 60 dB. Figures 1 and 2 plot the operating point of each stage in the pipeline against the Pareto surfaces. Both figures are plotted in the dimensions of the system design variables: Figure 1 plots the power against the slew rate, slewRate, and noise power, nPO, of each pipeline stage and Figure 2 plots power against the load capacitance,  $C_L$ , and settling time constant,  $\tau_{hold}$ , of each pipeline stage.



▲ Figure 1: Pareto surface and power of each pipeline stage in flat and hierarchical solutions against slewRate and nPO.



**A** Figure 2: Pareto surface and power of each pipeline stage in flat and hierarchical solutions against  $C_1$  and  $\tau_{hold}$ .

- [1] S. Boyd and L. Vandenerghe. (2004) Introduction to convex optimization with engineering applications. [Online]. Available: http://www.stanford. edu/~boyd/cvxbook/
- [2] E. Zitzler, "Evolutionary algorithms for multi-objective optimization: Methods and applications," PhD thesis, Swiss Federal Institute of Technology, Zurich, Switzerland, Nov. 1999.
- [3] J. Zou, D. Mueller, H. Graeb, and U. Schlichtmann, "A CPPLL hierarchical optimization methodology considering jitter, power and locking time," in Proc. Design Automation Conference, San Francisco, CA, pp. 19–24. July 2006.
- [4] M. Hershenson, S. Boyd, and T. Lee, "Optimal design of a CMOS opamp via geometric programming," *IEEE Transactions CAD*, vol. 20, pp. 1–21, Jan. 2001.
- [5] T. Eekeleart, T. McConaghy, and G. Gielen, "Efficient multiobjective synthesis of analog circuits using hierarchical Pareto-optimal performance hypersurfaces," in Proc. of the 42nd Annual Conference on Design Automation and Test in Europe Conference, Anaheim, CA, June 2005, pp. 1070–1075.
- [6] T. Eekeleart, R. Schoofs, G. Gielen, M. Steyeart, and W. Sansen, "Hierarchical bottom-up analog optimization methodology validated by a deltasigma A/D converter design for the 802.11a/b/g standard," in Proc. Design Automation Conference, San Francisco, CA, July 2006, pp. 25–30.
- [7] F. Bernardinis, P. Nuzzo, and A. Vincentelli, "Robust system level design with analog platforms," presented at IEEE/ACM International Conference on Computer-Aided Design, 2006.

## A Hierarchical Bottom-up, Equation-based Optimization Design Methodology for RF Transceivers

W. Sanchez, J.L. Dawson

Sponsorship: Gates Millennium Scholars Graduate Fellowship

Over the last decade, the use of mixed-signal circuits on systemon-chip integrated circuits (IC) has increased at a steady rate. The challenges associated with large-scale analog system-level exploration, including early-stage tradeoff analysis, create the bottleneck for mixed-signal system design. Equation-based and simulation-based optimization techniques have been predominant in the exploration of a circuit's design space. However, these tools are practical only for small electronic systems. We show that this need not be a limitation by establishing a general design methodology for large systems. By decomposing a system into smaller, less complex building blocks, thereby adopting a Hierarchical, Bottom-up (H-BU) approach, we keep the problem tractable, and the tradeoff space can be constructed in a piecewise manner up to the system level. This idea is illustrated via a transmitter segment in Figure 1.

The H-BU methodology and a suitably chosen optimization framework (i.e., geometric programming) [1-2] coupled with an equation-based design philosophy and intended to provide an alternative analog IC system design process. Once the initial system decomposition is settled at the system level, each circuit designer produces a set of Pareto-optimal (PO) surfaces, to send to the system level designer for system level allocation. Whether a simulation- or equation-based approach is used in the generation of the surfaces, an equation-based approach should prevail at the system level. The reasons for using this approach are several: to eliminate the common convoluted use of spreadsheets and systematize a straightforward formulation strategy, as well as take advantage of the gentle surfaces, which are amenable to equation fitting. These surfaces fully describe the block-level space in an optimal sense. At the system level, the system designer can select the operating points for each block so as to optimally distribute the allocated resources. Moreover, fitting the PO surfaces into functions amenable to an MP allows the system designer to formulate a system level MP and to produce the PO surface characterizing the system over any range of interest. Figure 2 show a PO curve for the transmitter segment.

For large systems composed of many blocks, tradeoff information between the various blocks is of enormous value. The value of this information lies in readily exposed relations hidden underneath strong non-inearities, indirect parameter correlations, subsystem trans-coupling, and otherwise non-intuitive interactions between blocks. This information will allow designers to make the best decisions on how to choose optimal allocation of the available resources. Moreover, if each block is kept within a geometric programming framework, then each can be optimized efficiently. Regardless of adherence to a GP framework at the bottom-level, the system-level formulation is always amenable to GP.



▲ Figure 1: Hierarchical, bottom-up design methodology. By generating Pareto-optimal curves for each block, the system-level PO space can be created.



▲ Figure 2: A Pareto-optimal surface for transmitter segment.

- [1] S. Boyd, S.J. Kim, L. Vandenberghe, and A. Hassibi, "A Tutorial on Geometric Programming," Optimization and Engineering, Sept. 2005.
- [2] S. Boyd and L. Vandenberghe, (1997) "Introduction to convex optimization with engineering applications." Information Systems Laboratory, Stanford University. [Online]. Available: http://www.stanford.edu/class/ee364.

## **Comparator-based Switched Capacitor Circuits (CBSC)**

J.K. Fiorenza, T. Sepke, P. Holloway, H.-S. Lee, C.G. Sodini Sponsorship: SRC/FCRP C2S2, CICS

Two side effects of technology scaling that have a significant impact on analog circuit design are the reduced signal swing and the decrease in intrinsic device gain. Gain is important in feedbackbased, analog signal processing systems because it determines the accuracy of the output value. Cascded amplifier stages have been a popular solution to increase amplifier gain, but they further reduce the signal swings of scaled technologies. An alternative method for achieving high gain in an operational amplifier without reducing signal swing is to cascade several lower gain amplifiers. Nested-Miller compensation approaches [1] can be used to stabilize the cascaded feedback system, but the frequency response of the closed loop system is significantly sacrificed to ensure stability. In this project [2-3], a new comparator-based switched capacitor (CBSC) circuit design methodology has been explored that eliminates the use of op-amps in sampled data systems.

A sampled-data system typically operates in two phases, a sampling phase  $(\phi_1)$  and a charge transfer phase  $(\phi_2)$ . An important property of these systems is that the output voltage needs to be accurate only at the moment the output is sampled. No constraint is placed on how the stage gets to the final output value. Feedback systems use a high-gain operational amplifier to force a virtual ground condition at the op-amp input. The top circuit in Figure 1a shows the conventional op-amp-based switched-capacitor

gain stage. The circuit in Figure 1b shows the proposed CBSC approach, in which a comparator and a current source have replaced the op-amp. Assuming the comparator input  $V_x$  starts below the common-mode voltage at  $V_{xo}$ , the current source charges the output circuit until the comparator detects the virtual ground condition and turns the current source off. At this instant, the output is sampled on  $C_L$ . Because the CBSC design ensures the same virtual ground condition as the op-amp based design, both circuits produce the same output value at the sampling instant. This is demonstrated by waveforms for the two circuits shown in Figure 2.

The CBSC concept is general and can be applied to any sampleddata analog circuit. For example, the CBSC design approach can be applied to a pipelined ADC. A prototype 1.5b/stage CBSC pipeline ADC was constructed and operates similar to the opamp version of the ADC. The prototype CBSC ADC was implemented in a 0.18µm CMOS technology. The active die area of the ADC is 1.2mm<sup>2</sup>. At a 7.9MHz sampling frequency, the DNL is +0.33/-0.28LSB, and the INL is +1.59/-1.13LSB. It achieves an SFDR of 62dB, an SNDR of 53dB and an ENOB of 8.7b for input frequencies up to the Nyquist rate. The core ADC power consumption of all 10 stages of the pipeline converter is 2.5mW at a 1.8V power supply, resulting in a 0.8pJ/b figure of merit.



Figure 2: Multiply-by-two waveforms.

▲ Figure 1: (a) Traditional op-amp based multiply-by-two amplifier versus (b) proposed comparator-based multiply-by-two amplifier.

- [1] J. Huijsing and D. Linebarger, "Low-voltage operational amplifier with rail-to-rail input and output ranges," *IEEE Journal of Solid-State Circuits*, vol. 20, no. 6, pp. 1144-1150, Dec. 1985.
- [2] T. Sepke, J.K. Fiorenza, C.G. Sodini, P. Holloway, and H.-S. Lee, "Comparator-based switched-capacitor circuits for scaled CMOS technologies," in Proc. IEEE International Solid-State Circuits Conference, San Francisco, CA, Feb. 2006, pp. 220-221.
- [3] J.K. Fiorenza, T. Sepke, C.G. Sodini, P. Holloway, H.-S. Lee, "Comparator-based switched-capacitor circuits for scaled CMOS technologies," IEEE Journal of Solid-State Circuits, vol. 41, no. 12, pp. 2658-2668, Dec. 2006.

## A Zero-crossing Based, 8b, 200MS/s Pipelined ADC

L. Brooks, H.-S. Lee Sponsorship: NDSEG Fellowship, CICS

Technology scaling is creating significant issues for switched capacitor circuit design. Decreasing device gain and voltage supplies make traditional implementations of high-gain, high-speed operational amplifiers (op-amps) increasingly difficult and less power-efficient. A comparator-based switched capacitor (CBSC) circuit technique was introduced in [2] that replaces the functionality of an op-amp with a comparator and current source to help with these issues. The current source sweeps the output node with a voltage ramp until the comparator detects that the virtual ground condition has been realized. Whereas an op-amp-based implementation *forces* the virtual ground condition, CBSC circuits *detect* the virtual ground condition to realize the same precision charge transfer.

The comparator input in a CBSC implementation is a constant slope voltage ramp, and so the comparator performs a zero-crossing detection. This work generalizes CBSC by replacing the general-purpose comparator of CBSC circuits with a zero-crossing detector to realize new architecture called zero-crossing based circuits (ZCBC) [1]. Two stages of the implementation of the 1.5 bit/stage ZCBC pipelined analog-to-digital converter (ADC) are shown in Figure 1. Devices  $M_1$  and  $M_2$  make a dynamic zero-crossing detector that is fast, simple, and amenable to scaling. It

draws no static current and thus realizes a power-efficient threshold detector. To improve linearity and output swing, the single current source of the previous design was split to create current sources  $I_1$ ,  $I_2$ ,  $I_3$ , and  $I_4$ . In this topology the capacitors are no longer charged through a series switch, so the associated nonlinear voltage drop is eliminated. Furthermore, the traditional bit decision comparators in a pipelined ADC have been replaced with bit decision flip-flops for improved speed.

To demonstrate these techniques, an 8b, 200MS/s ZCBC pipelined ADC was implemented in a 0.18-um CMOS technology in an active die area of 0.05mm<sup>2</sup>. The differential non-linearity (DNL) and integral non-linearity (INL) are 0.75LSB<sub>8</sub> and 1.0LSB<sub>8</sub>. The measured effective number of bits (ENOB) is 6.4b. It consumes 8.5mW (2.9/5.6mW analog/digital) from a 1.8V power supply. It draws only dynamic CV<sup>2</sup>f power as there are no statically biased circuits in the complete ADC. The corresponding figure of merit (FOM=P/2f<sub>in</sub>/2<sup>ENOB</sup>) is 510 fJ/step at 200MS/s. This demonstrates best-in-class performance in terms of power-efficiency among other published ADCs in its class.



▲ Figure 1: Two stages of the 1.5 bit/stage zero-crossing based pipelined ADC.

- L. Brooks and H.-S. Lee, "A zero-crossing based 8b, 200ms/s pipelined ADC," in Proc. IEEE International Solid-State Circuits Conference, San Francisco, CA, Feb. 2007, pp. 460–461.
- [2] J.K. Fiorenza, T. Sepke, P. Holloway, C.G. Sodini, and H.-S. Lee, "Comparator-based switched-capacitor circuits for scaled CMOS technologies," IEEE J. Solid-State Circuits, vol. 41, no. 12, pp. 2658–2668, Dec. 2006.

## Ultra-high Speed A/D Converters Using Zero-crossing-based Circuits

A. Chow, H.-S. Lee Sponsorship: SRC/FCRP C2S2

With an increasing need for higher data rates, both wireless applications and data links are demanding higher speed analog-todigital converters (ADC) with medium resolution. In particular, this work will investigate ADC's with sampling rate up to 10 Gs/s with 6-8 bits of resolution. Time-interleaved converters achieve their high sampling rate by placing several converters in parallel. Each individual converter, or channel, has a delayed sampling clock and operates at a reduced sampling rate. Therefore each channel is responsible for digitizing a different time slice. This method requires that the individual converters, which make up the parallel combination, be matched. Mismatches and non-idealities, such as gain error, timing error, and voltage offset, degrade the performance. Therefore channel matching is an important design consideration for time-interleaved ADCs.

Although digital calibration can mitigate many of these non-idealities, timing mismatches are non-linear errors, which are more difficult to remove. At sampling rates up to 10Gs/s, digital calibration would consume a large amount of power. An alternative solution uses a global switch running at the full speed of the converter. This technique works well for medium-high speed ADC's [1-2]. At higher speeds the ability to turn the switch on and off at the full sampling rate becomes a major challenge. We will investigate the applicability of the global switch technique in 90- or 65-nm CMOS technologies for 10Gs/s operation.

Power optimization is a major design consideration when implementing a time-interleaved ADC. We will lower total power consumption by exploring innovative technologies for implementing the individual ADCs in the channel, such as the zero-crossing based circuits (ZCBCs) [3]. The ADC topology was previously presented. In particular, this work investigates a fast, single-slope architecture (Figure 1). The faster each channel operates, the fewer channels are needed, hence lowering power in clock and buffer circuits. The primary emphasis falls on the development of highly power-efficient single-slope ZCBC architecture. Since the single slope architecture is more sensitive to non-idealities such as ramp nonlinearity, we are carefully studying the sources of nonidealities and develop clever techniques to address the accuracy issues.



Figure 1: One stage of a single-slope ZCBC-based pipelined ADC.

- M. Gustavsson, "A global passive sampling technique for high-speed switched-capacitor time-interleaved ADCs," IEEE Trans. On Circuits and Systems II, vol. 47, no. 9, pp.821-831, Sept. 2000.
- [2] S. Gupta, M. Choi, M. Inerfield, and J. Wang, "A 1GS/s 11b time-interleaved ADC in 0.13/spl mu/m CMOS," IEEE International Solid-State Circuits Conference Digest of Technical Papers, San Francisco, CA, Feb. 2006, pp. 2360-2369.
- [3] L. Brooks and H.-S. Lee, "A zero-crossing based 8b 200MS/s pipelined ADC," IEEE ISSCC Dig. Tech. Papers, San Francisco, CA, Feb. 2007, pp. 460-461.

# High-accuracy Pipelined A/D Converter Based on Zero-crossing Switched Capacitor Circuits

M. Markova, P. Holloway, H.-S. Lee Sponsorship: EECS Fellowship, CICS

Technology scaling poses challenges in designing analog circuits because of the decrease in intrinsic gain and reduced swing. An alternative to using high- gain amplifiers in the implementation of switched- capacitor circuits has been proposed [1] that replaces the amplifier with a current source and a comparator. The new comparator- based switched- capacitor (CBSC) technique has been implemented in two pipelined ADC architectures at 10MHz and 200MHz and 10bit and 8bit accuracy, respectively [1-2].

The purpose of this project is to explore the use of the CBSC technique for very high-precision AD converters. The goal of the project is a 100MHz 16 bit pipelined ADC. First, we are investigating multiphase CBSC operation to improve the power-linearity trade off of the A/D conversion [3]. We are also developing linearization techniques for the ramp waveforms. Linear ramp waveforms require fewer phases, thus allowing faster operation. Techniques for improving linearity beyond using a cascoded cur-

rent source are explored. A linear ramp generator, which decouples the current source from the output ramp through a Miller capacitor is proposed to improve the linearity of the ramp waveform in all phases. This ramp generator improves the range by improving linearity through compensation of the gate-to-source voltage of the current source without the use of a cascode. In addition it lends itself to a symmetric differential implementation for the final phase to ensure adequate noise rejection. At the target resolution of 16 bits, power supply and substrate noise coupling can limit the performance. We are studying their effects in CBSC circuits. For reduced sensitivity to power supply and substrate noise, we are developing a differential CBSC architecture. Other techniques that we are presently developing include power-efficient offset cancellation in comparators and exploiting *a priori* information from previous stages in the pipeline structure to increase linearity and speed.

- T. Sepke, J.K. Fiorenza, C.G. Sodini, P. Holloway, and H.-S. Lee, "Comparator-based switched-capacitor circuits for scaled CMOS technologies," IEEE Journal of Solid State Circuits, vol. 41, no. 12, pp. 2658-2668, Dec. 2006.
- [2] L. Brooks and H.-S. Lee, "A zero-crossing-based 8b 200MS/s pipelined ADC," in IEEE International Solid State Circuits Conference Digest of Technical Papers, San Francisco, CA, Feb. 2007, pp 460-461.
- J.K. Fiorenza, "A comparator-based switched-capacitor pipelined analog-to-digital converter," Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, 2007.

## Low-voltage Comparator-based Switched-capacitor Sigma-delta ADC

M. Guyton, H.-S. Lee Sponsorship: CICS, DARPA

Many analog signal-processing circuits use operational amplifiers (op-amps) in a negative feedback topology. The amount of error in these feedback systems is inversely proportional to the gain of the op-amp. Because scaled CMOS technologies use shorter channel lengths and require lower power supply voltages, it becomes more difficult to implement high gain op-amps. Recently, a comparator-based switched-capacitor (CBSC) technique was proposed [1] that uses a comparator rather than an op-amp to implement switched-capacitor topologies.

In this project, we investigate very-low-voltage delta-sigma converters. One of the biggest challenges of low-voltage circuits is the transmission gates that must pass the signal. If the signal is near the middle of the power supply range, neither the NMOS nor the PMOS transistor has sufficient gate drive to pass the signal properly. The switched-op-amp technique [2] was proposed to mitigate this problem. In this technique, the output of the opamp is directly connected to the next sampling capacitor without a transmission gate to perform charge transfer. During the charge transfer phase, the op-amp is switched off, and the output is grounded. Similar to the standard switched-capacitor technique, CBSC circuits use two-phase clocking, having both sampling and chargetransfer clock phases. Unlike a standard switched-capacitor circuit, in a CBSC circuit all current sources connected to the output node are off at the end of the charge-transfer phase. Therefore, there is no op-amp or current source to turn off to accommodate the charge transfer without a transmission gate. Thus, the CBSC technique is inherently better suited to low-voltage applications than switched-op-amp circuit topologies. Although the previous CBSC implementation was a single-ended version, many highresolution systems require fully differential implementation for better power supply and substrate noise rejection properties. Since the CBSC is a new technique without an op-amp, existing fully differential circuitry cannot be applied. In this program, we are developing fully-differential CBSC topologies for applications in high resolution data conversion. Figure 1 shows a fully-differential low-voltage CBSC integrator stage using the combined techniques. We are designing a fourth-order sigma-delta ADC for operation at 1V power supply using this integrator stage.



Figure 1: Fully-differential comparator-based switched-capacitor integrator. The input of the next integrator stage is also shown.

Common-mode feedback circuits are not shown.

- T. Sepke, J.K Fiorenza, C.G. Sodini, P. Holloway, and H.S. Lee, "Comparator-based switched-capacitor circuits for scaled CMOS technologies," in IEEE International Solid-State Circuits Conference Digest of Technical Papers, San Francisco, CA, Feb. 2006, pp. 812-821.
- [2] J. Crols and M. Steyaert, "Switched-op-amp: An approach to realize full CMOS switched-capacitor circuits at very low power supply voltages," IEEE Journal of Solid-State Circuits, vol. 29, no. 8, pp. 936-942, Aug. 1994.

# **Zero-crossing-based ADC for mm-Wave Applications**

J. Chu, H.-S. Lee Sponsorship: SRC/FCRP C2S2

In an mm-wave imaging system, the resolution depends on the phase accuracy of the signal. Our system design uses digital beam forming with extensive digital processing to reduce the phase variance and extract accurate phase information from an array of receivers. This design choice demands a high-speed, medium-resolution ADC to digitize the signal. Since there are up to 1000 receivers in the imaging system, each requiring 2 ADC's for in-phase and quadrature signals, the power consumption of each ADC is also a major constraint. We are investigating a time-interleaved ADC operating at 4 Gs/s with 8-10 bits of resolution with less than 50mW of power consumption. This work is also useful in many other applications including wireless and wireline communications.

Each individual channel will be implemented using a zerocrossing-based circuit (ZCBC) [1], which is an extension of the comparator-based switch-capacitor circuit (CBSC) design methodology [2]. The focus of the project is to explore novel circuit structures based on ZCBC to improve the FOM and power consumption of A/D converters. In particular, we are investigating the use of a dynamically biased zero-crossing detector. The idea is to use the most power when the signal is crossing the threshold; this extra power decreases the delay and reduces the noise. During the other times, power can be reduced without degrading the ADC performance.

Time interleaving will be used to achieve the speed requirement. In a time-interleaved structure, matching between the different channels will be very important to maintain the desired performance. Any mismatch in non-idealities such as gain error, offset, and timing errors can greatly degrade the performance. We plan to use a global sampling technique, which mitigates the timing errors [3-4]. Careful design and layout will be needed to reduce the other mismatches.

- [1] L. Brooks, H.-S. Lee, "A zero-crossing-based 8b 200MS/s pipelined ADC," IEEE International Solid-State Circuits Conference Digest of Technical Papers, San Francisco, CA, Feb. 2007.
- [2] T. Sepke, J.K. Fiorenza, C.G. Sodini, P. Holloway and H.-S. Lee, "Comparator-based switched-capacitor circuits for scaled CMOS technologies," IEEE International Solid-State Circuits Conference Digest of Technical Papers, San Francisco, CA, Feb. 2006, pp. 812-821.
- [3] M. Gustavsson, "A global passive sampling technique for high-speed switched-capacitor time-interleaved ADCs," IEEE Transactions on Circuits and Systems II, vol. 47, no.9, pp. 821-831, Sept. 2000.
- [4] S. Gupta, M. Choi, M. Inerfield, and J. Wang, "A 1GS/s 11b time-interleaved ADC in 0.13/spl mu/m CMOS," IEEE International Solid-State Circuits Conference Digest of Technical Papers, San Francisco, CA, Feb. 2006, pp. 2360-2369.

## **Comparator-based Circuits for HBTs**

J. Feng, H.-S. Lee Sponsorship: BAE Systems

Recently, comparator-based switched-capacitor (CBSC) circuits and zero-crossing-based circuits (ZCBC) were introduced [1,2] as a viable alternative to op-amp-based circuits. The use of op-amps in analog signal processing circuits is becoming more difficult due to the decreased intrinsic device gains and reduced signal swings obtained in scaled CMOS technologies. Op-amps rely upon high gain in the negative feedback mode in sampled data systems because the gain determines the accuracy of the output value. CBSC and ZCBC-based circuits replace the opamp using a comparator and a current source (see Figure 1), and therefore do not require high gain and stability simultaneously as in op-amp-based circuits. Since comparators can be designed without the use of complementary devices, these techniques can be applied to a variety of transistor technologies. In this work, we explore the use of heterojunction bipolar transistors (HBTs) in comparator-based circuits for sampled data systems.

HBTs offer much higher device speeds than CMOS devices and have demonstrated the fastest transistor speeds to date with cutoff frequencies as high as  $f_T$ =710 GHz using a pseudomorphic InGaAs/InP HBT [3]. Commercially, silicon germanium (SiGe) based HBTs have been developed with a cutoff frequency of  $f_T$ =200 GHz as shown in Figure 2. The faster device speeds that HBTs offer can help meet the demand for very high speed, high-resolution analog-to-digital converters (ADC) for various applications including wireless and wireline communications and radar systems. HBTs also have a more constant  $g_m/I$  ratio over the normal operating range, lower 1/f noise, and better device-matching of differential pairs than CMOS devices.

This project focuses on the development of innovative circuits and architectures to design a 12-bit pipelined ADC operating at 2 GHz using either an HBT-only or SiGe BiCMOS process. The first goal of the project is focused on adapting switched emitterfollower sample-and-hold circuits for switched-capacitor applications. Ultimately, the project will culminate in the design of a prototype ADC chip.



▲ Figure 1: (a) Traditional op-amp based multiply-by-two amplifier versus (b) comparator-based multiply-by-two amplifier.



Figure 2: Unity cutoff frequency  $f_T$  vs. collector current at  $V_{CE}$ =1.5 V for 0.12 µm width SiGe HBT at different device lengths.

- T. Sepke, J.K. Fiorenza, C.G. Sodini, P. Holloway, and H.-S. Lee, "Comparator-based switched-capacitor circuits for scaled CMOS technologies," IEEE Int'l Solid-State Circuits Conf. Dig. Of Tech. Papers, San Francisco, CA, Feb. 2006, pp. 220-221.
- [2] L. Brooks and H.-S. Lee, "A zero-crossing-based 8b 200MS/s pipelined ADC," IEEE Int'l Solid-State Circuits Conf. Dig. Of Tech. Papers, San Francisco, CA, Feb. 2007, pp. 15-17.
- W. Hafez, W. Snodgrass, and M. Feng, "12.5 nm base pseudomorphic heterojunction bipolar transistors achieving f<sub>T</sub>=710 GHz and f<sub>MAX</sub>=340 GHz," Applied Physics Letters, 87, 252109, 2005.

## Massively Parallel ADC with Self-calibration

M. Spaeth, H.-S. Lee Sponsorship: CICS

In this program we are developing an analog-to-digital converter (ADC)that can quantize a wideband 150-MHz signal at 600 mega-samples per second, with signal-to-noise ratio and linearity in excess of 75dB (12-bits). Use of a massively parallel, time-interleaved architecture with 128 active ADC channels reduces the requisite speed for each channel and enables the devices to be biased in the sub-threshold region for an extremely low-power (<50mW, core) solution. In a parallel time-interleaved system, any mismatches between channels result in undesired spurious tones. Most existing time-interleaved ADCs either employ a low degree of parallelism, such that the tones appear outside the signal band, or are low enough in resolution that the tones are below the quantization noise floor. In this design, however, all inherent gain, offset, and timing skew mismatches must be calibrated away to achieve the stated high-performance goals.

The 128 14-bit pipeline ADCs are arranged into 16 blocks of 8 channels each, as shown in Figure 1. The hierarchal organization of the design allows individual blocks to be pulled out for background calibration, while the remaining blocks continue to quantize the input signal. Due to the large number of channels to be calibrated, the calibration algorithm must be simple but effective. The sub-radix-2 calibration algorithm [1-2] is very effective in removing offset and linearity errors but poses a challenge due to the complexity when applied to the massively parallel converter. We have modified the algorithm to allow nominal radix-2 operations to be employed, for similar calibration efficiency with reduced complexity. We are also exploring several innovative techniques to calculate and remove systematic timing skew between channels. An additional channel is included in the design to act as a timing reference for some of the timing skew measurement algorithms. A novel token-passing control scheme is used to generate local clock phases for the individual blocks and channels, minimizing the number of clock lines that must be routed across the chip.

The design was fabricated in a 0.18-µm digital CMOS process by National Semiconductor and is currently under test. A micrograph of the finished chip is shown in Figure 2.







Figure 2: Micrograph of the fabricated chip.

- A.N. Karanicolas, H.-S. Lee, and K. Bacrania, "A 15-b 1-Msamle/s digitally self-calibrated pipeline ADC," IEEE J. of Solid-State Circuits, vol. 28, no. 12, pp. 1207-1215, Dec. 1993.
- [2] H.-Y.Lee, T.-H. Oh, H.-J. Park, H.-S. Lee, M. Spaeth, and J.-W. Kim "A 14-b 30MS/s 0.75mm<sup>2</sup> pipelined ADC with on-chip digital self-calibration," to be presented at IEEE Custom Integrated Circuits Conference, San Jose, CA, Sept. 2007.

# **Prediction of Time-to-contact for Intelligent Vehicles**

Y. Fang, B.K.P. Horn, I. Masaki Sponsorship: Intelligent Transportation Research Center

The time-to-contact (TTC) is the time it takes two objects to touch if they continue in their current trajectory. The TTC estimation based on video sequences from single camera provides a simple and convenient way to detect approaching objects, andpotential danger and to analyze surrounding environment for automotive and robotic applications. Most current TTC estimation methods depend on the computation of the optical flow as an intermediate result by tracking "interesting feature points" over multiple images [4-6]. However, the optical flow estimation itself is very noisy [1-2], especially when motion between two continuous frames is large. The TTC estimation based on optical flow takes advantage only of information about moving object boundaries. Thus the estimation accuracy and robustness are very noisy. We propose a new direct "gradient-based" method [3,7-9] to determine TTC that operates directly on the spatial and temporal derivatives of brightness. The proposed method does not depend on "feature detection" and thus does not require careful calibration of the optical system. The new method enhances robustness and is computationally efficient, which is especially important to provide fast response for vehicle applications. The results of TTC estimation present an expected trend as in Figure 1 and show the robustness to parameter choices. Figure 2 shows the intermediate results of TTC calculation process for one video sequence.



▲ Figure 1: Typical time to contact for most applications. The vertical axis shows the number of frames before contact will happen for different frames in a frame sequence.



▲ Figure 2: The TTC calculation process for one video sequence. The circle represents the focus of expansion with coordinate at the bottom right corner. The three bars represent the relative speed between camera and object. The TTC value is at the upper left corner of each image.

- [1] B.K.P. Horn and B.G. Schunck, "Determining optical flow," Artificial Intelligence, vol. 16, no. 1-3, pp. 185-203, Aug. 1981.
- [2] B.D. Lucas and T. Kanade, "An iterative image registration technique with an application to stereo vision," in *Proc. DARPA Imaging Understanding Workshop*, Apr. 1981, pp. 121-130.
- [3] A.R. Bruss and B.K.P. Horn, "Passive navigation," Computer Vision, Graphics, and Image Processing, vol. 21, no. 1, pp. 3-20, Jan. 1983.
- [4] J.E. Tanner and C.A. Mead, "A correlating optical motion detector," in *Proc. Conference on Advanced Research in VLSI*, Cambridge, MA, Jan. 1984, p. 57-64.
- [5] J.E. Tanner, "Integrated optical motion detection," Ph.D. Thesis, California Institute of Technology, Pasadena, 1986.
- [6] J.E. Tanner and C.A. Mead, "An integrated analog optical motion sensor," in Proc. ASSP Conference on VLSI Signal Processing, Los Angeles, CA, Nov. 1986, pp. 59-76.
- [7] B.K.P. Horn and S. Negahdaripour, "Direct passive navigation," *IEEE Transactions on Pattern Analysis and Machine Intelligence*, vol. PAMI-9, no. 1, pp. 168-176, Jan. 1987.
- [8] B.K.P. Horn, "Motion fields are hardly ever ambiguous," International Journal of Computer Vision, vol. 1, no. 3, pp. 259-274, Oct. 1987.
- [9] B.K.P. Horn and E.J. Weldon, Jr., "Direct methods for recovering motion," International Journal of Computer Vision, vol. 2, no. 1, pp. 51-76, June 1988.

# Very High-frequency DC-DC Boost Conversion

A. Sagneri, R. Pilawa, D. Perreault

Sponsorship: MIT/Industry Consortium on Advanced Automotive Electrical/Electronic Components and Systems, NSF, National Semiconductor, CICS

At current switching frequencies (about 1-10 MHz), the required energy storage in a typical dc-dc converter yields passive component dimensions that are large with respect to integrated processes. Shrinking these components at constant energy storage results in an unacceptable efficiency penalty. Larger size accompanies greater cost, both because more material is required and because batch processing is more complicated. This is especially true of magnetic materials, where deposited materials exhibit relatively poor characteristics and add significant cost. On the other hand, continually improving semiconductor devices is a key enabler in the focus of this research-to explore converter operation in the very high frequency (VHF, 30-300 MHz) regime. At VHF, reduced energy storage translates directly to smaller passive components. Significantly, air-core inductors can be implemented in a small volume. These components are simple, small, and ready for batch fabrication. Further, without the frequency dependent loss of a magnetic core, passive components no longer limit switching frequency. The other loss mechanisms—switching and gating loss-are dealt with through resonant circuit techniques. The resulting dc-dc boost converters shown here are capable of nominal power levels above 20 watts and efficiency in excess of 87% while switching at 110 MHz (Figure 1).

The circuit topology in Figure 2 consists of a resonant  $\Phi_2$  inverter coupled to a resonant rectifier [1,3-5]. The result is a resonant dc-dc boost converter operating at VHF (Figure 1). The inverter uses wave-shaping to minimize peak voltage stress (about 2.3 V<sub>IN</sub>) [1], in this case staying within typical integrated power process breakdown voltage limits (50 V) for input voltage over the automotive range (8-18 V). A 110-MHz converter using a commercial MOSFET achieved greater than 87% efficiency. The converter nominally supplies 23 W output power, over an 8-18 V input range and a 22-33 V output range. A 50-MHz converter using a MOSFET fabricated in an integrated power process achieved 75% efficiency over the same voltage ranges with a 19 W nominal output power. The largest inductors in each design are 33 nH and 56 nH, respectively. Both converters use resonant gate drives for high efficiency and operate in bang-bang modulation under hysteretic voltage mode control [3-4,6]. This control scheme takes advantage of exceptional transient performance to achieve wide load range at high efficiency [2]. Greater load range can be achieved by adding more cells. Current work focuses on designing optimal devices for a given power process and co-packaging of passive components.



▲ Figure 1: Picture of a 110 MHz dc-dc boost converter. Vin: 8-18V, Vout: 22-33V, Pnom: 23W, efficiency: 87%.



▲ Figure 2: A VHF dc-dc boost power stage and control scheme. The dc-dc converter cell is modulated by a hysteretic controller ~ 200 kHz while the cell switches ~100 MHz. Ripple is set by the hysteresis band. Power and ripple requirements size the bulk output capacitor.

- J.M. Rivas, Y. Han, O. Leitermann, A.D. Sagneri, and D.J. Perreualt, "A high-frequency resonant inverter topology with low voltage stress," presented at 38<sup>th</sup> IEEE Power Electronics Specialist Conference, Orlando, FL, June 2007.
- [2] J.M. Rivas, R.S. Wahby, J.S. Shafran, and D.J. Perreault, "New architectures for radio-frequency dc-dc power conversion," in Proc. 35<sup>th</sup> Annual IEEE Power Electronics Specialist Conference, Aachen, Germany, June 2004, pp. 4074-4084.
- [3] A.D. Sagneri, "The design of a VHF dc-dc boost converter," Master's thesis, Massachusetts Institute of Technology, Cambridge, 2007.
- [4] R. Pilawa, "VHF dc-dc converter design," Master's thesis, Massachusetts Institute of Technology, Cambridge, 2007.
- [5] J.M. Rivas, "Radio frequency dc-dc power conversion," Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, 2006.
- [6] R. Pilawa, A.D. Sagneri, J.M. Rivas, D. Anderson, and D.J. Perreault, "Very high-frequency resonant boost converters," to be presented at 38<sup>th</sup> IEEE Power Electronics Specialist Conference, Orlando, FL, June 2007.

# **Techniques for Low-jitter Clock Multiplication**

B.M. Helal, M.H. Perrott

High-frequency clocks are essential to high-speed digital and wireless applications. The performance of such clocks is measured by the amount of jitter, or phase noise, their outputs exhibit. Phase-Locked Loops (PLLs) are typically used to generate high-frequency clocks. However, a major disadvantage of PLLs is the accumulation of jitter within their Voltage Controlled Oscillators (VCOs) [1]. Multiplying Delay-Locked Loops (MDLLs) have been introduced recently to significantly reduce the problem of jitter accumulation in PLLs and reject VCO phase noise at a higher bandwidth than possible with PLLs [2-3].

Jitter accumulation is reduced in an MDLL by resetting the circulating edge in its ring oscillator using a clean edge from the reference signal. The Select-logic circuitry commands the multiplexer, using the Sel signal, to pass the reference edge instead of the output edge at the proper time, as shown in Figure 1. The major disadvantage of a typical MDLL is static delay offset, which causes its output to exhibit deterministic jitter and

reference spurs. Static delay offset is caused mostly by path and current mismatches in the phase detector and charge pump, respectively [2-3]. When the VCO is not perfectly tuned, the last output edge and the reference edge that replaces it occur at a time offset, causing an inconsistent transition time. Figure 2 illustrates the problem of static delay offset in a locked MDLL, showing a deterministic jitter of  $\Delta$  seconds peak-to-peak.

A digital correlation technique was developed to eliminate sources of analog mismatches and drastically reduce static delay offset in MDLLs, thereby allowing their use in applications that require low-jitter, high-frequency clocks. A test chip was fabricated using a CMOS 0.13µm process. A highly-digital MDLL prototype, which used a scrambling time-to-digital converter, generated a 1.6GHz output from a 50MHz reference input. Measured results demonstrated reference spurs below -59 dBc and estimated random and deterministic jitter below 1 ps [4].



- ••••••••••••••••••
- Figure 1: Classical MDLL block diagram.



▲ Figure 2: Timing diagram illustrating the effect of static delay offset.

- [1] B. Kim, T. Weigandt, and P. Gray, "PLL/DLL system noise analysis for low jitter clock synthesizer design," in *Proc. International Symposium on Circuits and Systems*, London, England, June 1994, pp. 31–38.
- [2] R. Farjad-Rad et al., "A low-power multiplying DLL for low-jitter multi-gigahertz clock generation in highly integrated digital chips," IEEE Journal of Solid-State Circuits, vol. 37, no. 12, pp. 1804-1812, Dec. 2002.
- [3] S. Ye, L. Jansson, and I. Galton, "A multiple-crystal interface PLL with VCO realignment to reduce phase noise," IEEE Journal of Solid-State Circuits, vol. 37, no. 12, pp. 1795–1803, Dec. 2002.
- [4] B. Helal, M. Straayer, G. Wei, and M. Perrott, "A low jitter 1.6 GHz multiplying DLL utilizing a scrambling time-to-digital converter and digital correlation," in IEEE Symposium on VLSI Circuits Digest of Tech. Papers, June 2007, pp. 166-167.

# A Digitally-enhanced Delta-sigma Fractional-N Synthesizer

C.-M. Hsu, M.H. Perrott Sponsorship: SRC/FCRP C2S2

Recent advances in frequency synthesizer architectures have formed the groundwork for a very exciting and active area of research. On the one hand, the need for a wider-bandwidth fractional-N synthesizer has inspired researchers to develop phase noise cancellation techniques to avoid tradeoffs between noise performance and synthesizer bandwidth [1], as shown in Figure 1. On the other hand, the continuing development of the deep submicron CMOS process has initiated people's interest in alldigital phase locked loop (PLL) [2], which not only leverages the high-speed digital capability available in a deep submicron process but also avoids the problems those a conventional charge-pump PLL may encounter, such as high variation and leakage current. The work in [2] demonstrated that an all-digital synthesizer can meet GSM specifications, but the need of a strong DSP capability and a complicated VCO structure prevents it from being a simple solution for many applications. In addition, the bandwidth of [2] is ten times lower than that achievable by analog techniques [1]. Therefore, the goal of this research is to find a digital synthesizer solution that not only is simpler than [2] but also can achieve a high bandwidth comparable to the analog approach [1].

The high in-band phase noise due to the quantization noise of the time-to-digital converter (TDC) in [2] limits the PLL bandwidth to roughly 40 kHz. Recently, a new TDC architecture introduced in [3] demonstrated the possibility of first-order noise shaping the TDC quantization noise. This technique shows the potential to achieve <-110 dBc/Hz in-band phase noise, according to simulation results. With such a low in-band noise floor, we are able to extend the PLL loop bandwidth to roughly 400 kHz without violating the GSM mask.

The digital controlled oscillator proposed in [2] requires a large, fine-resolution switched capacitor bank, which is a challenging design. An alternative in [4] combines a digital-to-analog converter (DAC) and a conventional analog LC voltage-controlled oscillator into a DAC-controlled oscillator. However, the resistorstring DAC used in [4], which offers an easy implementation, will not support dynamic element matching techniques (DEM). Applying DEM to remove the resistor mismatching is critical, since the DAC nonlinearity will fold MASH quantization noise to lowfrequencies and thereby overwhelm the benefit of our low-phasenoise TDC. Therefore, we modified the resistor-string DAC into another form that enables us to apply DEM technique easily. Just as with the design in [4], this DAC does not require an analog buffer or an op-amp.

By combining the techniques stated above, we expect to achieve a 400-kHz bandwidth digitally-enhanced fractional-N synthesizer with a carrier frequency of 3.6 GHz, which is an order of magnitude higher than offered by [2]. Figure 2 illustrates the block diagram of our architecture. Behavior simulation verifies that the proposed architecture can still meet GSM mask even with the high bandwidth.



▲ Figure 1: Conventional phase cancellation fractional-N synthesizer.



▲ Figure 2: Proposed digital enhanced phase cancellation fractional-N synthesizer.

- S. Pamarti, L. Jansson, and I. Galton, "A wideband 2.4 GHz delta-sigma fractional-N PLL with 1 Mb/s in-loop modulation," IEEE J. Solid State Circuits, vol. 39, no. 1, pp. 49-62, Jan. 2003.
- [2] R. B. Staszewski, et al., "All-digital PLL and transmitter for mobile phones," *IEEE J. Solid State Circuits*, vol. 40, no. 12, pp. 2469-2481, Dec. 2005.
- [3] B. M. Helal, M. Z. Straayer, G.-Y. Wei, and M. H. Perrott, "A low-jitter 1.6 GHz multiplying DLL utilizing a scrambling time-to-digital converter and digital correlation," to be presented at Symp. VLSI Circuits, Kyoto, Japan, June 2007.
- [4] M. Ferriss and M. Flynn, "A 14mW fractional-N PLL modulator with an enhanced digital phase detector and frequency switching scheme," in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, San Francisco, CA, Feb. 2007, pp. 352-353.

## Voltage-controlled Oscillator-based A/D Conversion

M. Park, M.H. Perrott Sponsorship: NSF

There has recently been increasing interest in developing highly digital analog-to-digital converter (ADC) structures for on-chip testing and ease of integration in future CMOS processes. An intriguing circuit to utilize in such cases is a ring oscillator volt-age-controlled oscillator (VCO), which outputs a clock waveform whose frequency is a function of an input tuning voltage. By comparing the clock frequency to that of a separate clock reference using digital counters, one can create an all-digital A/D converter that can be readily utilized for on-chip monitoring of supply voltage variations and other on-chip waveforms [1-2]. A shortcoming of the approach in [1] is that the effective conversion rate must be quite low to achieve high resolution, and the shortcoming of the approach in [2] is that the overall A/D implementation ends up being primarily analog in nature.

In this project, we propose a VCO-based A/D converter structure, which allows second-order  $\Sigma$ - noise shaping to be achieved with a highly digital structure. We suggest the use of a primarily digital structure that is augmented by a small amount of lowperformance analog circuitry to achieve the higher-order noise shaping. Figure 1 shows the proposed second-order  $\sum \Delta$  ADC architecture using a VCO as a first-stage integrator. The VCO and the dual-modulus divider form a feedback path. A second-stage integrator consists of the charge pump and the capacitor.

The prototype chip was fabricated in a 0.18-µm CMOS process. Figure 2 shows the die photograph. The fabricated ADC achieves 60 dB SNR over 1 MHz bandwidth with a sampling rate of 800 MHz, and the quantization noise is second-order noise-shaped. The highly digital architecture makes it possible to realize ADCs with only a small number of analog circuits, and it could potentially be useful for on-chip signal monitoring for ASIC chips, especially in the future CMOS technology.



 $\blacktriangle$  Figure 1: Proposed second-order  $\Sigma\text{-}\Delta$  ADC employing a VCO.



▲ Figure 2: Die photograph of prototype IC.

- [1] E. Alon, V. Stojanović, and M.A. Horowitz, "Circuits and techniques for high-resolution measurement of on-chip power supply noise," *IEEE Journal of Solid-State Circuits*, vol. 40, no. 4, pp. 820–828, Apr. 2005.
- [2] A. Iwata, N. Sakimura, M. Nagata, and T. Morie, "An architecture of delta sigma A-to-D converters using a voltage controlled oscillator as a multi-bit quantizer," *IEEE Transactions on Circuits and Systems II*, vol. 46, no. 7, pp. 941–945, July 1999.

# A DA ADC with Noise-shaping VCO Quantizer and DEM Circuit

M. Straayer, M.H. Perrott Sponsorship: Lincoln Laboratory

A combined 5-bit, 1st order noise-shaped quantizer and dynamic element matching (DEM) circuit running at 950 MHz based on a multi-phase voltage controlled oscillator (VCO) is presented. This quantizer structure is the key element in a 3rd-order noise-shaped analog-to-digital converter (ADC) with 2nd order loop dynamics and a single op-amp.

Figure 1 shows the basic operation of the multi-phase VCO-based quantizer. In this circuit, the VCO integrates an input voltage into a phase, and a digital quantizing register captures the phase state of the VCO. The quantization is particularly efficient due to the digital form of the ring oscillator. When the quantized phase is differentiated, a digital output proportional to the input voltage results. However, because the quantization error is also differentiated, the VCO-quantizer achieves first-order noise-shaping.

To improve the linearity and quantization noise performance of the converter, a sigma-delta feedback loop is formed with secondorder loop dynamics. Figure 2 shows a simplified block diagram of this architecture. Interestingly, the rotation of the VCO phase can be utilized to perform dynamic element matching on the feedback DAC elements.

The authors wish to acknowledge MIT Lincoln Laboratory for research support through the Lincoln Scholars Program.



- M. Straayer and M. Perrott, "A 10-bit 20MHz 38mW 950MHz CT ΣΔ ADC with a 5-bit noise-shaping VCO-based quantizer and DEM circuit in 0.13u CMOS," to be presented at VLSI Symposium, Kyoto, Japan, June 2007.
- [2] A. Iwata, N. Sakimura, M. Nagata, and T. Morie, "The architecture of delta sigma analog-to-digital converters using a VCO as a multibit quantizer," IEEE Transactions on Circuits and Systems II, vol. 46, no. 7, pp. 941-945, July 1999.
- [3] J. Kima and S. Cho, "A time-based analog-to-digital converter using a multi-phase VCO," in Proc. International Symposium on Circuits and Systems, Island of Kos, Greece, May 2006, pp. 3934-3937.
- [4] R. Naiknaware, H. Tang, and T. Fiez, "Time-referenced single-path multi-bit delta sigma ADC using a VCO-based quantizer," IEEE TCAS II, vol. 47, no. 7, pp. 596-602, July 2000.
- [5] E. Alon, V. Stojanović, and M. Horowitz, "Circuits and techniques for high-resolution measurement of on-chip power supply noise," IEEE Journal of Solid-State Circuits, vol. 40, no. 4, pp. 820-828, Apr. 2005.

## A Sub Picosecond Time-to-digital Converter for On-chip Jitter Measurement

K. Johnson, M.H. Perrott Sponsorship: SRC/FCRP C2S2

Digital chip clock distribution consumes a major portion of the chip power budget. On-chip jitter and phase noise measurement promises the ability to control the jitter and power consumption in real-time. Current on-chip jitter measurement systems measure either a histogram with fine resolution [1] or transient jitter using a time to digital converter (TDC) with coarser resolution [2]. A finer resolution TDC would result in a single circuit that satisfies both purposes.

In a related application space, recent innovations in frequency synthesizers [3] make increased use of digital components including a TDC as a phase detector. The resolution of the TDC sets an upper limit on the bandwidth of the frequency synthesizer. A finer resolution TDC would result in lower in-band phase noise, wider bandwidth, and greater frequency agility.

Our architecture uses dividers and delay stages to present the delay to the digitizer (Figure 1). The remainder of the system is discrete time. Our work has a time resolution below the minimum inverter delay. All signals are full-swing digital signals with no information stored as low-frequency analog voltages to increase immunity to supply noise. The architecture is suitable for digital standard cells, which allows for simple migration as technology scales.



▲ Figure 1: A TDC converter architecture. The dividers and the delay chain present edges to the digitizer.

- [1] K.A. Jenkins, A.P. Jose, and D.F. Heidel, "An on-chip jitter measurement circuit with sub-picosecond resolution," in *Proc. European Solid-State Circuit Conference*, Grenoble, France, Sept. 2005, pp. 157-160.
- [2] K. Nose, M. Kajita, and M. Mizuno, "A 1-ps resolution jitter-measurement macro using interpolated jitter oversampling," IEEE Journal of Solid-State Circuits, vol. 41, no. 12, pp. 2911-2920, Dec. 2006.
- [3] R.B. Staszewski, S. Vemulapalli, P. Vallur, J. Wallberg, and P.T. Balsara, "1.3V 20 ps time-to-digital converter for frequency synthesis in 90-nm CMOS," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 53, no. 3, pp. 220-224, Mar. 2006.

# Techniques for Highly-digital Implementation of Clock and Data Recovery Circuits

C. Lau, M.H. Perrott Sponsorship: SRC/FCRP C2S2

A clock and data recovery (CDR) circuit is an essential building block in a chip-to-chip communication system. Its key functions are to extract the clock signal and retime the data from an incoming non-return to zero (NRZ) data stream. As shown in Figure 1, conventional designs of CDR circuits typically employ a phaselocked loop (PLL), which consists of analog components such as a phase detector (PD), charge pump, loop filter, and voltage controlled oscillator (VCO). Although this analog implementation works well in most modern applications, we have started to see its limitations as we migrate to deep-submicron CMOS IC processes. For example, this analog system relies on low-leakage capacitors to hold values when the phased-locked loop is locked. The input of the VCO must be held stable in order to minimize frequency drift and jitter in the recovered clock. However, as the leakage current problem worsens in new generations of CMOS process, it demands a great deal of power and chip area to maintain not only the CDR's performance but also its functionality.

In view of the above challenge, it would be useful to pursue a new mixed-signal CDR architecture that minimizes its analog content and takes advantage of digital circuits. In line with this goal, recent research on digitally-controlled oscillators (DCOs) has demonstrated the feasibility of achieving fine resolution in frequency synthesis through digital control [1]. Therefore, we propose a highly-digital CDR circuit that leverages digital circuits, as shown in Figure 2. We use a bang-bang phase detector to generate error pulses of fixed width, which are then directly treated as digital signals in the subsequent digital blocks in the major loop. In this way, we can preserve the digital nature in the control path to the DCO, thus alleviating the need for high-performance, low-leakage analog components. We also utilize a simple analog feedback loop to linearize the bang-bang phase detector's nonlinear dynamics. Simulation results show that the achievable recovered clock jitter is around 2ps RMS and verify that this architecture meets the OC-48 SONET specification. This design has been implemented in the 0.18-um CMOS process and is being tested.

We acknowledge National Semiconductor for providing the fabrication services.







Figure 2: The proposed digital CDR architecture.

#### REFERENCES

 R.B. Staszewski, et al., "A first digitally-controlled oscillator in a deep-submicron CMOS process for multi-GHz wireless applications," in Proc. IEEE Radio Frequency Integrated Circuits Symposium, June 2003, pp. 81-84.

## Low-power CMOS Rectifier Design for RFID Applications

S. Mandal, R. Sarpeshkar Sponsorship: Symbol Technologies

We have developed a general theory for far-field RF power extraction (or harvesting) systems. Such systems consist of an antenna and impedance-matching network that capture and efficiently transfer radiated RF power to a rectifier that converts it to DC for powering other circuits [1]. We have studied how fundamental physical relationships that link the operating bandwidth and range of such systems are related to technology-dependent quantities like transistor threshold voltage and parasitic capacitances. An important conclusion is that major improvements in rectifier efficiency are possible when advanced CMOS processes are used to fabricate them. The availability of high-Q capacitors and transistors with lower gate resistance, threshold voltage and parasitic capacitances, i.e., higher fT, in such processes proves to be crucial.

We have used our theory to accurately model far-field power extraction systems for passive RFID tags operating at UHF (850-950MHz). Efficient planar antennas, coupled resonator impedance matching networks and low power all-MOS rectifiers fabricated in standard CMOS technologies (0.5µm and 0.18µm) have been individually designed and later combined to form complete power extraction systems. One of our systems was found to have power-up thresholds of 6µW±10% (at 1µW load) and 8.5µW±10% (at 2µW load) while operating around 950MHz, closely matching values predicted by theory  $(5.2\mu W \text{ and } 8\mu W)$ , respectively). These low values of the power-up threshold allow the operating range of passive RFID tags to be extended without increasing the transmitted power. As far as we know, our experimental results constitute the best performance reported from a far-field power extraction system built in standard CMOS to date.



▲ Figure 1: Photograph of completed power extraction system. The planar antenna and impedance-matching network occupies most of the area. The packaged chip containing the MOS rectifier is also visible.



▲ Figure 2: Measured load curves of the power extraction system. The load current was varied from 10nA to 10 $\mu$ A for available power levels PA ranging from -24.7±0.5dBm (3.4 $\mu$ W±10%) to -17.7±0.5dBm (17.2 $\mu$ W±10%).

## REFERENCES

[1] K. Finkenzeller, *RFID Handbook: Fundamentals and Applications in Contactless Smart Cards and Identification*, second ed. Chicester: John Wiley & Sons, Inc., 2003.

## Low-power Circuits for Brain-machine Interfaces

R. Sarpeshkar, W. Wattanapanitch, B.I. Rapoport, S.K. Arfin, M.W. Baker, S. Mandal, M. Fee, R.A. Andersen, S. Musallam

This work involves the development of ultra-low-power circuits for brain-machine interfaces with applications for paralysis prosthetics, prosthetics for the blind, and experimental neuroscience systems. The circuits developed include a micropower neural amplifier with adaptive power biasing for use in multi-electrode arrays (Figure 1); an analog linear decoding and learning architecture for data compression; radio-frequency (RF) impedance modulation for low-power data telemetry; a wireless link for efficient power transfer; mixed-signal system integration for efficiency, robustness, and programmability; and circuits for wireless stimulation of neurons. Experimental results have been obtained from chips that have recorded from and stimulated neurons in the zebra-finch brain (Figure 2) and from RF power-link systems. Circuit simulations have also successfully processed prerecorded data from a monkey brain and from an RF data telemetry system.



Figure 1: The adaptive micropower neural amplifier circuit.



▲ Figure 2: Recordings obtained from the RA region of a zebra-finch brain using the amplifier circuit shown in Figure 1.

# A 77-GHz Receiver Front-end for Passive Imaging

J.D. Powell, H. Kim (Lincoln Laboratory), C.G. Sodini Sponsorship: SRC/FCRP C2S2, Lincoln Laboratory

The area of Millimeter-Wave (MMW) system research and design has become increasingly popular in recent years, as advanced silicon processes have enabled integrated circuit operation in the MMW regime. The SiGe process features 200+ GHz f<sub>p</sub> including fully modeled passive elements. Several applications exist for MMW design, including wireless communications at 60-GHz, collision-avoidance radar imaging at 77-GHz, and concealed weapons detection imaging at 77-GHz and higher. Significant advances have been made in these areas using SiGe technology [1]. This research will focus on a passive imager front-end that has been developed and tested for the application of concealed weapons detection.

The Passive Imager enables detection of concealed weapons, given that they are composed of materials that possess emissive properties that contrast with those of the human body. The integrated Passive Imager operates from 73-81 GHz. Compared with current mm-wave research, this system is a wideband receiver that is fully differential, which allows many receivers to be tightly packaged in an array for the passive imager. The Passive Imager RF Front-End is composed of a low noise amplifier (LNA) tuned to the RF frequency band of 73-81 GHz; a double-balanced mixer, which down-converts the RF frequency to the

intermediate frequency (IF) range of 1-9 GHz; and an on-chip cross-coupled voltage controlled oscillator (VCO), which provides a local oscillator frequency of 72 GHz. The LNA is a two-stage independently biased cascode design, which achieves 4-6 dB NF, 20-26 dB gain, and excellent impedance matching. The mixer design is a double-balanced Gilbert cell with IF amplifier. The stand-alone mixer has a broadband 180° hybrid at LO and RF input ports for testing purposes. The mixer achieves 12-14 dB NF, 20-26 dB conversion gain and P1dB of -26 dBm. The VCO core is a cross-coupled pair incorporating capacitive coupling and independent base biasing. The capacitors in feedback act as capacitive dividers and enable higher output power from the VCO core, as well as a substantially higher output frequency. The VCO achieves output power of -2 to 0 dBm, center frequency of 72-GHz and phase noise of approximately -93 dBc/Hz.

The Passive Imager RF Front-End block diagram is shown in Figure 1. The integrated chip, implemented in 0.13-um SiGe, is shown in Figure 2. It achieves particularly impressive conversion gain for wide-bandwidth applications, with approximately 46-36 dB conversion gain and 7-10 dB NF from 1-9 GHz. The P1dB is approximately -38 dBm at 76 GHz RF.



Figure 1: Block diagram of RF front-end.



▲ Figure 2: Die photo of front-end receiver composed of the LNA, VCO and mixer.

## REFERENCES

[1] S. Reynolds, B.A. Floyd, U.R. Pfeiffer, T.J. Beukema, T. Zwick, J. Grzyb, D. Liu, and B.P. Gaucher, "Progress toward a low-cost millimeter-wave silicon radio," in *Proc. IEEE Custom Integrated Circuits Conference*, San Jose, CA, Sept. 2005, pp. 563-570.

## **Power Amplifier Design for Millimeter-wave Imaging**

K.M. Nguyen, C.G. Sodini Sponsorship: SRC/FCRP C2S2

This research investigates the challenges of designing a power amplifier (PA) that could be used in a millimeter-wave (MMW) imaging system. A 130-nm SiGe BiCMOS process was used to develop an understanding of the specification limits. At MMW frequencies, the operating frequencies are pushing towards the  $f_T$  (200 GHz) of the devices. Furthermore, the low breakdown voltage of the bipolar devices limits the voltage swing and output power of the PA. To overcome this, a cascode topology was used in which the DC base resistance of the cascode transistor was reduced to increase the breakdown voltage of the transistor and allow more voltage swing. The reduced Miller effect from the cascode gave an increase in power gain.

A simulation study was conducted to determine the device parameters that limited the performance of MMW PAs. The operating frequency was pushed to 120 GHz, and a nominal PA was designed and simulated. Parameters within the model file were systematically changed, and the nominal PA was redesigned to compensate for the adjusted parameters. The change in performance could be attributed to the specific parameter. We found that the most significant parameters were the intrinsic and extrinsic base-collector capacitances rather than the base transit time. This showed that reducing the line widths of the bipolar devices provides greater gains in PA performance than reducing the base widths.

A test chip was submitted for fabrication in December 2006. Due to limitations of available test equipment, the operating frequency was reduced to 110 GHz. Using a 2-stage cascode design, the PA achieves a simulated maximum output power of 10.7 dBm and 6.7 dBm 1-dB compression point. The maximum power added efficiency is 5.1%, and it has a 13.6 dB power gain. Figure 1 shows a 77 GHz PA that was submitted for fabrication in January 2007 using a topology similar to the 110 GHz design. Simulation results of this PA are shown in Figure 2. It shows a maximum output power of 12 dBm and an 8.1 dBm 1-dB compression point. The maximum power added efficiency is 6.0%, and it has a 15-dB power gain at 77 GHz.



▲ Figure 1: Layout of the 2-stage 77 GHz PA that was submitted in January 2007 in a 120-nm BiCMOS process. The die dimension is 1.1mm x 0.75mm.





# A 77-GHz System for Millimeter-wave Active Imaging

A. Accardi, J. Chu, K. Nguyen, J. Powell, H. Kim (Lincoln Laboratory), G. Wornell, H.-S. Lee, C.G. Sodini Sponsorship: SRC/FCRP C2S2

Due to advances in silicon and digital processing technology, lowcost millimeter-wave (MMW) imaging solutions with high antenna array density are now viable. While millimeter resolution or better is desirable for many applications, this wavelength is large enough to avoid scattering by tiny interfering particles. Furthermore, a large bandwidth can be supported at this high carrier frequency. The MMW technology is therefore well suited for applications such as automotive collision avoidance and concealed weapons detection.

By superimposing the signals recorded at antennas configured in an array, the imaging receiver can be focused on a portion of the scene corresponding to a particular pixel. This process, called beam forming, makes use of constructive interference at the carrier frequency and allows the receiver to be "electronically steered" without any moving parts. However, very low phase noise is required for fine resolution at long range. Traditionally, beam-formers at such high frequencies are fabricated using custom analog technology to ensure precise phase control. Our system performs digital beam forming, allowing for low-cost, large-scale production and low power consumption. We address the phase noise by over-sampling, averaging, and employing feedback. That is, we correct for phase noise introduced in the analog and data conversion circuitry in the digital domain, thereby driving research with high data rate, low phase noise, and low power consumption requirements. Figure 1 illustrates the system and indicates the components we plan to fabricate. Our goal is to justify the system architecture and establish a proof of concept by implementing the most challenging components.





# Coding in Wideband OFDM Wireless Communications with Adaptive Modulation

F. Edalat, C.G. Sodini Sponsorship: CICS, NSF, Texas Instruments Fellowship

To achieve high-speed wireless communications, such as streaming of next-generation Gigabit Internet or HDTV, orthogonal frequency division multiplexing (OFDM) has been proven as the enabling technology. In an indoor environment where reflections from the surrounding objects result in multiple copies of the transmitted signal arriving at the receiver, the channel is highly frequency-selective over a wide bandwidth. An OFDM system decomposes such a channel into multiple flat fading sub-bands by transmitting the high data-rate signal in multiple parallel lower data-rate blocks. Furthermore, an OFDM system can exploit this channel characteristic to maximize the data rate by adapting the modulation per bin based on the estimated Signal-to-Noise Ratio (SNR), as shown in Figure 1 [1]. In addition, channel coding is necessary to achieve the required system performance with a limited transmit power. In this work, we determine suitable codes in an adaptive modulation OFDM system to achieve highest throughput with a constrained latency. In particular, we analyze the benefits and tradeoffs of such codes as convolutional coding, trellis-coded modulation, and capacity-approaching low-density parity-check codes used in current OFDM systems.

To measure the performance gain from coding with adaptive modulation in an indoor wireless environment, we have implemented a transceiver prototype. In this prototype, the adaptive modulation takes place in three steps. In step 1, the transmitter sends a training sequence, and the receiver measures the SNR on each bin. If the true SNR is known for each sub-band, the most efficient sub-band modulation that yields an uncoded bit error rate (BER) smaller than  $10^{-3}$  is selected, as shown in Figure 2. However, the errors in the estimation of channel and time and frequency synchronizations result in a loss in SNR that increases the SNR thresholds in Figure 2. Next, in step 2, the receiver feeds back to the transmitter the assigned modulation scheme, which the transmitter uses to sends the data packet in step 3.







▲ Figure 2: Sub-band modulation assignment at target uncoded BER of 10<sup>-3</sup>.

#### REFERENCES

[1] F. Edalat, J.K. Tan, M. Nguyen, N. Matalon, and C.G. Sodini, "Measured data rate from adaptive modulation in wideband OFDM systems," in *Proc.* 2006 IEEE International Conference on Ultra-Wideband, Boston, USA, Sept. 2006, pp. 195-200.

# An Organic Imager for Flexible Large-area Electronics

I. Nausieda, K. Ryu, A.I. Akinwande, V. Bulović, C.G. Sodini Sponsorship: SRC/FCRP C2S2

Interest in organic semiconductors is sustained in part by the promise of large-area and flexible electronics. Early work in this field has focused on the fabrication and testing of discrete devices, such as an organic field effect transistor (OFET), or organic photoconductor (OPD). Creating electronic systems, however, requires an integrated approach, for both fabrication and testing.

In this work, a 4x4 addressable imager consisting of OFET switches with OPDs was fabricated and tested, using a near-room temperature (<95°C) process [1]. The individual pixel circuit is shown in the top left of Figure 1. The OFET and lateral OPD were sized in order for the OPD to determine the pixel conductance while the OFET was on and for the pixel conductance to be dominated by the OFET while it was biased off. Since the OFET acts only as a switch, the circuit is robust to process variation and degradation over time. Measurements indicate a pixel responsivity of  $6x10^{-5}$  A/W and an on/off ratio of 880 at a luminance of 5mWcm<sup>-2</sup>. The conductivity versus luminance is pictured in Figure 1.

The fabricated active matrix imager is seen in Figure 2. It occupies an area of 10.24mm<sup>2</sup> and uses a 25-V power supply. The imager was demonstrated to correctly image a "T" pattern after a first-order calibration.



▲ Figure 2: Optical micrograph of the 4x4 active matrix



imager.

▲ Figure 1: Pixel conductivity versus luminance, for the cases of the OFET switch biased on and off.

## REFERENCES

 I. Nausieda, K. Ryu, I. Kymissis, A.I. Akinwande, V. Bulovic, and C.G. Sodini, "An organic imager for flexible large area electronics," in *IEEE Interna*tional Solid State Circuits Conference Digest, San Francisco, CA, Feb. 2007, pp.72-73.

## Channel- and Circuits-aware, Energy-efficient Coding for High-speed Links

N. Blitvic, L. Zheng, V. Stojanović Sponsorship: SRC/FCRP C2S2

In order to achieve high throughput while satisfying energy and density constraints, both the data rates and the energy efficiency of high-speed chip-to-chip interconnects need to increase. This project aims to extend the link system design to incorporate energy-efficient channel coding techniques. To enable the systematic characterization of different codes, we have developed a new statistical simulator [1]. The simulator employs a divide-andconquer approach to fully account for the effect of long residual interference on the received voltage for systematic binary-linearblock-coded transmissions. It is therefore the first link simulator to accurately account for dependencies in transmitted data instead of approximating the transmissions as independent (uncoded). This method is of most use for high-rate codes, but it remains valid for both arbitrarily long channel lengths and block lengths. The resulting probability distributions are computed analytically and are therefore accurate at the low bit-error-rates (BER) required in high-speed links.

Integrating the voltage probability distributions computed through the previously described technique yields individual cross-over probabilities for different bit locations in a codeword. The resulting difference in the cross-over probabilities of individual bits compared to the cross-over probability of an uncoded system indicates the extent to which coupling residual interference with the data correlation due to binary linear block coding affects the performance of the system. Figures 1 and 2 illustrate this effect for two different link channels. Although the present focus is on developing simple codes tailored for interferencedominated environments, the current framework also allows for the characterization of classical error-detecting codes. For links with relatively well-compensated interference and low noise correlation, the channel can be approximated as binary-symmetric. The individual crossover probabilities are therefore sufficient to compute the error probabilities after error correction. This property was verified in [1] for some typical link channels and down to BERs achievable by Monte Carlo simulation.



▲ Figure 1: Voltage distributions for the set of all (10,8) linear block codes. Shown are the cumulative mass functions (CMF) for two different channels: Peters B3 operating at 5 Gbps [top] and Peters B32 operating at 10 Gbps [bottom]. The plots also show the voltage CMF computed under the assumption that the data is uncoded.



▲ Figure 2: Bit crossover probabilities for the uncoded case and two different (10,8) linear block codes on a B3 channel [top] and B32 channel [bottom]. Code 2 was chosen to yield the maximum deviation from the uncoded CMF, while Code 1 was chosen roughly in between the two extremes. The results are computed under the common assumption that the link noise is additive, white and Gaussian with  $\sigma \approx 3$ mV.

#### REFERENCES

[1] N. Blitvic and V. Stojanović, "A new statistical simulator for block-coded channels with long residual interference," to be presented at *IEEE International Conference on Communications*, Glasgow, Scotland, June 2007.

## Design and Optimization of Equalized Interconnects for Energy-efficient On-chip Networks

B. Kim, V. Stojanović Sponsorship: NEC Fund, IBM Faculty Award

In recent high-performance processor design, multi-hierarchical co-optimization of on-chip network and overall chip architecture improves the performance-power efficiency significantly [1]. Though equalized on-chip interconnects have been proposed to improve the network efficiency [2-3], the multi-hierarchical co-optimization of equalized interconnect has been a difficult problem due to its design complexity.

This work presents a modeling and tool framework for fast design space exploration of equalized on-chip interconnects by exporting abstracted low-level design parameters to a link model. Using this tool technique, we can explore how the transistor and wire parameters affect link performance; equalization coefficients; and architecture-friendly metrics like delay, power, and area throughput density. With this approach, we are able to find the best link design for target throughput power and area constraints, thus enabling the architectural optimization of energy-efficient on-chip networks.

Figure 1 shows the hierarchical simulation framework. The lower level models are abstracted into the higher-level models. For example, RLGC matrices of the wire's transmission line are used to derive the through- and cross-talk closed-form transfer functions of the channel. At the top level, the behavioral model simulator uses the transfer functions to compute the link metrics and provides interconnect metrics for a low common mode (LCM) type equalized interconnect [4]. Figure 2 shows optimization results comparing interconnect metrics between the LCM and the repeated interconnects. Our simulation shows that the equalized LCM interconnect is much more power-efficient than the repeated interconnect for given target throughput density.



Figure 1: Hierarchical simulation framework.





- R. Kumar, V. Zyuban, and D.M. Tullsen, "Interconnections in multi-core architectures: understanding mechanisms, overheads and scaling," presented at the Proc. 32nd International Symposium on Computer Architecture, 2005.
- [2] A.P. Jose, G. Patounakis and K.L. Shepard. "Near speed-of-light on-chip interconnects using pulsed current-mode signalling," VLSI Symposium on Circuits Digest of Technical Papers, June 2005, pp. 108-111.
- [3] D. Schinkel, E. Mensink, E.A. Klumperink, E. van Tuijl, and B. Nauta, "A 3-Gb/s/ch transceiver for 10-mm uninterrupted RC-limited global on-chip interconnects," IEEE Journal of Solid-State Circuits, vol. 41, no. 1, pp. 297-306, Jan. 2006.
- [4] H. Hatamkhani, K.J. Wong, R. Drost and C.K. Yang, "A 10-mW 3.6-Gbps I/O transmitter," VLSI Symposium on Circuits Digest of Technical Papers, June 2003, pp. 97-98.

## System-to-circuit Framework for High-speed Link Design-space Exploration

R. Sredojević, T. Khanna, V. Stojanović, J.L. Dawson

Currently, we aim to bridge the gap between analog/mixed-signal circuit and system design by providing a framework for fast design space exploration at the system-to-circuit level, based on the bottom-up information from the underlying circuits and process technology. This gap is particularly severe in high-speed link I/O circuits, which are rapidly growing into mini-communication systems due to the bandwidth limitations of packages and board traces [1]. Finding the best methods to compensate intersymbol interference and minimize timing noise while running circuits at lowest possible power end maximum possible data rate is a difficult balancing act that requires extremely tight connection between circuit and system levels.

We try to provide a missing link between the system and circuit levels by formulating the system-to-circuit high-speed link description. This framework intimately connects circuit level parameters with block and system-level link specification, providing a direct vertical link from transistor sizes and parasitic to top-level link metrics: data rate, power, and bit-error-rate. We want this framework to provide answers to questions that link designers often ask: Which equalization method should be used (transmit pre-emphasis, linear analog receiver equalizer, decision-feedback equalizer)? What is the power/data-rate trade-off? One of the most interesting descriptions of a high-speed link design space is a power vs. data rate trade-off for a given BER. Different link architectures, driver styles, and equalization methods can be plugged into our system-to-circuit framework to enable design space exploration. In Figure 1 we show the trade-off for a high-speed link with one-tap of transmit pre-emphasis and with receiver pre-amplifier equalization. When coupled with transmit pre-emphasis, receiver amplification improves the power-data rate trade-off since receiver pre-amplifier can drive larger onchip impedance.

Next, in Figure 2 we show transmitter tap coefficients w for data rates in Figure 1. At lower data rates, where channel attenuation is not too strong, receive equalizer is very efficient, taking on most of the equalization and amplification roles. At higher loss conditions, the residual ISI in the channel saturates the input range of the receiver equalizer (the non-linearity limit) and transmit pre-emphasis has to increase to narrow down the dynamic range of the signal at the input to the receiver (transmit pre-emphasis attenuates the DC value of the received signal).







Figure 2: Transmit side tap coefficient.

#### REFERENCES

[1] B. Casper, M. Haycock, and R. Mooney, "An accurate and efficient analysis method for multi-gb/s chip-to-chip signaling schemes," VLSI Symposium on Circuits Digest of Technical Papers, June 2002, pp. 54–57.

## System Architecture Implications of CNT Interconnects

F. Chen, V. Stojanović, A.P. Chandrakasan Sponsorship: SRC/FCRP IFC

As CMOS processes scale into the nanometer regime, lithography limitations, electromigration problems that increase resistivity, and relative delay of copper interconnects have driven the need to find alternative interconnect solutions. Carbon Nanotube (CNT) interconnects have emerged as a potential candidate to supplant copper interconnects because of their purported ballistic transport and ability to carry large current densities in the absence of electromigration. There have been several investigations [1-2] that assess the potential use of CNTs as interconnects in scaled VLSI applications. However, these works primarily focus on the relative interconnect delay of CNTs to copper for forthcoming technology nodes and do not address any higher-level issues.

In this work, we intend to investigate the relative system impact of using CNTs in order to gain insight as to how CNTs should be integrated into future processes. The CNTs are in effect a material similar to copper but with superior conducting properties as shown in Figure 1. The use of CNTs presents an opportunity to

rescale the cross-sectional dimensions of the interlayer dielectric (ILD) stack up to take advantage of CNT properties. Figure 2 shows the energy delay tradeoff curves for several ILD and wire cross-sections for hybrid CNT and copper interconnects. The W and H values shown represent the wire width and dielectric height values normalized to the minimum copper wire width for the process and the nominal dielectric height, respectively. The CNT interconnects are assumed to have a resistivity close to that of bulk copper, which is roughly 2X better at sub-45-nm technology nodes. Results of the study indicate there is potential for improvements in both delay and energy by introducing a combination of CNT wires and CNT vias. Initial studies investigating the impact of using CNT interconnects on dense, buffered, on-chip routing networks show nearly a 2X increase in aggregate routing bandwidth and 3X longer routing distance before aggregate bandwidth saturates as compared to copper at the same technology node.



Figure 1: Effective resistivity as a function of technology node. Figure 1 shows the effective resistivity of copper (Cu) interconnects and carbon nanotubes (CNTs) as the technology process scales. The Cu values are taken from the ITRS roadmap and include grain boundary scattering effects. The three CNT curves show the effective resistivity for ideally contacted CNTs and CNTs with  $50k\Omega$  of contact resistance at lengths of 10 times and 100 times the minimum metal pitch for the process node.



▲ Figure 2: Figure 2 shows energy vs. delay curves for various inverter driven copper and CNT wiring cross-section configurations at the 45 nm node. The load being driven is an 8X minimum-sized buffer load, and the length of the wires is 1000 times the minimum wire pitch. Variable H is the dielectric height normalized to the nominal ILD height, and W is the minimum wire width normalized to the minimum wire width for the process. Curves marked as "CNT" are assumed to have 1/2 the resistivity of copper wires.

- [1] A. Naeemi and J.D. Meindl "Design and performance modeling for single-walled carbon nanotubes as local, semi-global, and global interconnects in gigascale integrated systems," *IEEE Trans. on Electron Devices*, vol. 54, no. 1, pp. 26-37, Jan. 2007.
- [2] N. Srivastava and K. Banerjee "Performance analysis of carbon nanotube interconnects for VLSI applications," IEEE/ACM International Conference on Computer-Aided Design, San Jose, CA, Nov. 2005, pp. 383-390.