# **CIRCUITS & SYSTEMS**



▲ Transceiver architecture (D.C. Daly, A.P. Chandrakasan, p. 8).

| Error-correcting Codes that Minimize Receiver Turn-on Time                                                     | 3    |  |  |
|----------------------------------------------------------------------------------------------------------------|------|--|--|
| A Robust Digital Baseband for UWB Communications                                                               |      |  |  |
| CAD for Tile-based 3-D Field Programmable Gate Arrays.                                                         |      |  |  |
| A Power-efficient Multi-local PLL Clock Distribution                                                           | 6    |  |  |
| Carbon Nanotube - CMOS Chemical Sensor Integration                                                             | 7    |  |  |
| An Energy Efficient Transceiver for Wireless Micro-Sensor Applications                                         | 8    |  |  |
| Prediction of Variation in Advanced Process Technology Nodes                                                   | 9    |  |  |
| Deep Sub-micron CMOS Analog-to-Digital Conversion for Ultra-wideband Radio                                     | . 10 |  |  |
| Fine-grain Power Control for Field Programmable Gate Arrays                                                    | . 11 |  |  |
| A Micropower DSP Architecture for Self-powered Microsensor Applications.                                       | . 12 |  |  |
| CMOS Circuit Techniques for Data-rate Enhancement in VCSEL-based Chip-to-chip Optical Links                    | . 13 |  |  |
| A Sub-threshold Cell Library and Methodology                                                                   | .14  |  |  |
| Design of a High-speed. High-resolution DAC in 2-D and 3-D Processes.                                          | . 15 |  |  |
| An Energy-efficient Ultra-wideband Radio Receiver                                                              | . 16 |  |  |
| 3-D SRAM Architecture and Circuits                                                                             | .17  |  |  |
| An Energy-optimal Power Supply for Digital Circuits                                                            | . 18 |  |  |
| An Energy-efficient Digital Baseband Processor for Pulsed UWB Using Extreme Parallellization                   | . 19 |  |  |
| An Ultra Low-power ADC for Wireless Micro Sensor Applications                                                  | . 20 |  |  |
| An All-digital. Pulsed-UWB Transmitter in 90-nm CMOS                                                           | . 21 |  |  |
| Parameterized Model Order Reduction of Nonlinear Circuits and MEMS                                             | . 22 |  |  |
| Development of Specialized Basis Functions and Efficient Substrate Integration Techniques for                  |      |  |  |
| Electromagnetic Analysis of Interconnect and RF Inductors                                                      | . 23 |  |  |
| A Quasi-convex Optimization Approach to Parameterized Model-order Reduction                                    | . 24 |  |  |
| RF PA Linearization: Open-loop Digital Predistortion Using Cartesian Feedback for Adaptive PA Characterization | . 25 |  |  |
| Convex Optimization of Integrated Systems Using Geometric Programming                                          | . 26 |  |  |
| High-Resolution, Pipelined Analog-to-Digital Conversion Using Comparator-Based Switched-Capacitor Techniques   | . 27 |  |  |
| High Speed Time-Interleaved Comparator-Based Switch Capacitor ADC                                              | . 28 |  |  |
| Low-Voltage Comparator-Based Switched-Capacitor Sigma-Delta ADC                                                | . 29 |  |  |
| Noise Analysis of Threshold Detection Comparators                                                              | . 30 |  |  |
| Massively Parallel ADC with Self-Calibration                                                                   | . 31 |  |  |
| Intelligent Night-Vision Human Detection System                                                                |      |  |  |
| Vision-Based System for Occupancy and Posture Analysis                                                         |      |  |  |
| Techniques for Low-jitter Clock Multiplication                                                                 |      |  |  |
| Advanced Delay-locked Loop Architecture for Chip-to-Chip Communication                                         | . 35 |  |  |
| Digital Techniques for the Linearization of RF Transmitters                                                    | . 36 |  |  |
| Techniques for Highly Digital Implementation of Clock and Data Recovery Circuits                               | . 37 |  |  |
| Digital Implementation and Calibration Technique for High-speed Continuous-time Sigma-Delta A/D Converters     | . 38 |  |  |
| High-performance Time-to-Digital Conversion and Applications                                                   | . 39 |  |  |
| Fast Cochlear Amplification with Slow Outer Hair Cells                                                         | . 40 |  |  |
| Circuits for an RF Cochlea                                                                                     | . 41 |  |  |
| An Analog Storage Cell with 5 Electrons/sec Leakage                                                            | . 42 |  |  |
| A Time-based Energy-efficient Analog-to-Digital Converter                                                      | . 43 |  |  |
| Optimization of System and Circuit Parameters in Wideband OFDM Systems                                         | . 44 |  |  |
| Comparator-based Switched-capacitor Circuits (CBSC)                                                            | . 45 |  |  |
| A Wideband $\Delta\Sigma$ Digital-RF Modulator                                                                 | . 46 |  |  |
| Area- and Power-Efficient Integrated Transceivers for Gigabit Wireless LAN                                     |      |  |  |
| Optical-feedback OLED Display Using Integrated Organic Technology                                              |      |  |  |
| A 77-GHz Receiver for Millimeter Wave Imaging                                                                  | . 49 |  |  |
| Realization of Baseband DSP Core for the Wireless Gigabit LAN                                                  | . 50 |  |  |
| Channel-and-Circuits-Aware, Energy-Efficient Coding for High-speed Links                                       | . 51 |  |  |
| Efficiency of High-speed On-Chip Interconnect: Trade-off and Optimization                                      | . 52 |  |  |

# for additional reading...

| Characterization of Organic Field-Effect | Transistors for OLED Displays | 246 |
|------------------------------------------|-------------------------------|-----|

# **Error-correcting Codes that Minimize Receiver Turn-on Time**

M. Bhardwaj, A.P. Chandrakasan Sponsorship: NSF, IBM Fellowship

The profusion of wireless client devices and sensor networks has led to a need for communications links that consume the lowest energy per information bit under a specified range and reliability constraint. To achieve an aggressive metric like 1 nJ/bit over tens of meters with a 10<sup>-5</sup> BER, we need to carefully pick physical layer (PHY) parameters like modulation, coding, signaling rate etc. Communications theory arguments that optimize transmit power or spectral efficiency typically drive such PHY choices. However, in systems like low-range, low data-rate, pulse-based ultrawideband (UWB) operating in the unlicensed bands, these are irrelevant metrics. The FCC Tx power limit for such systems is about 40  $\mu$ W/500 MHz and spectral efficiencies are on the order of 1/1000. For such systems, total energy is determined by the complexity of the receiver, a quantity that classical information theory disregards. Furthermore, since receive energy is proportional to receiver turn-on

time, Rx rate (the number of information bits per every bit the Rx looks at) is a more relevant metric than the classical definition of rate (the number of information bits per every bit the transmitter sends). Hence, the objective of our work is to redefine classical communication theoretic problems in terms of receive rates rather than transmit rates. We believe that doing so will lead to fundamental lower bounds on the energy efficiency of a class of receivers.

Thus far our work has revealed that simple repetition codes allow Rx rates to be dropped by 2-3x with only an insignificant increase in Tx rates. We are now working to completely characterize the class of linear block codes that would minimize receive rates. Subsequently, we hope to characterize both systematic and non-systematic convolutional codes.

# A Robust Digital Baseband for UWB Communications

R. Blázquez, A.P. Chandrakasan Sponsorship: NSF

The FCC approved the use of ultra-wideband signals for communication purposes in February 2002 in the band from 3.1 GHz to 10.6 GHz, effectively opening 7.5 GHz of free unlicensed bandwidth. The purpose of this baseband is to achieve 100 Mbps at 10 m in a robust pulsed UWB receiver using 500 MHz of this bandwidth while providing the programmability to expose useful functionality knobs.

Due to the bandwidth of UWB signals, the multipath becomes very relevant as the data rate is increased into the range of the hundreds of megabits per second. The current multipath model, used for the development of IEEE standard 802.15.3a is a modified Saleh-Valenzuela model that has a root mean square duration of the impulse response from 5 to 25 ns [1]. This constraint implies that the receiver must be designed to cope with this multipath in the form of an important inter symbol interference (ISI) that

requires, for a pulsed signal, an MMSE demodulator [2]. A digital back-end for UWB applications has been designed that estimates the channel impulse response and uses this information to compensate for it with a RAKE receiver and an MMSE demodulator. Its block diagram is shown in Figure 1. It is also fully programmable, allowing reduction of its complexity and power dissipation whenever the channel quality is high or to provide a robust demodulation if the channel quality is low. Figure 2 shows the trade-off between the complexity of the RAKE receiver and the SNR loss.

The signal processing algorithms for this digital back-end have been tested in a discrete prototype designed also in the Microsystems Technology Laboratory. We acknowledge National Semiconductor for providing the IC fabrication services.



Figure 1: Block diagram of the UWB baseband.



▲ Figure 2: Losses in the modified RAKE receiver as a function of the normalized threshold and the channel model.

#### REFERENCES

- [1] J. Foerster, "Channel modeling sub-committee report final," IEEE P802.15 Working Group for Wireless Personal Area Networks (WPANs). Feb., 2002.
- [2] G.D. Forney, "Maximum-likelihood sequence estimation of digital sequences in the presence of intersymbol interference," IEEE Trans. on Information Theory, vol. 18, pp. 363-378, May 1972.

# CAD for Tile-based 3-D Field Programmable Gate Arrays

V. Chandrasekhar, A.P. Chandrakasan, D. Troxel Sponsorship: DARPA

The performance of integrated circuits is limited mainly by the growing interconnect delay as a result of increasing circuit complexity. Three-dimensional integration helps in reducing the interconnect delay by bringing the circuit blocks physically closer to each other. By using the appropriate routing architecture, the relatively small distance between two adjacent vertical layers can be exploited to significantly reduce the interconnect delay of the circuit (see Figure 1) [1]. This work analyzes the benefits of 3-D integration in terms of performance and power consumption in field programmable gate arrays. Instead of partitioning a circuit into layers, the entire circuit is mapped onto the asymmetric 3-D grid of the FPGA. The VPR CAD tool [2] for 2-D FPGAs is modified to route circuits on a 3-D FPGA architecture. The circuit blocks are placed using a simulated annealing algorithm and the pathfinder algorithm is used for routing the nets in the placed circuit.

We are also exploring the use of asymmetric switch matrix architectures (Figure 2) for the 3-D FPGA instead of extending the standard 2-D switch architectures such as the disjoint switch or the Woolton switch matrix. Since the vertical interconnect channel is much shorter, the router tends to route more nets through the vertical channels. However, such a routing increases the capacitance seen by the drivers of the vertical channels inside the switch matrix, which increases the interconnect delay. Since the simulated annealing algorithm places the circuit blocks close together, congestion is more common in the central parts of the FPGA layers. By allowing more routing flexibility in these parts of the FPGA and using sparse routing resources elsewhere, we can significantly reduce the critical path delay of the circuit without losing routability. Moreover, the power consumption can be significantly reduced by using a lower supply voltage to achieve the same operating frequency as a 2-D FPGA.



#### REFERENCES

- Y.S. Kwon, P. Lajevardi, A.P. Chandrakasan, F. Honoré, and D.E. Troxel, "A 3-D FPGA wire resource-prediction model validated using a 3-D placement and routing tool," Proc. of the 2005 Int'l. Workshop on System level Interconnect Prediction, Apr. 2005, pp. 65-72.
- [2] V. Betz and J. Rose, "VPR: A new packing, placement and routing tool for FPGA research," In'It. Workshop on Field-Programmable Logic and Applications, pp. 213-222, Aug. 1997.
- [3] P. Lajevardi, "Design of a 3-dimension FPGA," Master's thesis, Massachusetts Institute of Technology, Cambridge, 2005.

# A Power-efficient Multi-local PLL Clock Distribution

F. Chen, V. Stojanović, A.P. Chandrakasan Sponsorship: MARCO IFC

In recent generations of high-performance microprocessors, the use of phase-locked loops (PLL) for clock distribution has become more common. As microprocessor operating frequencies increase, the relative impact of clock jitter on the performance of the clock distribution network also increases. Because PLLs have the property of attenuating some of the jitter presented at their input, there have been several investigations [1, 2] into the use of multiple PLLs to distribute the clock. However, the conclusions of these investigations have indicated that the marginal cost of power and complexity for additional PLLs has been too great to warrant more than a single PLL. In this work, we propose a scalable methodology that utilizes multiple PLLs to reduce the picoseconds of jitter per milliwatt of power in the clocking network. A block diagram of the distribution methodology for a single branch is shown in Figure 1. Distributing the global clock at a lower frequency and locally multiplying up the delivered clock can reduce the cost associated with power, shown in Figure 2, while regulating the buffers local to the PLL can offset jitter accumulation in the PLL. By inserting the PLL deeper into the clock distribution network, fewer unregulated repeaters are needed following the PLL resulting in a net reduction in clock jitter.



▲ Figure 1: The diagram shows the effective change for a given branch of the clock network for the proposed distribution method.



▲ Figure 2: Plot of the incremental change in power for each PLL inserted versus the number of PLLs inserted into the network.

#### REFERENCES

- [1] M. Saint-Laurent et al., "Optimal clock distribution with an array of phase-locked loops for multiprocessor chips," *Circuits and Systems, 2001. MWSCAS 2001*, vol.1, pp.454-457, 2001.
- [2] V. Gutnik and A.P. Chandrakasan, "Active GHz clock network using distributed PLLs," IEEE J. Solid-State Circuits, vol. 35, issue 11, pp. 1553-1560, Nov., 2000.

# **Carbon Nanotube - CMOS Chemical Sensor Integration**

T.S. Cho, A.P. Chandrakasan Sponsorship: MARCO IFC, Samsung Lee Kun Hee Scholarship Foundation

In this research, we propose an energy-efficient architecture to interface carbon nanotube (CNT) chemical sensors, and the development of signal processing algorithm to reliably infer the chemical concentration in air based on the sensor read-out results. The CNT changes its conductance when exposed to certain chemicals [1] (Figure 1), and thus we can effectively utilize CNTs as resistive chemical sensors. The room-temperature operation of the chemical-sensing mechanism makes CNT an appealing candidate for lowpower chemical sensor application.

However, poor control over the CNT process, the resolution requirements in conductance measurements, and the changes in conductance due to specific chemicals in air require that the front-end circuitry has a dynamic range of more than 18 bits. While such accuracy is power-consuming to attain [2], the reduction in power-supply voltage further aggravates the dynamic-range limitations in analog circuits. In order to surmount such problems, we are developing a new architecture suitable for this application. The stochastic nature of CNT chemical sensors calls for multiple deployments of CNT sensors in one sensor node. This constraint, in turn, requires an efficient algorithm to infer the concentration of the chemical we are interested in. Thus, this research will also delve into developing an energyefficient algorithm that can be operated in real time.

This project is currently carried out in collaboration with Kyeongjae Lee and Professor Jing Kong from the department of Electrical Engineering and Computer Science at MIT to design an integrated gas sensor.



▲ Figure 1: Conductance change in response to gas detection. Courtesy: J. Kong et al, Science [1].

#### REFERENCES

- J. Kong, N.R. Franklin, C. Zhou, M.G. Chapline, S. Peng, K. Cho, and H. Dai, "Nanotube molecular wires as chemical sensors," Science vol. 287, Jan. 2000.
- [2] M. Grassi, P. Malcovati, and A. Baschirotto, "A 0.1% Accuracy 100 Ohm-20 Mega Ohm dynamic range integrated gas sensor interface circuit with 13+4 bit digital output," Proc. of ESSCIRC, Grenoble, France, 2005.

# An Energy Efficient Transceiver for Wireless Micro-Sensor Applications

D.C. Daly, A.P. Chandrakasan Sponsorship: DARPA Power Aware Computing/Communication Program

Large-scale wireless sensor networks require a low-power, energy-efficient transceiver that can operate for years on a single battery. To meet this demand for microwatt average power consumption, the transceiver must be scalable, support duty cycling, and be energy-efficient when "on." A key metric for measuring the energy efficiency of the transceiver is the energy per bit ratio, which is equal to the power consumption of the transmitter or receiver divided by the instantaneous data rate. We have fabricated a custom radio for wireless micro-sensor applications that achieves energy-per-bit ratios down to 0.5 nJ/bit in receive mode and 3.8 nJ/bit in transmit mode [1]. These low ratios, combined with a fast receiver startup time of  $2.5 \,\mu$ s, allow for energy-efficient operation.

The transceiver operates at 1 Mbps in a single channel centered at 916.5 MHz and employs on-off keying (OOK) modulation. A non-coherent, envelope-detection receiver architecture removes the need for a local oscillator and allows for a fast receiver startup time. Figure 1 shows a block

diagram of the transceiver. The RF front end supports several gain settings, so that the power consumption of the receiver can be reduced in the presence of large input signals. The transmitter supports 7 digitally controlled output power levels to enable power-scaling based on node proximity. The receiver power consumption scales from 0.5 mW to 2.6 mW, with an associated sensitivity ranging from -37 dBm to -65 dBm at a bit error rate of 10<sup>-3</sup>. The transmitter supports output power levels from -11.4 dBm to -2.2 dBm.

The chip was fabricated using  $0.18 \,\mu m$  CMOS technology; a chip micrograph is shown in Figure 2. We acknowledge National Semiconductor for providing the IC fabrication services and NSERC for funding. Denis Daly is partially supported by an NSERC fellowship.





 $\blacktriangle$  Figure 2: Chip micrograph of the transceiver in 0.18  $\mu m$  CMOS.

#### REFERENCES

 D.C. Daly and A.P. Chandrakasan, "An energy-efficient OOK transceiver for wireless sensor networks," IEEE Radio Frequency Integrated Circuits Symposium, June 11-13, 2006.

# **Prediction of Variation in Advanced Process Technology Nodes**

N. Drego, A.P. Chandrakasan, D. Boning Sponsorship: MARCO C2S2

As Moore's Law forces the semiconductor industry into the sub-50-nm regime, process variability is proportionally becoming larger. Design cycles must simultaneously accommodate this increase in process variability and are thus being extended in order to ensure product robustness in the face of such manufacturing uncertainty. To facilitate the designer's need to accurately model and simulate circuits in the face of variation, we seek to provide predictive statistical models for advanced technology nodes and/or novel transistor architectures. Coupled with predictive technology models (PTM) [1], these statistical models will allow designers to simulate designs in a robust manner during, or even prior to, the development phase of a new process technology.

As a basis for providing such models, we are developing simple digital circuits that ease the measurement and extraction of parameters. An example of such a circuit appears in Figure 1. This circuit employs a delay-based measurement to measure the current drive of each of the transistors highlighted in red. If the transistor is biased in the sub-threshold regime and the DIBL coefficient of the process is small enough, we can use the delay measurement as a proxy for threshold-voltage  $(V_T)$  variation (i.e., variations in time to charge the capacitor

and disable the counter are dominated by  $V_T$  variation as Figure 2 shows). However, if the DIBL coefficient is large enough,  $\Delta V_T$  is no longer as dominant a source of current variation due to the increasing effect of channel-length variation ( $\Delta$ L). As a result, the circuit no longer functions as a proxy for  $V_T$  variation. Nevertheless, the same circuit can be used to determine  $I_{ON}/I_{OFF}$  and the sub-threshold slope of a given transistor. In advanced process technologies, a primary factor in determining the viability of the process will be the performance of the process with regard to shortchannel effects (SCE), among which the  $I_{ON}/I_{OFF}$  ratio and subthreshold slope are extremely significant. Furthermore, the SCE performance of novel transistor architectures such as the FinFET depends heavily on new critical dimensions such as body (fin) thickness [2]. The ability to efficiently measure variability due to such critical dimensions will enable quick determination of process feasibility. Future work includes fabrication of the aforementioned circuit on both mature as well as novel processes such as a FinFET process. Additionally, we would like to identify other circuits capable of such variation measurement to enable us to build complete statistical models.







9

#### REFERENCES

- Y. Cao, T. Sato, D. Sylvester, M. Orshansky, and C. Hu, "New paradigm of predictive MOSFET and interconnect modeling for early circuit design," in *Proc. of CICC*, pp. 201-204, 2000.
- S. Xiong and J. Bokor, "Sensitivity of double-gate and FinFET devices to process variations," IEEE Tran. on Electron Devices, pp. 2255-2261, Nov. 2003.

# Deep Sub-micron CMOS Analog-to-Digital Conversion for Ultra-wideband Radio

B.P. Ginsburg, A.P. Chandrakasan Sponsorship: NDSEG Fellowship, DARPA

Ultra-wideband radio can be used for very high data rate (≥480 Mb/s) communication over short distances. For proper reception, the receiver requires a 500-MS/s analogto-digital converter (ADC) with 4 bits of resolution. While flash is the typical architecture chosen, time-interleaved successive approximation register (SAR) ADCs that operate at these specifications with very low power have recently been demonstrated [1-3]. As feature sizes in CMOS technology continue to scale, new challenges arise for the design of analog and mixed-signal circuits, including reduced voltage supplies, increased variability, and lower transistor output impedances. The SAR architecture is well suited to meet these challenges. It uses only open loop amplification in a comparator, as opposed to the operational amplifier for the pipelined architecture. There is significant digital complexity on the critical path in a SAR converter, but the

reduced feature sizes directly improve the digital logic's performance.

A prototype 500-MS/s, 5-b, 6-way time-interleaved SAR ADC [4] has been designed and fabricated in Texas Instruments' 65-nm CMOS process. The prototype includes the first implementation of the split capacitor array [5], seen in Figure 1. This array conserves charge between bit-cycles to lower the overall switching energy, and it settles faster because fewer capacitors switch during each period. The prototype also includes a variable delay line to optimize the instant of latch strobing and lengthen the maximal settling time available for the preamplifiers. The ADC achieves Nyquist performance and consumes 6 mW from a 1.2 V supply. Its die photo is shown in Figure 2.



▲ Figure 1: Block diagram of the split capacitor array. The MSB capacitor has been split into an identical copy of the rest of the array, which improves both switching energy and speed.





#### REFERENCES

- D. Draxelmayr, "A 6b 600MHz, 10mW ADC array in digital 90nm CMOS," in Proc. IEEE Int'l. Solid-States Circuits Conf., San Francisco, CA, Feb. 2004, pp. 264-265.
- [2] B.P. Ginsburg and A.P. Chandrakasan, "Dual scalable 500MS/s, 5b time-interleaved SAR ADCs for UWB applications," in Proc. IEEE Custom Integrated Circuits Conf., San Jose, CA, Sep. 2005, pp. 403-406.
- [3] S.-W.M. Chen and R.W. Brodersen, "A 6b 600MS/s 5.3mW asynchronous ADC in 0.13µm CMOS," in Proc. IEEE Int'.I. Solid-State Circuits Conf., San Francisco, CA, Feb. 2006, pp. 574-575.
- [4] B.P. Ginsburg and A.P. Chandrakasan, "A 500MS/s 5b ADC in 65nm CMOS," in IEEE Symposium on VLSI Circuits Digest of Tech. Papers, June 2006, pp. 174-175.
- [5] B.P. Ginsburg and A.P. Chandrakasan, "An energy-efficient charge recycling approach for a SAR converter with capacitive DAC," IEEE Int'l. Symposium on Circuits and Systems," Kobe, Japan, May 2005, pp. 184-187.

# **Fine-grain Power Control for Field Programmable Gate Arrays**

F. Honoré, A.P. Chandrakasan, D. Troxel Sponsorship: MARCO IFC

Implementation flexibility through hardware reconfiguration has become an important factor in the design of digital systems. Field programmable gate arrays (FPGAs) are extending their application area from system prototyping to custom application implementation but they are much slower and less power-efficient than ASIC systems. We have developed a power- and performance-scalable multi- $V_{\rm DD}$  FPGA. The interconnect overhead for FPGAs is a large fraction of the power and delay, due to the use of programmable switch elements. Fine-grain voltage domains allow low-energy operation in non-critical areas of logic and routing segments.

We modified a public domain FPGA place-and-route tool to handle the assignment of the voltage domains for noncritical paths. Thus, by selecting either a low or high voltage for each domain, this method achieves an average of 2ximprovement in power for the same performance, as shown in Figure 1. The high  $V_{DD}$  is kept at 1.8 V and the low  $V_{DD}$  can vary depending on the application. Low-overhead level converters provide voltage conversion between domains when necessary. The area overhead for the power switches and level converters is less than 10%. With these fine-grain controls, the software is able to reduce dynamic power while maintaining performance.

We have fabricated and tested a 3mm x 3mm chip (Figure 2) using a semi-custom ASIC flow to validate the approach and have developed custom CAD tools to automate the implementation of some of these techniques. The test chip contains 64 tiles of logic. Testing confirmed functionality at a range of voltages from 1.8 V down to 550 mV.

The chip was fabricated using  $0.18 \ \mu m$  CMOS technology. We acknowledge National Semiconductor for providing IC fabrication services.



 $\blacktriangle$  Figure 1: Benchmark results showing an average improvement of 52% at a V\_{DDH} of 1.8 V and V\_{DDL} of 0.9 V.





## A Micropower DSP Architecture for Self-powered Microsensor Applications

N. Ickes, D. Finchelstein, A.P. Chandrakasan Sponsorship: DARPA Power Aware Computing/Communication Program

Distributed microsensor networks consist of hundreds or thousands of miniature sensor nodes. Each node individually monitors the environment and collects data as directed by the user, and the network collaborates as a whole to deliver high-quality observations to a central base station. The large number of nodes in a microsensor network enables high-resolution, multi-dimensional observations and faulttolerance that are superior to more traditional sensing systems. However, the small size and highly distributed arrangement of the individual sensor nodes make aggressive power management a necessity.

The aim of our project is to build a highly integrated yet versatile sensor system with a strong focus on energy efficiency and agility. Tracking the optimal operating point in the dynamic environments typical for sensor networks requires hardware that can vary clock rates, power supply voltages, and other circuit parameters in real time. We have developed an architecture consisting of a micropower DSP surrounded by dedicated accelerator blocks for frequently used, complex functions—such as Fourier transforms and FIR filtering. This architecture of highly optimized, ondemand hardware support for energy intensive tasks allows for ultra-low-power data manipulation and lowers the processing burden on the general-purpose DSP core.

An initial implementation of the architecture was fabricated in 0.18  $\mu$ m CMOS by National Semiconductor. This implementation included a custom 16-bit DSP core, an FFT accelerator, and interfaces to custom ADC and radio chips, as illustrated in Figure 1. The fabricated chip operates at supply voltages as low as 0.5 V, consuming only 110  $\mu$ W. A die photo of the chip is shown in Figure 2. A complete, miniature microsensor node was built around this chip, incorporating custom ADC and radio chips developed by other students in our group. A second-generation architecture is currently being fabricated in 90 nm CMOS by ST Microelectronics. This version features a streamlined datapath and extensive power- and clock-gating for further power reduction.



Figure 1: First-generation DSP architecture.



Figure 2: Die photo of first-generation DSP chip.

## CMOS Circuit Techniques for Data-rate Enhancement in VCSEL-based Chip-to-chip Optical Links

A.M. Kern, A.P. Chandrakasan Sponsorship: MARCO IFC, Intel, NSF

Optical chip-to-chip signaling promises to replace electrical serial links when data-rate requirements increase beyond the bandwidth limits of conventional copper traces. Recent advances in VCSEL technology have dramatically increased the performance of short-distance optical links and VCSELs nominally suitable for 10 Gb/s links are now commercially available. However, substantially higher data rates may be achieved by designing optical transceiver architectures and circuits to compensate for VCSEL bandwidth limitations.

The designed 90 nm CMOS VCSEL driver uses preemphasis to enable 20 Gb/s data rates with a 10 Gb/s VCSEL. The driver uses dual-edge pre-emphasis of the modulation current to compensate for the bandwidth limitation introduced by the significant parasitic capacitance of the VCSEL. The VCSEL is modulated with the summed output of two current-mode drivers, where the output of the second driver is delayed, inverted, and attenuated with respect to the first in order to introduce current emphasis at data transitions. The pre-emphasis portion of the output current is primarily absorbed by the parasitic capacitor, thereby reducing the charging time and opening the eye of the VCSEL junction current.

The driver design includes digital controls to allow for post-fabrication programming of the pre-emphasis pulse duration, the modulation current, and the pre-emphasis current. The modulation and pre-emphasis current are controlled independently by two 5-bit DACs and the width of the pre-emphasis pulse is controlled by a 4-tap digital delay line. This programmability improves robustness to variations and allows characterization of the effect of varying the height and width of the pre-emphasis pulse. The VCSEL driver simulation results, based on extracted layout, demonstrate 20 Gb/s operation without the use of inductors. Preliminary testing of the fabricated chip is ongoing at the time of publication.

# A Sub-threshold Cell Library and Methodology

J. Kwong, A.P. Chandrakasan Sponsorship: Texas Instruments, DARPA

In this work, we develop a sub-threshold library and design methodology that addresses the unique challenges and tradeoffs in ultra-low voltage operation. Drive currents become comparable in magnitude to idle leakage currents, causing reduced output swings and possible functional errors. Due to the exponential dependence of sub-threshold currents on threshold voltage, sub-threshold circuits are particularly sensitive to environmental and process variations. Figure 1a compares the delay distributions of an 8-bit adder in the sub-threshold and above-threshold regimes under transistor threshold voltage variation. Figure 1b performs the same comparison for energy. Circuit performance exhibits much larger variability in the sub-threshold regime, which can be mitigated through device sizing and choice of logic styles. The sub-threshold library employs a device sizing methodology with functionality as the primary consideration while implementing appropriate trade-offs among energy, delay, and variability. Different logic styles are also evaluated during cell design. When device sizes are adjusted for equal functional yield, a transmission gate full adder [1] offers an energy benefit because it avoids the large device stacks in static CMOS logic. Particular attention is given to robustness of memory storage elements, where idle leakage can significantly degrade data retention capabilities. This library serves as a platform for further exploration of errorresilient architectures in the sub-threshold regime.



▲ Figure 1a: Delay distribution of 8-bit adder at subthreshold (top) and nominal (bottom) power supplies, under threshold voltage variation. Arrow points to outliers.

Energy Distribution of 8-bit Adder



▲ Figure 1b: Energy distribution of 8-bit adder at sub-threshold (top) and nominal (bottom) power supplies, under threshold voltage variation. Arrow points to outliers.

#### REFERENCES

[1] J.M. Rabaey, A.P. Chandrakasan, and B. Nikolic, *Digital Integrated Circuits: A Design Perspective*, 2<sup>nd</sup> ed. Prentice Hall, 2003.

# Design of a High-speed, High-resolution DAC in 2-D and 3-D Processes

P. Lajevardi, A.P. Chandrakasan Sponsorship: DARPA

A 2-D 16-b 100-MS/s digital to analog converter (DAC) is being designed for multi-carrier communication applications. In this type of DACs, the important challenge is to minimize the harmonic distortion (HD) and Intermodulation distortion (IMD) and to maximize spurious free dynamic range (SFDR). Some important factors that affect HD, SFDR, and output noise of a DAC include glitch on the clock and control signals, jitter on the control signals, mismatch, settling error, clock feed-through (CFT), and the circuit noise.

Analog and digital techniques such as dynamic element matching (DEM) and calibration have been explored to optimize the performance of DACs[1]. A 3-D version of a DAC is expected to show additional improvement in DAC performance. Since timing mismatch between the control signals is one of the main sources of dynamic distortion in DACs, 3-D fabrication improves the harmonic distortion by providing more connectivity for active cells and allowing lower distance between cells. In addition, calibration blocks can be placed on the upper die above each analog cell to provide more feedback. Substrate isolation between analog and digital circuits also improves the coupling noise.

Current steering DACs have shown a good performance for high-speed, high-resolution applications. The DAC architecture is partitioned in 3 active layers. The partitioning is being optimized for minimizing SFDR which is the main performance metric in high-speed, high-resolution DACs.



Figure 1: Block diagram of a current steering DAC.

15

#### REFERENCES

 K.L. Chan, and I. Galton, "A 14b 100MS/s DAC with fully segmented dynamic element matching," ISSCC Dig. Tech. Papers, pp.258-259, Feb., 2006.0

# An Energy-efficient Ultra-wideband Radio Receiver

F.S. Lee, A.P. Chandrakasan Sponsorship: NSF

The development of energy-efficient short-range (30 cm to 10 m) wireless radio transceivers has become an active area of research with the growth of high data-rate wireless battery-operated appliances, as well as with the onset of low data-rate ad-hoc wireless networks [1]. The primary figure of merit for energy efficient radios is energy/bit. Within reasonable energy costs, it is also desirable to keep the bit-error-rate (BER) less than the worst-case value of 10<sup>-3</sup>. This research explores the utilization of ultra-wideband (UWB) signals to achieve improvements in wireless receiver energy/ bit consumption by an order of magnitude or more at low data-rates.

Figure 1 plots energy/bit of recent receivers against datarate. As the data-rate increases, the energy/bit decreases as the fixed-costs for radio energy consumption in the oscillator and bias currents for the analog circuits are amortized over more bits per unit time. Therefore, to reduce energy/ bit at low data-rates it is necessary to build circuits that can be duty-cycled with low settling time and to choose a signaling architecture that does not require power-hungry high frequency oscillators and clock buffers. Because UWB signals are essentially short impulses in time, they inherently lend themselves to receiver architectures that can be deeply duty-cycled.

This work focuses on developing an extremely low energy receiver operating at low data-rates using UWB signaling. Figure 2 shows a block diagram of the receiver architecture. Proper choices of the signaling technique, adjustable filters to address in-band interferers and channel selectivity, amplifier and mixer circuit innovations, a novel low-power digitally-configurable analog processor (D-CAP) for mixed-signal correlation, and duty-cycling of circuits all work in tandem to realize a sub-nJ/bit, 10 kbps – 1 Mbps receiver in a 0.5 V, 90 nm CMOS process.





#### REFERENCES

- [1] A.P. Chandrakasan, R. Min, M. Bhardwaj, S.-H. Cho, and A. Wang, "Power aware wireless microsensor systems," in *IEEE Proceedings of the European Solid-State Circuits Conference*, 2002.
- [2] D. Daly, "An energy efficient rf transceiver for wireless microsensor networks," Master's thesis, Massachusetts Institute of Technology, 2005.

# **3-D SRAM Architecture and Circuits**

T. Pan, A.P. Chandrakasan Sponsorship: DARPA

Long global interconnect limits the performance of highperformance SRAM. Large leakage power dissipation is another concern in scaled SRAM design. The 3-D technology is a promising approach to reduce interconnect delay and power dissipation by replacing long global wires with short vertical interconnects. In preliminary simulations using 3-D layout, we see a 20% reduction in access time compared to a conventional 2-D implementation. The use of 3-D technology decreases the number of buffers needed to drive long global wires, further reducing propagation delay. Our architecture introduces a layer selection signal in addition to a block, row and column decoder. The layer selection can be used to reduce active and leakage power. Since transistors are closer together in 3D, blocks and columns can share part of the control circuits, such as the sense-amplifier. Figure 1 shows the block diagram of the SRAM memory.

We are developing a model that takes both horizontal and vertical parameters into account and gives the optimized memory architecture once the total size of the memory is specified. An optimum partition between layers and the block size will be generated. The optimum buffer insertion will also be determined. A SRAM layout generator based on 0.18  $\mu$ m Lincoln Labs SOI 3-D integration technology is being developed.



# An Energy-optimal Power Supply for Digital Circuits

Y.K. Ramadass, A.P. Chandrakasan Sponsorship: DARPA, Texas Instruments

Substantial savings in the energy consumed by a digital circuit can be obtained by operating the circuit below the threshold voltage of its devices. The variation of the energy consumed per operation with the operating voltage for a FFT circuit is shown in Figure 1. This curve is dynamic in nature and changes with temperature, workload of the circuit, nature of operations performed by the circuit and data handled. The optimum energy point shifts widely as the curve changes, which necessitates a circuit to track the optimum energy point with changing conditions. The optimal supply voltage for minimum energy-operation usually falls into the sub-threshold region of operation of digital circuits (i.e., below the device thresholds). At these voltages, the circuits operate substantially more slowly. A minimum energy feedback loop would fit well with energydriven class of circuits where performance is not a key issue.

The energy minimization circuitry (Figure 2) consists of a buck converter that operates in the pulse frequency modulation (PFM) mode. The digital circuit under test, which operates at the  $V_{dd}$  set by the converter, is clocked by a critical path replica ring oscillator. The energy-sensor circuitry determines the energy consumed per operation at different operating voltages. Based on the energy per operation at a given operating voltage obtained from the energy-sensing circuitry, an energy minimization algorithm changes the reference voltage to the buck converter suitably and the system approaches the minimum energy operating voltage of the digital circuit using a slope-detection strategy. A test chip containing the minimum energy tracking loop and an embedded DC-DC converter has been fabricated in Texas Instruments' 65 nm CMOS process.



▲ Figure 1: Estimated minimum energy point for an FFT using a typical transistor in a 0.18-µm technology (from [1]).



▲ Figure 2: Block diagram of the energy minimization loop with the DC-DC converter.

#### REFERENCES

 A. Wang, A.P. Chandrakasan, "A 180-mV subthreshold FFT processor using a minimum energy design methodology," IEEE J. of Solid-State Circuits, pp. 310-319, Jan. 2005.

#### An Energy-efficient Digital Baseband Processor for Pulsed UWB Using Extreme Parallellization

V. Sze, R. Blázquez, A.P. Chandrakasan Sponsorship: DARPA

The use of ultra-wideband (UWB) as a medium for highdata rate last meter links creates a need for integrating UWB radios into battery-operated devices such as mobile phones, handheld devices, and sensor nodes. Consequently, there is a strong demand for an energy-efficient UWB system. We propose using parallelism in the digital baseband processor to reduce the energy required to receive UWB packets.

An energy-efficient baseband can be achieved by exploiting two forms of parallelism. First, the supply voltage of the digital baseband can be lowered so that the correlator operates near its minimum energy point, which occurs below the threshold voltage, placing the circuit in the sub-threshold region [1]. Figure 1 shows the energy per operation of a single correlator for various supply voltages. The correlator and the rest of the baseband must then be parallelized to maintain a throughput of 500 MSPS at this reduced voltage. While sub-threshold operation is traditionally used for lowenergy, low-frequency applications such as wrist-watches, this work examines how sub-threshold operation can be applied to low-energy, high-performance applications.

Second, the correlators can be further parallelized for a significant reduction in the synchronization time. The reduced synchronization time allows the baseband and the rest of the receiver to be turned off earlier, resulting in a system-wide reduction in energy. The architecture shown in Figure 2 will be implemented in STMicroelectronics 90 nm process. The baseband processor will be designed to deliver a maximum of 100 Mbps.



Figure 1: Simulated energy plot for correlators [2].



▲ Figure 2: System level diagram of UWB digital baseband. The N and M are parameters for the different forms of parallelism.

#### REFERENCES

- B.H. Calhoun and A.P. Chandrakasan, "Characterizing and modeling minimum energy operation for sub-threshold circuits," Int'l. Symposium on Low Power Electronics and Design, 2004.
- [2] V. Sze, R. Blázquez, M. Bhardwaj, and A.P. Chandrakasan, "An energy-efficient sub-threshold baseband processor architecture for pulsed ultra-wideband communications," Int'l. Conf. on Acoustics, Speech, and Signal Processing, 2006.

# An Ultra Low-power ADC for Wireless Micro Sensor Applications

N. Verma, A.P. Chandrakasan Sponsorship: DARPA Power Aware Computing/Communication Program

Autonomous micro-sensor nodes rely on low-power circuits to enable energy harvesting as a means of sustaining longterm, maintenance free operation. This work presents the design of an ultra low-power analog-to-digital converter (ADC) whose sampling rate and resolution can be scaled to dynamically recover power savings [1].

The ADC has a sampling rate of 0-100 kS/s and a resolution of either 12 or 8 bits. The design is based on the successive approximation register architecture (SAR), which is shown in Figure 1. Several techniques improve the efficiency of the ADC: analog offset calibration in the latch improves the comparator power-delay product; weak-inversion operation increases preamplifier  $g_m/I$ ; robust self-timing eases settling requirements; sub-DAC gain adjustment compensates non-linearities from top-plate parasitics; and switched-capacitor

auto-zero reference generation maximizes common-mode rejection.

The ADC was fabricated in a 0.18  $\mu$ m, 5M2P CMOS process. In 12b mode, the measured SNDR, with a 48 kHz input tone, is 65 dB (10.55 ENOB), and the SFDR is 71 dB. The total power consumption of the ADC is 25  $\mu$ W at 100kS/s and decreases linearly towards zero as the sampling rate is reduced. This corresponds to a figure-of-merit (P/(2F<sub>IN</sub>2<sup>ENOB</sup>)) of 165 fJ/conv.Step which, as shown in Figure 2, is best reported among medium to high resolution ADCs.

We acknowledge National Semiconductor for providing the IC fabrication services.





▲ Figure 2: Figure-of-merit comparison with state of the art ADCs.

20

#### REFERENCES

 N. Verma and A.P. Chandrakasan, "A 25µW 100kS/s ADC for wireless micro-sensor applications," ISSCC Dig. Tech. Papers, pp. 222-223, Feb. 2006.

# An All-digital, Pulsed-UWB Transmitter in 90-nm CMOS

D.D. Wentzloff, A.P. Chandrakasan Sponsorship: NSF

A common metric for comparing the performance of energy-constrained wireless radios is energy consumed per bit transmitted. As the maximum data rate is reduced in a typical wireless link, the energy/bit increases due to the increased on-time of the analog electronics. For low data-rate applications such as RFID tags and wireless sensor nodes, the energy/bit of the wireless link is optimized by employing a very high data-rate radio with a low duty-cycle. This radio can be undesirable from a system perspective when considering network latency and baseband processing. Furthermore, finite startup time of the analog electronics limits the minimum energy/bit that can be obtained. Conversely, pulsed ultra-wideband (UWB) radios can exploit the inherent duty-cycled nature of their signaling to overcome the date rate/on-time tradeoff that leads to increased energy/bit in other radios [1]. The pulsed-signaling also makes UWB transmitters well-suited for an all-digital implementation, resulting in energy/bit proportional to  $CV_{dd}^{-2}$ , which scales with process technology. The proposed transmitter will simultaneously achieve subnJ/bit energy consumption with a data rate variable from 1 kb/s-1 Mb/s. The data rate may be reduced with very little penalty in energy/bit by avoiding the use of any constantbiased analog circuits such as local oscillators.

The transmitter is designed to operate in a custom transceiver architecture that trades off spectral efficiency for total energy/bit. The frequency plan utilizes the 3.1-5.0-GHz UWB band, divided into three non-overlapping channels of 550 MHz each, as shown in Figure 1. Binary pulse-position modulated (PPM) square pulses are generated in the selected channel at a variable pulse-repetition frequency (PRF) of 1 kHz-1 MHz. The spectrum of PPM signals contains spectral lines that reduce the spectral efficiency [2]. Therefore PPM signals are phase scrambled in order to eliminate these lines. Conventional BPSK scrambling requires differential signaling, adding to the complexity and energy consumption of the transmitter. However, spectral lines may be sufficiently reduced by scrambling with a phase delay as shown in Figure 1. This delay can be fully synthesized and requires no analog components, keeping complexity and energy low. Figure 2 shows the transmitter architecture. Pulses are synthesized by combining a variable number of edges of a delay line clocked at the PRF. The center frequency of the pulse is selected by calibrating the delay/stage in the delay line with an off-line digital calibration loop. The digital pulse is amplified by an inverter chain with power gating for gain control and leakage reduction. The transmitter is capable of driving a 50- $\Omega$  UWB antenna with 800-mVppk while consuming <1-n]/bit.



▲ Figure 1: Spectrum of the three-channel frequency plan with the FCC indoor and outdoor spectral masks (top). Illustration of phase-delay scrambling of PPM pulses to reduce spectral lines (bottom).



21

#### REFERENCES

- T. Terada, S. Yoshizumi, M. Muqsith, Y. Sanada, and T. Kuroda, "A CMOS ultra-wideband impulse radio transceiver for 1-Mb/s data communications and ±2.5-cm range finding," *J.I of Solid-State Circuits*, vol. 41, no. 4, pp. 891-898, Apr. 2006.
- [2] Y.-P. Nakache and A.F. Molisch. "Spectral shape of UWB signals influence of modulation format, multiple access scheme and pulse shape," Vehicular Tech. Conf., vol. 4, Apr. 2003, pp. 2510-2514.

# Parameterized Model Order Reduction of Nonlinear Circuits and MEMS

B. Bond, L. Daniel Sponsorship: MARCO GSRC, NSF

The presence of several nonlinear analog circuits and microelectro-mechanical (MEM) components in modern mixedsignal system-on-chips (SoC) makes the fully automatic synthesis and optimization of such systems an extremely challenging task. Our research is the development of techniques for generating parameterized reduced-order models (PROM) of nonlinear dynamical systems. These reduced-order models could serve as a first step towards the automatic and accurate characterization of geometrically complex components and subcircuits, eventually enabling their synthesis and optimization. Our approach combines elements of an existing non-parameterized trajectory piecewise linear method [1] for nonlinear systems with an existing moment matching parameterized technique [2] for linear systems. By building on these two existing methods, we have created four different algorithms for generating PROMs for nonlinear systems. The algorithms were tested

on three different systems: a MEM switch, shown in Figure 1, and two nonlinear analog circuits. All of the examples contain distributed strong nonlinearities and possess some dependence on several geometric parameters.

The reduced-order models can be constructed to possess strong local or global accuracy in the parameter-space, depending on which algorithm is used. Figure 2 shows the output of one PROM created for the example in Figure 1 and compared to the field solver output of the full nonlinear system. In this example the system was parameterized in the width of the device and simulated at a parameter value different from the values at which the model was created. We found that in general the best algorithm is applicationspecific, but the PROMs are very accurate over a practical range of parameter values. For further details on parameterspace accuracy and cost of the algorithms, see [3].



▲ Figure 1: Application example: MEM switch realized by a polysilicon beam fixed at both ends and suspended over a semiconducting pad and substrate expansion.



<sup>▲</sup> Figure 2: Center point deflection predicted by our parameterized reduced model of order 40, compared to a finite difference detailed simulation.

#### REFERENCES

- M. Rewienski and J.K. White. "A trajectory piecewise-linear approach to model order reduction and fast simulation of nonlinear circuits and micromachined devices," in Proc. of IEEE/ACM International Conf. on Computer Aided-Design, pp. 252-257, Nov. 2001.
- [2] L.Daniel, C.S. Ong, S.C. Low, K.H. Lee, and J.K. White, "A multiparameter moment matching model reduction approach for generating geometrically parameterized interconnect performance models," *IEEE Trans. On Computer-Aided Design of Integrated Circuits and Systems*, 23(5), pp. 678-693, May 2004.
- B. Bond and L. Daniel. "Parameterized model order reduction of nonlinear dynamical systems," in Proc. of the IEEE Conf. on Computer-Aided Design, pp. 487-494, 2005.

# Development of Specialized Basis Functions and Efficient Substrate Integration Techniques for Electromagnetic Analysis of Interconnect and RF Inductors

X. Hu, T.A.E. Moselhy, J. White, L. Daniel Sponsorship: SRC, MARCO GSRC, NSF

The performance of several mixed-signal and RF-analog platforms depends on substrate effects that need to be represented in the library model with critical field solver accuracy. For instance, substrate-induced currents in RF inductors can severely affect quality and hence RF filter selectivity. We have developed an efficient approach to full-wave impedance extraction that accounts for substrate effects through the use of two-layer media Green's functions in a mixed-potential-integral-equation (MPIE) solver. In particular, we have developed accelerated techniques for both volume and surface integrations in the solver.

In this work, we have also introduced a technique for the numerical generation of basis functions that are capable of parameterizing the frequency-variant nature of crosssectional conductor current distributions. Hence skin and proximity effects can be captured utilizing many fewer basis functions in comparison to the prevalently-used piecewiseconstant basis functions. One important characteristic of these basis functions is that they only need to be precomputed once for a frequency range of interest per unique conductor cross-sectional geometry, and they can be stored off-line with a minimal associated cost. In addition, the robustness of these frequency-independent basis functions is enforced using an optimization routine.

We have shown in [2] that the cost of solving a complex interconnect system using our new basis functions can be reduced by a factor of 170 when compared to the use of piecewise-constant basis functions over a wide range of operating frequencies. Furthermore our volume and surface integration routines result in additional efficient improvement by a factor of 9.8 as shown in [1]. Our solver accuracy is validated against measurements taken on fabricated devices.



▲ Figure 1: Measured and simulated Q-factors for a square RF inductor with an area of 15 mm x 15 mm and surrounded by a ground ring.



▲ Figure 2: Our basis functions avoid the expensive crosssectional discretization shown in figure necessary to account for trapezoidal cross-sections or skin and proximity effects.

# **CIRCUITS & SYSTEMS**

23

#### REFERENCES

- X. Hu, J.H. Lee, J. White, and L. Daniel, "Analysis of full-wave conductor-system-impedance over substrate using novel integration techniques," in Proc. of the IEEE/ACM Design Automation Conference, June 2005.
- [2] X. Hu, T.A.E. Moselhy, J. White, and L. Daniel, "Novel development of optimization-based, frequency-parameterizing basis functions for the efficient extraction of interconnect system impedance," submitted to the IEEE/ACM Design Automation Conference, July 2006.

#### A Quasi-convex Optimization Approach to Parameterized Model-order Reduction

K.C. Sou, L. Daniel, A. Megretski Sponsorship: MARCO GSRC, SRC, NSF

This work proposes an optimization-based model-orderreduction (MOR) framework. The method involves setting up a quasi-convex program that explicitly minimizes a relaxation of the optimal H-infinity norm MOR problem. The method generates guaranteed stable and passive reduced models and it is very flexible in imposing additional constraints. The proposed optimization approach is also extended to the parameterized model reduction problem (PMOR). The proposed method is compared to existing moment-matching and optimization-based MOR methods in several examples. For example, a 32<sup>nd</sup> order parameterized reduced model has been constructed for a 7-turn RF inductor with substrate (infinite order) and the error-of-quality factor matching was less than 5% for all design parameter values of interest.

24



▲ Figure 1: A 7-turn RF inductor for which a parameterized (with respect to wire width and wire separation) reduced model has been constructed.



Figure 2: Matching of quality factor of 7-turn RF inductor when wire width =  $16.5 \mu m$ , wire separation = 1, 5, 18, and 20  $\mu m$ . Blue dash line: Full model. Red solid line: ROM.

#### REFERENCES

[1] K. Sou, A. Megretski, and L. Daniel, "A quasi-convex optimization approach to parameterized model order reduction," *IEEE/ACM Design Automation Conference*, Anaheim, CA, 2005.

# **RF PA Linearization: Open-loop Digital Predistortion Using Cartesian Feedback for Adaptive PA Characterization**

J.W. Holloway, S. Chung, J. Huang, J.L. Dawson Sponsorship: MARCO C2S2

This work combines the advantages of two different RF power amplifier (PA) linearization techniques: digital predistortion (DPD) and Cartesian feedback (CFB). Cartesian feedback, an extension of classical continuous-time feedback, is limited by the bandwidth of its loop transfer function; this bandwidth, in turn, puts an upper limit on the bandwidth of the data input. However, this limitation gives one the ability to continuously linearize the PA without extensive knowledge of the PA characteristics [1].

Digital predistortion is an inherently open-loop technique and thus does not suffer from bandwidth limitations. This technique requires detailed modeling or characterization of the PA to produce the new, distorted baseband symbols [2]. One can use CFB to characterize the PA over the input symbol constellation, creating a digital lookup table (Figure 1) to be used for open-loop DPD [3]. Behavioral simulations have shown substantial improvement in PA output spectrum (Figure 2) and ACPR. These advantages can be had for little increase in power or die area. In addition, techniques to speed training time are being investigated (i.e., describing the tradeoff between accuracy in the lookup table and the speed at which the table is produced). Moreover, the DPD scheme used is much less computationally intensive than most adaptive digital predistortion schemes in the literature [2].



▲ Figure 1: A schematic representation of the CFBcreated lookup table, showing a simple IQ constellation distortion due to a nonlinear PA.



▲ Figure 2: Results from a behavioral simulation of the CFBtrained DPD system showing improved linearity.

#### REFERENCES

- J.L. Dawson and T.H. Lee, "Automatic phase alignment for a fully integrated Cartesian Feedback power amplifier system," IEEE J. of Solid-State Circuits, vol. 38, no. 12, pp. 2269-2279, Dec. 2003.
- [2] K. Muhonen, M. Kavehrad, and R. Krishnamoorthy, "Look-up table techniques for adaptive digital predistortion: A development and comparison," IEEE Transactions on Vehicular Technology, vol. 49, no. 5, pp. 1995-2001, Sept. 2000.
- [3] J.L. Dawson, "Feedback linearization of RF power amplifiers," Ph.D. Dissertation, Stanford University, Stanford, 2003.

# **Convex Optimization of Integrated Systems Using Geometric Programming**

T. Khanna, R. Sredojevic, J.L. Dawson, V. Stojanović Sponsorship: MIT Lincoln Laboratory Advanced Concepts Committee

In system design, allocation of circuit resources, like power and noise budgets, is a problem with an often unclear solution and it results in long negotiations between both circuit and system designers. It is difficult to know the optimal distribution or even the feasible set of distributions of resources. This uncertainty results in an iterative approach with frequent re-design of circuit blocks for different distribution schemes. Insight into the trade-offs among resources within each circuit block can aid in finding optimal distribution and eliminate the need for re-design, ultimately speeding up the design cycle.

Thus far, work done in analog circuit optimization has applied convex optimization techniques, specifically geometric programming (GP), in order to formulate and solve for optimality. Geometric programming is convenient because there is a specific formulation that can efficiently be solved [1]. We plan to follow the style of past circuit optimization attempts [2-4] but reformulate them in our more general hierarchical approach to optimize a fully integrated system.

With a hierarchical optimization, GP is used to formulate and optimize each circuit block in a given system. We stress that formulation is not trivial and requires circuit design experience for correctness. From this optimization, we create trade-off curves describing the performance specifications. In other words, the trade-off curves are a continuous representation of the design space for each block. The generated trade-off curves are then related in the system formulation to produce optimal performance criteria foroptimal system design. Figures 1 and 2 show this optimization flow can be seen. We also anticipate that a hierarchical approach will allow for an interchange of block topologies, which has not been allowed in the past.



▲ Figure 1: Using a given topology, circuit performance trade-off curves are generated.



▲ Figure 2: System optimization for performance specifications. Circuit blocks are abstracted into trade-off curves, and performance specifications are optimized.

#### REFERENCES

- [1] S. Boyd and L. Vandenberghe, Convex Optimization, New York: Cambridge University Press, 2004.
- [2] M.M. Hershenson, "Efficient description of the design space of analog circuits," Design Automation Conf., 2003.
- [3] M.M. Hershenson, "Design of pipeline analog-to-digital converters via geometric programming," ICCAD, 2002.
- [4] Y. Xu, L. Pileggi, and S. Boyd, "ORACLE: Optimization with recourse of analog circuits including layout extraction," Design Automation Conf., June 2004.

# High-Resolution, Pipelined Analog-to-Digital Conversion Using Comparator-Based Switched-Capacitor Techniques

L. Brooks, H.-S. Lee Sponsorship: NSDEG Fellowship, CICS

Recently, a comparator-based switched-capacitor circuit (CBSC) design methodology was proposed [1]. CBSC replaces op-amps with comparators and current sources and offers many advantages. This research seeks to realize several of these advantages with the application to high resolution pipelined analog-to-digital converters (ADC). The specific goal is to design and fabricate a 12-bit, 1-GHz, 100-mW ADC. This type of ADC has applications such as software radio, general test equipment, wide bandwidth modems, smart radios for wireless communications, advanced radar systems, multi-beam adaptive digital beam-forming array transceivers, and anti-jam GPS receivers.

Theoretically, CBSC circuits offer more than an order of magnitude improvement in figure of merit (FOM) over traditional op-amp based pipelined ADCs. This means, for example, that for the same speed and resolution, a CBSC ADC can operate with more than an order of magnitude lower power consumption. In switched-capacitor circuits a charge transfer phase must be realized. In an op-amp based implementation, the op-amp drives or forces the exact charge transfer via a high-gain, high-bandwidth feedback loop. In CBSC, a current source provides the charge and a comparator detects the time when the transfer is complete and turns off the current source. So where op-amps drive the charge transfer, CBSC searches for the correct charge transfer by sweeping the output over the voltage range and shutting off the current when the correct charge transfer is found. A two phase search can help to maximize the FOM. The first phase is a coarse, fast search, and the second phase is a fine, slower search over a much smaller range. The FOM advantages of CBSC come from reduced bandwidth requirements, reduced device count, reduced complexity, increased voltage range, and increased power efficient biasing.

This project is to develop and optimize innovative circuits and architectures to achieve an aggressive design goal. The work focuses on the design of into two prototype chips. First, we are fabricating a single-ended 10-bit CBSC ADC with a single-phase (coarse phase) search only. This single-phase design embodies novel techniques and requires no static current. For this reason, this implementation overcomes the FOM shortfalls of the single phase search. The goal of this chip was to focus on speed at a lower resolution. The second prototype will use the first prototype as the back-end but will add fully-differential front-end stages with an additional a fine search phase that improves the resolution to 12 bits. In addition, several channels will be time-interleaved to achieve the desired speed.

27

#### REFERENCES

 T. Sepke, J.K. Fiorenza, C.G. Sodini, P. Holloway and H.-S. Lee, "Comparator-based switched-capacitor circuits for scaled CMOS technologies," IEEE Int'l Solid-State Circuits Conf. Dig. of Tech. Papers, ISSCC, 2006, pp. 220-221.

# High Speed Time-Interleaved Comparator-Based Switch Capacitor ADC

A. Chow, H.-S. Lee Sponsorship: MARCO C2S2

With an increasing need for higher data rates, both wireless applications and data links are demanding higher speed analog-to-digital converters (ADC) with medium resolution. In particular, this work will investigate ADCs with sampling rate up to 10 Gs/s with 6-8 bits of resolution. Time interleaved converters achieve their high sampling rate by placing several converters in parallel. Each individual converter, or channel, has a delayed sampling clock and operates at reduced sampling rate. The reduced sampling rate of each channel allows transistors to be biased in a more power-efficient region, thus saving on the overall use of power. Therefore each channel is responsible for digitizing a different slice. This method requires that the individual converters, which make up the parallel combination, be matched. Mismatches in non-idealities, such as gain error, timing error, and voltage offset, greatly degrade the performance of such systems. Therefore channel matching is an important design consideration for time-interleaved ADCs.

Although digital calibration can mitigate many of these non-idealities, timing mismatches is a non-linear error, which is more difficult to remove. At sampling rates up to 10 Gs/s, such complicated digital calibration would consume large amounts of power. An alternative solution uses a global switch running at the full speed of the converter to determine the sampling instance. This technique works well for medium high-speed ADC's [1-2]. At higher speeds the ability to turn the switch on and off at the full sampling rate becomes a major challenge. We will investigate whether the global clocking can function satisfactorily at a 10 Gs/s sampling rate in scaled technologies.

Power optimization is a major design consideration when implementing a time-interleaved ADC. If the power efficiency of the individual channels can be optimized, then the power dissipation of the entire system can be optimized. We are exploring ways to lower power consumption in high-speed ADCs with the adaptation of innovative circuit topologies. In particular, we will further investigate the use of the Comparator-Based Switch Capacitor (CBSC) for high-speed applications. This work is investigating a fast, single-slope architecture (Figure 1). The faster each channel can operate, the lower the number of channels and hence the lower the power in clock and buffer circuits. The primary emphasis is the development of highly powered efficient single-slope CBSC architecture. Since the single slope architecture is more sensitive to non-idealities such as ramp nonlinearity, we are carefully studying the sources of non-idealities and developing clever techniques to address the accuracy issues.



Figure 1: One stage of a single slope CBSC based pipelined ADC.

#### REFERENCES

- M. Gustavsson, "A global passive sampling technique for high-speed switched-capacitor time-interleaved ADCs," IEEE Transactions on Circuits and Systems II, pp.821-831, 2000.
- [2] S. Gupta, et al., "A 1Gs/s 11b time-interleaved ADC in 0.13-µm CMOS," Dig. ISSCC 2006, pp.576-577, 2006.

# Low-Voltage Comparator-Based Switched-Capacitor Sigma-Delta ADC

M. Guyton, H.-S. Lee Sponsorship: CICS

Many analog signal processing circuits use operational amplifiers (op-amps) in a negative feedback topology. Error in these feedback systems is inversely proportional to the gain of the op-amp. Because scaled CMOS technologies use smaller channel lengths and require lower power supply voltages, it becomes more difficult to implement high gain op-amps. Recently, a comparator-based switched-capacitor (CBSC) technique was proposed [1] that uses a comparator rather than an op-amp to implement switched-capacitor topologies. One of the biggest challenges of low voltage circuits is the transmission gates that must pass the signal. If the signal is near the middle of the power supply range, neither the NMOS nor the PMOS transistor has sufficient gate drive to pass the signal properly. The switched-op-amp technique [2] was proposed to mitigate this problem. In this technique, the output of the op-amp is directly connected to the next sampling capacitor without a transmission gate. During the charge transfer phase, the op-amp is switched off, and the output is grounded.

Much like the standard switched-capacitor technique, CBSC circuits use two-phase clocking, having both sampling and evaluation clock phases. Unlike a standard switched-capacitor circuit, in a CBSC circuit all current sources connected to the output node are off at the end of the evaluation phase. Thus, the CBSC technique is inherently better suited to low-voltage applications than switched-op-amp circuit topologies. Although the previous CBSC implementation was a single-ended version, many high-resolution systems require fully differential implementation for better power supply and substrate noise rejection properties. Since the CBSC is a new technique without an op-amp, existing fully differential circuitry cannot be applied. In this program, we are developing fully-differential CBSC topologies for applications in high resolution data conversion. Figure 1 shows a fully-differential low-voltage CBSC integrator stage using the combined techniques. We recently designed a fourth-order sigma-delta ADC for operation at 1-V power supply using this integrator stage.



▲ Figure 1: Fully-differential comparator-based switched-capacitor integrator. The input of the next integrator stage is also shown. Common-mode feedback circuits are not shown.

#### REFERENCES

- T. Sepke, J.K. Fiorenza, C.G. Sodini, P. Holloway, and H.S. Lee, "Comparator-based switched-capacitor circuits for scaled CMOS technologies," in IEEE Int'l. Solid-State Circuits Conf. Digest of Tech.I Papers, San Francisco, CA, Feb. 2006, p. 220.
- [2] J. Crols and M. Steyaert, "Switched-op-amp: An approach to realize full CMOS switched-capacitor circuits at very low power supply voltages," IEEE J. of Solid-State Circuits, vol. 29, no. 8, pp. 936-942, Aug. 1994.

# **Noise Analysis of Threshold Detection Comparators**

T. Sepke, J.K. Fiorenza, P. Holloway, C.G. Sodini, H.-S. Lee Sponsorship: MARCO C2S2, CICS

Recently, a comparator-based switched-capacitor circuit (CBSC) design methodology was proposed [1]. A fundamental limitation to the accuracy of CBSC systems is the noise of the threshold-detection comparator. Unlike traditional clocked comparators that compare voltages at a specific point in time, the virtual ground threshold-detection comparators must detect the time a voltage ramp crosses the virtual ground condition and open the output sampling switch. Threshold-detection comparators are usually thought of as a wide-bandwidth, high-gain amplifier, possibly implemented as a cascade of low-gain amplifiers. The first stage of the cascaded amplifier typically dominates the input-referred noise power spectral density. Due to the rather large noise bandwidth of the cascaded amplifiers, the input referred noise of such a comparator can be quite large.



▲ Figure 1: Threshold detection comparator with ideal band-limiting preamplifier to lower the input-referred noise of the threshold detection comparator.

One possible method for lowering the input-referred noise of the comparator is to add a preamplifier as shown in Figure 1. The noise of the comparator is improved if the preamplifier has a lower input-referred noise than the threshold-detection comparator alone and if the preamplifier has enough gain to dominate the input-referred noise of the comparator.

In linear small-signal amplifiers, the frequency of the transfer function poles determines speed. However, in preamplifiers for threshold-detection comparators, the time it takes the output to reach a threshold voltage determines speed because the preamplifier does not require small-signal steady-state conditions. If the band-limiting preamplifier output is always clamped to the same voltage for the same load capacitance and transconductance, the preamplifier with the highest gain is the fastest to a given output threshold [2]. Intuitively, a band-limiting stage should be lower noise than a broadband stage, but care must be taken in applying knowledge of small-signal amplifier noise behavior to systems that do not necessarily reach steady state. A non-stationary noise analysis for the preamplifier shows that the noise bandwidth of an ideal band-limiting preamplifier is inversely proportional to twice the time it takes the preamplifier to transition from its clamped state to the comparator threshold (Figure 2).

To measure and verify the noise analysis and modeling of the threshold detection comparator, the prototype CBSC pipeline ADC in [1] is being used. Because the converter was implemented as a cascade of identical stages, the total input-referred noise of the ADC less the kT/C noise of the input sampler is proportional to the comparator noise. Therefore, a spectral analysis of the converter output codes is a measure of the comparator noise.



▲ Figure 2: Threshold-detection comparator timing showing preamplifier noise integration time ti that determines the preamplifier noise bandwidth. Total comparator delay is  $t_{d}$ .

#### REFERENCES

- T. Sepke, J.K. Fiorenza, C.G. Sodini, P. Holloway, and H.-S. Lee, "Comparator-based switched-capacitor circuits for scaled CMOS technologies," IEEE Int'l. Solid-State Circuits Conf. Digest of Technical Papers, San Francisco, CA, Feb. 2006, pp. 220-221.
- [2] B.J. McCarroll, C.G. Sodini, and H.-S. Lee, "A high-speed CMOS comparator for use in an ADC," IEEE J. of Solid-State Circuits, vol. 23, no. 1, pp. 159-165, Feb. 1988.

# Massively Parallel ADC with Self-Calibration

M. Spaeth, H.-S. Lee Sponsorship: CICS

In this program we are developing an analog-to-digital converter (ADC), which can quantize a wideband 150-MHz signal at 600 mega-samples per second, with signalto-noise ratio and linearity in excess of 75 dB (12 bits). Use of a massively parallel, time-interleaved architecture, with 128 active ADC channels, reduces the requisite speed for each channel, and enables the devices to be biased in the sub-threshold region for an extremely low-power (<50mW, core) solution. In a parallel time-interleaved system, any mismatches between channels result in undesired spurious tones. Most existing time-interleaved ADCs employ either a low degree of parallelism, such that the tones appear outside the signal band, or are low enough in resolution that the tones are below the quantization noise floor. In this design, however, all inherent gain, offset, and timing skew mismatches must be calibrated away to achieve the stated high-performance goals.

The 128 14-bit pipeline ADCs are arranged into 16 blocks of 8 channels each, as shown in Figure 1. The hierarchal organization of the design allows individual blocks to be pulled out for background calibration, while the remaining blocks continue to quantize the input signal. Due to the

large number of channels to be calibrated, the calibration algorithm must be simple, but effective. The sub-radix-2 calibration algorithm [1] is very effective in removing offset and linearity errors but poses a challenge due to the complexity when applied to the massively parallel converter. We have modified the algorithm to allow nominal radix-2 operations to be employed, for similar calibration efficiency with reduced complexity. Also, we are exploring several innovative techniques to calculate and remove systematic timing skew between channels. An additional channel is included in the design to act as a timing reference for some of the timing skew measurement algorithms. A novel token-passing control scheme is used to generate local clock phases for the individual blocks and channels, minimizing the number of clock lines that must be routed across the chip.

The design was fabricated in a 0.18-µm digital CMOS process by National Semiconductor and is currently under test.

A micrograph of the finished chip is shown in Figure 2.



Figure 1: Top-level block diagram of the IMPACT ADC architecture.



Figure 2: Micrograph of the fabricated chip.

31

#### REFERENCES

[1] A.N. Karanicolas, H.-S. Lee, and K. Bacrania, "A 15b 1 Ms/s digitally self-calibrated pipeline ADC," IEEE J. of Solid-State Circuits, vol. 28, no. 12, pp. 1207-1215, Dec. 1993.

# **Intelligent Night-Vision Human Detection System**

Y. Fang, I. Masaki, B.K.P. Horn Sponsorship: Intelligent Transportation Research Center

Our objective is to apply machine vision techniques to develop a new generation of night-vision systems with intelligent human detection and identification functions. Currently, more and more infrared-based night-vision systems are mounted on the vehicles to enhance drivers' visual ability. Such mounting does allow drivers to see better, but it also introduces new safety concerns. A driver needs to switch their attention between the windshield and a separate infrared-display screen. Specifically for senior drivers, it is still difficult to identify any abnormal scenario or potential danger in its early stage. For safety purposes, an intelligent human detection and identification system based on infrared-video sequences is expected to automatically track pedestrians' location and to detect any potential dangers based on the targets' action in a monitored environment.

Compared with conventional shape-based pedestrian detection, our new "shape-independent" detection methods

include the following two innovations. First, we propose an original "horizontal-first, vertical-second" segmentation scheme that divides infrared images into several vertical image stripes and then searches for pedestrians only within these image stripes. Second, we have defined unique new shape-independent multi-dimensional classification features. We demonstrated both the similarities of these features among pedestrian image regions with different poses and the differences of these features between pedestrian and non-pedestrian regions of interest (ROI). Our preliminary test results (as shown in Figure 1, Figure 2) based on limited sample images were very encouraging in terms of reliability and accuracy when our algorithms are applied to detect pedestrians with arbitrary poses. Our overall goal is to design systems for future transportation systems to make driving safer and less stressful for all travelers regardless of age and ability.



#### REFERENCES

- Y. Fang, K. Yamada, Y. Ninomiya, B. Horn, and I. Masaki, "Comparison between infrared-image-based and visible-image-based approaches for pedestrian detection," Proc. of the IEEE Intelligent Vehicles Symposium, 2003, pp. 505–510.
- [2] Y. Fang, K. Yamada, Y. Ninomiya, B. Horn, and I. Masaki, "A shape-independent-method for pedestrian detection with far infrared-images." Special issue on "In-Vehicle Computer Vision Systems." IEEE Transactions on Vehicular Technology, vol. 53, no.6, pp.1679-1697, Nov. 2004.

# **Vision-Based System for Occupancy and Posture Analysis**

M. Farrell, I. Masaki, B.K.P. Horn Sponsorship: Intelligent Transportation Research Center

Over the past few years, advances in computer vision have given hope for robust systems for safety applications in cars. In particular, we seek to develop a way to deploy a passenger-side airbag that is aware of the occupant in the passenger seat. The use of such devices may help avoid extensive injury due to airbag deployment in multiple classes of passengers; babies, children, adults.

Our approach is to use a combination of two computer vision methods: invisible structured lighting and correlation-based stereo matching. The environment of an automobile poses challenges for these methods. Computer vision relies heavily on intensity values to function properly, and the interior environment of a car has many different lighting conditions. To get around this caveat the monochrome cameras cut light below 850 nm. This filtering on the cameras reduces our consideration of illumination values to near infrared and produces a monochrome image of the target. Structured lighting improves texture on the passenger seat and further constrains the depth of the target [1]. By projecting a sine-wave grating onto the scene with stripes of random width we are ensured greater accuracy. Then, using correlation-based window-matching, as well as brightness values in each image we avoid "phase ambiguity" when determining which pixel belongs to a specific stripe in each image. The method just described eliminates this problem or poor depth resolution when using unstructured lighting.

Stereo matching uses a correlation window that is applied to the "reference" image and scans the other image in the stereo pair for a matching brightness [3]. The correlation window achieves its best results when it is larger than the largest stripe in the image. This constraint on window size avoids the problem of areas in the disparity map where we can lose depth information due to a window that is too small. For a good treatment of a correlation window based matching see [3].



▲ Figure 1: Example of a source target used to obtain depth maps, as in Figure 2. Image was taken with near -infrared sensitivity and no visible light cuts.



▲ Figure 2: Example of disparity map obtained from two-frame stereo matching under structured lighting. The correlation window size is 32 x 32 pixels with maximum disparity of 60 pixels. Brighter colors imply a short distance to the camera and darker colors imply the opposite. Values are in pixels and indicate disparity.

#### REFERENCES

- [1] D. Scharstein and R. Szeliski. "High-accuracy stereo depth maps using structured light," IEEE Proc. of CVPR, 2003, pp.195-202.
- [2] Y. Zhang, S.J. Kiselewich, and W. Bauson. "A monocular vision-based occupant classification approach for smart airbag deployment," IEEE Intelligent Vehicles Conf. Proc., 2005, pp. 632—637.
- [3] T. Kanade and M. Okutomi. "A stereo-matching algorithm with an adaptive window: Theory and experiment," *IEEE Transactions on Pattern Analysis and Machine Intelligence*, pp. 920-932,1994.

# **Techniques for Low-jitter Clock Multiplication**

B. Helal, M.H. Perrott Sponsorship: MARCO C2S2 (partial)

High-frequency clocks are essential to high-speed digital and wireless applications. The performance of such clocks is measured by the amount of jitter, or phase noise, their outputs exhibit. Phase-locked loops (PLLs) are typically used to generate high-frequency clocks. However, a major disadvantage of PLLs is the accumulation of jitter within their voltage controlled oscillators (VCOs) [1]. Multiplying delay-locked loops (MDLLs) have been developed in recent years to drastically reduce the problem of jitter accumulation in PLLs [2].

Jitter accumulation is reduced in an MDLL by resetting the circulating edge in its ring oscillator using a clean edge from the reference signal. The Select-logic circuitry commands the multiplexer, using the Edge\_select signal, to pass the reference edge instead of the output edge at the proper time, as shown in Figure 1.

The major drawback of a typical MDLL is that it suffers from static delay offset, which causes its output to exhibit deterministic jitter. Static delay offset is caused mostly by phase offset in the phase detector and by various device mismatches. Figure 2 illustrates the problem of static delay offset in a locked MDLL, showing a deterministic jitter of  $\Delta$ seconds peak-to-peak.

The goal of this research is to develop a technique that detects and cancels static delay offset in MDLLs, thereby allowing their use in applications that require low-jitter, high-frequency clocks. Behavioral simulations were used to validate the feasibility of the technique and a test chip implementing the proposed approach will be fabricated using National Semiconductors' 0.18-µm process.

We acknowledge National Semiconductor for providing the fabrication services.



Figure 1: MDLL block diagram.



A Figure 2: Timing diagram illustrating the problem of static delay offset. Transition time, D seconds, is less than ideal, causing the transition of Out, after the edge reset, to be longer by  $\Delta$  seconds.

#### REFERENCES

- B. Kim, T. Weigandt, and P. Gray, "PLL/DLL system noise analysis for low jitter clock synthesizer design," in Proc. Int'l. Symp. Circuits and Systems, vol. 4, 1994, pp. 31–38.
- [2] R. Farjad-Rad et al., "A low-power multiplying DLL for low-jitter multigigahertz clock generation in highly integrated digital chips," IEEE J. Solid-State Circuits, vol. 37, pp. 1804–1812, Dec. 2002.

## Advanced Delay-locked Loop Architecture for Chip-to-Chip Communication

C.-M. Hsu, M.H. Perrott Sponsorship: MARCO C2S2

A challenging component in high-speed data links is the clock and data recovery circuit (CDR). Two primary functions of a CDR are to extract the clock corresponding to the input data and then to resample the input data. The conventional technique uses a phase-locked loop (PLL) to tune the frequency and phase of a voltage-controlled oscillator (VCO) to match that of the input data. In some applications, such as chip-to-chip communication, a reference clock that is perfectly matched in frequency to the signal sequence is available. However, the clock and data signals are often mismatched in phase due to different propagation delays on the PC board. In such cases, using a delay-locked loop (DLL), as shown in Figure 1, instead of

a PLL allows for much simpler design, since only a phase adjustment is necessary [1].

The aim of this research is to develop advanced DLL architectures for chip-to-chip communication. In order to provide a fine-resolution and wide-range delay, a digital adjustable delay element consisting of a sigma-delta fractional-N frequency synthesizer is proposed, as shown in Figure 2 [2]. This new architecture also provides low-sensitivity to process, temperature, and voltage variations compared to conventional techniques using analog adjustable delay elements, as shown in Figure 1. In addition, a new sigma-delta modulator architecture is proposed to provide a compact design with reasonable power dissipation.



Figure 1: DLL-based data recovery circuit with an analog adjustable delay element.



#### REFERENCES

- T.H. Lee and J.F. Bulzacchelli, "A 155-MHz clock recovery delay- and phase-locked loop," IEEE J. Solid-State Circuits, vol. 27, no. 12, pp. 1736-1746, Dec. 1992.
- [2] M.H. Perrott, M.D. Trott, and C.G. Sodini, "A modeling approach for sigma-delta fractional-N frequency synthesizers allowing straightforward noise analysis," IEEE J. Solid State Circuits, vol. 37, pp. 1028-1038, Aug. 2002.

# **Digital Techniques for the Linearization of RF Transmitters**

K. Johnson, M.H. Perrott

Linear and power-efficient transmitters improve mobile communications systems. In a mobile device the power consumption of the transmitter directly affects the battery life of the device. Hence, it is desirable to operate the transmitter at the highest efficiency possible. However, nonlinearity causing distortion and spectral re-growth increases at higher drive levels. Specifications limiting distortion and spectral re-growth force the transmitter to operate at a back-off from the optimal efficiency.

Improving the linearity of the transmitter reduces the required back-off and increases the overall efficiency. Linearization techniques exist for a variety of transmit architectures: IQ modulation, linear amplification using non-linear components (LINC), and envelope elimination and restoration. They all require an additional high linearity analog down-conversion path, increasing the complexity of the transmitter. The more sophisticated algorithms require extensive DSP, limiting the application to more expensive solutions.

We propose a highly digital, algorithmically simple linearization technique suitable for mobile devices with limited DSP capability. The highly digital down-conversion architecture reduces the complexity and chip area cost of the down-conversion path. In combination a simple DSP algorithm takes advantage of a priori architectural information to estimate the transmitter transfer function.

# Techniques for Highly Digital Implementation of Clock and Data Recovery Circuits

C. Lau, M.H. Perrott Sponsorship: MARCO C2S2

Clock and data recovery (CDR) is a critical function in high-speed digital communication systems. Data received in these systems are asynchronous and noisy, so they must be properly recovered. The CDR circuits must also satisfy stringent specifications defined by communication standards such as the SONET specification. Other desirable performance metrics, such as fast acquisition time, must also be considered.

A conventional CDR, as shown in Figure 1, employs a phaselocked loop with analog components including a phase detector, charge pump, loop filter, and a voltage-controlled oscillator (VCO). Although this analog implementation works well in most current applications, we have already started to see its limitations, as with the scaling of CMOS fabrication technology. For example, this analog system relies on low-leakage capacitors to hold values when the phased-locked loop is locked. The input of the VCO must be held stable in order to minimize frequency drift and jitter in the recovered clock. However, the leakage problem is becoming more significant as CMOS technology process continues to scale.

In view of this problem, we propose a highly digital CDR circuit that leverages digital circuits to achieve high performance; specifically, the circuit achieves fast acquisition and low-jitter performance. As shown in Figure 2, we use a bang-bang phase detector to generate error pulses of fixed width, which are then directly treated as digital signal in the subsequent digital blocks in the major loop. In this way, we can preserve the digital nature in the control path to the VCO, thus alleviating the need for high-performance, low-leakage analog components. We also utilize a simple analog feedback loop to linearize the nonlinear dynamics of the bang-bang phase detector. Simulation results show that the achievable recovered clock jitter is around 2ps RMS and verify that this architecture meets the OC-48 SONET specification. This design is being designed in the 0.18-µm CMOS process.



**CIRCUITS & SYSTEMS** 

# Digital Implementation and Calibration Technique for High-speed Continuous-time Sigma-Delta A/D Converters

M. Park, M.H. Perrott Sponsorship: MARCO C2S2, Applied Materials

A/D converters are essential building blocks for many applications. Mobile communication devices require low cost, low power, and high performance A/D converters. A sigma-delta A/D converter is often chosen for wireless applications because high resolution and wide bandwidth are achievable by increasing the oversampling ratio and designing the appropriate loop filter. High oversampling ratio is relatively easy to achieve through state-of-the-art digital technology. However, implementing a low power discretetime loop filter is challenging as the sampling-frequency increases. Therefore, a continuous-time sigma-delta A/D converter is better for a mobile application than a discretetime counterpart because of its low power consumption.

Device mismatch is a serious issue for a continuoustime loop filter, however. Since the mismatch of passive and active elements directly degrades the performance, calibration or compensation is necessary to implement a high resolution and wide bandwidth A/D converter. In this work, we propose implementing an automatic calibration and compensation technique for a continuous-time loop filter. The proposed architecture is shown in Figure 1.

The core technique is an algorithm that estimates the values of individual components of the loop filter. The spectrum of the output digital signal from the sigma-delta converter contains the quantization noise that is shaped by a noise transfer function, which can be estimated by system identification techniques. A DSP building block is designed to evaluate the parameters of passive and active elements from the estimated noise transfer function. Then, a feedback loop calibrates the passive and active elements. The adaptive digital filter is also employed to deal with non-ideality, which cannot be calibrated due to the limitation of the technology such as finite rising or falling time of signal.



Figure 1: Continuous-time sigma-delta A/D converter using digital calibration and compensation.

# **High-performance Time-to-Digital Conversion and Applications**

M. Straayer, M.H. Perrott Sponsorship: Lincoln Laboratory

Time-to-digital converter (TDC) structures are used to quantify time information of a signal event with respect to a reference event. Traditionally, TDCs have found application in experimental physics and laser range-finding. More recently, fully integrated TDCs have attracted significant commercial interest as a core building block for a variety of clocking and phase-locked loop systems and applications [1]. The basic operation of a TDC is shown in Figure 1.

When a TDC is used in closed-loop feedback such as a phase-locked loop (PLL), the resolution of the TDC can limit the noise performance of the system. A simple TDC

implementation with digital circuits has a resolution limited to a single inverter delay, shown in Figure 2(a). More intricate and involved circuit techniques such as Vernier delay lines [2], illustrated in Figure 2(b), achieve more precision, but at the expense of design complexity. This research aims to improve the overall resolution of a TDC with simple and elegant circuit techniques.

The authors wish to acknowledge MIT Lincoln Laboratory for research support through the Lincoln Scholars Program.







▲ Figure 2: Common time-to-digital converter implementations.

#### REFERENCES

- [1] R.B. Staszewski et al., "All-digital PLL and transmitter for mobile phones," IEEE J. of Solid-State Circuits, vol. 40 no. 12, pp. 2469-2482, Dec. 2005.
- [2] P. Dudek, S. Szczepanski, and J.V. Hatfield, "A high-resolution CMOS time-to-digital converter utilizing a Vernier delay line," IEEE J.I of Solid-State Circuits, vol. 35, no. 2, pp. 240-247, Feb. 2000.

## Fast Cochlear Amplification with Slow Outer Hair Cells

T.K. Lu, S. Zhak, P. Dallos, R. Sarpeshkar Sponsorship: NSF, David and Lucille Packard Foundation, ONR, Howard Hughes Medical Institute

In mammalian cochleas, outer hair cells (OHCs) produce mechanical amplification over the entire audiofrequency range (up to 100 kHz). Under the "somatic electromotility" theory, mechano-electrical transduction modulates the OHC transmembrane potential, driving an OHC mechanical response that generates cycle-by-cycle mechanical amplification. Yet, though the OHC motor responds up to at least 70 kHz, the OHC membrane RC time constant (in vitro upper limit  $\sim 1000$  Hz) reduces the potential driving the motor at high frequencies. Thus, the mechanism for high-frequency amplification with slow OHCs has been a two-decade-long mystery. Previous models that fit experimental data incorporated slow OHCs but did not explain how the OHC time constant limitation was overcome. Our key contribution is showing that negative feedback due to organ-of-Corti functional anatomy with adequate OHC gain significantly extends closed-loop system

bandwidth and increases resonant gain [1]. Figure 1 shows our macromechanical model of the cochlea. The OHCs implement negative feedback by exerting a corrective force on the reticular lamina (designated "rl" in Figure 1) that opposes changes to the system output caused by changes in the input stimuli. Our model produces realistic results (Figure 2) and demonstrates that the OHC gain-bandwidth product, not just bandwidth, determines whether highfrequency amplification is possible. Due to the cochlea's collective traveling-wave architecture, the gain of a single OHC needs not be great. The OHC piezoelectricity increases the effectiveness of negative feedback but is not essential for amplification. Thus, emergent closed-loop network dynamics differ significantly from open-loop component dynamics, a generally important principle in complex biological systems.



▲ Figure 1: Macromechanical model of the cochlea composed of local micromechanical sections coupled by fluid. The local micromechanical section is enclosed in the dotted box and is repeated consecutively to simulate the traveling-wave response of the cochlea to input stimuli ( $l_{in}$ ). Standard electrical representation of acoustic analogs was used for simulations. The dependent voltage source is the OHC-force generator with an RC time constant.



▲ Figure 2: Results from simulation of the macromechanical model of the cochlea composed of local micromechanical sections coupled by fluid. (a) Basilar-membrane (BM)-to-stapes-velocity ratio from the model at section 320 (*solid line, blue*) compares favorably with experimental chinchilla data [2] (*red circles*). Volume velocity to linear velocity conversion is 62.7 dB. (b) Phase response from the model matches experimental results [2] (*red circles, brown squares*). Note that the experimental phase data shows "the full range of variation in BM phase data" [2] and is not drawn from the same animal as the magnitude data in (a).

#### REFERENCES

- [1] T.K. Lu, S. Zhak, P. Dallos, and R. Sarpeshkar, "Fast cochlear amplification with slow outer hair cells," *Hear. Res.*, vol. 214, pp. 45-67, 2006.
- [2] M.A. Ruggero, N.C. Rich, L. Robles, and B.G. Shivapuja, "Middle-ear response in the chinchilla and its relationship to mechanics at the base of the cochlea," J. Acoust. Soc. Am., vol. 87, pp. 1612-1629, 1990.

# **Circuits for an RF Cochlea**

S. Mandal, S. Zhak, R. Sarpeshkar Sponsorship: NSF, Center for Bits and Atoms

The RF cochlea uses ideas from the biological cochlea and extends them into RF for performing fast, broadband, low-power spectrum analysis [3, 4]. We have designed and built the first RF cochlea on silicon. Our inspiration, the biological cochlea, is a sophisticated signal processing system that acts as a traveling-wave spectrum analyzer. In healthy humans, it has 120 dB of input-referred dynamic range and consumes only about 14  $\mu$ W of power [1]. The cochlea spatially separates frequency components in incoming sound signals, thereby performing a frequency-to-place transformation. High (or low) frequencies excite peak responses towards the beginning (or end) of the structure.

Electrically, the cochlea can be modeled as an active, nonlinear transmission line with properties that scale exponentially with position [2]. Nonlinear behavior is important in the biological cochlea, particularly for spectral masking and gain control. We have developed a simplified cochlear model that consists of a cascade of unidirectional lowpass filters with exponentially decreasing cutoff frequencies (see Figures 1 and 2). There are several advantages of such a biologically-inspired system. Firstly, exponentially tapered traveling-wave architectures like the cochlea are more hardware-efficient than banks of bandpass filters for performing spectral analysis [1]. As a result, they are simpler and faster than conventional spectral analysis techniques with comparable resolution. Secondly, the RF cochlea has inherently higher dynamic range than audio-frequency silicon cochleas, mainly because integrated passive inductors can be used at RF. Active inductors, which produce  $Q^2$  times as much noise as passive inductors with the same quality factor Q, must be used at audio. Finally, the RF cochlea is a complex signal-processing system that uses collective computation to reduce power consumption and improve dynamic range. It allows us to explore the design, calibration, and control of large systems with many interacting components.



▲ Figure 1: Measured frequency response of five individual cochlear stages with center frequencies one octave apart. The stages were fabricated in 0.18-µm CMOS technology.



▲ Figure 2: Layout of complete RF cochlea chip, to be fabricated in 0.13-µm CMOS technology. The chip contains 46 filter stages, with center frequencies ranging from 8 GHz to 400 MHz.

#### REFERENCES

- [1] R. Sarpeshkar, R.F. Lyon, and C.A. Mead, "A low-power wide-dynamic-range analog VLSI cochlea," *Analog Integrated Circuits and Signal Processing*, vol. 16, no. 3, pp. 245-274, Aug. 1998.
- [2] R.F. Lyon and C.A. Mead, "An analog electronic cochlea," IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 36, no. 7, pp. 1119-1134, July 1988.
- [3] S. Zhak, S. Mandal, and R. Sarpeshkar, "A proposal for an RF cochlea," in *Proc. Asia-Pacific Microwave Conference*, New Delhi, India, Dec. 2004.
- [4] S. Mandal, S. Zhak, and R. Sarpeshkar, "Circuits for an RF cochlea," in Proc. IEEE Symposium on Circuits and Systems (ISCAS), Kos, Greece, May 2006.

# An Analog Storage Cell with 5 Electrons/sec Leakage

#### M. O'Halloran, R. Sarpeshkar

Sponsorship: Center for Bits and Atoms, NSF Research Grant, ONR, Catalyst Foundation, David and Lucille Packard Foundation, Swartz Foundation.

Medium-term analog storage offers a compact, accurate, and low-power method of implementing temporary local memory that can be useful in adaptive circuit applications. The performance of these cells is characterized by the sampling accuracy and voltage droop that can be achieved with a fixed level of die area and power. Typically, the droop rate is limited by the OFF state leakage of a single MOS switch. Past low-leakage switch designs have assumed that subtreshold conduction and drain-to-bulk diode leakage dominate other effects [1-2]. However, measurements of MOS leakage in a 1.5- $\mu$ m CMOS process revealed a third important mechanism that can contribute significant leakage [3]. It was demonstrated that incorporating a novel MOS switch topology into a high-accuracy switched-capacitor storage cell can minimize all of the experimentally observed leakages, achieving 10-aA average leakage in a 1.5- $\mu$ m process [3]. New experimental data from storage cells fabricated in a 0.5- $\mu$ m process (see Figures 1 & 2) exhibit 0.8-aA (5e-/sec) average leakage, a 100× reduction over the leading alternative cell in the literature [2]. This implies that with a 1-pF storage capacitor and a 3.3-V supply, this cell can store a 12-bit accurate voltage for 14.5 minutes and an 8-bit accurate voltage for 3.9 hours. The leakage reduction between the 1.5- $\mu$ m and 0.5- $\mu$ m implementations appears to be reasonable based on simple scaling arguments [4].



▲ Figure 1: Die photograph of 0.5-µm implementation  $(1.5 \text{ mm} \times 1.5 \text{ mm})$ . A differential analog storage cell, which exhibits 5 electrons/sec average leakage current at room temperature is circled in white.





#### REFERENCES

- E. Vittoz, H. Oguey, M.A. Maher, O. Nys, E. Dijkstra, and M. Chevroulet, "Analog storage of adjustable synaptic weights," in VLSI Design of Neural Networks, U. Ramacher and U. Rückert, Eds. Norwell, MA: Kluwer, pp. 47-63, 1991.
- [2] M. Ehlert and H. Klar, "A 12-bit medium-time analog storage device in a CMOS standard process," IEEE J. of Solid-State Circuits, vol. 33, no. 7, pp. 1139-1143, July 1998.
- [3] M. O'Halloran and R. Sarpeshkar, "A 10-nW 12-bit accurate analog storage cell with 10-aA leakage," IEEE J. of Solid-State Circuits, vol. 39, no. 11, pp. 1985-1996, Nov. 2004.
- [4] M. O'Halloran and R. Sarpeshkar, "An analog storage cell with 5 electrons/sec leakage," in Proc. IEEE Int'l. Symposium on Circuits and Systems, pp. 557-560, May 2006.

# A Time-based Energy-efficient Analog-to-Digital Converter

H. Yang, R. Sarpeshkar Sponsorship: Center for Bits and Atoms, NSF Research Grant, ONR

There is an increasing trend in several biomedical applications such as pulse-oximetry, ECG, PCG, EEG, neural recording, temperature sensing, and blood pressure for signals to be sensed in small portable wireless devices. Analog-to-digital converters for such applications need only modest precision ( $\leq$  8-bits) and modest speed ( $\leq$  40 kHz) but must be very energy-efficient [1-3]. Analog-to-digital converters for implanted medical devices need micropower operation to run on a small battery for decades. We present a bio-inspired analog-to-digital converter that uses successive integrateand-fire operations such as spiking neurons to perform analog-to-digital conversion on its input current. The proof-of-concept design and implementation in the 0.35 µm process demonstrated very good energy-efficient operation



▲ Figure 1: Die photograph of an 8-bit, 45-kHz A/D converter in the TSMC 0.18  $\mu$ m process consumes 960 nW of total (analog + digital) power. The effective area is 130  $\mu$ m x 160  $\mu$ m (~0.021mm2).

[4]. In a 0.18- $\mu$ m sub-threshold CMOS implementation, we were able to achieve 8 bits of DNL-limited precision and 7.4 bits of thermal-noise-limited precision at a 45-kHz sample rate with a total power consumption of 960 nW. The energy-efficiency of a data converter is derived from the figure-of-merit presented in [5]. This converter's net energy-efficiency of 0.12 pJ/quantization level appears to be the best reported so far. The converter is also very area-efficient (< 0.021mm<sup>2</sup>) and can be used in applications that need several converters in parallel. Its algorithm allows easy generalization to higher-speed applications through interleaving, to performing polynomial analog computations on its input before digitization, and to direct time-to-digital conversion of event-based cardiac or neural signals.

| Performance Metric                                     | Value                               |
|--------------------------------------------------------|-------------------------------------|
| Technology                                             | MOSIS TSMC 0.18 µm                  |
| Voltage Supply                                         |                                     |
| Analog                                                 | 1.2 Volts                           |
| Digital                                                | 0.75 Volts                          |
| Reference Current                                      | 80 nA                               |
| Input Current Range (w/ DC offset)                     | 10 nA to 320 nA                     |
| Integrating Capacitor                                  | 500 fF                              |
| T <sub>clk</sub>                                       | 1 μS                                |
| Sampling Rate                                          | 45 kHz                              |
| INL                                                    | $\leq \pm 1.0$ LSB [8 bits] typical |
| DNL                                                    | $\leq \pm 0.5$ LSB [8 bits] typical |
| SNR                                                    | 47 dB                               |
| SFDR                                                   | 51 dB                               |
| ENOB                                                   | 7.4 bits                            |
| Power Dissipation                                      |                                     |
| Analog                                                 | 360 nW                              |
| Digital                                                | 600 nW                              |
| Thermal Noise-Limited<br>Energy per Quantization Level | 0.12 pJ/State                       |
| Active Area                                            | $0.021 \text{ mm}^2$                |

Figure 2: Summary of performance specifications.

#### REFERENCES

- V. Aksenov et al., "Biomedical data acquisition systems based on sigma-delta analogue-to-digital converters," 23rd Annual Int. Conf. IEEE EMBS, vol.4, pp. 3336-3337, Oct. 2001.
- [2] S. Led, J. Fernandez, and L. Serrano, "Design of a wearable device for ECG continuous monitoring using wireless technology," 26th Annual Int. Conf. IEEE EMBS, vol. 2, pp. 3318-3321, Sept. 2004.
- [3] M.J. Moron, E. Casilari, R. Luque, and J.A. Gazquez, "A wireless monitoring system for pulse-oximetry sensors," Systems Communications, pp. 79- 84, Aug. 2005.
- [4] H. Yang and R. Sarpeshkar, "A time-based energy-efficient analog-to-digital converter," J. Solid State Circuits, vol. 40, no. 8, pp. 1590-1601, Aug. 2005.
- [5] R.H. Walden, "Analog-to-digital converter survey and analysis," IEEE J. Select. Areas Commun. vol. 17, no. 4, pp. 539-550, Apr.1999.

#### Optimization of System and Circuit Parameters in Wideband OFDM Systems

F. Edalat, C.G. Sodini Sponsorship: NSF, Texas Instruments

In the wireless giga-bit local area network (WiGLAN) research effort, the goal is to achieve Giga-bit data rates by methods fundamentally different from the proposed IEEE 802.11n, next-generation WLAN. In other words, instead of using multiple antennas as multiplexing to increase the capacity, WiGLAN uses a much wider bandwidth (150 MHz compared to 20 MHz) and adaptive modulation per bin of a multi-carrier system. However, both systems employ orthogonal frequency division multiplexing (OFDM) to combat inter-symbol interference from multipath fading of the indoor channel and to eliminate equalization. We have simulated the WiGLAN system using CppSim [1]. The wideband characteristic of WiGLAN, while enabling high throughput, imposes several challenges. The system simulation is used as one of the initial steps to identify such challenges and to examine optimum system solutions and circuit design techniques. For instance, we are investigating various adaptive modulation techniques to choose the most

appropriate one for such systems. In addition to simulation, we have implemented the transceiver's baseband processing on an FPGA (Xilinx Virtex 4) to examine the implementation practicality and performance of the adaptive modulation algorithms in the real wireless environment using our testbed WiGLAN nodes.

In such multi-carrier systems with a frequency-selective channel, higher capacity can be obtained by adapting modulation of each bin to the channel response over its band (Figure 1). The modulation per bin is selected (Figure 2) based on the estimated signal-to-noise ratio (SNR) per bin at the input of detector at the receiver and the target bit-error-rate (BER) of the overall system performance. In addition to achieving higher capacity, since adaptive modulation is based on SNR per bin independent of other bins, it can avoid interference and as a result enable coexistence of two or more wireless systems.



▲ Figure 1: Adaptive modulation per bin in WiGLAN based on the channel response over each bin. The modulation scheme is chosen from 4-, 16-, 64-, and 256-rectangular QAM modulations.



▲ Figure 2: How the Adaptive modulation algorithm dictates the modulation scheme for each bin in WiGLAN.

44

[1] M. Perrott, "CppSim behavioral simulator" [Online]. Available: http://www-mtl.mit.edu/researchgroups/perrottgroup/tools.html.

# **Comparator-based Switched-capacitor Circuits (CBSC)**

J.K. Fiorenza, T. Sepke, P. Holloway, H.-S. Lee, C.G. Sodini Sponsorship: MARCO C2S2, CICS

Two side effects of technology scaling that have a significant impact on analog circuit design are the reduced signal swing and the decrease in intrinsic device gain. Gain is important in feedback-based, analog signal processing systems because it determines the accuracy of the output value. Cascoded amplifier stages have been a popular solution to increase amplifier gain, but they further reduce the signal swings of scaled technologies. An alternative method for achieving high gain in an operational amplifier without reducing signal swing is to cascade several lower gain amplifiers. Nested-Miller compensation approaches can be used to stabilize the cascaded feedback system, but the frequency response of the closed loop system is significantly sacrificed to ensure stability. In this project [1-2], we explore a new comparator based switched capacitor (CBSC) circuit design methodology that eliminates the use of op-amps in sampled data systems

A sampled-data system typically operates in two phases, a sampling phase ( $\phi$ 1) and a charge transfer phase ( $\phi$ 2). An important property of these systems is that the output voltage needs to be accurate only at the moment the output is sampled. No constraint is placed on how the stage gets to the final output value. Feedback systems use a high-gain operational amplifier to force a virtual ground condition at the op-amp input. The top circuit in Figure 1a shows the conventional op-amp-based switched-capacitor gain stage. The circuit in Figure 1b shows the proposed CBSC approach.



▲ Figure 1: (a) Traditional op-amp-based multiply-by-two amplifier versus (b) proposed comparator-based multiply-by-two amplifier.

where a comparator and a current source have replaced the op-amp. Assuming the comparator input  $v_X$  starts below the common-mode voltage at  $v_X$ 0, the current source charges the output circuit until the comparator detects the virtual ground condition and turns the current source off. At this instant, the output is sampled on  $C_L$ . Because the CBSC design ensures the same virtual ground condition as the op-amp based design, both circuits produce the same output value at the sampling instant. This property of the CBSC technique is demonstrated by the waveforms for the two circuits shown in Figure 2.

The CBSC concept is general and can be applied to any sampled-data analog circuit. For example, the CBSC design approach can be applied to a pipelined ADC. A prototype 1.5-b/stage CBSC pipeline ADC was constructed and operates similarly to the op-amp version of the ADC. The prototype CBSC ADC was implemented in a 0.18-µm CMOS technology. The active die area of the ADC is 1.2 mm<sup>2</sup>. At a 7.9 MHz sampling frequency, the DNL is +0.33/-0.28 LSB, and the INL is +1.59/-1.13 LSB. Its ADC achieves an SFDR of 62 dB, an SNDR of 53 dB, and an ENOB of 8.7 b for input frequencies up to the Nyquist rate. The core ADC power consumption of all 10 stages of the pipeline converter is 2.5mW at a 1.8V power supply, resulting in a 0.8 pJ/b figure of merit.



#### REFERENCES

- [1] T. Sepke, J.K. Fiorenza, C.G. Sodini, P. Holloway, and H.-S. Lee, "Comparator-based switched capacitor circuits for scaled CMOS technologies," in IEEE Int'l. Solid-State Circuits Conf. Dig. of Tech. Papers, San Francisco, CA, Feb. 2006, pp. 220-221.
- [2] J.K. Fiorenza, T. Sepke, C.G. Sodini, P. Holloway, and H.-S. Lee, "Comparator-based switched capacitor circuits for scaled CMOS technologies," IEEE Journal of Solid-State Circuits, Dec. 2006, to be published.

# A Wideband $\Delta \Sigma$ Digital-RF Modulator

A. Jerng, C.G. Sodini Sponsorship: CICS

This research focuses on the implementation of a direct digital-RF transmitter for use in the wireless gigabit local area network (WiGLAN) system that is capable of providing a throughput of 1 Gb/s in the 5.15-5.35 GHz U-NII bands. This architecture takes advantage of digital process scaling trends by replacing high dynamic range analog circuits with digital circuits. In the conventional IQ transmitter depicted in Figure 1, the I and Q signal paths from the DAC to the output of the analog mixer must maintain noise and distortion to levels satisfying the required dynamic range of the system. As the baseband signal bandwidth increases, the analog reconstruction filter consumes more power for the same dynamic range. DAC accuracy becomes degraded by dynamic errors at high frequencies rather than static DC errors. Furthermore, as transistors continue to scale and supply voltages continue to decrease, it becomes more challenging to design high dynamic range analog circuits over a wide bandwidth.

Direct digital modulation of an RF carrier can eliminate the DAC, reconstruction filter, and analog mixer, resulting in power and area savings. Luschas [1] introduced the RF DAC, which combines a conventional DAC and mixer into one

stage. The RF DAC uses one of the high-frequency Nyquist images of the DAC as an RF output. We further develop this concept by modulating an RF carrier using digitally controlled RF phase shifters. In this way, the output power is concentrated at the RF carrier frequency, rather than at DC and at Nyquist image frequencies. Oversampling  $\Delta\Sigma$ concepts are applied to convert digital baseband data into a bitstream of +/-1s, corresponding to phase shifts of 0° and 180°. A 2-level RF phase selector can then be implemented using differential signaling and simple CMOS switches. By applying quadrature RF and baseband components to the phase selectors, we create a quadrature digital modulator capable of arbitrary I,O modulation, as shown in Figure 2. As the noise-shaping transfer function (NTF) of the baseband  $\Delta\Sigma$  modulators push their quantization noise outside the signal bandwidth, a bandpass filter at the output can remove the up-converted quantization noise, acting as an RF reconstruction filter.

The new transmitter architecture requires circuit design in both the digital and RF domains. The main challenges include designing a high-speed digital  $\Delta\Sigma$  modulator and realizing a high-Q on-chip passive bandpass filter.



#### REFERENCES

 S. Luschas, R. Schreier, and H.-S. Lee, "Radio frequency digital-to-analog converter," IEEE Journal of Solid-State Circuits, vol. 39, no. 9, pp. 1462-1467, Sept. 2004.

# Area- and Power-Efficient Integrated Transceivers for Gigabit Wireless LAN

L. Khuon, C.G. Sodini Sponsorship: MARCO C2S2, CICS

For a given transmit distance and data rate, diversity available from multiple antennas significantly decreases the signal-to-noise ratio (SNR) necessary for low bit error rate transmission. As a result, transceivers for multiple antenna systems with low transmit power, low receiver operating power, and smaller chip area become possible [1-2]. Using multiple antenna systems for wireless LAN increases both data rates and transmission distance. Reduced power and area consumptions for these systems motivate their use for portable applications and allow for a cost-effective on-chip implementation.

As shown in Figure 1, the proper application of this SNR gain balances the decrease in power consumption due

to a lower transmit power with the increase in power consumption due to the increase in overhead electronics. However, when overhead electronics power dominates, lowering the receivers' operating power reduces power consumption. Application of SNR gain for area-efficient circuits minimizes area consumption and also reduces power consumption to a lesser degree. An area-efficient 5.22 GHz four receivers chip, shown in Figure 2, was implemented in 0.18  $\mu$ m SiGe BiCMOS. Each receiver has a low noise amplifier, Q-enhanced image reject filter, mixer, and local oscillator amplifier and distribution circuits for bias and filter tuning. The receivers dissipate 225 mW, occupy 4 mm<sup>2</sup>, and provide 14 dB conversion gain with over 30 dB image rejection [3].







▲ Figure 2: The WiGLAN receivers. Each receiver includes a low noise amplifier, image reject filter, mixer, and local oscillator amplifier but shares the local oscillator and filter tuning signals and bias circuits.

#### REFERENCES

- L. Khuon, E. Huang, C.G. Sodini, and G. Wornell, "Integrated transceiver arrays for multiple antenna systems," in Proc. 61<sup>st</sup> IEEE Vehicular Tech. Conf., Stockholm, Sweden, May 2005, pp. 892-895.
- [2] E. Huang, L. Khuon, C.G. Sodini, and G. Wornell, "An approach for area- and power-efficient low-complexity implementation of multiple antenna transceivers," in Proc. IEEE Radio and Wireless Symposium, San Diego, CA, Jan. 2006, pp. 495-498.
- [3] L. Khuon and C.G. Sodini, "An area-efficient 5-GHz multiple receivers RFIC for MIMO WLAN applications," in *IEEE Radio Frequency Integrated Circuits Symposium*, San Francisco, CA, June 2006, pp. 107-110.

# **Optical-feedback OLED Display Using Integrated Organic Technology**

I. Nausieda, I. Kymissis, V. Bulović, A.I. Akinwande, C.G. Sodini Sponsorship: MARCO C2S2, MARCO MSD

Organic light-emitting diodes (OLEDs) are a promising technology for large, thin, flexible displays. The OLEDs are emissive, thereby removing the need for a backlight and decreasing display thickness and power dissipation. Compared to typical light-valve displays, OLEDs exhibit improved contrast ratio, faster response times, and a larger color gamut. However, OLEDs possess non-linear light output characteristics, and their response drifts over time due to operational degradation. This degradation produces pixel-to-pixel variation in output characteristics, as well as decreasing the display's overall lifetime. We propose to drive OLEDs to the desired brightness using optical feedback on the pixel level. Preliminary research [1] has shown that feedback will improve the display lifetime by six to tenfold. This project aims to build a complete system that encompasses the design and fabrication of an integrated silicon control chip and an organic pixel/imaging array, which will together form a stable, usable display.

The integrated silicon control chip is composed of multiple channels, each of which contains two main blocks: the current sensing block and the feedback compensation block (Figure 1). The former is a transimpedance amplifier that converts the organic photodetector output current to a voltage. The feedback compensation block stabilizes the loop, ensuring a desirable response time and phase margin, and is implemented using a National Semiconductor .35um CMOS process. The organic pixel/imager array consists of organic field effect transistors (OFETs) that select and control OLED pixels and photodetectors. The OFET and photoconductor electrode arrays are fabricated using a photolithographic process [2] (Figure 2). Currently, a technique to thermal ink-jet print the organic photoconductor to save photolithographic steps is being explored.

We acknowledge National Semiconductor for providing the fabrication services.



▲ Figure 1: A sample 3 x 3 portion of the pixel/imager array. The display pixels in a row are driven simultaneously in a column-parallel architecture.



▲ Figure 2: Micrograph of photolithographically fabricated OFET select transistor and photodetector.

#### REFERENCES

- E.T. Lisuwandi, "Feedback circuit for organic LED active-matrix displaydrivers," Master's Thesis, Massachusetts Institute of Technology, Cambridge, 2002.
- I. Kymissis, C.G. Sodini, A.I. Akinwande, and V. Bulović, "An organic semiconductor-based process for photodetecting applications," IEEE Int'l. Electron Devices Meeting Tech. Dig., Dec. 2004.

# A 77-GHz Receiver for Millimeter Wave Imaging

J. Powell, K.M. Nguyen, C.G. Sodini Sponsorship: NSF, Lincoln Laboratory

The area of millimeter-wave (MMW) integrated circuits has recently generated a great deal of interest in several applications including automotive radar, concealed weapons detection, and wireless communications in the 60-GHz industrial, scientific, medical (ISM) band. This is due, in part, to the rapid advancement of silicon germanium (SiGe) technology, which achieves oscillation frequencies (fT, fMAX) exceeding 200 GHz. [1] In this research, a 77-GHz receiver and transmitter will be designed for imaging applications including automotive radar and concealed weapons detection. Several key transceiver circuits have been designed and submitted for fabrication, including a 77-GHz two-stage LNA, VCO and a double-balanced mixer. These blocks were also assembled together as a first step toward implementing an RF receiver system (Figure 1). The LNA is expected to achieve less than 6 dB NF at 77-GHz, with a gain of 23 dB; the VCO is expected to achieve a tuning range of greater than 12%, spanning from 68-GHz to 77-GHz. The separate LNA and VCO blocks are depicted in Figure 2. The double-balanced mixer is expected to achieve a noise figure of approximately 13 dB at 77-GHz, with a conversion gain of approximately 9 dB. A 77-GHz class AB power amplifier is currently being designed for the 77-GHz transmitter.



▲ Figure 1: Layout photo of front end RX system composed of the 77-GHz LNA, VCO and Mixer. (Mixer courtesy of Helen Kim of Lincoln Laboratory.)



Figure 2: VCO layout (top) and LNA layout (bottom).

49

#### REFERENCES

 B.A. Floyd, S.K. Reynolds, U. Pfeiffer, and T. Beukema, "SiGe bipolar transceiver circuits pperating at 60 GHz," IEEE J. of Solid State Circuits, vol. 40, no. 1, pp. 156-167, Jan. 2005.

# **Realization of Baseband DSP Core for the Wireless Gigabit LAN**

J.K. Tan, K.M. Nguyen, C.G. Sodini Sponsorship: NSF, CICS

The wireless gigabit lan (WiGLAN) aims to achieve a high data-rate of 1 Gbps through the combination of orthogonal frequency division multiplexing (OFDM), a wide bandwidth of 128 MHz and adaptive modulation. Adaptive modulation decisions are based on the channel conditions, which stay static on the order of tens of milliseconds to a couple of seconds. Hence to demonstrate the WiGLAN concept, a baseband DSP core is implemented to adapt to the channel conditions, in real-time.

The hardware platform of choice is a field programmable gate array (FPGA). The baseband design implemented is shown in Figure 1. The DSP core is integrated with a RF front-end [1] and wireless measurements are taken. Figure 2 shows that the measured SNR/BER of each sub-carrier matches up closely to a theoretical Gaussian channel.



 $\blacktriangle$  Figure 2: Theoretical Curve for a 16-QAM Gaussian channel and the measured SNR/BER for each 16-QAM sub-channel.

#### REFERENCES

 N. Matalon, "An implementation of a 5.25-GHz transceiver for high data-rate wireless applications," Master's thesis, Massachusetts Institute of Technology, Cambridge, 2005.

#### Channel-and-Circuits-Aware, Energy-Efficient Coding for High-speed Links

N. Blitvic, M. Lee, L. Zheng, V. Stojanović Sponsorship: MARCO IFC

In order to achieve high throughput while satisfying energy and density constraints, both the data rates and the energy efficiency of high-speed chip-to-chip interconnects need to increase. In this project we aim to extend the link system design to incorporate energy-efficient channel coding techniques. Using novel energy-efficient coding techniques for non-Gaussian noise and residual interference, we will increase both the achievable data rates and the energyefficiency of links by drastically off-loading the low-BER target burden and hence decreasing the complexity of the equalization/modulation level (Figure 1). Presently, both a statistical simulator and an experimental setup are being developed with the purpose of streamlining the code design process. The statistical simulator will be the first link simulator to include channel coding and the effects of data correlation. The current focus is on the modeling of the residual inter-symbol interference (ISI), but the approach will be extended to deal with cross-talk, timing jitter, and other circuit-related effects. Our recent developments have addressed the difficulty in computing ISI probability distributions for realistic channel lengths, in presence of data correlation in the form of a single parity bit. This approach is presently being extended to linear block codes.

The resulting simulator will provide the capability to model data correlation both as a plug-in for the existing analytical statistical link simulators or as the basis of time-domain behavioral link simulation software. In order to mitigate the inadequacies of analytical system models (Figure 2), limited by system complexity and link-specific noise sources, we consider advanced statistical methods based on modifications of the standard Monte Carlo technique. The generality of the Monte Carlo technique will allow us to accurately encompass the system's complexity in a behavioral time-domain framework, without resorting to overly restrictive simplifications (like linearity) necessary in the fully analytical approach. Furthermore, the samplesize reduction techniques, such as importance sampling, coupled with conditioning through our interference calculation methods, will allow us to efficiently simulate very low target BERs not reachable by standard Monte Carlo simulation. The promise of this approach lies in the large deviation theory and the theory of asymptotically efficient estimators.



▲ Figure 1: Model of the high-speed link where the decoder/ encoder replaces the serializer/deserializer blocks. By relaxing the target BER, channel coding will have the benefit of lowering the energy associated with timing and equalization.



▲ Figure 2: Error incurred in modeling the link noise by additive white Gaussian noise (AWGN). As shown, the simplification can be adequate at low BER but becomes largely inaccurate by the time we reach the target BER range (~  $10^{-15}$ ) [1].

#### REFERENCES

 V. Stojanović and M. Horowitz, "Modeling and analysis of high-speed links," in Proc. Custom Integrated Circuits Conference, San Jose, CA, Sept. 2003, pp. 589-594.

### Efficiency of High-speed On-Chip Interconnect: Trade-off and Optimization

B. Kim, V. Stojanović Sponsorship: NEC research fund

Signaling over global on-chip wires has been an increasingly difficult problem for the last several generations of VLSI technologies. As the technology scales, global wires scale poorly, causing a large increase in module-to-module communication. Traditionally, a repeater insertion [1] is used to overcome the latency problem but the power consumption of the signaling increases due to the highspeed requirement for the repeater. To address the limited latency and energy-efficiency of the repeater chains, alternative techniques such as RF-modulation [2] and pulse width modulation [3] have been suggested.

These past studies, however, have not considered the interconnect as a part of a dense on-chip network which must be optimized for the area-normalized metric such as crosssectional throughput and power density instead of single link metric such as throughput and power consumption. Given the global constraints such as power and total die area, the designer must jointly optimize interconnect circuits and wires to find the best trade-off between energy dissipated in circuits and wires. In this project, we aim to establish a framework for analysis and comparison of various interconnect methods under a set of performance and cost metrics including bandwidth, latency, chip area, and power consumption.

Figure 1 shows power density versus data rate density of optimized repeater-inserted interconnect of predicted bulk 32-nm CMOS process model for a given target delay-tosymbol period ratio, Nd=1, 2, 4. Figure 2 shows the power density versus data rate density of optimized pulse-width modulation interconnect of the same 32-nm CMOS model. The latency of this point-to-point link is one bit time at highest data rate (equivalent to Nd=1 repeater case). The trade-off curves are calculated when all practical design parameters (such as driver size, wire width and space) are optimized to meet given performance specifications. The two figures show that the pulse width modulation is a more energy-efficient signaling method than repeater for comparable data rate density. Our analytical method also provides the information of best interconnect design for given performance specifications.



▲ Figure 1: Power density (mW/um) versus data rate density (Gbps/um) of repeater-inserted interconnect for given delay to symbol period ratios (Nd=Td/Ts=1,2,4).



▲ Figure 2: Power density (mW/um) versus data rate density (Gbps/um) of PWP with one delay to symbol period ratio (Nd ~1).

#### REFERENCES

- R. Ho, K. Mai, and M. Horowitz, "Efficient on-chip global interconnect," Dig. of Tech. Papers, IEEE Int'l. Symposium on VLSI Circuits June 2003, pp. 271-274.
- [2] R.T. Chang, C.P. Yue, and S.S. Wong, "Near speed-of-light on-chip electrical interconnect," Dig. of Tech. Papers, IEEE Int'l. Symposium on VLSI Circuits, June 2002 pp. 18-21.
- [3] D. Schinkel, E. Mensink, E. Klumpernik, E. van Tuijl, and B. Nauta, "A 3Gb/s transceiver for RC-limited on-chip interconnects," in Proc. of IEEE Int'l. Solid State Circuits Conf., Feb., 2005.