# Robust Reference Clock Generator Design for DDR Synchronous Devices <sup>§</sup>

Chua-Chin Wang<sup>†</sup>, Yih-Long Tseng, and Chi-Wen Chen

Department of Electrical Engineering National Sun Yat-Sen University Kaohsiung, Taiwan 80424 email : ccwang@ee.nsysu.edu.tw

## Abstract

The rapidly improved performance of latest CPUs introduces higher clock rates for peripheral devices. Moreover, DDR(double data rate)has been one of the most important methods to increase the throughput of a system, e.g., SDRAM. The edges of the reference clock, thus, become deadly important to these high-speed and high-clock applications. In this paper, we present a pulse generator circuit to generate pulses corresponding to the rise edge and fall edge of a given clock, respectively, without any phase shift and delay. These pulse trains can be used to synchronize the peripherals. The noise rejection is also proved when the given clock is coupled with a 10\% noise. The proposed circuit can be applied to other clock rates beyond 133 MHz as long as the sizes of the delay elements are properly tuned.

**Key words :** synchronous devices, DDR (double data rate), clock generation, phase shift, delay cancellation

†the contact author

## **1. Introduction**

The clock of the mainboards in PCs has been raised from 33 MHz to 133 MHz, while the data acquisition is also improved to the DDR scheme. The correct clock edges, thus, become very critical to this kind of high-speed and high-clock applications [2], [3]. The conventional method of converting a single clock source into two out of phase clocks is simply used one inverter, as shown in Fig. 1. However, the inverter will intrinsically introduce the parasitic delay to these generated pulses. The delay might either cause a functionally error in high-speed applications or a phase shift to certain phase sensitive peripherals [4], [5], [6], [7]. Hence, it is very crucial to generate two out-of-phase pulse trains without any delay when given only one clock. Meanwhile, the delay caused by the inverter will likely be a variable factor when the noise produced by the process or supplies. The noise rejection property is also required accordingly. We present a novel design by adding a differential amplifier in the clock edge detection circuit to cope with

<sup>\$</sup>This research was partially supported by National Science Council under grant NSC 89-2215-E- 110-014 and 89-2215-E-110-015.

the mentioned problems [1]. Then, a modified NAND circuit with a delay-adjustable feedback is presented to generate a phase-clocked pulse train. The proposed design is verified by using CADENCE and HSPICE, and the process to carry out the entire design is TSMC 1P4M 0.35  $\mu$ m CMOS technology.

# 2. Clock Generator Circuitry

To increase the data throughput, the transmission scheme has been promoted from SDR (single data rate) to DDR. This feature demands that two correctly out-of-phase pulses are generated given a single clock. The generated clock pulses can be used to synchronize internal pipelines, or output data buffers. The proposed design is partitioned into two parts, clock edge detection circuit and pulse generator circuit as shown in Fig. 2, to resolve such a problem.

## 2.1. Clock edge detection circuit

Owing to the inverter in Fig. 1 will unavoidably introduce a delay in the pulse clock corresponding to the falling edge of the given clock. We propose another design in Fig. 3, where the CLK and CLKB are simultaneously fed into two clock edge detection (CED) circuitry followed by their corresponding pulse generators. The CED is composed of a PMOS pair differential amplifier as shown in Fig. 3[1]. The operation of the CED is summarized as follows.

1). When  $V_{IN} > V_{INB}$ , the output node of the differential amplifier is pulled down to a

low level. Then, the output of CED, BUFOUT, is pulled high.

- 2). When  $V_{IN} < V_{INB}$ , the output node of the differential amplifier is pulled up to a high level. Then, the output of CED, BUFOUT, is pulled low.
- 3). Since the differential amplifier possesses a high rejection ratio to the common noise which are introduced either by process variations or supply noise, the noise immunity is highly improved.
- 4). The reason why the PMOS pair is used instead of a NMOS pair is that the PMOS pair can be placed inside a single N-well, which that the noise coupled from the substrate is isolated.

#### 2.2. Pulse generator

The pulse generator, as shown in Fig. 4, is responding to the output of the CED to generate an internal clock. The pulse generator consists of a NAND- based responding circuit composed of M8 through M11, a keeper composed of M12 and inverter INV4, an output inverter in INV5, and a feedback delay circuit in INV2 and INV3. The detailed operations are described as follows.

- When the input signal at PIN is low, M8 and the keeper respond and keep the node PRES at a high voltage. In the mean time, the output, POUT, turns low. In the steady state, M9 is off and M11 is on.
- 2). As soon as PIN turns high, M10 is turned

on while M11 is initially on. Hence, PRES is pulled to ground through M10 and M11 which in turn pull POUT to VDD level.

- 3). The low voltage at PRES will be propagated through INV3 and INV2 to cut off M11 and turn on M9. Hence, the voltage at PRES will be back to a high level.
- The positive pulse width, therefore, is determined by the delay of the feedback network, i.e., INV2 and INV3.

The pulse width is predicted to be proportional to the delay of INV2 and INV3. Fig. 5 is a simulated result using TSMC 0.35  $\mu$ m 1P4M CMOS process to verify this prediction. The width of the PMOSs in INV2 and INV3 is set to 4.0  $\mu$ m, while that of NMOSs is set to 2.0  $\mu$ m. Referring to Fig. 5, the pulse width vs. the lengths of the MOSs are almost linear. This result relaxes the complexity of determining the duty cycle when given a clock frequency.

## 3. Simulations and Measurement

By using TSMC (Taiwan Semiconductor Manufacturing Company)  $0.35 \ \mu m$  1P4M CMOS process, we realize the proposed design to meet the DDR requirement of PC-133 mainboards.

#### **3.1.** Post-layout simulations

**Glitch Rejection :** Fig. 6 shows a common scenario that the external clock is coupled with a periodical glitch noise. The proposed design can

reject the glitch of which the magnitude is over 1/2 VDD.

**Noise Rejection :** Fig. 7 shows another scenario that the external clock is contaminated with high frequency noises. Again, the noises are filtered before the internal clocks.

**Power Failure Sensitivity :** Since the power supply voltage will likely be dropped owing to aging problems or defective batteries. The voltage drop of the supplied will drastically affect the performance of the clocks. Fig. 8 shows an impressive result when the supply voltage is reduced by 10%. The sensitivity of the edge jitter is 0.62 ns/V.

#### **3.2.** Chip characteristics

A prototype chip (or IP, i.e., intellectual property) is shown in Fig. 9. The physical implementation of the proposed clock generator has been approved to be fabricated on silicon by TSMC and CIC given the number : S35-90A-15u. Table 1 summarizes the characteristics of the chip.

| die area (with pads)        | 983×898µm <sup>2</sup>  |  |  |  |
|-----------------------------|-------------------------|--|--|--|
| core area (without pads)    | $122 \times 74 \mu m^2$ |  |  |  |
| max. freq.                  | 178 MHz                 |  |  |  |
| power dissipation @ 133 MHz | 13.3 mW                 |  |  |  |
| power dissipation @ 178 MHz | 15.3 mW                 |  |  |  |
| transistor count            | 44                      |  |  |  |

Table 1 : The characteristics of the chip

Referring to Fig. 10, the die photo of the proposed design is proved on silicon. Fig. 11 is the measured waveforms given a 133-MHz external clock, while Fig. 12 is the waveforms when the Vdd is dropped by 10%. They are

adequate to show the functional correctness of the chip. Table 2 is the measurement of the physical chip by the IMS-200 tester.

Table 2 : Measurements of the chip

| max. freq     | 153 MHz  |
|---------------|----------|
| power@133 MHz | 131.8 mW |
| max. power    | 138.6 mW |
| sensitivity   | 0.7 ns/V |

# 4. Conclusion

A simple and noise-insensitive design for internal clock generation is presented. It is highly suitable for DDR applications. It can be either implemented in a single chip or in an IP (intellectual property) form to be included in a system chip.

#### Reference

- P. R. Gray, and R. G. Meyer, "Analysis and design of analog integrated circuits," Reading: 3rd edition, John Wiley & Sons, Inc., 1993.
- P. Larsson, and C. Svensson, "Impact of clock slope on true single phase clocked (TSPC) CMOS circuits," *IEEE. J. of Solid-State Circuits*, vol. 29, no. 6, pp. 723-726, June 1994.

- [3] T. Takimoto, N. Fukunaga, M. Kubo, and N. Okabayshi, "High speed SI-OEIC (OPIC) for optical pickup," *IEEE Trans. on Consumer Electronics*, vol. 44, no. 1, pp. 137-142, Feb. 1998.
- [4] A. Terukina, T. Nozawa, Y. Suzuki, A. Hino, S. Koyama, and A. Moritani, "A high precision (+/- 100 ppm) CMOS clock generator)," *IEEE 1993 Custom Integrated Circuits Conference*, pp. 27.3.1 - 27.3.4, 1993.
- [5] C.-Y. Yang, G.-K. Dehng, J.-M. Hsu, and S.-I. Liu, "New dynamic flip-flops for high-speed dual-modulus prescaler," *IEEE*. *J. of Solid-State Circuits*, vol. 33, no. 10, pp. 1568-1571, Oct. 1998.
- [6] T. Yoshimura, H. Kondoh, Y. Matsuda, and T. Sumi, "A 622-Mb/s bit/frame synchronizer for high-speed backplane data communication," *IEEE. J. of Solid-State Circuits*, vol. 31, no. 7, pp. 1063-1066, July 1996.
- [7] C.-C. Wang, Y.-T. Chien, and Y.-P. Chen, "A practical load-optimized VCO design for low-jitter 5V 500 MHz digital phase-locked loop," *VLSI Design*, vol. 11, no. 2, pp. 107-113, June 2000.



Figure 1: Traditional out-of-phase clock generator



Figure 2: Block diagram of the proposed DDR clock generator



Figure 3: Schematic view of the clock edge detector (CED)



Figure 4: Schematic view of the NAND-based pulse train generator



Figure 5: Diagram of delay vs. length of MOSs in the feedback inverters



Figure 6: Post-layout simulation waveforms of rejecting glitches

(3.3V VDD, 133 MHz external clock)







Figure 8: Post-layout simulation waveforms of tolerating 10% VDD drop



Figure 9: Chip Layout



Figure 10: Die photo



Figure 11: Measured waveforms given a 133 MHz clock

| Í                                               | Timing Diagrams - cicims01 |           |                      |           |           |                     |  |              |
|-------------------------------------------------|----------------------------|-----------|----------------------|-----------|-----------|---------------------|--|--------------|
| <u>F</u> ile                                    | <u>E</u> dít               | Screens   | Su <u>b</u> -Screens | Options   | Utilities |                     |  | <u>H</u> elp |
| Timir<br>Test<br>Name<br>CLKB<br>PCLK1<br>PCLK1 | ng Dieg<br>comple<br>2     | ite.      | Total erro           | per 7.500 | na cycle  | Segmence:<br>Cycle: |  |              |
| Step                                            | Size:                      | 1.500ns   | X Marker:            | 0 Na:     | rker :    | X - 0:              |  |              |
| Start                                           | Syster                     | n Stop Sy | /stem                |           |           |                     |  |              |

Figure 12: Measured waveform when Vdd drops 10%