Frontiers of Information Technology & Electronic Engineering www.jzus.zju.edu.cn; engineering.cae.cn; www.springerlink.com ISSN 2095-9184 (print); ISSN 2095-9230 (online) E-mail: jzus@zju.edu.cn



## Supplementary materials for

Hadi JAHANIRAD, 2023. Dynamic power-gating for leakage power reduction in FPGAs. *Front Inform Technol Electron Eng*, 24(4):582-598. https://doi.org/10.1631/FITEE.2200084

## 1 Effects of power-domain granularity on energy saving

In this section, we investigate the effects of granularity size on power-gating efficincy. For more clarification, supposing that a power domain consumes  $P_{\text{leakage}}(\text{on})$  and  $P_{\text{leakage}}(\text{off})$  leakage power in active and sleep modes, respectively. It is obvious that  $P_{\text{leakage}}(\text{off}) \leq P_{\text{leakage}}(\text{on})$ . Moreover, supposing that a transition from active to sleep mode and vice versa requires  $E_{\text{tr}}$  joules energy. The consumed energy is related to the inrush current along with charging and discharging the internal nodes of the power domain modules (the worst-case scenario).

Furthermore, the power consumption related to the power controller  $P_{ov}(PC)$  and other additional resources should be considered in the calculations. In Eq. (S2),  $P_{ov}(\text{tile})$  and  $P_{PCS}(\text{tile})$  are related to the power consumption of the retention latch and the routing resources of the power control signal (PCS), respectively. On the other hand, assuming that in a  $T_{\text{total}}$  time of system operation,  $T_{\text{idle}}$  is the idle time for the power domains. Due to the additional resources that are responsible for power-gating (PC, PCS, footer/header, retention latches, etc.),  $P_{\text{leakage-PG}}(\text{on})$  and  $P_{\text{dynamic-PG}}$  are greater than  $P_{\text{leakage-ungated}}$  and  $P_{\text{dynamic-ungated}}$  of the conventional field programmable gate array (FPGA) architecture.

Roughly, supposing that the power-domain granularity can be as small as a tile (Fig. S1), including a configurable logic block (CLB), its neighbor switch boxes (SB), and the related connection boxes (CBs) (H\_CB and V\_CB). The amount of energy consumed in  $T_{\text{total}}$  seconds can be calculated for a conventional (ungated) and a power-gated tile (1-tile granularity) according to Eqs. (S1) and (S2), respectively.

$$E_{\text{ungated}} = (P_{\text{leakage-ungated}} + P_{\text{dynamic-ungated}}) \times T_{\text{total}}, \tag{S1}$$

$$E_{1-\text{tile}} = P_{\text{leakage}}(\text{off}) \times T_{\text{idle}} + (P_{\text{ov}}(\text{PC}) + (P_{\text{ov}}(\text{tile}) + P_{\text{PCS}}(\text{tile})) \times T_{\text{total}} + N_{\text{tr}} \times E_{\text{tr}} + (P_{\text{leakage}}, P_{\text{G}}(\text{on}) + P_{\text{dynamic}}, P_{\text{G}}) \times (T_{\text{total}} - T_{\text{idle}}),$$
(S2)

$$E_{N_{\text{PD-tile}}} = \beta_1 P_{\text{leakage-PG}}(\text{off}) \times T_{\text{idle}} + (\alpha_1 P_{\text{ov}}(\text{PC}) + \alpha_2 P_{\text{ov}}(\text{tile}) + \alpha_3 P_{\text{PCS}}(\text{tile})) \times T_{\text{total}} + N'_{\text{tr}} \times E'_{\text{tr}} + (\beta_2 P_{\text{leakage-PG}}(\text{on}) + \beta_3 P_{\text{dynamic-PG}}) \times (T_{\text{total}} - T'_{\text{idle}}).$$
(S3)

The first term in Eqs. (S2) and (S3) concerns leakage power in the idle time, and the second term is related to the power overhead caused by the power gating modules that are always on. The third term is related to the  $N_{tr}$  transitions energy, and the fourth term corresponds to the leakage and dynamic power consumption in active time.

The definitions of parameters and variables in Eqs. (S1)–(S3) are as follows.  $E_{ungated}$ ,  $P_{leakage-ungated}$ , and  $P_{dynamic-ungated}$  are the enregy, leakage power, and dynamic power consumed by an ungated tile, respectively.  $T_{total}$  and  $T_{idle}$  are the total operation time and the idle time of the tile, respectively.  $P_{leakage}$ (off) and  $P_{leakage-PG}$ (on) are the amount of leakage power which consumed by the tile in sleep mode and active mode, respectively.  $P_{dynamic-PG}$  is the amount of power consumption of a tile in its active mode.  $P_{ov}(PC)$ ,  $P_{ov}(tile)$ , and  $P_{PCS}$ (tile) are

the power consumption of power controller, power of overhead modules of a tile, and power consumption of PCS module, respectively.  $N_{tr}$  and  $E_{tr}$  are number of transitions and the energy per transition of a tile, respectively. When the power-domain granularity increases to  $N_{PD}$  tiles, the effective parameters in Eq. (S3) change as follows. We consider  $\alpha_1$ ,  $\alpha_2$ , and  $\alpha_3$  as the reduction coefficients for  $P_{ov}(PC)$ ,  $P_{ov}(tile)$ , and  $P_{PCS}(tile)$ , respectively. Moreover,  $\beta_1$ ,  $\beta_2$ , and  $\beta_3$  are the related coefficients for  $P_{leakage}(off)$ ,  $P_{leakage-PG}(on)$ , and  $P_{dynamic-PG}$ , respectively.

The idle time of this power domain is the overlap of idle-time durations of all  $N_{\text{PD}}$  tiles. So, the idle time of the power domain is less than  $T_{\text{idle}}$  of its tiles ( $T_{\text{idle}} < T_{\text{idle}}$ ). The granularity of  $N_{\text{PD}}$  tiles can reduce the power controller overhead due to the reduction of the number of power controller outputs. Consequently, the share of the power-controller overhead is reduced ( $\alpha_1 < 1$ ). The energy of a transition between sleep to active mode and vice versa may be less than  $N_{\text{PD}} \times E_{\text{tr}}$ . For instance, using one footer transistor in the granularity of  $N_{\text{PD}}$  tiles instead of the  $N_{\text{PD}}$  ones in the first case would result in a reduction of  $E_{\text{tr}}$  due to a lower capacitance value. The power overhead due to data retention ( $P_{\text{ov}}$ (tile)) would remain constant ( $\alpha_2 \approx 1$ ). This is because of the similar data that should be retained in the 1-tile and  $N_{\text{PD}}$ -tile granularities. On the other hand, an increase in the granularity of the power domain results in a reduction of the PCS routing resources. Thus, the related power overhead ( $P_{\text{PCS}}$ ) decreases accordingly ( $\alpha_3 < 1$ ). The leakage power of the off state of one tile in  $N_{\text{PD}}$  granularity architecture is less than its 1-tile granularity counterpart ( $\beta_1 < 1$ ), as well as the on state leakage power is less than its counterpart ( $\beta_2 < 1$ ). Dynamic power consumption for  $N_{\text{PD}}$  tiles in 1-tile granularity is slightly larger than the related dynamic power consumption in the  $N_{\text{PD}}$ -tile granularity ( $\beta_3 < 1$ ). This is due to the utilization of more resources in the former (more footer/header transistors, etc.).

Increasing the granularity of the power domain is accompanied by lowering all the desirable parameters except the idle time. Due to the idle time decrement, almost all the tiles receive less benefit from power-gating. If all the tiles in a power domain belong to a single power-gating region (PGR), then the most efficient granularity size equals the size of all the tiles. If more than one PGR module occupies the power domain tiles, the related idle time is the overlap value of related PGR idle times. This reduces the benefit of power gating in the power domain. Consequently, an efficient granularity is one in which the PGR and power domain matching is maximum.

For instance, Fig. S2 illustrates the final placement of an arbitrary circuit containing six modules. Two levels of granularity are shown in this figure. One is a 1-tile granularity in red, and the other is a 16-tile granularity in black. A portion of the M4, M5, and M6 modules are placed in the PD1 power domain. So, the related idle time would be reduced to the overlap of idle times of these modules. Furthermore, despite the low utilization of PD2 ( $\approx$ 20% by M3), the M3 idle time determines the idle times of all the tiles belonging to this power domain.

## 2 Leakage mechanisms in FPGA

The sub-threshold current occurs when  $V_{GS}$  (voltage difference between gate and source terminals) is less than  $V_{th}$  (threshold voltage of the transistor), and is calculated according to

$$I_{\rm sub} = \mu_0 C_{\rm ox} \frac{W}{L} (m-1) \times v_{\rm T}^2 \times e^{(V_{\rm GS} - V_{\rm th})/(mv_{\rm T})} \times (1 - e^{-V_{\rm DS}/v_{\rm T}}),$$
  

$$m = 1 + \frac{3t_{\rm ox}}{W_{\rm dm}},$$
(S4)

where  $\mu_0$ ,  $C_{\text{ox}}$ ,  $t_{\text{ox}}$ , W, and L are the device mobility, oxide capacitance, oxide thickness, transistor width, and channel length, respectively. Further,  $v_{\text{T}}$  and  $W_{\text{dm}}$  are the thermal voltage and maximum depletion layer width, respectively.

Three main factors affect the sub-threshold current:

1. The temperature increase leads to an increase in the sub-threshold current;

2. The sub-threshold current increases with the reduction of  $V_{\text{th}}$ . Consequently, the body effect (due to non-zero  $V_{\text{BS}}$ ) increases  $V_{\text{th}}$  and leads to a decrement of the sub-threshold current;

3. Drain-induced barrier lowering (DIBL) occurs due to different voltages in the drain and source terminals ( $V_{DS}>0$ ). DIBL decreases the  $V_{th}$  and increases the sub-threshold current.

The second leakage mechanism is related to the gate leakage current, which exists in both the on and off states. The amount of this current in the off state is negligible. Generally, a large  $V_{GS}$  along with a small  $V_{DS}$  results in a larger gate leakage current.



Fig. S1 Internal circuitry of a tile



Fig. S2 A circuit placement and two different granularities



Fig. S3 A typical basic logic element architecture

| Table S1 | Synthetic | circuit | properties |
|----------|-----------|---------|------------|
|----------|-----------|---------|------------|

| Circuit         | Modules                                              | Number of input–outputs | Number of basic logic elements |
|-----------------|------------------------------------------------------|-------------------------|--------------------------------|
| C2 1            | elliptic+S38417                                      | 155                     | 5268                           |
| C2 <sup>2</sup> | clma+s298                                            | 473                     | 4287                           |
| C3_1            | elliptic+S38417+s298                                 | 164                     | 5319                           |
| C3 2            | diffeq frisc+S38584                                  | 499                     | 9761                           |
| C4_1            | ex5p+s298+apex2+seq                                  | 199                     | 2577                           |
| C4_2            | spla+diffeq+misex3+s38417                            | 267                     | 8483                           |
| C5_1            | apex4+tseng+seq+pdc+clma                             | 785                     | 9574                           |
| C5_2            | s38417+ex5p+diffeq+ex1010+s298                       | 265                     | 7325                           |
| C6_1            | seq+tseng+apex4+pdc+spla+misex3                      | 423                     | 8204                           |
| C6_2            | spla+ex5p+s298+seq+apex2+ex1010                      | 292                     | 5809                           |
| C7_1            | misex3+spla+s38417+pdc+seq+tseng+apex4               | 557                     | 12 838                         |
| C8_1            | clma+ex1010+s298+spla+misex3+spla+apex2+seq          | 781                     | 11 908                         |
| C9_1            | s298+apex2+seq+alu4+elliptic+tseng+s38417+apex4+ex5p | 563                     | 12 427                         |