# Message delay time distribution analysis for controller area network under errors ${ }^{* *}$ 

Lei-ming ZHANG, Yi-chao SUN, Yong LEI ${ }^{\ddagger}$<br>State Key Laboratory of Fluid Power \& Mechatronic Systems, Zhejiang University, Hangzhou 310027, China<br>E-mail: lmzhang@zju.edu.cn; syc_best@163.com; ylei@zju.edu.cn<br>Received Dec. 8, 2017; Revision accepted May 13, 2018; Crosschecked June 11, 2019


#### Abstract

Controller area network (CAN) is a widely used fieldbus protocol in various industrial applications. To understand the network behavior under errors for the optimal design of networked control systems, the message response time of the CAN network needs to be analyzed. In this study, a novel delay time distribution analysis method for the response messages is proposed when considering errors. In this method the complex message queues are decomposed into typical message patterns and cases. First, a stochastic fault model is developed, and the probability factor is defined to calculate the error distribution. Then the message delay time distribution for the single slave node configuration is analyzed based on the error distribution. Next, based on the delay time distribution analysis of typical patterns and cases, an analysis framework of message delay time distribution for the master/slave configuration is developed. The testbed is constructed and case studies are conducted to demonstrate the proposed methodology under different network configurations. Experimental results show that the delay time distributions calculated by the proposed method agree well with the actual observations.


Key words: Controller area network; Message delay; Probability distribution; Errors
https://doi.org/10.1631/FITEE. 1700815

## 1 Introduction

Since its standardization in the 1980s, the controller area network (CAN) protocol has gained wide acceptance across various applications, including vehicle systems, networked automation systems, and more recently, airplane sensor-actuator systems (Farsi et al., 1999). However, the probability of a CAN system suffering from faults increases as the complexity of the network topologies and system functions grow rapidly. Errors in the system contributed by factors such as vibration, electromagnetic interference, and the effects of aging on the

[^0]cables can cause delay when transmitting messages, which will result in the degradation of the network performance and a decrease in the stability of the overall system. To achieve a robust design of the CAN network for an optimal bandwidth usage and evaluate the real-time performance of the networked control systems, the research on network behaviors in terms of message queues and message response time distributions when considering errors is of great importance.

In the literature, message response time analysis for a CAN network has drawn attention since the early 1990s. Since it was difficult to exactly calculate the message response time, the worst-case response time (WCRT) was studied extensively. Tindell et al. (1995b) proposed a method to estimate the bound of WCRT of a given message. The WCRT analysis method has been extended to the conditions of considering errors, fixed priority, different controllers,
and other circumstances (Tindell and Burns, 1994; Tindell et al., 1994, 1995a). Navet et al. (2000) calculated the worst-case deadline failure probability (WCDFP) considering random errors other than deterministic errors for the CAN network. Davis et al. (2007) improved the WCRT analysis method by resolving the flaws with the original schedulability analysis for the CAN messages. The WCRT analysis and optimal priority assignment policies have been introduced (Davis et al., 2011, 2013; Davis and Navet, 2012), under different constraints such as first in first out (FIFO) queueing policy and arbitrary deadlines. Broster et al. (2005) demonstrated the unreasonable and conservative aspects of the error model proposed in Navet et al. (2000), and conducted the WCRT analysis of messages under a Poisson distribution fault model. Mubeen et al. (2014, 2015) calculated the WCRT for periodic, sporadic, and mixed messages in CAN networks by integrating the effect of hardware and software limitations in the CAN controller. Yomsi et al. (2012) proposed an extendible framework built upon the transaction model to analyze the WCRT of non-preemptive CAN frames with offsets. However, the existing WCRT method is too conservative in practice such that it will result in over-design and under-utilization of network resources and bandwidth.

The message response time analysis from the perspective of probability and statistics for the CAN network has been studied in the literature. Kumar et al. (2009) proposed a response time distribution analysis method for CAN networks based on the deterministic stochastic petri net (DSPN), where the message response time distributions in different priorities were analyzed and the results were compared with that of worst-case analysis. Chen et al. (2012) developed a non-preemptive priority M/G/1 model for the response time of CAN messages, and calculated the distribution function of bit-stuffing and the mean value of the response time. Zeng et al. (2010) proposed a method to compute message response time probability distribution using statistical analysis when only partial information was available on the bus and the assigned message priorities were given.

As can be seen from the literature, although existing probability distribution approaches have been developed to analyze the response time of CAN messages, the analysis of the CAN message response time
distribution considering errors on the bus has not been conducted. In practical industrial systems, the queueing of the messages and the error interruptions are inevitable in the CAN network, which will in turn make the design of a CAN network complex. To take full advantage of the bus bandwidth and ensure that the message transmissions satisfy the real-time requirements of a system, it is needed to develop a new message response time analysis methodology that considers both message arbitrations and error interruptions.

The purpose of this study is to develop a delay time distribution analysis method for the response messages in the CAN network under errors. The advantages of this study are as follows: First, the proposed method considers the interaction of both message arbitrations and error interruptions on the message delay time analysis, which provides a deep insight into the CAN behaviors. Second, in this framework, the complex message queues are decomposed into typical message patterns and cases, which simplifies the analysis procedure and makes it straightforward to calculate the delay time distribution of messages for the master/slave configuration based on the observations of a practical single-node network. The results of this work will enable system engineers to predict and estimate the performance of a network system in its design stage based on the message response time analysis of several basic nodes, and to better understand the network behaviors with stochastic errors, which will ultimately lead to optimal design of the networked control systems.

## 2 Problem definition

In a polling based CAN network, the transmission time of a message $M$ from a slave node depends on the message queue on the bus (Sun et al., 2015), as shown in Fig. 1.

The top panel of Fig. 1 shows the scenario where only one master device and one slave node are on the bus and the bus is error-free. $M_{P}$ denotes the request frame sent from the master device to the slave node, and $M$ denotes the response message sent from the slave node to the master device without error interruptions. In this scenario, the response time of message $M, R_{M}$, is calculated as

$$
\begin{equation*}
R_{M}=J_{M}+C_{M}, \tag{1}
\end{equation*}
$$

where $C_{M}$ and $J_{M}$ respectively denote the transmission time and the jitter time of message $M$ when the bus is error-free. Since the variation of the jitter time is considerably small compared with the response time $R_{M}$, the expectation of $J_{M}$ is adopted in this study.


Fig. 1 Analysis of the delay time for a message $M$
The bottom panel of Fig. 1 shows the scenario where the transmission of message $M$ is delayed by the blocking of other messages and the error interruptions on the bus (Hansson et al., 2002). $M_{P i}$ and $M_{j}$ are the examples of other messages that block the transmission of $M$, where $M_{P i}$ denotes the message sent from the master device to other slave nodes, and $M_{j}$ denotes the message sent from other slave nodes. $I_{m}$ is the shortest frame interval defined in the CAN protocol (Bosch, 1991) that distinguishes two consecutive messages, which is equal to the transmission time of seven consecutive recessive bits in this work. In this scenario, the response time of message $M$, $R_{M}^{D}$, is calculated as

$$
\begin{align*}
R_{M}^{D} & =t_{B}+t_{E}+t_{R}+C_{M} \\
& =t_{B}+E_{m}+C_{M} \tag{2}
\end{align*}
$$

where $t_{B}$ is the time interval between $M_{P}$ successfully transmitted and message $M$ starting to transmit, and $E_{m}$ is the delay time caused by error interruptions. $E_{m}$ contains two parts: The first part $t_{E}$ is the already transmitted time of the uncompleted message when error interruption occurs, and the second part $t_{R}$ is the bus recovery time which is equal to the transmission time of 24 bits (Bosch, 1991).

Then the delay time contributed by message blocking and error interruptions can be calculated as

$$
\begin{align*}
t_{D} & =R_{M}^{D}-R_{M} \\
& = \begin{cases}E_{m}, & t_{B} \leq J_{M}, \\
t_{B}-J_{m}+E_{m}=B_{m}+E_{m}, & t_{B}>J_{m},\end{cases} \tag{3}
\end{align*}
$$

where $B_{m}$ denotes the blocking time caused by other messages.

Therefore, to determine the delay time distribution of message $M$, the following challenges must be addressed:

1. How should the error interruptions occurring on the bus be modeled, and how is the delay time distribution of message calculated when error interruptions occur at different parts of the message?
2. How is the delay time distribution of message calculated when both message blocking and error interruptions on the bus are considered?

The assumptions in this study are as follows: The communication mode of the CAN bus is polling with one master device, and the information about message sequences without errors can be obtained by practical observation or simulation.

## 3 Analysis methodology

### 3.1 Overall analysis framework

The basic idea of the proposed method is that by analyzing the message sequences, one can decompose complex message arbitrations induced and error interruptions induced message delays into simple patterns and cases. The overall procedure for the proposed method is shown in Fig. 2.

First, a stochastic fault model is developed to describe the faults occurring in a CAN network. Then we introduce the probability factor to express the causal relationship between the intermittent connection (IC) fault and the resulting error interruptions on the bus. Third, we conduct a detailed analysis of the message delay for the single slave node configuration, which includes calculating the probability of different numbers of error interruptions occurring during the message transmission, obtaining the probability density functions (PDFs) of the delay time corresponding to the different numbers of error interruptions, and determining the delay time distribution by calculating the expectation of all these PDFs. Finally, we introduce typical patterns and cases of the message sequence for the master/slave configuration, and present a framework to analyze the delay time distribution for the master/slave configuration considering both message blocking and error interruptions. Details of the proposed method are introduced in the following subsections.

### 3.2 Stochastic fault model

During the virtual industrial production and control process, the faults that a CAN system suffers from the surrounding environment are complex and stochastic. Among the various kinds of faults, we consider the IC fault in this study. An IC fault causes short-term abnormal or chronically instable connection between the node module and network bus. If the IC fault occurs during the transmission of a dominant bit, this dominant bit will turn into a recessive bit, which causes error interruption on the CAN system (Lei et al., 2015).

The arrivals of the IC fault follow a Poisson process (Lei et al., 2014), and the arrival rate is $\lambda$. Therefore, the probability that the number of IC faults arriving in any interval of length $t$ is equal to


Fig. 2 Overall framework of the delay time analysis procedure
$n$ can be obtained as follows:

$$
\begin{equation*}
P(t, n)=P\{N(t)=n\}=\mathrm{e}^{-\lambda t} \frac{(\lambda t)^{n}}{n!}, n=0,1, \cdots \tag{4}
\end{equation*}
$$

### 3.3 Error handling of CAN and probability factor calculation

### 3.3.1 CAN error handling mechanisms

There are five different error types that can be detected in the CAN network: bit error, stuff error, cyclic redundancy check (CRC) error, format error, and acknowledgement (ACK) error. Any node that detects an error condition will send an error flag to interrupt the transmission on the bus. Table 1 shows the cause of each type of error, and the output time sequence of the error flag defined in the CAN protocol for each type of error. For example, if an IC fault affects one dominant bit that leads to a CRC validation error, the error flag will occur and generate an error interruption after the ACK delimiter, rather than after this affected dominant bit.

### 3.3.2 Calculation of probability factor $\alpha$

Although the arrivals of an IC fault have been modeled, the distribution of the error interruptions on the bus is difficult to determine. The reasons are twofold: First, the IC fault can affect only the dominant bits of the frame, and it cannot lead to an error interruption when the bus is idle or when the bus is transmitting the recessive bits. Therefore, an arrival of an IC fault may not result in an error interruption on the bus. Second, when the dominant bits are affected by the IC fault, it may not immediately result in an error interruption.

Considering the aforementioned two reasons, a probability factor $\alpha$ is developed to describe the relationship between the IC fault arrivals and the resulting error interruptions on the bus. According to

Table 1 Output time sequences of error flag for different error types in the CAN protocol

| Error type | Cause of the error | Time sequence of error flag |
| :---: | :--- | :---: |
| Bit error | Output level does not agree with the bus monitoring level <br> Stuff error | Six consecutive equal bit levels are detected |
| Format error <br> ACK error | Level detected does not agree with the fixed bit format <br> Level in ACK slot of sending unit is recessive | Output error flag next to the <br> bit after the error is detected |
| CRC error | CRC calculated does not agree with that received | Output error flag next to the <br> bit after ACK delimiter |

the CAN protocol, a message $M$ can be divided into several segments (Fig. 3).


Fig. 3 Probability factor $\alpha$ for different frame segments of message $M$

In every segment of $M$, the probability factor $\alpha_{i}$ represents the probability that the arrival of an IC fault can effectively cause an error interruption, which can be obtained by

$$
\begin{equation*}
\alpha_{i}(M)=\frac{n_{d i}}{n_{i}}+\sum_{\forall j \in E(j i)} \frac{n_{d j}}{n_{j}}, \tag{5}
\end{equation*}
$$

where $n_{i}$ or $n_{j}$ is the number of all bits on each corresponding segment, and $n_{d i}$ or $n_{d j}$ is the number of dominant bits that can be effective on each corresponding segment. The set $E(j i)$ represents all scenarios in which the IC fault arrives at segment $j$ but causes an error interruption on segment $i$.

The calculation of $\alpha_{i}(M)$ includes two parts: the first part denotes the probability that the IC fault arrival is effective and causes an error interruption in segment $i$, and the second part denotes the probability that the IC fault arrival is effective on segment $j$ but causes an error interruption in segment $i$. The expectation of $\alpha_{i}(M)$ can be calculated by

$$
\begin{equation*}
\bar{\alpha}(M)=E\left[\alpha_{i}(M)\right]=\sum_{i=1}^{8} \frac{n_{i}}{n_{\text {total }}} \alpha_{i}(M), \tag{6}
\end{equation*}
$$

where $n_{\text {total }}$ is the total number of bits of message $M$.

### 3.4 Message delay analysis for the single slave node configuration

In this subsection, we introduce the message delay analysis for the single slave node configuration, which serves as the fundamental building block of this work. In the single slave node configuration, only one master device and one slave node are on the bus, the slave node transmits the response message as long as the master device sends the request message, and there are no other messages affecting the transmission. Hence, only the error interruptions can cause delay time for the transmission of the response message sent from the slave node.

Let $M_{P k}$ denote the request message sent from the master device to node $k$, and $M_{N k}$ the response message sent from slave node $k$. Thus, the only message sequence observed on the bus is $\left(M_{P k}, M_{N k}\right)$. As the IC faults may arrive continuously in time, different numbers of error interruptions should be analyzed separately. Two scenarios for different numbers of error interruptions on the bus are shown in Fig. 4, where the top panel and bottom panel are the message sent by the slave node interrupted by one error and two successive errors, respectively. The scenario that more error interruptions on the bus can be analyzed similarly.


Fig. 4 Message queues under different numbers of error interruptions

The probability that different numbers of IC faults arrive during the transmission of $M_{N k}$ can be calculated by Eq. (4), and the probability factor for message $M_{N k}, \bar{\alpha}\left(M_{N k}\right)$, can be calculated by Eq. (6). Then the probability that $n$ error interruptions occur during the transmission interval of $M_{N k}$ can be calculated by

$$
\begin{align*}
P_{M_{N k}}^{(n)}= & P\left\{n \text { error interruptions during } C_{M_{N k}}\right\} \\
= & {\left[\left(1-P\left(C_{M_{N k}}, 0\right)\right) \bar{\alpha}\left(M_{N k}\right)\right]^{n} \cdot\left[P\left(C_{M_{N k}}, 0\right)\right.} \\
& \left.+\left(1-P\left(C_{M_{N k}}, 0\right)\right)\left(1-\bar{\alpha}\left(M_{N k}\right)\right)\right], \tag{7}
\end{align*}
$$

where $C_{M_{N k}}$ denotes the transmission time of $M_{N k}$ when the bus is error-free.

According to the Poisson process theory, the PDF for the $n^{\text {th }}$ IC fault arrival is given as follows:

$$
\begin{equation*}
f_{S_{n}}(t)=\lambda \mathrm{e}^{-\lambda t} \frac{(\lambda t)^{n-1}}{(n-1)!} \tag{8}
\end{equation*}
$$

Then the probability density function of $t_{E}$ corresponding to the IC fault arriving at different frame
segments can be obtained:

$$
\begin{align*}
f_{M_{N k}}^{(n)}\left(t_{E}\right) & =\alpha_{i}\left(M_{N k}\right) f_{S_{n}}\left(t_{E}\right) \\
& =\left\{\begin{array}{cl}
\alpha_{1}\left(M_{N k}\right) f_{S_{n}}\left(t_{E}\right), & t_{0} \leq t_{E}<t_{1}, \\
\alpha_{2}\left(M_{N k}\right) f_{S_{n}}\left(t_{E}\right), & t_{1} \leq t_{E}<t_{2}, \\
\vdots & \\
\alpha_{8}\left(M_{N k}\right) f_{S_{n}}\left(t_{E}\right), & t_{7} \leq t_{E} \leq t_{8}
\end{array}\right. \tag{9}
\end{align*}
$$

Finally, the delay time PDF of $M_{N k}$ can be obtained by calculating the expectation of all these PDFs (note that $B_{m}$ is zero since there is no competition in the single node case):

$$
\begin{align*}
f_{M_{N k}}\left(t_{D}\right) & =f_{M_{N k}}\left(E_{m}\right)=f_{M_{N k}}\left(t_{E}+t_{R}\right) \\
& =\sum_{i=1}^{n} P_{M_{N k}}^{(i)} f_{M_{N k}}^{(i)}\left(t_{D}-t_{R}\right) . \tag{10}
\end{align*}
$$

Furthermore, the cumulative distribution function (CDF) of delay time $t_{D}$ is obtained:

$$
\begin{equation*}
F_{M_{N k}}(t)=\int_{-\infty}^{t} f_{M_{N k}}\left(t_{D}\right) \mathrm{d} t_{D} \tag{11}
\end{equation*}
$$

According to the stochastic Poisson fault model, the number of IC fault arrivals in any time interval can be infinite theoretically; thus, the number of error interruptions during $C_{M_{N k}}$ can be infinite. However, the scenarios in which more than two consecutive error interruptions occur are rare according to actual observations. Therefore, a probability threshold value $\psi$ can be used to limit the number of error interruptions which need to be considered. If $P_{i}<\psi$, then such an $i$-error-interruption case can be ignored.

## 4 Message delay analysis for the master/slave configuration

When the single node case is extended to the master/slave case, the contention between messages from multiple nodes is inevitable. Thus, the delay time is caused not only by error interruptions, but also by the blocking time of other messages. Since there are various message sequences on the bus in the master/slave configuration, the complexity of the analysis for multiple slave nodes is obviously increased. In this section, as shown in Table 2, we introduce the typical patterns and cases of the message sequence in the master/slave configuration, based on which we can analyze the message delay time for the master/slave configuration, and the details of the patterns and cases are introduced in the following subsections.

### 4.1 Pattern I: no arbitration contention between messages

Similar to the single slave node configuration, let $M_{P k}$ denote the request message sent from the master device to node $k$, and $M_{N k}$ the response message sent from node $k$. In pattern I, message $M_{N k}$ has no arbitration contention with other messages, but may still be influenced by the transmission delay of the messages ahead of $M_{N k}$. As analyzed in the single slave node configuration, there are at most two consecutive error interruptions that are statistically significant. Therefore, the maximum delay time caused by single node message under errors, $\max \left(E_{m}\right)$, is given by

$$
\begin{equation*}
\max \left(E_{m}\right)=\max \left(t_{E}\right)+t_{R}=2\left(C_{M}+t_{R}\right) \tag{12}
\end{equation*}
$$

where $M$ is the message ahead of $M_{N k}$ and $C_{M}$ is the transmission time of $M$ when the bus is errorfree. Then by comparing the interval time between

Table 2 Typical patterns and cases of the message sequence for the master/slave configuration

| Pattern | Case(s) and scenario(s) |  |
| :---: | :--- | :--- |
|  | Case A: $t_{\text {interval }} \gg \max \left(E_{m}\right)$ |  |
| I: no arbitration contention | Case B: $t_{\text {interval }} \ll \max \left(E_{m}\right)$ | Scenario 1: $M_{N k}$ is interrupted <br> Scenario 2: $M$ is interrupted |
|  | Case C: $t_{\text {interval }} \approx \max \left(E_{m}\right)$ | Scenario 1: $M_{N k}$ is interrupted <br> Scenario 2: $M$ is interrupted |
| II: existing arbitration contention | Scenario a: $M_{N k}$ wins the arbitration <br> Scenario b: $M_{N k}$ loses the arbitration |  |

$M$ and $M_{N k}$, i.e., $t_{\text {interval }}$ with $\max \left(E_{m}\right)$, we can obtain three cases in pattern I.

### 4.1.1 Case A of pattern I

In this case, $t_{\text {interval }}$ is considerably greater than $\max \left(E_{m}\right)$, where $M_{N k}$ is not affected by $M$ even if $M$ is interrupted by error interruptions. Fig. 5 shows the message queue and the error interruptions in case A.


Fig. 5 Delay time analysis of $M_{N k}$ in case A of pattern I

As shown in the bottom panel of Fig. 5, the delay time distribution of $M_{N k}$ in case A is equivalent to that in the single slave node configuration. Therefore, the delay time PDF considering error interruptions of $M_{N k}$ in case A is

$$
\begin{align*}
f_{M_{N k}}^{\text {case A }}\left(t_{D}\right) & =f_{M_{N k}}\left(E_{m}\right)=f_{M_{N k}}\left(t_{E}+t_{R}\right) \\
& =\sum_{i=1}^{n} P_{M_{N k}}^{(i)} f_{M_{N k}}^{(i)}\left(t_{D}-t_{R}\right) . \tag{13}
\end{align*}
$$

### 4.1.2 Case B of pattern I

In this case, $t_{\text {interval }}$ is equal to the shortest frame interval $I_{m}$, which is far smaller than $\max \left(E_{m}\right)$. In case B, if the error interruptions occur during the transmission of $M$ and cause delay time $E_{m}$ for $M$, then the delay time $E_{m}$ will affect the subsequent message $M_{N k}$ and make the delay time of $M_{N k}$ equal to $E_{m}$. The message queue and the error interruptions in case B are shown in Fig. 6.


Fig. 6 Delay time analysis of $M_{N k}$ in case B of pattern I

Based on the locations where the error interruptions occur, case B can be decomposed into two scenarios:

## 1. Scenario 1 in case B of pattern I

In this scenario, the error interruptions occur during the transmission of $M_{N k}$; thus, the delay time PDF of $M_{N k}$ is the same as the result obtained in the single slave node configuration, as shown below:

$$
\begin{equation*}
f_{M_{N k}}^{(\mathrm{s} 1)}\left(t_{D}\right)=f_{M_{N k}}^{\mathrm{case} \mathrm{~A}}\left(t_{D}\right) \tag{14}
\end{equation*}
$$

2. Scenario 2 in case B of pattern I

In this scenario, the error interruptions occur during the transmission of $M$ and cause delay time $E_{m}$ for $M$. Then the transmission of $M_{N k}$ is blocked by the delay of $M$, and the blocking time $B_{m}$ is equal to $E_{m}$. Therefore, the delay time PDF of $M_{N k}$ in this scenario can be described by

$$
\begin{align*}
f_{M_{N k}}^{(\mathrm{s} 2)}\left(t_{D}\right) & =f_{M_{N k}}\left(B_{m}\right)=f_{M}\left(E_{m}\right) \\
& =\sum_{i=1}^{n} P_{M}^{(i)} f_{M}^{(i)}\left(t_{D}-t_{R}\right) . \tag{15}
\end{align*}
$$

To calculate the delay time distribution of $M_{N k}$ in case B, the percentage of each scenario appearing on the bus should be determined. Since the two scenarios are caused by the different locations where error interruptions occur, and the arrivals of an IC fault are assumed to follow a Poisson process, the percentage for each scenario can be obtained by combining the length of each message and the corresponding probability factor. The probability of scenario 1 appearing on the bus is given by

$$
\begin{equation*}
\beta=\frac{C_{M_{N k}} \bar{\alpha}\left(M_{N k}\right)}{C_{M_{N k}} \bar{\alpha}\left(M_{N k}\right)+C_{M} \bar{\alpha}(M)}, \tag{16}
\end{equation*}
$$

where $C_{M_{N k}}$ and $C_{M}$ denote the transmission time of $M_{N k}$ and $M$ respectively, and $\bar{\alpha}\left(M_{N k}\right)$ and $\bar{\alpha}(M)$ denote the probability factors of $M_{N k}$ and $M$ respectively.

Thus, the probability of scenario 2 appearing on the bus is $(1-\beta)$. By combining scenarios 1 and 2 , the delay time PDF of $M_{N k}$ in case B can be obtained:

$$
\begin{equation*}
f_{M_{N k}}^{\text {case B }}\left(t_{D}\right)=\beta f_{M_{N k}}^{(\mathrm{s} 1)}\left(t_{D}\right)+(1-\beta) f_{M_{N k}}^{(\mathrm{s} 2)}\left(t_{D}\right) . \tag{17}
\end{equation*}
$$

### 4.1.3 Case C of pattern I

In this case, $t_{\text {interval }}$ is in the range of $\max \left(E_{m}\right)$. Thus, if the error interruptions occur during the
transmission of $M$ and cause delay time $E_{m}$ for $M$, the delay time $E_{m}$ will affect the subsequent message $M_{N k}$, but the resulting delay time of $M_{N k}$ is different from that in case B. Fig. 7 shows the message queue and the error interruptions in case C .

Similar to the analysis in case B, case C can be decomposed into two scenarios based on the locations where the error interruptions occur.

## 1. Scenario 1 in case C of pattern I

In this scenario, the error interruptions occur during the transmission of $M_{N k}$; thus, the delay time PDF of $M_{N k}$ is the same as the result obtained in the single slave node configuration, shown as follows:

$$
\begin{equation*}
f_{M_{N k}}^{(\mathrm{s} 1)}\left(t_{D}\right)=f_{M_{N k}}^{\text {case A }}\left(t_{D}\right) \tag{18}
\end{equation*}
$$

## 2. Scenario 2 in case C of pattern I

In this scenario, the error interruptions occur during the transmission of $M$ and cause delay time $E_{m}$ for $M$. As shown in the middle panel of Fig. 7, the location of error interruption on $M$ is on the front of the whole message, then the retransmission of $M$ has no influence on the transmission of $M_{N k}$, and the block time $B_{m}$ is zero. On the other hand, as shown in the bottom panel of Fig. 7, the location of error interruption on $M$ is at the rear of the whole message, then the repeated transmission of $M$ blocks the transmission of $M_{N k}$, and the resulting block time for $M_{N k}$ is $B_{m}=E_{m}-\left(T_{k}-I_{m}\right)$, where $T_{k}$ denotes the interval time between $M_{N k}$ and $M$, which can be obtained by analyzing the message response time of different single nodes when the bus is error-free. Therefore, the delay time PDF of $M_{N k}$ in this scenario can be obtained by

$$
\begin{align*}
& f_{M_{N k}}^{(s 2)}\left(t_{D}\right)=f_{M_{N k}}\left(B_{m}\right)=f_{M}\left(E_{m}-T_{k}+I_{m}\right), \\
& = \begin{cases}0, & t_{D} \leq 0, \\
\sum_{i=1}^{n} P_{M}^{(i)} \cdot f_{M}^{(i)}\left(t_{D}-t_{R}+T_{k}-I_{m}\right), & t_{D}>0 .\end{cases} \tag{19}
\end{align*}
$$



Fig. 7 Delay time analysis of $M_{N k}$ in case C of pattern I

In case $C$, the percentage of each type of the two scenarios appearing on the bus can also be calculated by Eq. (16). Therefore, the delay time PDF of $M_{N k}$ in case $C$ can be obtained by combining the two scenarios:

$$
\begin{equation*}
f_{M_{N k}}^{\mathrm{case} \mathrm{C}}\left(t_{D}\right)=\beta f_{M_{N k}}^{(\mathrm{s} 1)}\left(t_{D}\right)+(1-\beta) f_{M_{N k}}^{(\mathrm{s} 2)}\left(t_{D}\right) \tag{20}
\end{equation*}
$$

### 4.2 Pattern II: arbitration contention existing between messages

In pattern II, there are arbitration contentions between message $M_{N k}$ and other messages. In this subsection, we present an analysis procedure for the situation where there is an arbitration contention between $M_{N k}$ and message $M$. Thus, the situation where there are arbitration contentions between $M_{N k}$ and multiple messages can be analyzed in a similar way

Fig. 8 shows two different message queues on the bus where a different message wins the arbitration and the interval time between $M_{N k}$ and $M$ in both of the message queues is the shortest frame interval $I_{m}$. To analyze the delay time distribution of $M_{N k}$ in pattern II, the message queues shown in Fig. 8 can be decomposed into two scenarios based on the arbitration results of the messages.


Fig. 8 Message analysis of $M_{N k}$ in pattern II

### 4.2.1 Scenario a of pattern II

As shown in the top panel of Fig. 8, message $M_{N k}$ wins the arbitration and starts the transmission first. In this scenario, message $M$ has no influence on the transmission of $M_{N k}$, and the delay time distribution of $M_{N k}$ is up to the message sequence ahead of $M_{N k}$, which is one of the three cases shown in pattern I. As long as the message sequence ahead of $M_{N k}$ is determined, the delay time PDF of $M_{N k}$ considering error interruptions in scenario a can be obtained:

$$
\begin{equation*}
f_{M_{N k}}^{(\mathrm{sa})}\left(t_{D}\right)=f_{M_{N k}}^{\text {case } u}\left(t_{D}\right), u \in\{\mathrm{~A}, \mathrm{~B}, \mathrm{C}\} . \tag{21}
\end{equation*}
$$

### 4.2.2 Scenario b of pattern II

As shown in the bottom panel of Fig. 8, message $M_{N k}$ loses the arbitration and is blocked by message $M$. In this scenario, the block time $B_{m}$ caused by $M$ is the sum of the transmission time $C_{M}$ and the jitter time $J_{M}$ of message $M$.

If the error interruptions occur during the transmission of $M_{N k}$, then the delay time PDF of $M_{N k}$ is given by

$$
\begin{align*}
f_{M_{N k}}^{(\mathrm{sb} 1)}\left(t_{D}\right) & =f_{M_{N k}}\left(E_{m}+B_{m}\right) \\
& =\sum_{i=1}^{n} P_{M_{N k}}^{(i)} f_{M_{N k}}^{(i)}\left(t_{D}-t_{R}-B_{m}\right) \tag{22}
\end{align*}
$$

On the other hand, if the error interruptions occur during the transmission of $M$, similar to the analysis in case B of pattern I, the delay time PDF of message $M$ is $f_{M}^{\text {case B }}$. Since $M_{N k}$ is blocked by message $M$ with block time $B_{m}$, the delay time PDF of $M_{N k}$ can be calculated by

$$
\begin{equation*}
f_{M_{N k}}^{(\mathrm{sb2)}}\left(t_{D}\right)=f_{M}^{\mathrm{case} \mathrm{~B}}\left(t_{D}-B_{m}\right) \tag{23}
\end{equation*}
$$

Then the delay time PDF of $M_{N k}$ in scenario b can be obtained:

$$
\begin{equation*}
f_{M_{N k}}^{(\mathrm{sb})}\left(t_{D}\right)=\beta f_{M_{N k}}^{(\mathrm{sb} 1)}\left(t_{D}\right)+(1-\beta) f_{M_{N k}}^{(\mathrm{sb} 2)}\left(t_{D}\right) \tag{24}
\end{equation*}
$$

where $\beta$ and $(1-\beta)$ denote the percentages of error interruptions occurring during the transmission of $M_{N k}$ and $M$, respectively, which can be calculated by Eq. (16).

To calculate the delay time distribution of $M_{N k}$ in pattern II, the probability of each scenario appearing on the bus should be determined. The probability of scenario a in which message $M_{N k}$ wins the arbitration appearing on the bus is given by

$$
\begin{equation*}
\gamma=\sum_{t_{k} \leq t_{v}} \phi_{N}\left(t_{k}\right) \phi_{M}\left(t_{v}\right) \tag{25}
\end{equation*}
$$

where $\phi_{N}\left(t_{k}\right)=P\left\{t=t_{k}\right\}(k \in\{1,2, \ldots\})$ denotes the discrete time distribution of the time interval between $M_{P k}$ and $M_{N k}$, and $\phi_{M}\left(t_{v}\right)=P\left\{t=t_{v}\right\}(v \in$ $\{1,2, \ldots\}$ ) denotes the discrete time distribution of the time interval between $M_{P k}$ and $M . \phi_{N}\left(t_{k}\right)$ and $\phi_{M}\left(t_{v}\right)$ can be measured from practical observations when the bus is error-free. Therefore, the probability of scenario b in which message $M_{N k}$ loses the arbitration appearing on the bus is $(1-\gamma)$.

By combining scenarios a and b , the delay time PDF of $M_{N k}$ of pattern II can be obtained:

$$
\begin{equation*}
f_{M_{N k}}\left(t_{D}\right)=\gamma f_{M_{N k}}^{(\mathrm{sa})}\left(t_{D}\right)+(1-\gamma) f_{M_{N k}}^{(\mathrm{sb})}\left(t_{D}\right) \tag{26}
\end{equation*}
$$

### 4.3 Message delay analysis procedure for the master/slave configuration

Based on the two typical patterns presented above, the message delay time distribution for the master/slave configuration can be analyzed in the following steps:

Step 1: Classify the message of interest into the corresponding pattern (pattern I or II) and case (case A, B, or C of pattern I) based on its practical location in the message queue when the bus is error-free.

Step 2: Analyze the delay time distribution of the message of interest according to the corresponding pattern and case methods shown in Sections 4.1 and 4.2 , and obtain the delay time PDF of the message of interest.

Step 3: Based on the delay time PDF, the CDF of the delay time for the message of interest can be obtained by

$$
\begin{equation*}
F_{M_{N k}}(t)=\int_{-\infty}^{t} f_{M_{N k}}\left(t_{D}\right) \mathrm{d} t_{D} \tag{27}
\end{equation*}
$$

## 5 Testbed setup and case studies

To illustrate the procedure for analyzing message delay time distribution, a testbed was constructed and two case studies were conducted. In case study 1 , we analyzed the delay time distribution for the single slave node configuration, and compared it with practical observations. Furthermore, in case study 2 , we analyzed the delay time distribution of each response message for the master/slave configuration, and compared it with the corresponding practical observations.

### 5.1 Testbed setup

The schematic layout of the experimental setup is illustrated in Fig. 9, and the constructed testbed, which consists of three modules, for the case studies is shown in Fig. 10. The first module is the DeviceNet network, which uses CAN as its physical layer and the data link layer protocol. The communication mode was set as polling and the communication speed is $500 \mathrm{~kb} / \mathrm{s}$; thus, the
transmission time of one bit, i.e., $\tau_{\text {bit }}$, is $2 \mu$ s. Therefore, $t_{R}=24 \tau_{\text {bit }}=48 \mu \mathrm{~s}$, and $I_{m}=7 \tau_{\text {bit }}=14 \mu \mathrm{~s}$.

The second module is the in-house developed fault injection system. A controlled high-speed onoff switch was applied to generate the IC faults occurring on the cable. The arrivals of the IC fault follow a Poisson process with the arrival rate $\lambda_{\mathrm{IC}}$. The third module is the CAN-bus analyzer based data acquisition module, which records the data link layer information on the bus.


Fig. 9 Schematic layout of the testbed


IC fault injection
Fig. 10 Constructed testbed for case studies

### 5.2 Case study 1: delay time analysis for the single slave node configuration

In this case study, there are two nodes on the bus: the master device and the slave node with address 9. The drop cable of node 9 was set to experience the IC faults at an injection rate of $\lambda_{\text {IC }}=1000$ faults $/ \mathrm{s}$. The bit stream information of the response message sent from node 9 when the bus was error-free is shown in Fig. 11, where " 0 " and " 1 " represent the dominant bit and recessive bit, respectively, and the bit pointed to by the triangle is the stuffing bit.

Then the probability factor $\alpha_{i}$ can be obtained by Eq. (5), which is shown in Table 3.

Finally, the delay time cumulative distribution
of the response message sent from node 9 can be obtained, which is plotted by the solid line in Fig. 12. Moreover, the practical observation of the delay time is plotted by the dashed line in Fig. 12.


Fig. 11 Bit stream information of the message sent from node 9 when the bus is error-free

Table 3 Probability factor $\alpha_{i}$ for the single node case

| $\alpha_{1}$ | $\alpha_{2}$ | $\alpha_{3}$ | $\alpha_{4}$ | $\alpha_{5}$ | $\alpha_{6}$ | $\alpha_{7}$ | $\alpha_{8}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0.033 | 0.046 | 0.085 | 0.097 | 0.207 | 0.346 | 0.751 | 0.267 |



Fig. 12 Delay time comparison between the fitted value and practical observation in case study 1 (root mean square error: 0.0083; maximum absolute error: 0.0361 )

As can be seen from Fig. 12, the delay probability distribution calculated by the proposed method agrees well with the practical observation. The curve begins after $50 \mu \mathrm{~s}$ due to the existence of the bus recovery time, and becomes steep after $150 \mu$ s because most error interruptions occur near the CRC frame check field according to the CAN protocol.

### 5.3 Case study 2: delay time analysis for the master/slave configuration

In this case study, there are 10 nodes on the bus: the master device and the slave nodes whose addresses are from one to nine. The drop cable of node 5 was set to experience the IC faults at an injection rate of $\lambda_{\mathrm{IC}}=334$ faults $/ \mathrm{s}$. The physical
waveform graph of the messages transmitted on the bus is shown in Fig. 13, where the message queue from left to right is $\left(M_{P 1}, M_{P 2}, M_{N 1}, M_{P 3}, M_{N 2}\right.$, $M_{P 4}, M_{N 3}, M_{P 5}, M_{N 4}, M_{N 5}, M_{P 6}, M_{P 7}, M_{N 6}$, $\left.M_{P 8}, M_{N 7}, M_{P 9}, M_{N 8}, M_{N 9}\right)$.

As analyzed in Section 4.3, the detailed analysis procedure for the master/slave configuration is shown in the following subsections.

### 5.3.1 Step 1

Classify the response message sent from the slave node into its corresponding pattern (Fig. 13 and Table 4).

The Roman numerals represent types of patterns, and the capital letters in the brackets represent the different cases in pattern I.

### 5.3.2 Step 2

Calculate the delay time distribution of each response message according to the corresponding pattern's method shown in Table 2. For example, the pattern of message $M_{N 1}$ is case A of pattern I; thus, the delay time distribution of $M_{N 1}$ can be calculated using the procedure discussed in Section 4.1.1. The calculation of the delay time distribution for other response messages can be conducted in a similar way.

### 5.3.3 Step 3

Finally, the delay time cumulative distribution of response messages can be obtained. In this subsection, we take the delay time cumulative distribution


Fig. 13 Physical waveform graph of all the messages in case study 2
of three messages (i.e., $M_{N 1}, M_{N 2}$, and $M_{N 3}$ ) as examples.

Figs. 14-16 show the delay time distribution comparisons between the fitted result and practical observation for response messages $M_{N 1}, M_{N 2}$, and $M_{N 3}$, respectively.

As can be seen from Figs. 14-16, the delay time distributions calculated using the proposed method agree well with the practical observations. The calculated delay time distribution in Fig. 15 can be separated into two parts according to its trend: the first part is from $0 \mu$ s to around $200 \mu \mathrm{~s}$, and the


Fig. 14 Delay time comparison between the fitted value and practical observation for $M_{N 1}$ in case study 2 (root mean square error: 0.0073; maximum absolute error: 0.0308)


Fig. 15 Delay time comparison between the fitted value and practical observation for $M_{N 2}$ in case study 2 (root mean square error: 0.0140; maximum absolute error: 0.0423)

Table 4 Corresponding pattern of each message sent from the slave node in case study 2

| Message | $M_{N 1}$ | $M_{N 2}$ | $M_{N 3}$ | $M_{N 4}$ | $M_{N 5}$ | $M_{N 6}$ | $M_{N 7}$ | $M_{N 8}$ | $M_{N 9}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Pattern | I(A) | II | I(C) | II | I(A) | II | I(C) | II | I(A) |



Fig. 16 Delay time comparison between the fitted value and practical observation for $M_{N 3}$ in case study 2 (root mean square error: 0.0121; maximum absolute error: 0.0345)
other part is the rest until $300 \mu \mathrm{~s}$. These two parts are just the two scenarios decomposed in pattern II. Furthermore, the calculated delay time distribution in Fig. 16 can be separated into two parts according to its trend: the first part is from $0 \mu \mathrm{~s}$ to around $120 \mu \mathrm{~s}$, and the other part is the rest until $300 \mu \mathrm{~s}$. These two parts are just the two scenarios decomposed in case C of pattern I. The potential explanations for the slight gaps between the fitted result and practical observation around some time intervals in Figs. 14-16 are the unpredictability of the duration of the IC fault arrival, the variation of the sample point, and the different degrees of concentration of the dominant bits of a message.

In conclusion, the method proposed in this study is effective for calculating the delay time distributions of the response messages for both the single slave node configuration and the master/slave configuration.

## 6 Conclusions and future work

In this paper, a novel message delay time distribution analysis method under errors has been proposed for the CAN network. The arrivals of an IC fault were modeled by the Poisson process, and the probability factor was developed to describe the causal relationship between the IC fault arrivals and the error interruptions on the bus. Then the message delay time distribution considering errors for the single slave node configuration was analyzed in detail. After elaborately analyzing the delay time distribution for typical patterns and cases of the message
queues, the analysis framework of the message delay time distribution for the master/slave configuration considering both message blocking and error interruptions was proposed. A testbed was constructed and case studies were carried out to demonstrate and verify the delay time distribution analysis method. As shown in the case studies, the delay time distributions calculated using the proposed method agreed well with the actual observations for both single and master/slave configurations. Future work includes developing a systematic method to estimate delay distributions under complex stochastic errors and developing a model to describe the influence of the degree of concentration of the dominant bits on the probability factor.

## Compliance with ethics guidelines

Lei-ming ZHANG, Yi-chao SUN, and Yong LEI declare that they have no conflict of interest.

## References

Bosch R, 1991. CAN Specification Version 2.0. Technical Report, Rober Bousch GmbH. http://esd.cs.ucr.edu/webres/can20.pdf
Broster I, Burns A, Rodríguez-Navas G, 2005. Timing analysis of real-time communication under electromagnetic interference. Real-Time Syst, 30(1-2):55-81.
https://doi.org/10.1007/s11241-005-0504-z
Chen X, Liu LY, Lü WJ, et al., 2012. Modeling and analysis of response time of CAN bus based on queueing theory. J Tianjin Univ, 45(3):228-235 (in Chinese). https://doi.org/10.3969/j.issn.0493-2137.2012.03.007
Davis RI, Navet N, 2012. Controller area network (CAN) schedulability analysis for messages with arbitrary deadlines in FIFO and work-conserving queues. Proc $9^{\text {th }}$ IEEE Int Workshop on Factory Communication Systems, p.33-42. https://doi.org/10.1109/WFCS.2012.6242538
Davis RI, Burns A, Bril RJ, et al., 2007. Controller area network (CAN) schedulability analysis: refuted, revisited and revised. Real-Time Syst, 35(3):239-272. https://doi.org/10.1007/s11241-007-9012-7
Davis RI, Kollmann S, Pollex V, et al., 2011. Controller area network (CAN) schedulability analysis with FIFO queues. Proc $23^{\text {rd }}$ Euromicro Conf on Real-Time Systems, p.45-56. https://doi.org/10.1109/ECRTS.2011.13
Davis RI, Kollmann S, Pollex V, et al., 2013. Schedulability analysis for controller area network (CAN) with FIFO queues priority queues and gateways. Real-Time Syst, 49(1):73-116. https://doi.org/10.1007/s11241-012-9167-8
Farsi M, Ratcliff K, Barbosa M, 1999. An overview of controller area network. Comput Contr Eng J, 10(3):113120. https://doi.org/10.1049/cce:19990304

Hansson HA, Nolte T, Norström C, et al., 2002. Integrating reliability and timing analysis of CAN-based systems. IEEE Trans Ind Electron, 49(6):1240-1250.
https://doi.org/10.1109/TIE.2002.804970

Kumar M, Kumar A, Srividya VA, 2009. Response-time modeling of controller area network (CAN). In: Garg V, Wattenhofer R, Kothapalli K (Eds.), Distributed Computing and Networking, Springer Berlin Heidelberg, p.163-174. https://doi.org/10.1007/978-3-540-92295-7_20
Lei Y, Yuan Y, Zhao JZ, 2014. Model-based detection and monitoring of the intermittent connections for CAN networks. IEEE Trans Ind Electron, 61(6):2912-2921. https://doi.org/10.1109/TIE.2013.2272277
Lei Y, Xie H, Yuan Y, et al., 2015. Fault location for the intermittent connection problems on CAN networks. IEEE Trans Ind Electron, 62(11):7203-7213. https://doi.org/10.1109/TIE.2015.2442518
Mubeen S, Mäki-Turja J, Sjödin M, 2014. Extending worst case response-time analysis for mixed messages in controller area network with priority and FIFO queues. IEEE Access, 2:365-380. https://doi.org/10.1109/ACCESS.2014.2319255
Mubeen S, Mäki-Turja J, Sjödin M, 2015. Integrating mixed transmission and practical limitations with the worstcase response-time analysis for controller area network. J Syst Softw, 99:66-84. https://doi.org/10.1016/j.jss.2014.09.005
Navet N, Song YQ, Simonot F, 2000. Worst-case deadline failure probability in real-time applications distributed over controller area network. J Syst Archit, 46(7):607617. https://doi.org/10.1016/S1383-7621(99)00016-8

Sun YC, Yang F, Lei Y, 2015. Message response time distribution analysis for controller area network containing
errors. Chinese Automation Congress, p.1052-1057. https://doi.org/10.1109/CAC.2015.7382654
Tindell K, Burns A, 1994. Guaranteed Message Latencies for Distributed Safety-Critical Hard Real-Time Control Networks. Technical Report, Real-Time System Research Group, Department of Computer Science, University of York, England.
Tindell K, Burns A, Wellings AJ, 1994. An extendible approach for analyzing fixed priority hard real-time tasks. Real-Time Syst, 6(2):133-151. https://doi.org/10.1007/BF01088593
Tindell K, Burns A, Wellings AJ, 1995a. Analysis of hard real-time communications. Real-Time Syst, 9(2):147171. https://doi.org/10.1007/BF01088855

Tindell K, Burns A, Wellings AJ, 1995b. Calculating controller area network (CAN) message response times. Contr Eng Pract, 3(8):1163-1169. https://doi.org/10.1016/0967-0661(95)00112-8
Yomsi PM, Bertrand D, Navet N, et al., 2012. Controller area network (CAN): response time analysis with offsets. Proc $9^{\text {th }}$ IEEE Int Workshop on Factory Communication Systems, p.43-52.
https://doi.org/10.1109/WFCS.2012.6242539
Zeng HB, di Natale M, Giusto P, et al., 2010. Using statistical methods to compute the probability distribution of message response time in controller area network. IEEE Trans Ind Inform, 6(4):678-691.
https://doi.org/10.1109/TII.2010.2050143


[^0]:    $\ddagger$ Corresponding author

    * Project supported by the National Natural Science Foundation of China (Nos. 51475422 and 51521064)
    \# A preliminary version was presented at the Chinese Automation Congress, China, November 27-29, 2015
    (ㄷ) ORCID: Yong LEI, http://orcid.org/0000-0003-0235-5203
    (C)Zhejiang University and Springer-Verlag GmbH Germany, part of Springer Nature 2019

