Theory--non-parametric system identification

\(\newcommand{\Gz}{G_z}\) \(\newcommand{\Gs}{\ensuremath{G_s}\xspace}\) \(\newcommand{\w}{\omega}\) \(\newcommand{\ejw}{\ensuremath{e^{j\omega_o t}}}\) \(\newcommand{\Hjw}{\ensuremath{H(j\omega_o)}}\) \(\newcommand{\Gjw}{\ensuremath{G(j\omega_o)}}\) \(\newcommand{\F}{\ensuremath{\mathcal{F}}}\) \(\newcommand{\wo}{\ensuremath{\omega_o}}\)

System identification is a huge topic. What is described here is necessarily limited, and is more an exposition of what I do for the AFM than it is a general overview. For the kind of control that I want to do, I think of system identification in two parts:

  1. Non-parametric system identification. For my purposes, this means obtaining a non-parametric frequency response function. Non-parametric means that we have a description of the transfer function from input to ouput of our system that is a list of a data points, rather than being, e.g., a \(z\) domain transfer function like we often think of it. While control designs can certainly be made using an NP-FRF, we are limited in what we can do.
  2. Parametric system identification. In our context, this means fitting a parametric function to the data that describes the NP-FRF. The function we fit will be, e.g., transfer function parameters, or parameters in a state space model. This will allow us to throw the full force of control theory behind our designs.

Two Fundamental Ideas

Non-parametric frequency response identification for linear systems leverages two crucial facts.

  1. Any reasonable signal \(y(t)\) which is \(T\)-periodic (meaning \(y(t) = y(t+T)\)) can be expressed as a linear combination of the complex exponentials \(\varphi_k(\w_o) = e^{j\w_okt}\). Moreover, the \(\varphi_k(\w_o)\) form an orthogonal basis over \(\mathbb{T}_{T_o}\).
  1. For a linear system with impulse response \(h(t)\), \(\varphi_k\) is an eigenfunction. The eigenvalue associated with \(\varphi_k(\w_o)\) is the systems frequency response at \(\w_o\), \(H(j\w_o)\).

It is rather involved to prove that the \(\varphi_k(\w_o)\) form a basis for \(\mathbb{T}_T\) 1. It is, however, pretty easy to show that they are orthogonal, by which we mean that \(<\varphi_k, \varphi_{\ell}\rangle = \delta_{k\ell}\). Or, explicitly using the inner product over \(\mathbb{T}_T\), we have

\begin{align} \langle \varphi_k, \varphi_{\ell}\rangle &= \frac{1}{T}\int_{T} e^{-j\w_okt}e^{j\w_o\ell t} dt\\ &= \frac{1}{T}\int_{T} e^{j\w_ot (\ell - k)} dt\\ &= \begin{cases} 1 & k=\ell\\ 0 & k\neq\ell. \end{cases} \end{align}

Now, let's first think about the idea that \(\varphi_k(\w_o)\) is an eigenfunction of \(h(t)\). If we drive \(h(t)\) with a complex exponential of frequency \(\w_o\), the output is given by the convolution

\begin{align} h(t)*e^{j\w_ot} &=\int_{\mathbb{R}} h(\tau)e^{j\w_o(t-\tau)}d\tau\\ &= H(j\w_o)e^{j\w_ot}. \end{align}

Since \(h(t)\) is a linear system, and convolution is a linear operator, we can represent this as the linear mapping \(\mathcal{A}:~\varphi_k\mapsto \Hjw \varphi_k\) or

\begin{equation} \mathcal{A}e^{j\w_ot} = H(j\w_o)e^{j\w_ot}. \end{equation}

In system identification, our goal to to find \(H(j\w_o)\). To do this, we experimentally let our system operate on the eigenfunction \(e^{j\w_ot}\) and try to determine the eigenvalue \(H(j\w_o)\). Of course, we can't measure \(H(j\w_o)\) directly. What we measure instead is the system output

\begin{equation} y(t) = e^{j\w_ot} H(j\omega).\label{eqn:out1} \end{equation}

How do we get \(\Hjw\) from \eqref{eqn:out1}? We recognize that \(\ejw\) is a member of the Fourier basis. It follows that the only non-zero coefficient in the Fourier series expansion of \(y(t)\) is \(y_1=\Hjw\). In other words,

\begin{align} y(t) & = \sum_{k=-\infty}^{\infty} y_k e^{j\w_okt} \\ y_k= &\begin{cases} H(\pm j\w_o), & k=\pm1\\ 0, & \text{else} \end{cases} \end{align}

So our job is calculate the first Fourier coefficient of \(y(t)\), which is given by

\begin{align} y_1 = \langle e^{j\w_o 1 t}, y(t)\rangle = \frac{1}{T}\int_{T} e^{-j\w_o t } y(t)dt.\label{eqn:calcy1} \end{align}

Of course, in practice, we only measure \(y(t)\) at discrete points, so we will approximate \eqref{eqn:calcy1} with a sum, which we will discuss further in Section No description for this link.

Let's take another look at \eqref{eqn:out1}. Pretend for a minute that these are finite dimensional vectors. To aide in our suspension of disbelief, call \(y(t)\), \(y\), and \(\Hjw\) as \(\lambda\) and \(\ejw\) as \(v\). Then \eqref{eqn:out1} looks like

\begin{equation} y = \lambda v. \end{equation}

We want to solve for \(\lambda\). We know how to do that, we just multiply both sides by \(v^T\).

\begin{align} v^Ty &= v^Tv \lambda\\ \lambda &=\frac{1}{v^Tv}v^Ty. \end{align}

Or in terms of the fancy inner product,

\begin{equation} \lambda = \frac{1}{\langle v, v \rangle} \langle v, y \rangle. \end{equation}

This is exactly what we did with the Fourier integral in \eqref{eqn:calcy1}. It just so happens that because our inner product on \(\T_T\) is normalized by \(1/T\), that \(\langle v,v\rangle = 1\), (the \(e^{j\w k t}\) are an ortho*/normal*/ basis).

Now, let's complicate things slightly. Suppose that we change the amplitude and phase of our input signal from \(\ejw\) to \(M e^{j(\w t - \phi}\), \ie, change with an amplitude and phase shift. This is just multiplication by a complex a number

\begin{equation} u(t) = M e^{j(\w t - \phi)} = (M e^{j\phi}) \ejw \end{equation}

Now, because \(\mathcal{A} = h(t) * u(t)\) is linear, we will find that

\begin{align} h(t)*u(t) & = (M e^{j\phi}) \Hjw \ejw\\ & = y(t)\\. \end{align}

Following the same argument as before, we see that

\begin{align} y(t) &= y_1\ejw\\ y_1 &= \langle\ejw, y(t)\rangle\\ & = (ME^{j\phi}) \Hjw. \end{align}

In practice, we may not know precisely what \(Me^{j\phi}\) is, especially when we are doing things in closed loop. But we do know that

\begin{align} u(t) & = \sum_{k=-\infty}^{\infty} u_k e^{j\w_okt} \\ & u_1 = Me^{j\phi}\\ & u_{k\neq 1} = 0. \end{align}

Thus \(y_1 = u_1 \Hjw\), so that

\begin{equation} \Hjw = \frac{y_1}{u_1} = \frac{ \langle\ejw, y(t)\rangle}{\langle\ejw, u(t)\rangle}. \end{equation}

What about noise

Up to this point, we have thought about a perfectly linear, time-invariant system with perfect measurements. Consequently, our calculations have been exact. Suppose instead we have measurement noise, which we express as

\begin{equation} y(t) = h(t)*u(t) + \eta(t), \end{equation}

where \(\eta(t)\) is the noise.

noisy_periodic_extension.svg
Figure 1: Noisy sinusoidal with periodic extension. We have collected the blue data and, for the sake of the Fourier Integral, pretend that this data is periodic (orange).

Of course, it is unlikely that \(\eta(t)\) will also be \(T\)-periodic. Nonetheless, we can pretend that it is, making use of what's known as a periodic extension (or think of working over \(L^2_{[0,T_o]}\), for which the \(\varphi_k\) are still a perfectly valid basis [1] instead of on the torus \(\mathbb{T}_{T_o}\).). The idea is illustrated in Fig. 1. The blue curve in that figure is a sinusoid with additive noise. This is the data we collect from our ADC. We want to estimate the amplitude and phase of that data using the method we developed above. So, to satisfy Mr. Fourier and our need for \(T\)-periodic signals, we pretend that if we had measured the previous period and the next period we would have gotten the exact same data.

Our goal now is to find the best estimate of \(y(t)\) projected onto the one dimensional subspace spanned by the the Fourier vector \(\ejw\). I will call this subspace \(\F_1\). What lives in that subspace? Any sinusoid with natural frequency \(\w_o\). We can scale that basis vector by any complex number and it will still be a member of \(\F_1\). This is the same thing as noting that, in \(\mathbb{R}^2\), \([1, ~0]^T2\) still lies on the \(x\)-axis. Scaling \(\ejw\) by a complex number means we can change both it's amplitude and phase, but not its frequency.

Our goal is to find the complex number which scales \(\ejw\) to be as close as possible to \(y(t)\). In other words, we want to solve

\begin{equation} \min_{\Hjw} || y(t) - \Hjw\ejw||.\label{eqn:fopt} \end{equation}

Hopefully, this looks remarkably similar to a least squares problem to you.

I make the following claim:

Suppose (for simplicity) that \(u(t) = \ejw\). Then

\begin{equation} \Hjw = \langle\ejw, y(t)\rangle \end{equation}

solves \eqref{eqn:fopt}.

Let's prove that claim. Let \(y^* = \Hjw\ejw\) and let \(\hat y\) be any other vector in \(\F_1\). Then

\begin{align} ||y - \hat y||^2 & = || y - \hat y + y^* - y^*||^2\label{eqn:optpf1}\\ & = \langle (y - y^*) + (y^* - \hat y), (y - y^*) + (y^* - \hat y)\rangle\label{eqn:optpf2}\\ & = ||y - y^*||^2 + 2Re\langle y-y^*, y^* - \hat y\rangle + ||y^* - \hat y|| \label{eqn:optpf3}\\ & \geq ||y - y^*||^2 + 2Re\langle y-y^*, y^* - \hat y\rangle. \label{eqn:optpf4} \end{align}

The first line follows because \(y^*-y^* = 0\). The second line is an expansion of the norm into the inner product. The third lines follows from the sub-additivity of the inner product and because \(\langle a, b\rangle = \overline{\langle b, a\rangle}\). And the fourth line follows because \( ||y^* - \hat{y}|| \geq 0\). Now, all we have to do is show that the second term of \eqref{eqn:optpf4} is always zero.

Why should that be the case? Because both \(y^*, ~\hat y \in \F_1\), then \(z=y^* - \hat y \in \F_1\). We also see from the expansion of \(y(t)\) in the Fourier basis, and our calculation of \(y^*\), that

\begin{align} y(t) - y^* &= \left(\sum_{k=-\infty}^{\infty} y_k e^{j\w k t} \right) - \Hjw\ejw\\ & = \sum_{k=-\infty, k\neq 1}^{\infty} y_k e^{j\w k t}. \end{align}

If we want to be pedantic, we write

\begin{align} \langle y-y^*, z\rangle &= \lim_{N\rightarrow \infty} \sum_{k=-N, k\neq1}^{N}\langle y_k e^{j\w k t}, z\rangle\\ &= 0 \quad \forall N\neq k \end{align}

Thus, \((y(t) - y^*) \perp (y^* - \hat y)\), which proves the result.

Will we recover \(\Hjw\) exactly with this procedure? In general, no. It is unlikely that the first Fourier coefficient of \(\eta\) is zero, so that will be included in our estimate. We will discuss two ways to mitigate this affect in Sections No description for this link.

Practical Calculation

Given a sequence of measured input/output data, we must approximately compute Fourier integral. The two most straightforward methods are to approximate the integral with a square rule or a trapezoid rule. I will show here that if we have a \(T\)-periodic signal sampled at regular intervals \(T_s\), and if \(T/T_s \in \mathbb{N}\), then the trapezoid approximation rule is exactly equal to a square approximation. First, considering integrating a signal, not necessarily periodic, from \(t=0\) to \(t=T\). This is illustrated in Figure 2. Assume the signal has been sampled with sampling interval \(T_s\) and that \(T/T_s\) is an integer.

trapz_vs_square.svg
Figure 2: Illustration of square vs trapezoidal approximation for a non-periodic signal.

Approximating the integral with a square rule requires \(N\) samples while the trapazoidal approximation requires requires \(N+1\) samples. Then the trapezoid approximation is given by

\begin{align} \int_0^T y(t) dt &\approx \sum_{k=1}^{N} \frac{T_S}{2} (y_k + y_{k-1})\nonumber\\ &= \frac{T_s}{2}\sum_{k=0}^N y_k + \frac{T_s}{2} \sum_{\ell=0}^{N}y_{\ell} - \frac{1}{2}(y_0+y_N)\nonumber\\ &= T_S\sum_{k=0}^N y_k - \frac{T_s}{2}(y_0+y_N) \label{eqn:trapzsimp} \end{align}

Now, suppose that \(y(t)\) is periodic with \(T\), so that for all \(t\), \(y(t) = y(t+T)\), and in particular, \(y(0)=y(T)\). Then \eqref{eqn:trapzsimp} becomes

\begin{align*} T_S\sum_{k=0}^N y_k - \frac{T_s}{2}(y_0+y_N) &=T_s \sum_{k=0}^{N-1} y_k + T_sy_N - \frac{T_s}{2}(y_N+y_N)\\ &= T_s\sum_{k=0}^{N-1} y_k. \end{align*}
trapz_vs_square_periodic.svg
Figure 3: Illustration of square vs trapezoidal approximation for a periodic signal.

This is precisely the sum we would have if we approximated the integral with a square rule instead of a trapezoid rule. Putting this together with the Fourier integral, we have

\begin{align} y_k &= \frac{1}{T} \int_0^T e^{-j\w k t}y(t) dt \approx \frac{T_s}{T}\sum_{\ell=0}^{N-1} e^{-j\w k T_s\ell}y_k\nonumber\\ &=\frac{1}{N}\sum_{\ell=0}^{N-1} e^{-j\w k T_s\ell}y_k.\label{eqn:dft_omega} \end{align}

Unsurprisingly, this is precisely the definition of the Discrete Fourier Transform (DFT). The form of \eqref{eqn:dft_omega} can be put into the standard form of the DFT by noting that \(\w T_s = \frac{2\pi}{NT_s}T_s\), which gives

\begin{align} y_k &=\frac{1}{N}\sum_{\ell=0}^{N-1} e^{\frac{-j 2\pi k}{N} \ell}y_k.\label{eqn:dft_normal} \end{align}

It is rather fascinating that, for an evenly sampled, strictly periodic signal y(t), the sum \eqref{eqn:dft_normal} will exactly yield the integral expression REF for the first \(\pm N/2\) coefficients.

The DFT from a linear algebra perspective

It is pretty easy to derive the DFT strictly from a linear algebra perspective. Previously, we talked about a signal \(y(t)\). Now, we think of a vector \(y\in \mathbb{C}^N\). If you like, this is the sampled signal \(y(T_sk)\). We claim that the set of vectors

\begin{equation} \{v_k\}_{k=0}^{N-1} = \begin{bmatrix} e^{jk\w_oTs0}\\ e^{jk\w_oTs1}\\ \vdots\\ e^{jk\w_oTs(N-1)} \end{bmatrix} = \begin{bmatrix} e^{\frac{j2\pi k}{N}0}\\ e^{\frac{j2\pi k}{N}1}\\ \vdots\\ e^{\frac{j2\pi k}{N}(N-1)}\\ \end{bmatrix} \end{equation}

form a basis for \(\mathbb{C}^N\). In constrast to the infinite dimension case (the continous time Fourier Series) because we are now working in finite dimensional space, it is sufficient to show that \(v_k\perp v_{\ell},~k\neq\ell\) to show that the \(\{v_k\}\) form a basis. We proceed with direct calculation:

\begin{align} \langle v_k,~v_{\ell}\rangle &= \sum_{n=0}^{N-1} e^{\frac{-j2\pi k}{N}n} e^{\frac{j2\pi \ell}{N} n}\\ &= \sum_{n=0}^{N-1} e^{\frac{j2\pi (\ell - k)}{N}n}. \end{align}

It should be clear that if \(k=\ell\), then the exponential becomes one, so that the sum is equal \(N\). If \(k\neq\ell\), then we use the geometric sum to find that

\begin{align} \sum_{n=0}^{N-1} e^{\frac{j2\pi (\ell - k)}{N}n} = \frac{1 - e^{j2\pi (\ell - k)}} {1 - e^{\frac{j2\pi (\ell - k)}{N}}} = 0 \end{align}

Thus, the \(v_k\) form a basis for \(\mathbb{C}^N\). Given a vector \(y\in\mathbb{C}^N\), we can represent it in this basis as the linear combination of \(v_k\)

\begin{equation} y = \sum_{k=0}^{N-1} Y_kv_k, \end{equation}

where the coefficients \(Y_k\) are found the by projecting \(y\) onto each basis vector

\begin{equation} Y_k = \langle v_k,~ y\rangle. \end{equation}

In other words

\begin{equation} \begin{bmatrix} Y_0\\Y_1\\\vdots\\Y_{N-1} \end{bmatrix} = \begin{bmatrix} ---& v_0^*& ---\\ ---&v;_1^*&---\\ &\vdots&\\ ---&v;_{N-1}^*&--- \end{bmatrix} y \end{equation}

This is the "SFT" (Slow Fourier Transform) of \(y\). The FFT (Fast Fourier Transform) just does this same thing with high computational efficiency.

Dealing with the non-ideal case in practice}

In general, when we hit our (supposedly) linear system \(g(\cdot)\) with an eigenvector \(\ejw\), we will not get back a perfect \(\Hjw\ejw\). Of the issues causing this are

  1. Transients. The exponentials \(\ejw\) are only eigenvectors if they are applied for all time (starting infinitely long ago).
  2. Noise. We will always collect a signal \(z(t) = y(t) + \eta(t)\), where \(eta(t)\) stems from some random process.
  3. Non-linearity. Our system will never be perfectly linear. In the frequency response, we expect this to show up as non-trivial amounts of energy in specific higher harmonics, i.e., \(e^{j\omega n t}\).

We consider each item in turn.

Transients

The exponential \(\ejw\) is only an eigenvector of \(g(\cdot)\) if it is applied from \(t=-\infty\) onward for all time. Such a situation is not practical. Instead, we must apply \(\ejw\) for long enough that the transients have died out. For example, applying the sinusoid for, say, 10 times the slowest time constant of the system is a good rule of thumb. We should only use data after this ``settling'' period in our Fourier calculations.

Noise

Even with a perfectly linear system and no transients left, we will always have at least some measurement noise. There are at least two ways to do deal with this. The first is to integrate over more than one period. In other words, we change from \eqref{eqn:calcy1} to

\begin{equation} \frac{1}{MT_o}\int_0^{T_0}e^{-j\omega_o t}y(t)dt, \end{equation}

where \(M\) is some integer.

The other method, broadly speaking, is to average the Fourier coefficients. The standard way to this is with the auto power spectra and cross power spectra and calculates \(G(j\omega_o)\) as

\begin{equation} G(j\omega_o) = \frac{P_{yu^*}(j\omega_o)}{P_{uu^*}(j \omega_o)}. \label{eqn:GPyuPuu} \end{equation}

Suppose we have collected \(M\) periods of frequency \(\omega_o\) and have computed the Fourier coefficients for \(y(t)\) and \(u(t)\) for each period, so that we have \(M\) coefficients. Label these coefficients as \(Y_k^m(j\omega_O)\) and \(U_k^m(j\omega_O)\). The superscript \(m\) runs from \(m=1\dots M\) and index over our \(M\) periods. For this discussion, we are primarily interested in \(k=1\), the first (fundamental) Fourier coefficient. Then we have

\begin{align} P_{yu^*} &= E[Y(j\omega_o)U^*(j\omega_o)] \approx \frac{1}{M}\sum_{m=1}^M Y_1^m(j\omega_O)\overline{U_1^m(j\omega_O)}\\ P_{uu^*} &= E[U(j\omega_o)U^*(j\omega_o)] \approx \frac{1}{M}\sum_{m=1}^M U_1^m(j\omega_O)\overline{U_1^m(j\omega_O)} \end{align}

where the bar indicates complex conjugation and \(E[\cdot,\cdot]\) is the expectation. In the following, we assume that the input is noise free and that only the measured output \(y(t)\) is corrupted by noise. For a more comprehensive development, see [2]. We have then

\begin{equation} Y(j\wo) = \Gjw U(j\wo) + N(j\wo),\label{eqn:Yjw_N} \end{equation}

which, if we perform the naive calculation, yields,

\begin{equation} \Gjw =\frac{Y(j\wo)}{ U(j\wo)} - \frac{N(j\wo)}{U(j\wo)}. \end{equation}

Consider instead multiplying \eqref{eqn:Yjw_N} by \(U^*\) and taking the expectation. We obtain

\begin{align} E[Y(j\wo)U^*(j\wo)] &= E\left[\left(\Gjw U(j\wo) + N(j\wo)\right)U^*(j\wo)\right]\\ &= \Gjw~E[U(j\wo)U^*(j\wo)] + E[N(j\wo)U^*(j\wo)]\\ &= \Gjw~E[U(j\wo)U^*(j\wo)] \end{align}

where the second line follows from the linearity of expectation and the second because the measurement noise and input are un-correlated. Thus,

\begin{equation} \Gjw = \frac{E[Y(j\wo)U(j\wo)^*]}{E[U(j\wo)U(j\wo)^*]} \end{equation}

What's the Practical Difference?

So what is the difference in practice between integrating over \(M\) periods versus using the cross and auto spectra? Let us investigate. Consider here a simple plant given by

\begin{equation} G(z) = \frac{4T_{s}}{z -0.9995} \end{equation}

with a sampling period \(T_{s} = 0.001\).

There are several things we might compare here. The most obvious is to see which method gives a better estimate to \(\Gjw\) when we add some Guassian noise to the measurement.

M_vs_syy.svg
Figure 4: Comparison of integrating over \(M\) periods versus estimating the mean of \(S_{yy}\) and \(S_{yu}\) with \(M\) samples.

The left plot illustrates the difference between the two schemes when there is no noise present for \(M\) between 2 and 200. The (y)-axis is the \(\log_{10}\) of the norm. There integration over all \(M\) periods evidently suffers from accumulation error while the other method largely buffers us from this. Either way, the errors are small: on the order of \(10^{-13}\), which is, at least in this case, fairly negligible.

The right plot in Figure 4 compares this when the additive measurement noise has a standard deviation \(\sigma=0.2\) for different values of \(M\). It is important that all of the random noise for each trial is taken as a slice from the same random vector in these simulations.

My conclusion from this is that there is little if any benefit to be gained in the extra complexity of calculating \(S_{yy}\) and \(S_{yu}\) as it pertains to the accuracy of estimating \(\Gjw\). One advantage to the power spectra method is that it allows us to compute things like the coherence and other statistics.

Non-linearities

Of other methods, the stepped sines method deals with non-linearities in our system more gracefully than competing frequency response identification methods.

Biblography

References

[1] E. Kreyszig, Introductory Functional Analysis with Applications. Hoboken, NJ: John Wiley and Sons, 1989. [ bib ]
[2] K. Godfrey, Perturbation Signals for System Identification. Hertfordshire, UK: Prentice Hall, 1993. [ bib ]
[3] J. K. Hunter and B. Nachtergaele, Applied Analysis. Hackensack, NJ: World Scientific Publishing Co., 2001. [ bib ]

Footnotes:

1

When we say that the \(\varphi_k\) form a basis, we do not mean that \(y(t)=\sum_{k\in\mathbb{Z}} \varphi_k y_k\) pointwise. What we mean is that

\begin{equation} \lim_{N\rightarrow\infty} ||y(t) - \sum_{k=-N}^N\varphi_k y_k||_{\mathbb{T}} = 0. \end{equation}

This is implies that the signal \(y(t)\) and its representation in the Fourier basis can be different at a countable number of points, since integration over those points will be zero. If you are brave, see [3] or [1] for details.