Recent works such as the S4 paper and the Hyena Heirarchy, have brought to light (to the ML world) the equivalence of linear state-space models and convolution models. Here, we explore the correspondence between state-sapce models and *autoregressive* models. This relationship can be understood via algebraic manipulations of a shift operator.

A linear discrete-time state-space model is represented by:

\begin{align} x(t+1) &= Ax(t) + Bu(t) \\ y(t) &= Cx(t) + Du(t) \end{align}

where $u\in\mathbb{R}^m$ is the input, $x\in\mathbb{R}^n$ is the state, and $y\in\mathbb{R}^p$ is the output. The matrices $A, B, C,$ and $D$ are of compatible dimensions.

An $n$-th order autoregressive model expresses the current output $y(t)$ as a function of the past $n$ outputs, and past-and-current inputs $u(t)$, formulated as:

\begin{align} y(t) = \sum_{i=1}^n \alpha_iy(t-i) + \sum_{j=0}^n \beta_ju(t-j) \end{align}

with $\alpha_i\in\mathbb{R}^{p\times p}$ and $\beta_j\in\mathbb{R}^{p\times m}$. Note that the indices $i$ and $j$ range differently.

This is sometimes called an **ARX** model (autoregressive with external inputs).

The goal is to convert state-space equations to ARX equations.

The shift operator $q$ advances a signal by one time step, so $qx(t) = x(t+1)$. Using $q$ as a variable, the state-space equations become:

\begin{align} x(t) &= (qI-A)^{-1}Bu(t) \end{align}

The output equation is:

\begin{align} y(t) &= \left(C(qI-A)^{-1}B + D \right)u(t) \\ y(t) &= H(q)u(t) \end{align}

where $H(q)$ is the **transfer function**. This derivation is valid if $(qI-A)^{-1}$ (an operator) is boundedly invertible and if $u(t)$ is bounded for all $t$.

The transfer function $H(q)$ includes an inverse term $(qI-A)^{-1}$. A matrix inverse is a ratio of its adjugate and determinant. By utilizing this property we have,

\begin{equation} H(q) = \frac{C\operatorname{adj}(qI-A)B + D\operatorname{det}(qI-A)}{\operatorname{det}(qI-A)} \end{equation}

By applying the adjugate and determinant formulas, each entry of the matrix $H(q)$ can be shown to be a quotient of polynomials of $q$. The degree of each polynomial is at most $n$ and the degree of the numerator is less than or equal to the degree of the denominator (a proper rational function of $q$).

Each entry of $H(q)$ shares the same denominator but different numerators. For notational simplicity we assume $p=m=1$ (single-input, single-output). We can now re-arrange and apply higher-order shifts $q^i$s to obtain the autoregressive equation as:

\begin{align} y(t+n) = \sum_{i=0}^n f_iu(t+i) - \sum_{j=1}^{n-1}g_jy(t+j) \end{align}

The coefficients $f_i,g_j\in\mathbb{R}$ can be expressed in terms of the state-space parameters $A,B,C,D$ (again, via application of the adjugate and determinant formulas).

Converting an autoregressive model to a state-space model is called the **realization problem**. Since we can apply similarity transforms to the state without affecting input-output behavior we know that there are infinitely many realizations of a given autoregressive model. One can choose a canonical realization.

The discrete-time analog of the Laplace transform is the Z-transform, which converts a time-domain signal to a function in the complex number domain. The Z-transform of a shifted signal is represented as $\mathcal{Z}(x(t+1)) = zX(z)$. The entire derivation process above can be redone with $z$ in place of $q$, yielding $Y(z) = H(z)U(z)$, where $H(z)$ is also the transfer function.

[1] Verhaegen, M., & Verdult, V. (2007). Filtering and system identification: A least squares approach. Cambridge University Press.