In this paper we consider the problem of learning a nonlinear dynamical system model with multiple outputs y(t) and multiple inputs u(t) (when they exist). Generally this identification problem can be tackled using different model structures, with the class of linear models being arguably the most well studied in engineering, statistics and econometrics Barber (2012), Bishop (2006), Box et al. (2015), Ljung (1998), Söderström and Stoica (1988).
Linear models are often used even when the system is known to be nonlinear Enqvist (2005), Schoukens et al. (2016). However certain nonlinearities, such as saturations, cannot always be neglected. In such cases using block-oriented models is a popular approach to capture static nonlinearities (Giri & Bai, 2010). Recently, such models have been given semiparametric formulations and identified using machine learning methods, cf. Pillonetto (2013) and Pillonetto, Dinuzzo, Chen, De Nicolao, and Ljung (2014). To model nonlinear dynamics a common approach is to use Narmax models Billings (2013), Sjöberg et al. (1995).
In this paper we are interested in recursive identification methods (Ljung & Söderström, 1983). In cases where the model structure is linear in the parameters, recursive least-squares can be applied. For certain models with nonlinear parameters, the extended recursive least-squares has been used (Chen, 2004). Another popular approach is the recursive prediction error method which has been developed, e.g., for Wiener models, Hammerstein models, and polynomial state-space models Mattsson and Wigren (2016), Tayamon et al. (2012), Wigren (1993).
Nonparametric models are often based on weighted sums of the observed data (Roll, Nazin, & Ljung, 2005). The weights vary for each predicted output and the number of weights increases with each observed datapoint. The weights are typically obtained in a batch manner; in Bai and Liu (2007) and Bijl, van Wingerden, Schön, and Verhaegen (2015) they are computed recursively but must be recomputed for each new prediction of the output.
For many nonlinear systems, however, linear models work well as an initial approximation. The strategy in Paduart et al. (2010) exploits this fact by first finding the best linear approximation using a frequency domain approach. Then, starting from this approximation, a nonlinear polynomial state-space model is fitted by solving a nonconvex problem. This two-step method cannot be readily implemented recursively and it requires input signals with appropriate frequency domain properties.
In this paper, we start from a nominal model structure. This class can be based on insights about the system, e.g. that linear model structures can approximate a system around an operating point. Given a record of past outputs, y(t) and inputs u(t), that is, Dt≜{y(1),u(1),…,y(t),u(t)},a nominal model yields a predicted output y0(t+1) which differs from the output y(t+1). The resulting prediction error is denoted ε(t+1) (Ljung, 1999). By characterizing the nominal prediction errors in a data-driven manner, we aim to develop a refined predictor model of the system. Thus we integrate classic and data-driven system modeling approaches in a natural way.
The general model class and problem formulation are introduced in Section ۲. Then in Section ۳ we apply the principle of maximum likelihood to derive a statistically motivated learning criterion. In Section ۴ this nonconvex criterion is minimized using a majorization–minimization approach that gives rise to a convex user-parameter free method. We derive a computationally efficient recursive algorithm for solving the convex problem, which can be applied to large data sets as well as online learning scenarios. In Section ۵, we evaluate the proposed method using both synthetic and real data examples.
In a nutshell, the contribution of the paper is a modeling approach and identification method for nonlinear multiple input–multiple output systems that:
-
explicitly separates modeling based on application-specific insights from general data-driven modeling,
-
circumvents the choice of regularization parameters and initialization points,
-
learns parsimonious predictor models,
-
admits a computationally efficient implementation.
Notation: Ei,j denotes the ijth standard basis matrix. ⊗ and ⊙ denote the Kronecker and Hadamard products, respectively. vec(⋅) is the vectorization operation. ‖x‖۲, ‖x‖۱ and ‖X‖W=tr{X⊤WX}, where W≻۰, denote ℓ۲-, ℓ۱– and weighted norms, respectively. The Moore–Penrose pseudoinverse of X is denoted X†.