ORIGINAL ARTICLE Year : 2022  Volume : 9  Issue : 3  Page : 96106 Design of magnetoencephalographybased brain–machine interface control methodology through timevarying cortical neural connectivity and extreme learning machine Caglar Uyulan Department of Mechanical Engineering, Faculty of Engineering and Architecture, İzmir Katip Celebi University, İzmir, Turkey Correspondence Address: Introduction: Humanmachine interfaces (HMIs) can improve the quality of life for physically disabled users. This study proposes a noninvasive BMI design methodology to control a robot arm using MEG signals acquired during the user's imagined wrist movements in four directions. Methods: The BMI uses the partial directed coherence measure and a timevarying multivariate adaptive autoregressive model to extract taskdependent features for mental task discrimination. An extreme learning machine is used to generate a model with the extracted features, which is used to control the robot arm for rehabilitation or assistance tasks for motorimpaired individuals. Results: The classification results show that the proposed BMI methodology is a feasible solution with good performance and fast learning speed. Discussion: The proposed BMI methodology is a promising solution for rehabilitation or assistance systems for motorimpaired individuals. The BMI provides satisfactory classification performance at a fast learning speed.
Introduction Brain–machine interfaces (BMIs) are built as communication devices that encode brain activity to a machine command signal, not involving muscles. The utilization of BMIs is quite helpful for users having diseases or traumatic injuries which cause muscle control degradation or motor disabilities (termed as a lockedin syndrome). The lockedin syndromes may comprise a wide spectrum of diseases and traumata, i.e., amyotrophic lateral sclerosis, cerebral palsy, muscular dystrophy, multiple sclerosis, brainstem stroke, and brain or spinal cord injury. Practical applications of brain–computer interfaces (BCIs) cover electrophysiological signals, such as electroencephalography (EEG), electrocorticography, and MEG. BCIs generally control a robotic end system via these aforementioned signals recorded from the synchronized activity of neuron groups inside the brain of subjects. The subjects can be people having different ages, gender, and physical characteristics. The aim here is to establish a userindependent, generic model of the BMI and to search for robustadaptive algorithms that minimize the variance of classification errors concerning the various kinds of users. The combination of “feature extraction/dimension reduction/feature selection/classification” algorithms with the best classification performance and the least possible error variance concerning users becomes the main model. As a result of this main model obtained, i.e., the robot trajectory control can be realized. The strong hypothesis (model) that is generated, is expected to guarantee the mental task classification of random users by limiting their errors to a lower band. As a result of this research, the BMI can be brought to a level that competes with commercial applications. The design procedure is divided into three basic modules: data acquisition, signal processing, and machine control. The majority of brain waves along the scalp are collected, and the wave information is transferred to a central processing unit (CPU) for processing in the data acquisition process. The CPU filters the signals and runs algorithms that extract features to control the machine. MEG can decrease training time while increasing the robustness of a BMI. As compared to the EEG, MEG has a larger number of sensors. Therefore, the spatial resolution is more. The detection of the frequency information can be above 40 Hz, which is not capable in EEG recordings.[1],[2] BMI architecture to be designed: (1) should adapt to nonstationary brain dynamics, (2) should generate neural signals from general brain states independently of the users, (3) should have a generalizable structure that will allow an easy transition to control applications, and (4) should demonstrate high classification performance.[3],[4],[5],[6] EEGbased BCIs suffer the problem that the recorded signal is nonstationary, that is, the data are subject to covariate shift. The nonstationarity can occur from the transition of the training model without feedback to online use with feedback, the shift of sensors to different brain regions due to head movements in MEG, or variations in mental status over time. These nonstationary cases are valid for a classifier trained on training data showing this mismatch.[7],[8] Since the MEG is a nonstationary signal, many feature extraction methods that originated from timedomain, frequencydomain, and timefrequency domain analysis, are utilized in the literature.[9],[10],[11],[12] Timedomain features are extracted directly from the signal and focus on the (averaged) time course. Frequencydomain features highlight the signal power in frequency band components. Timefrequency domain features analyze how the power spectrum changes with respect to time. The shorttime Fourier transform and the wavelet transform are the most applicable.[13],[14],[15],[16] Spatial filtering, which extracts signals from multiple sensors to look at the activity localized in a particular brain region, is also applied. Some of the common spatial filtering methods can be listed as: bipolar montage, where bipolar channels are evaluated by subtracting the signals from two collocated electrodes;[17] common average reference, which subtracts the average value of the full electrode montage from that of the specific channel;[18] Laplacian method, which evaluates for each electrode location the second derivative of the instantaneous spatial voltage distribution by combining the value at that location with the values of a set of surrounding electrodes;[19] and common spatial patterns, which analyzes multichannel data based on savings from two tasks. The filters are utilized to optimize the variance for one task and minimize it for the other tasks.[20] Other techniques have also been utilized in electrophysiological feature extraction, such as the Kalman filter, fractal dimensions, and entropy.[21],[22],[23] The AAR algorithm based on the Kalman filtering approach has also been applied to analyze the electrophysiological signals. The parametric model has an adaptive closedloop controller with a recursive Bayesian estimator and a linearsquare regulator. It estimates states and system parameters from different mental tasks and feeds them back to the optimal controller. The performance of BMIs is enhanced by closedloop solver adaptation or multiplicative recurrent artificial neural network (ANN) solver coupled with KF. Thanks to this solver, the learned training datasets become more robust and provide a wide variety of neuralkinematic mapping learning.[24],[25],[26] Coming to the classification stage, the use of ANNs is quite common. However, the learning rates of feedforward ANNs, including deep learning applications, are quite low. Among the main reasons for this are the use of slow gradientbased learning algorithms in the training of ANNs and the iterative updating of all parameters of the networks in this way at each stage. ELMs have succeeded in solving this problem by proving that hidden nodes do not necessitate learning and are recursively set. ELM consists of generalized single or several hidden layer feedforward networks. ELMs give importance to feature representations in hidden layers compared to support vector machines (SVMs). These models can demonstrate good generalization performance and learn extremely faster than backpropagation networks. Random input layer weights add to the generalization capabilities of a linear output layer solution as it has nearly orthogonal (weakly correlated) hidden layer properties. If the weight range is constrained, the orthogonal inputs yield a larger solution space volume with these constrained weights. Thanks to the smallweight norms, the system is more stable and robust to noise. The random hidden layer generates weakly correlated hidden layer features, proposing a solution with a low norm and strong generalization performance. Various types of ELM have been proposed in the literature.[27],[28],[29] To learn effectively from various data types, ELMs need to be modified to suit the problem. When new samples are added to the dataset and grown, the ELM is retrained, but retraining the network is inefficient because the proportion of new incoming data is small. Therefore, Matias et al.[30] proposed an online sequential ELM (OSELM). The basic idea behind OSELM is to avoid retraining on previous examples using a sequential method. OSELM reconstructs settings using new instances sequentially after startup. Huang et al.[31] developed an incremental ELM (IELM). When a new hidden node is introduced, IELM randomly adds nodes to the hidden layer one by one, freezing the output weights of the existing hidden nodes. IELM is effective for singlelayer feedforward networks (SLFNs) with piecewise continuous activation functions. Rong et al.[32] proposed a pruned ELM (PELM) algorithm as a systematic and automated strategy for building ELM networks. Using a small number of hidden nodes can cause under/overfitting problems in model classification. Compared to traditional ELM, simulation results showed that PELM results in compact network classifiers that produce fast responses and strong prediction accuracy on unseen data. Feng et al.[33] proposed an error minimizationbased method (EMELM) that determines the number of hidden nodes in generalized SLFNs by growing hidden nodes one by one or group by group. As the networks grow, the output weights are gradually changed, significantly reducing the computational burden. The simulation results in sigmoidtype hidden nodes showed that this method can greatly decrease the computational cost of ELM. Nodes are randomly added to the network until they reach a certain error value of ε. A new learning algorithm called an evolutionary ELM (EELM) has been developed to optimize input weights and latent biases and determine output weights. Output weights are determined analytically using the MP generalized inverse.[34] The stability and generalization performance of the ELM was investigated. This system consists of two stages. In the first step, a forward iterative algorithm is applied to select hidden nodes from randomly generated candidates at each step and then hidden nodes are added until the stopping criterion is matched. In the second stage, unimportant nodes are removed.[35],[36] A kernelbased ELM (KELM) inspired by SVM has been developed and the main kernel function used in ELMs is the radial basis function. KELMs are used to design deep ELMs (DELMs).[37] DELMs utilize KELM as an output layer.[38] Votingbased ELM (VELM) has been proposed to improve performance on classification tasks. In VELM, not just one network is trained, but also many are trained and then the fittest is chosen according to the majority voting method.[39] In this study, a signal processing methodology was developed for the subjects who control the movement of a robotic arm using four motor imagery signals related to the following wrist movement states, respectively: right, forward, left, and backward. Section 2 gives a background on the MEG acquisition system and the robot arm, followed by a discussion on the classification and control strategy implemented in this study. In Section 3, the analysis results were presented. Section 4 includes the discussion and conclusive summary part. Materials and Methods There is no need for ethics committee approval. MEG data acquisition and preprocessing The data set was provided by the Brain Machine Interfacing Initiative, AlbertLudwigsUniversity Freiburg, the Bernstein Center for Computational Neuroscience Freiburg, and the Institute of Medical Psychology and Behavioral Neurobiology, the University of Tübingen collected for the BCI Competition IV (https://www.bbci.de/competition/iv/). The data set comprises directionally modulated lowfrequency MEG activity that was acquired from two healthy, righthanded subjects. They performed wrist movements in four various directions, i.e., we have four different classes. The task was to move a joystick from a center position toward one of four targets located radially at 90° intervals (fourclass centerout paradigm) utilizing exclusively the right hand and wrist. The MEG device has 10 channels and 625 Hz. sampling rate. The data were bandpass filtered (0.5 to 100 Hz., Butterworth, 3rd order) and resampled at 400 Hz. The data comprise signals from ten MEG channels which were positioned above the motor areas given in [Figure 1].{Figure 1} The details about the data acquisition and signal processing processes can be accessed at https://www.bbci.de/competition/iv/. Data were collected from two separate subjects including training sessions during 40 s/trials for each mental task, and testing sessions during 73 trials by executing random tasks. The test data will be used as external validation for assessing the performance of the proposed classifier. The training data were merged as the size of the matrix for each mental taskbased class is 40 (number of trials)×800 (number of sampling data)×10 (number of channels). The train data matrix was reshaped into a twodimensional form as 32000 (number obtained from multiplication of trials with the sampling data)×(number of channels). The total size of the training data is 128000 × 10. The testing data including the random sequence of mental tasks were merged as the size of the matrix is 73 (number of trials)×800 (number of sampling data)×10 (number of channels). The test data matrix was reshaped into a twodimensional form as 58400 (number obtained from multiplication of trials with the sampling data)×10 (number of channels). Feature extraction The feature extraction process includes the transformation of the preprocessed signal into a feature matrix by attenuating noise and focusing on important data. The ARFIT package is utilized to find an optimum model order for the timevarying multivariate autoregressive (MVAR) model. The timeinvariant parameter and the order of the model can be estimated using ARFITpackage. “Schwarz's Bayesian Criterion” is applied for the order estimation and then the model order is fixed for further analysis. Timevarying partial directed coherences (PDCs) are evaluated through the timevarying MVAR model matched with the signal using an AAR algorithm, which uses linear Kalman filtering for parameter estimation. A surrogate data method having 50 realizations is performed to choose the most essential quantities of the measure at a 99% confidence level. Surrogates are constructed by randomizing all samples of the signal to remove the causal relations among them.[40],[41] The algorithm embeds the linear Kalman filtering to update the MVAR parameters for each time sample. A ddimensional timevarying MVAR process is given in Eq. 1. [INLINE:1] where p is the model order, wk∈Rd is a zeromean white process noise vector, and Ak(r) ∈ Rd×d is the matrix of autoregressive coefficients at each lag r and time point k=p+1,…,N. A form of state space equations is built from the MVAR equations by reorganizing all matrix parameters into a state vector of the dynamical system and focusing the nonstationary signal as the measurement is given in Eq. 2.[42],[43] [INLINE:2] where xk is the parameter vector (state vector), kk is the Kalman gain, Hk is the measurement matrix (observation matrix), ek is the onestep prediction error, and yk is the estimated vector. xk and Hk are stated in Eq. 3. [INLINE:3] where yT(k1)=[yT(k1)…yT(kp))]. The elements of the state vector are estimated via the Kalman filtering approach. The process and observation noise covariance matrices (w(k),v(k))are updated using a specific method given in Eq. 4.[44] [INLINE:4] where ζ is the update coefficient, z(k) is the aposteriori correlation matrix. I is the identity matrix, and × implies the matrix product operator. The update coefficient determines the time resolution and the smoothing of the AR estimates. The Ak(r) matrices given in Eq. 1 are in the following form given in Eq. 5. [INLINE:5] for r = 1,...,p and their elements are predicted utilizing the adaptive method given in Eq. 14. Based on that information, adaptive timevarying connectivity measures are defined on the following Ztransform of the MVAR parameters “Ak(r)” to the frequency domain as in Eq. 6. [INLINE:6] PDC, which is a timevarying connectivity measure, is defined in Eq. 7.[45],[46] [INLINE:7] PDC quantifies the direct influence from time series to time series, after discounting the effect of all the other time series. The square exponents enhance the accuracy and stability of the estimates while the denominator part permits the normalization of outgoing connections by the inflows.[47],[48],[49] Classification Extracted features are then defined as input neurons to the classification algorithm. The output layer should include four neurons for the four classes that represent the four wrist movement states. The number of neurons in the input layer changes according to the length of the feature vector. NNs and SVMs play key roles in machine learning and data analysis. However, it is known that there exist some challenging issues with them such as intensive human intervention, slow learning speed, and poor learning scalability. In this article, we use ELM for the classification. ELMs are a kind of feedforward neural network, which does not necessitate gradientbased backpropagation for learning. It utilizes MP generalized inverse to set its weights. ELM not only learns up to tens of thousands faster than NNs and SVMs but also provides unified implementation for regression, binary, and multiclass applications. ELM is efficient for time series, online sequential, and incremental applications. ELM is also efficient for large datasets. A singlehidden layer feedforward NN is shown in [Figure 2].{Figure 2} The algorithm of a singlehidden layer feedforward NN can be listed as multiplication inputs by weights, adding bias, implementing the activation function, repeating the first three steps with a number of layers times, evaluating output, backpropagating, and repeating every step, respectively. The ELM differs from the feedforward NN by removing the repeating steps and replacing the backpropagation step with a matrix inverse operation. The output of the ELM is evaluated as in Eq. 8. [INLINE:8] where L is the number of hidden units, N is the number of training samples, βi is the weight vector between ith hidden layer and output, w is the weight vector between the input and hidden layer, g(.) is an activation function, b is a bias vector, and x is an input vector. β is a special matrix due to the pseudoinverse operation. Eq. 8 can be represented in a compact form as in Eq. 9. T = Hβ [INLINE:9] where m is the number of outputs, H is called the hidden layer output matrix, and T is a training data target matrix. Since there are no certain rules for choosing the number of hidden neurons, the EMELM method was used for finding the optimal configuration. Machine interface design JACO robotic arm, which is a generic 6axis robotic manipulator with a threefingered hand, can be used for the physical realization and machine interface of the proposed algorithm. The arm has six degrees of freedom in total with a maximum reach of 90 cm radius sphere and a maximum speed of 30 cm/s. It is made of three sensors: force, position, and acceleration. This arm should be suitable for a person with a disability of the upper arm and can be placed in a wheelchair. The upper arm of the robot is made of three links which are similar to the upper limb of the human body, as shown in [Figure 3].{Figure 3} An API, which gives freedom of control to users, is provided by the manufacturers. The subject needs to control the movement of the robot arm toward a given target by using four mental commands: Forward (F), Backward (B), Left (L), Right (R), and No Movement command. To end the movement of the robot arm, the subject would generate a “No Movement” command by taking no action. The arm could move on two axes (x and y) and in four directions (forward (+y), backward (−y), left (+x), and right (−x)). The control signals are generated according to the mental commands. To move the arm forward, the subject imagines the forward wrist movement, and performs backward wrist movement to move the arm backward. The subject imagines moving his/her right wrist to move the robotic arm right and imagines moving his/her left wrist to move the robotic arm left. Forward differential kinematics gives the relation between joint velocities and tip velocity. This relation is in matrix form called Jacobian. To form Jacobian, first propagation matrices are created. The propagation matrix given in Eq. 10 produces the relation between sequenced joints. [INLINE:10] For fixed base manipulators, the robot propagation matrix can be constructed by using these joint relations. This matrix is called φ and gives all joint velocity vectors. To reach tip velocity, Eq. 11 should be evaluated. [INLINE:11] By using this equation, the robot tip velocity can be calculated concerning joint velocities. Hence, the Jacobian is given in Eq. 12. [INLINE:12] MATLAB Simulink Code is created by using the above equations. First, the Jacobian is implemented, after that according to Rodrigues' formula, the rotation matrices for each joint are built. The coordinate frames for each joint change when the robot moves. For this reason, coordinate frames are updated at each sample time by using joint velocities, sample period, and rotation axes. However, this update mechanism is correct if the joint velocity is constant along the sample period. The selected sample time is 1 ms. The Simulink block diagram of the robot Jacobian is shown in [Figure 4].{Figure 4} Inputs are base positions (6 × 1) and Robot Joint Angular Positions (6 × 1), and Outputs are Tip Point Position and Load Center Position. Tip and load center positions are fed to these shapes at the VRML file to confirm the operation of the moving base kinematic code. Only the angular part of the base position is used. These angles are Euler XYZ angles and converted to vectorangle representation and then fed to VRML file. Two common ways to create a dynamical model of a robot are Newton–Eulerbased and Euler–Lagrangebased dynamical modeling. The Newton–Euler method, which is a vectorial approach, was explained in this article for dynamical modeling. The dynamic model stated in Eq. 13 describes the relationship between joint forces and joint accelerations. The forces acting on a joint are the sequenced joint force and linear and angular inertia of the joint. [INLINE:13] The compact form of Eq. 13 can be given as in Eq. 14. [INLINE:14] If the robot joint dynamic equations are put together, Eq. 15 is obtained. [INLINE:15] The above equation includes all forces acting on the robot joints. To reach forces that make the robot joint accelerate, the force vector F_ is multiplicated with the rotation axis matrix as in Eq. 16. [INLINE:16] Based on MATLAB kinematic code for moving baserobot systems, a dynamic model is built. The torque expressions are related to Jacobian terms. Because of that dynamic model code can be easily improved by starting with kinematic code. The first addition to kinematic code is creating μ the matrix according to base velocities, base accelerations, and fixed base dual robot “μ” The fixed base dual robot “μ” requires robot joint masses, inertia tensors, position vector of the center of mass, and joint angular velocity vectors. Masses, inertia tensors, and center of gravities concerning joint frames are found by investigating the SolidWorks drawing of the robot arm. Joint angular velocity vectors are provided from the previous iteration with zero initial points. After evaluating the fixed base “μ” the moving base features are added to the “μ” and a new “μ” is created. These additions are base mass matrix which depends on base mass, inertia tensor, the center of gravity, and baserobot mass matrices. C matrix is created based on H, φ, a vector, b vector, and mass matrix. After calculating these matrices, the forward dynamics of the robot with a moving base are simulated. Above torque, expression is prepared as torque input and acceleration output given in [Figure 5] and Eq. 17.{Figure 5} [INLINE:17] The inputs are Tip Forces (12 × 1), Base Forces (6 × 1), Joint Torques (12 × 1), and Joint Angular Velocity Vectors (12 × 1), respectively. The outputs are Accelerations (18 × 1), Tip Point Velocities (12 × 1), and Load Velocity (6 × 1), respectively. Results After applying the adaptive ARbased PDC feature extraction method to the train data matrix for each class, the timevariant estimated MVAR parameters were obtained. The size of the parameter vector for each class is 32000 × 200. The rows represent time in terms of data samples, and the columns show the parameter values estimated over samples. The most significant values of the adaptive PDC measures at a 99% level of significance after applying the surrogate data method are given in [Figure 6] for each class of wrist movement states.{Figure 6} The adaptive PDC results show the connectivity between channels. It can be inferred from [Figure 6] that the most significant direct coupling activities emerge from channel 3 to channels 1, 2, 4, and 5, respectively. The model parameters are accurately tracked and each class has different patterns. The adaptive ARbased PDC feature matrices are concatenated and the feature train matrix is built, and the obtained train matrix is fed into the ELM. The size of the feature train matrix is 128000 × 200. A principle component analysisbased dimension reduction algorithm, which is an orthogonal transformation constructing the relevant features, is applied to the feature train matrix because the number of extracted features is too much. The first four components having the most variance in the feature data are selected. After the dimensionality reduction process, the size of the feature train matrix becomes 128000 × 4. The obtained data are randomly divided into training, testing, and validation sets. Every time the system is executed, samples were used for each task (70400 samples [55%] are used for training, 25,600 samples are used for validation [20%], and the remaining 32000 trials were used for the test [25%]). ELM has trained for 100 epochs (iterations) by incrementing the number of neurons in the hidden layer (hidden node). For each hidden node, the classification accuracy was evaluated in terms of the “root mean squared error” criteria. [Figure 7] represents the performance curve sensitivity concerning the hidden node.{Figure 7} The rmse error approaches the equilibrium state while increasing the hidden nodes. Therefore, the number of hidden neurons is chosen as 100. The training ratio is selected using a trial and error process as 0.7. The activation function of the output layer is chosen as “sigmoid.” The confusion matrices are given in [Figure 8].{Figure 8} Test data are used for the external validation process. The size of the reduced feature test matrix is 58400 × 4. After feeding the test data to the trained ELM model, the external validation accuracy of the ELM is found as 84.88%. A comparative analysis was conducted over the external validation results using various classification methods and given in [Table 1].{Table 1} According to the external validation results, the ELM outperforms the other machine learning models, and also the computational speed is extremely fast. Discussion and Conclusive Summary The obtained results clarify that by applying the proposed methodology, control signals from an individual's MEG signals can control the robotic arm. Timevarying cortical neural connectivity features are strong biomarkers for characterizing the MEG patterns, which originated from wrist movements. According to the external validation classification results, ELM outperforms the other classifiers by achieving the highest accuracy. The accuracy of which these control signals are extracted from the MEG signals is extensively measured using the wide industryemployed receiver operator characteristic curves. After the signal processing part is taken care of, it is necessary to establish the dynamic model of the robot, create and simulate the solid model, eliminate the errors caused by the system dynamics in the robot joints, and synchronize with the brain signals. MEGbased BCI systems are capable to be robustly utilized as a control device through the proposed framework. Patient informed consent There is no need for patient informed consent. Ethics committee approval There is no need for ethics committee approval. Conflicts of interest There are no conflicts of interest to declare. Financial support and sponsorship No funding was received. Author contribution subject and rate: Caglar Uyulan (100%): Design the research, data collection, and analyses and wrote the whole manuscript. References


