This study presents an AI-based constitutive modelling framework wherein the prediction model directly learns from triaxial testing data by combining discrete element modelling (DEM) and deep learning. A constitutive learning strategy is proposed based on the generally accepted frame-indifference assumption in constructing material constitutive models. The low-dimensional principal stress-strain sequence pairs, measured from discrete element modelling of triaxial testing, are used to train recurrent neural networks, and then the predicted principal stress sequence is augmented to other high-dimensional or general stress tensor via coordinate transformation. Through detailed hyperparameter investigations, it is found that long short-term memory (LSTM) and gated recurrent unit (GRU) networks have similar prediction performance in constitutive modelling problems, and both satisfactorily predict the stress responses of granular materials subjected to a given unseen strain path. Furthermore, the unique merits and ongoing challenges of data-driven constitutive models for granular materials are discussed.

Granular materials are ubiquitous in nature, industrial and engineering activities. Predicting how granular materials respond to various external loads is not just intricate but important for many engineering problems. The complexity of granular media can be partially attributed to its unique features, such as inherent anisotropy and heterogeneity [

Over the past decades, analytical or phenomenological characterisation of elastic-plastic behaviour of granular materials is undoubtedly the most common scheme. However, although numerous attempts have been made to capture the constitutive behaviour of granular materials, developing a unified theoretical model remains an ongoing challenge [

Instead of phenomenological constitutive models, some scholars are seeking a shift of methodology in predicting the constitutive behaviour of granular materials. An attractive alternative is the

Overall, the past work mainly focuses on developing advanced learning models or algorithms to learn an accurate prediction model with as few as possible specimens. However, when training a neural network with triaxial testing data and taking these models into practical applications where complex stress-strain states are usually encountered, a direct challenge is how to enable the deep neural networks (DNNs) trained with principal stress-strain data from triaxial testing to predict general stress response in practice.

This work aims to address the above-mentioned challenge by presenting an AI-based constitutive modelling strategy which leverages

Constitutive models refer to the mathematical formulations which relate the stress responses to strain states of materials at an element level. A state of stress or strain of a point under a general loading condition can be defined by a second-order tensor with 9 components (reduce to 6 components under a symmetry condition). Instead of using any mechanical assumption, DNNs connect these stress components to strain components directly with a series of linear and nonlinear mathematical operations. Since the application of deep learning in predicting the elastic-plastic behaviour of granular materials is still at an early stage of development, an introduction about the fundamentals of a conventional neural network is given in Appendix A.

The constitutive behaviour of materials is essentially a sequence problem. In the family of deep learning, recurrent neural networks (RNNs) are extensions of conventional fully connected neural networks and mainly suitable for sequence data predictions. LSTM and GRU networks are two popular RNNs in dealing with long sequence learning problems. In this study, both networks will be used to train the constitutive models. The detailed internal structures of GRU and LSTM cells and their mathematical operations can be found in [

Triaxial tests are commonly used to measure the stress-strain responses of granular materials with the assumption that the triaxial specimen is a representative volume element of the measured materials. Such a loading condition makes the three loading directions happen to be the principal stress/strain directions and the increments of principal stress and strain can be experimentally measured during true triaxial testing. In the process of developing conventional constitutive models, the role of these triaxial tests is to calibrate free parameters and validate the applicability of a new analytical model.

However, when utilising the triaxial testing measurements to train a deep neural network and develop a data-driven constitutive model, some extra work is necessary. The reason is that deep learning can only approximate the mapping between direct inputs (principal strain) and outputs (principal stress), while a constitutive model that can be used to analyse boundary value problems (BVPs) is usually required to incorporate shear stress/strain components. To resolve this issue,

Frame-indifference is a generally accepted assumption in the study of constitutive models [

where _{im}_{nj}

in which _{m}_{n}_{m}_{n}

and its inverse transformation is:

The above formulations can achieve the interchangeable transformation between the principal stress tensor in triaxial testing conditions and a 3D tensor incorporating shear components. The transformation of a strain tensor follows the same rule.

On the basis of frame indifference, two strategies are available to develop a data-driven constitutive model: one strategy is to perform data augmentation based on the principal stress-strain pairs by rotating the original coordinate frame. Depending on the selected interval of the rotation angle, thousands of data specimens incorporating normal and shear components can be artificially generated based on simply one principal stress-strain sequence pair. Then all the augmented data specimens are used to train the deep neural networks. In this case, the AI model will naturally achieve the aim of predicting stress responses with a 3D strain tensor incorporating both shear and normal components. As the strategy augments data first and then training models later, we would name it as “Augmentation First and Training Later,” short for “AFTL.” The basic workflow of the AFTL strategy can be found in

The other strategy is to train the neural networks only with the principal stress and strain data.

Recalling the fact that the coordinate transformation does not change the stress or strain state experienced by a point, all these artificially generated specimens essentially represent the identical state as the original one. Therefore the AFTL strategy may include information redundancy. Inevitably a marked increase in the number of training specimens will give rise to a far greater DNN scale and a larger training cost. Besides, the choice of the interval of rotation angle becomes a new hyperparameter that has to be artificially tuned. By contrast, the TFAL strategy avoids these downsides and is thus recommended as a preferred scheme in this study.

As a high-dimensional surrogate model, deep learning normally requires a large amount of data to train the model. Laboratory experiments are not only expensive but time-consuming. In contrast, DEM has proven to be capable of capturing the salient behaviour of real granular materials [

In this study, a total of 220 numerical triaxial specimens with 4037 spherical particles for each model are generated via DEM. The particle radii are uniformly distributed between 2 and 4 mm. The normal and tangential contact stiffnesses, interparticle frictional coefficient, particle density, viscous damping ratio are 10^{5} N/m, ^{3} and 0.5, respectively. The specimens are isotropically consolidated to a confining pressure of 200 kPa. Also, the maximum loading strain is restricted to 12% with incorporating several (mostly one or two) unloading-reloading loops during the whole loading process. These unloading and reloading strain values are mutually different and randomly sampled with a physical restriction that the reloading strain is always lower than its preceding unloading strain.

Before training, all the data specimens from DEM are transformed with the MinMaxScaler in the scikit-learn package to scale each feature to a range of (0,1). This transformation of data benefits reducing the risk of getting stuck in local optima and makes the training process faster. Then the scaled data specimens are shuffled with a certain random seed. The whole specimens are partitioned into training, validation, and test datasets with 96, 24, and 100 groups of datasets, respectively. These specimens are mutually different and never seen for each other.

The LSTM and GRU neural networks are built on Tensorflow platform with Nvidia GPU accelerated computation framework for all the training process. The prediction accuracy of the deep learning models is evaluated by the scaled mean absolute error (SMAE), which is used as the cost function when training a DNN model and also the metric to evaluate the final trained model. The formula of an average SMAE can be calculated by:

where ^{j} and

The approximation capability of a deep learning model is related to its used architecture and network configuration. To explore a suitable network configuration for the current constitutive modelling problem in this work, a detailed parametric study is conducted. The preliminary network configuration starts from one or two RNN layers, followed by one or zero dense layer, before connecting the output layer. The neuron number in each hidden layer varies from 0 to 120 with a gap of 20. The final network architecture will be determined by comprehensively considering (1) the amount of SMAE, and (2) the complexity of architectures. Specifically, if the difference of two SMAEs is within 10^{−5}, the simplest architecture with the least parameters to be trained will be selected because a simpler model has less risk of overfitting. The SMAEs of different network architectures can be found in

After a suitable network configuration has been selected, the next step is to train the network to fully exploit the prediction ability of this surrogate model. Therefore, the influences of some other important hyperparameters (e.g., timesteps, batch size and learning rate) should be considered. Empirically, the batch size is usually selected from a power of 2 (e.g., 2^{n}); the learning rate normally starts from 0.001 and increases to 0.01 and 0.1.

With all these considerations, an investigation of SMAE against different timesteps and batch sizes is shown in

The learning curves of both LSTM and GRU models with the selected hyperparameters are shown in

Note that the hyperparameter selection for a deep neural network is essentially a very high dimensional combinatorial optimisation problem, it is thus not easy to search all the possible combinations considering available computational resources. Although the parametric study does not cover many combinations, it provides a relatively reliable network configuration and parameters for the model in the searched parameter space.

The LSTM model predicts the 100 groups of unseen test specimens with an average SMAE of 0.0193. The smallest SMAE is 0.007 and the largest SMAE is 0.051. For the GRU model, the average SMAE on the test specimens is 0.0189. The best prediction has a SMAE of 0.007, while the worst prediction has a SMAE of 0.054. Some typical predictions given by the trained LSTM and GRU model are shown in

On the one hand, the results confirm that the LSTM and GRU have similar prediction performances on the stress-strain behaviour of granular materials. On the other hand, it is found that even for the worst and the second worst prediction cases, the overall tendency of stress responses has been satisfactorily captured by both LSTM and GRU models. The results demonstrate that the prediction accuracy of the trained model is acceptable and the DNN model is able to predict complex cases with more than two unloading-reloading cycles, which is very challenging to achieve for phenomenological constitutive models.

Conventional continuum-based elastic-plastic constitutive models normally utilise 1) yield surfaces to describe plasticity, and 2) associative or non-associative flow rules to characterise the evolution direction of plastic deformation. This work offers an alternative to currently used phenomenological models for granular materials by training a data-driven constitutive model via RNN neural networks based on triaxial testing data. The work adopts coordinate transformation to transform the principal stress or strain tensor to a general 3D tensor incorporating both normal and shear components, and a constitutive training strategy is summarised. Also, both LSTM and GRU neural networks are used to train the stress-strain prediction model. The results demonstrate that the LSTM and GRU models have a similar prediction accuracy and both of them are powerful in predicting the macroscopic elastic-plastic responses of granular materials.

The data-driven paradigm has unique advantages in the constitutive modelling of materials. First, it can be embedded in macroscopic numerical modelling (such as FEM or MPM) for practical applications without extra simplifications. Second, its prediction ability can be further improved, provided that new data specimens are added. Third, the DNN model is capable of predicting complex stress-strain responses with excellent accuracy and efficiency. It not only inherits the merits of DEM in naturally capturing stress-strain relations of granular materials undergoing large deformation and shear localisation, but overwhelmingly accelerates the stress-strain predictions given by

It should be noted that constitutive models for a specific granular material should be associated with its material and state parameters, such as particle size distribution, mineralogical compositions and porosity etc. More advanced deep learning models considering these physics-invariant properties can be found in [

Deep learning is a subset of machine learning based on deep (multiple-layer) artificial neural networks (ANNs). As shown in

where _{j}^{th} neuron in the (^{(j −1)} is the bias in the (

Assuming that the

where

The nature of deep learning is to discover a hypothesis function or “surrogate model” based on DNNs’ architectures to represent a certain mapping or relation. Initially, the weights and biases in DNN are randomly initialised. The resulting prediction by the forward propagation will normally be far away from the ground truth. The difference between the prediction and the ground truth is quantified by a loss function. For a prescribed neural network structure, the loss function is specified as a function of weights and biases.

To train a reliable hypothesis model, the learning problem is converted to an optimisation problem with the target of minimising the loss function. Normally, the network weights and biases are iteratively optimised by a gradient-based optimisation algorithm, e.g., gradient descent. The required gradients are calculated by the backpropagation algorithm, which utilises automatic differentiation (AD) to numerically evaluate the derivative of a function by applying the chain rule repeatedly to the elementary arithmetic operations constituting the loss function.

In the training phase, the optimised weights and biases are used for the next forward propagation to update the loss function. Then the backpropagation and the gradient-based optimisation algorithm further adjust the weights and biases for the next forward propagation. After the process is repeated for a sufficiently large number of training cycles, the neural network will usually be able to predict the results satisfactorily. Then the weights and network architectures can be saved for future use.