In the process of oil recovery, experiments are usually carried out on core samples to evaluate the recovery of oil, so the numerical data are fitted into a non-dimensional equation called scaling-law. This will be essential for determining the behavior of actual reservoirs. The global non-dimensional time-scale is a parameter for predicting a realistic behavior in the oil field from laboratory data. This non-dimensional universal time parameter depends on a set of primary parameters that inherit the properties of the reservoir fluids and rocks and the injection velocity, which dynamics of the process. One of the practical machine learning (ML) techniques for regression/classification problems is gradient boosting (GB) regression. The GB produces a prediction model as an ensemble of weak prediction models that can be done at each iteration by matching a least-squares base-learner with the current pseudo-residuals. Using a randomization process increases the execution speed and accuracy of GB. Hence in this study, we developed a stochastic regression model of gradient boosting (SGB) to forecast oil recovery. Different non-dimensional time-scales have been used to generate data to be used with machine learning techniques. The SGB method has been found to be the best machine learning technique for predicting the non-dimensional time-scale, which depends on oil/rock properties.

The hydrocarbons reservoirs are naturally fractured, so that they consist of two main sections, fractures and matrix blocks (

In this study, we utilize the SGB machine learning technique to forecast oil recovery estimation and its scaling-law. In general, the ML techniques, including Artificial Neural Networks (ANNs), k-nearest neighbor (k-NN), and Support Vector Machine (SVM), can be utilized to predict oil recovery too. However, most of the well-established machine learning techniques are complex, and their training processes are time-consuming. Rule-based Decision Tree (DT) and tree-based ensembles (TBE) methods such as Random Forest (RF), Extremely Randomized Trees (ERT), and SGB are powerful and robust forecasting algorithms. The ML tools are recently used widely in the oil/gas industry [

The machine learning SGB regression algorithm has been developed to predict oil recovery from oil reservoirs based on laboratory measurements and analytical models. To the best of the author’s knowledge, the machine learning tools have not been used before in oil recovery prediction. The performance of oil recovery prediction is assured for the SGB model. Also, other machine learning techniques, including k-NN, ANN, SVM, and RF, have been used oil recovery besides the SGB model. Another significant coefficient in oil recovery prediction, namely, the universal dimensionless time, has also been predicted along with oil recovery in a generalized scaling-law [

In this paper, a novel ensemble prediction scheme is utilized to accomplish oil recovery prediction. The proposed model utilizes the SGB model for oil recovery. The rest of the paper is arranged as follows: In Section 2, scaling laws are discussed. In Section 3, machine learning techniques are explained. Section 4 provides the results and discussions; and finally, conclusions are presented in Section 5.

Aronofsky et al. [

where ^{3}] is the recovery, _{im}^{3}] is the ultimate.

Ref. | Scaling-Laws | Definitions | |
---|---|---|---|

Bruce et al. [ |
K [m^{2}] is the permeability |
||

El Ouahed et al. [ |
|||

Aronofsky et al. [ |
|||

Mattax et al. [ |
|||

Gupta et al. [ |
|||

El-Amin et al. [ |
_{i}_{c} |

The generalized formula of dimensionless time by El-Amin et al. [

Wettability | _{i} |
---|---|

Strong water-wet | 3.09E −8 |

1.27E −7 | |

3.20E −7 | |

4.43E −6 | |

Intermediate-wet | 3.30E −7 |

6.60E −7 | |

1.32E −6 | |

2.65E −6 | |

4.65E −6 | |

Weak water–wet | 3.3E −7 |

1.32E −6 | |

4.65E −6 |

In the following subsections, selected machine learning methods have been presented including k-NN, ANNs, SVM, RF and SGB.

The k-NN is a nonparametric technique that can be utilized for classification and regression [

The idea of ANNs is to mimic the human brain’s task of learning from experience and then identify predictive patterns [

The SVM (Support Vector Machine) method is linear/nonlinear data classification algorithms [

The tree classifiers group, which is relevant to random vectors, is called random forest (RF) [

Grabczewski [

The dataset and input variables used in this study were mainly experimental data extracted from many published papers [

It is well known that an independent test set is not a good indicator of performance on the training set. One may use a training set based on each instance’s classifications within the training set. In order to predict a classifier’s performance on new data, an error estimation is required on a dataset (called test set), which has no role in classifier formation. The training data, as well as the test data, are assumed to be representative samples. In some particular cases, we have to differentiate between the test data and training data. It is worth mentioning that the test data cannot be utilized to generate the classifier. In general, there are three types of datasets, namely, training, validation, and test data. One or more learning schemes use the training data to create classifiers. The validation data is often employed to modify specific classifier parameters or to pick a different one. The test data will then be used to estimate the error of the final optimized technique. In order to obtain good performance, the training and testing sets should be chosen independently. Thus, in order to achieve better performance, the test set should be different from the training set. In many cases, the test data are manually classified, which reduces the number of the used data. A subset of data, which is called the holdout procedure, is used for testing, and the rest is employed for training, and sometimes a part can be used for validation [

In order to evaluate the quality of each ML technique, a number of statistical measures are listed in _{i}_{i}

Given the following expressions:

Measure | Equation |
---|---|

Correlation coefficient (R): | |

Mean-absolute error (MAE): | |

Relative absolute error (RAE): | |

Root relative squared error (RRSE): | |

Root mean-squared error (RMSE): |

In this study, several machine learning algorithms are utilized to forecast the dimensionless time and oil recovery in terms of primary physical parameters of rocks and fluids. In this regard, diverse prediction models are developed, and the predictors’ performance is examined. As shown in

Classifier | R | MAE | RMSE | RAE | RRSE |
---|---|---|---|---|---|

ANN | 0.8185 | 15.8604 | 19.0643 | 53.38% | 56.45% |

k-NN | 0.9799 | 3.8663 | 6.6369 | 13.01% | 19.65% |

SVM | 0.5669 | 16.9334 | 31.3068 | 56.99% | 92.69% |

RF | 0.9754 | 5.0363 | 8.0282 | 16.95% | 23.77% |

SGB | 0.9869 | 3.4592 | 5.6901 | 11.64% | 16.85% |

ANN | 0.8390 | 85.2091 | 218.9916 | 39.25% | 54.00% |

k-NN | 0.9413 | 50.2303 | 167.8127 | 23.14% | 41.38% |

SVM | 0.9887 | 13.4459 | 60.6137 | 6.19% | 14.95% |

RF | 0.9468 | 54.8753 | 175.6228 | 25.28% | 43.31% |

SGB | 0.9906 | 10.3747 | 55.1585 | 4.78% | 13.60% |

ANN | 0.9689 | 5.3295 | 8.2838 | 19.05% | 24.80% |

k-NN | 0.9936 | 2.3153 | 3.9147 | 8.28% | 11.72% |

SVM | 0.9600 | 5.7694 | 9.3702 | 20.62% | 28.05% |

RF | 0.9936 | 3.1598 | 4.9892 | 11.29% | 14.94% |

SGB | 0.9958 | 1.8362 | 2.9497 | 7.07% | 9.52% |

ANN | 0.9204 | 5.6367 | 8.1003 | 32.54% | 39.19% |

k-NN | 0.9896 | 1.9435 | 2.9788 | 11.22% | 14.41% |

SVM | 0.9442 | 4.0743 | 6.812 | 23.52% | 32.95% |

RF | 0.9939 | 1.6865 | 2.5251 | 9.74% | 12.22% |

SGB | 0.9956 | 1.2462 | 2.0136 | 7.20% | 9.74% |

ANN | 0.7425 | 18.6878 | 25.2042 | 55.06% | 64.70% |

k-NN | 0.9875 | 3.4529 | 6.0695 | 10.17% | 15.58% |

SVM | 0.665 | 16.8976 | 30.7061 | 49.79% | 78.83% |

RF | 0.9767 | 5.9246 | 8.8212 | 17.46% | 22.64% |

SGB | 0.9875 | 3.4529 | 6.0695 | 10.17% | 15.58% |

ANN | 0.9847 | 65.2591 | 133.8567 | 13.27% | 17.69% |

k-NN | 0.9725 | 67.9595 | 178.3904 | 13.82% | 23.57% |

SVM | 0.9986 | 10.4821 | 40.0731 | 2.13% | 5.30% |

RF | 0.9561 | 91.5358 | 240.5619 | 18.61% | 31.79% |

SGB | 0.9989 | 8.0802 | 35.1254 | 1.64% | 4.64% |

In this work, selected ML techniques have been used to predict the dimensionless time and oil recovery. So, several machine learning techniques are developed to predict the dimensionless time and oil recovery against the scaling-law (actual) ones for strong/weak/intermediate water-wet cases. When comparing the SGB algorithm with other techniques (

If one compares the performances of the machine learning techniques for the predicted dimensionless time and oil recovery against the scaling-law (actual) ones for strong/weak/intermediate water-wet cases, the proposed SGB method achieved the best results in all cases. But, ANN, k-NN, SVM, and RF are also accomplished good performance in some cases. For instance, SGB achieved R = 0.9869, MAE = 3.4592 and RMSE = 5.6901, k-NN is also achieved similar results with R = 0.9799, MAE = 3.8663, and RMSE = 6.6369 for oil recovery in the intermediate water-wet case. SGB achieved R = 0.9906, MAE = 10.3747 and RMSE = 55.1585, SVM is also achieved similar results with R = 0.9887, MAE = 13.4459 and RMSE = 60.6137 for dimensionless time in the intermediate water-wet case. SGB achieved R = 0.9958, MAE = 1.8362 and RMSE = 2.9497, k-NN is also achieved similar results with R = 0.9936, MAE = 2.3153 and RMSE = 3.9147 for oil recovery in the strong water–wet case. SGB achieved R = 0.9956, MAE = 1.2462 and RMSE = 2.0136. Random Forest is also achieved similar results with R = 0.9939, MAE = 1.6865, and RMSE = 2.5251 for the dimensionless time in the strong water-wet case. SGB achieved R = 0.9875, MAE = 3.4529 and RMSE = 6.0695, k-NN is also achieved the same results with R = 0.9875, MAE = 3.4529, and RMSE = 6.0695 for oil recovery in the weak water–wet case. SGB achieved R = 0.9989, MAE = 8.0802 and RMSE = 35.1254, SVM is also achieved the similar results with R = 0.9986, MAE = 10.4821 and RMSE = 40.0731 for dimensionless time in the weak water–wet case. In all cases, the SGB model outperformed the other models. This reveals that the SGB is a robust model and tackle well noisy conditions. The overall results illustrate that the SGB technique can effectively handle the expected oil recovery data because of its ability to produce better performance while ensuring better generalization.

As the dimensionless scaled-time law is fundamental to predict oil-recovery using laboratory data. We examined several ML techniques (k-NN, ANN, SVM, RF, and SBG) to predict the dimensionless scaling-law based on oil and rock physical properties in the current paper. The SGB regression was found to be the best ML method for predicting dimensionless scaling-time. The machine learning techniques’ performance has been compared using R, MAE, RMSE, RAE, and RRSE. Assessment of the experimental results among the machine learning techniques has shown that the SGB algorithm has the best prediction performance. Besides, the SGB model achieved higher prediction accuracy and lowered MAE, RMSE, RAE, and RRSE compared to k-NN, ANN, SVM, and RF regression models.