Rate of penetration optimization using machine learning technique

Hussein Najem Al-Janabi; Abdelalim Hashem Elsayed

Аннотация статьи

The rate of penetration (ROP) optimization is one of the most important factors in improving drilling efficiency, especially in the downturn time of oil prices. This process is crucial in the well planning and exploration phases, where the selection of the drilling bits and parameters has a significant impact on the total cost and time of the drilling operation. Thus, the optimization and best selection of the drilling parameters are critical. Optimization of ROP is difficult due to the complexity of the relationship between the drilling variables and the ROP. For this reason, the development of high-performance computer systems, predictive models, and algorithms will be the best solution. This research seeks to develop a predictive model for the Rate of Penetration (ROP) to facilitate optimal bit selection. Specifically, the study employs Support Vector Machine (SVM) model to estimate ROP. There are many factors that affect ROP were identified based on previously published researches. The data sets encompassed over 13,500 records of drilling parameters for three wells in a Middle East Oilfield. These records were analyzed statistically to eliminate outliers and low-quality entries. As a result, more than 11,000 records were retained and normalized in a preprocessing stage before the model was constructed. The model is utilized in this study Support Vector Machines (SVM) to predict ROP. The model produced good results and performed well in predicting ROP, achieving correlation coefficients of 0.89 for both training and validation and 0.90 for testing.

Текст статьи

Introduction

A key challenge confronting the oil industry, particularly in drilling, is the high cost of drilling, which has garnered significant focus in recent decades. Several factors influence drilling costs, with the most significant being the well's drilling time, which can substantially raise expenses. Reducing drilling time is a primary objective for drilling engineers. In essence, one of the key goals of optimizing drilling operations is to decrease the overall time required. Two approaches have been suggested to achieve this goal: selecting the optimal drilling parameters (such as choosing the appropriate drill-bit and drilling fluid) and performing real-time analysis to fine-tune operational factors like rotary speed and weight on bit during drilling [1, p. 1-25]. Various approaches were presented for the optimal bit selection program. The traditional method for selecting a drilling bit relies on historical data, including cost per foot (CPF), specific energy (SE), bit wear, offset-well bit performance, and geological details [2, p. 1225-1229]. Most of these methods use Rate of Penetration (ROP) as one of its inputs; therefore, ROP prediction became a valuable means for Bit Selection optimization to reduce drilling time, thus costs. The rate of penetration (ROP) is the primary factor influencing drilling duration. Therefore, the accuracy of the ROP model is essential. Several factors influence the rate of drilling, such as the properties of drilling mud, the characteristics of the formation, the speed of rotation, and the features of the bit [3]. Certain factors are beyond control, like the characteristics of the formation, while others can be controllable, including the Weight on Bit (WOB), drillstring rotation speed (RPM), bit type and size, pump flow rate, and drilling mud properties. The impact of various factors on the (ROP) can be analyzed separately, including parameters such as weight on bit (WOB), revolutions per minute (RPM), and rock strength [1, p. 1-25]. A study analyzing various mathematical models of ROP discovered that the majority of them incorporate factors such as the bit diameter, rotation, and weight on bit, which are either exponentially or linearly adjusted by constants as inputs. The model suggested by Bourgoyne and Young, nevertheless, remains the most widely recognized. Certain parameters they utilize are not always readily accessible, making their model suitable primarily when an extensive amount of well data is available.

Data overview and methodology

From three wells in a Middle East field, a comprehensive dataset is collected and represented by (13575 data points), comprising historical drilling data containing diverse drilling parameters including rotary speed, weight on bit, mud density, etc., along with the corresponding ROP values. For this study, 12 drilling parameter have been selected as the most effective parameters on ROP. Figure 1 shows the heatmap of all drilling parameters of these three wells.

Fig. 1. Heatmap chart for 13,575 data points for the three wells

From this heatmap the most important parameters that were considered to develop the SVM ROP Model as inputs are listed below:

True Vertical Depth (TVD) (m).
Bit Rotary Speed (RPM) (1/min).
Torque (T) (Ft*Lb).
Hook Load (HKLD) (Ton).
Mud Weight (sg).
Stand Pipe Pressure (psi).
Pump Flow Rate (SPM).
Weight On Bit (WOB) (Ton).
Mud Temperature In (℉).
Mud Flow In (gpm).
Equivalent Circulation Depth (ECD) (sg).
D Exponent.

Data cleaning

It is important to minimize the errors of collected data that might was produced during data capturing or editing prior to storing, to maximize the usefulness and significance of the data. Failure to perform routine data cleaning can result in accumulated errors, which may hinder efficiency and create future challenges [4]. There many methods to complete the data cleansing process, statistical method was implemented in this research by examining the data utilizing the values of range, standard deviation, mean, etc. where the aim was to reduce the standard deviation for the selected parameters of the collected data, while maintaining the median and main values are close to the original values as much as possible. Data pre-processing and cleaning are needed before feeding any ROP model. The quality of the data is directly linked to the outcomes produced by any model. The last stage of this process involves identifying and eliminating outliers, which are data points that deviate significantly from the majority of the dataset [5, p. 181-185]. Such data are often anticipated to emerge within extensive experimental datasets. Such data can influence the precision and dependability of models [6, p. 694-701]. Therefore, obtaining this data is crucial for the advancement of models [7, p. 88-100]. At first, when they are easily recognizable, the process can be carried out manually. However, after this stage, more advanced techniques become essential. In this research, Interquartile Range (IQR) method was evaluated. Figure 2 show the boxplot of some input data that has been used to determine the outliers, mean and median of these data.

Fig. 2. Box plot of dataset

This study utilizes the leverage approach to identify outliers as well. The discrepancy between the predicted values and the corresponding experimental data was calculated using this method.

Data pre-processing (normalization)

Prior to feeding the input-output data into the training phase of any model, all datasets were scaled to a range of [-1, 1] using Equation (1), where X is the input parameter, is the normalized value of input parameter, and and are the maximum and minimum values of input parameters, respectively. In fact, the scaling of data within these ranges [-1, 1] is an important task to be performed in order to prevent computational issues and comply with algorithmic criteria [8, p. 339-354].

, (1)

The outliers also must be removed as a data cleansing process, the results of this process, which is called Data Cleansing, are summed up in table 1. The processed data, deemed valid and exhibiting a normal distribution, were considered reliable, therefore, used to build and train ROP Model. Once the data screening, filtering, and outlier removal processes were finished, the dataset used to develop the SVM ROP Model consisted of 11,336 records.

Table 1

Parameters cleansing statistical data

Before Data Cleansing	Mean	Std Dev	Min	Max	Median	Mode
TVD	2332.17	1306.61	70	4640	2332	420
Bit RPM	116.29	29.77	9	321	110	110
TORQUE	6359.68	1766.98	53	12753	6489	6076
SPP	2495.63	718.56	80	3771	2500	2497
HKLD	109.76	29.6	27.45	156.63	118.87	139.02
WOB	6.73	2.88	0	59.67	6.82	8.92
MFI	928.37	246.07	413	1361	931	1203
MWIN	1.4	0.2	1.1	1.75	1.28	1.28
MT in	56.51	6.18	21	66	58	59
ECD	1.43	0.23	1.1	1.86	1.3	1.29
DXPC	0.87	0.167	0	1.61	0.87	0.91
Pumps	184.09	30.69	90	244	193	211
ROP	11.48	7.38	0.43	69.12	9.8	8.72
After Data Cleansing	Mean	Std Dev	Min	Max	Median	Mode
TVD	2276	1197.13	183	4542	2250.5	420
Bit RPM	109.93	6.798	89	132	110	110
TORQUE	6609	1686	780	12753	6826	7003
SPP	2503.95	556.497	874	3771	2488	2334
HKLD	110.47	26.93	39.51	156.63	118.37	140
WOB	6.93	2.82	0	15.73	7.05	9.23
MFI	965.1	218.88	476	1361	982	1203
MWIN	1.39	0.19	1.12	1.73	1.28	1.28
MT in	57.22	4.53	27	66	58	59
ECD	1.41	0.21	1.12	1.86	1.3	1.29
DXPC	0.88	0.16	0	1.61	0.88	0.91
Pumps	190.35	25.27	90	244	197	211
ROP	10.56	6.017	0.43	27.87	9.275	5.12

Support vector machine (SVM)

In this research, we developed SVM model with a Radial Basis Function (RBF) kernel to forecast the Rate of Penetration (ROP) in drilling operations. The model's performance was assessed by testing different combinations of the cost parameter (C) and gamma (γ) [9]. The objective was to determine the optimal combination of these parameters that minimizes the Root Average Squared Error (RASE) and reduces the R² value, indicating a better fit and generalization capability of the model.

Fig. 3. Concept of ε-insensitivity in the linear data analysis [10, p. 515-535]

Performance measures

To check how well does the developed model truly represent the set of data, the forecasting capacity of the model were determined by using two performance criteria: the mean square error (MSE) and the determination coefficient . The and MSE values can be calculated utilizing Eqs. (2) and (3), respectively.

, (2)

, (3)

Where 𝑝̅ and 𝑚̅ are the standard deviation and mean of the recorded values, respectively; pi and mi are the recorded and simulated values at step i, respectively; n is the number of samples in the database [11].

Results and discussion

Detailed comparison between SVM ROP Model output with real-life measured data were performed, supported with graphical and statistical representation of the created ROP Model outputs against real life data. Differences between ROP Model output and real-life measured data was calculated to measure the errors which is a measure of quality, precision, and uncertainty of the developed Model. This error analysis is used to determine the accuracy of SVM model for ROP prediction, and to examine its suitability to simulates the physical behavior.

Support vector machine (SVM) ROP model

The model's performance was assessed through different combinations of the cost parameter (C) and gamma (γ), as shown in table (2). The SVM model was developed and validated using a comprehensive dataset, with the (C) and (γ) values systematically varied ten times producing ten models to determine their impact on the model’s performance.

Table 2

SVM ROP model selection after building 10 models

Method	Cost	Gamma	Training RASE	Validation RASE	Test RASE	Validation
Model 1	4.12	0.37	1.39	2.04	1.91	0.88
Model 2	2.09	0.14	1.8	1.99	1.87	0.89
Model 3	0.71	0.47	1.67	2.05	1.98	0.88
Model 4	1.43	0.33	1.62	1.97	1.9	0.89
Model 5	0.46	0	2.91	2.8	2.83	0.78
Model 6	2.91	0.28	1.55	1.96	1.87	0.89
Model 7	4.72	0.24	1.97	1.95	1.85	0.89
Model 8	2.48	0.44	1.41	2.05	1.94	0.88
Model 9	3.53	0.04	2	2.08	1.95	0.88
Model10	0.05	0.22	2.4	2.41	2.31	0.84

Upon evaluating the performance metrics, it was evident that model number 7 with Cost (C): 4.72 and Gamma (γ): 0.24. This combination yielded the highest R² value of approximately 0.8945 and the lowest RASE values across the test, validation, and training datasets. Specifically, the Training RASE was 1.97, the Validation RASE was 1.95, and the Test RASE was 1.84. These findings suggest that the model is highly accurate and generalizes well, making it the optimal choice for predicting ROP.

Fig. 4. Cross-Plot of SVM ROP Model Outputs vs. Real Data (Training Data Set)

Fig. 5. Cross-Plot of SVM ROP Model Outputs vs. Real Data (Validation Data Set)

Fig. 6. Cross-Plot of SVM ROP Model Outputs vs. Real Data (Testing Data Set)

Evaluating the residuals (the difference between predicted and actual ROP) serves as a valuable method for identifying shortcomings in the ROP Model. From figure 7, it is obvious that the mean value of residual error in testing data sets is almost zero, with a standard deviation of residuals around + / -2, which means that the predicted value has a relatively low tolerance of error.

Fig. 7. Residual analyses for testing data set

Conclusions

SVM was successfully utilized to develop models for forecasting ROP in a Middle East field.
The SVM ROP Model demonstrated a good correlation coefficient (0.893 to 0.90) and RMSE (1.84 to 1.97).
Sensitivity analysis revealed that parameters such as "mud flow in" and "pump rate" positively impact ROP, whereas other factors have a negative effect. Among the input variables, the D exponent (DXPC) has the greatest influence on ROP.

Список литературы

Riazi M. et al., “Modelling rate of penetration in drilling operations using RBF, MLP, LSSVM, and DT models,” Scientific Reports, Vol. 12, No. 1, P. 1-25, 2022, doi: 10.1038/s41598-022-14710-z.
Rabia H. “Specific energy as a criterion for bit selection,” Journal of petroleum technology, Vol. 37, No. 07, P. 1225-1229, 1985.
Amer M.M., Aramco S., Abdel P., Dahab S., Abdel-alim P. “SPE-187969-MS An ROP Predictive Model in Nile Delta Area Using Artificial Neural Networks Artificial Neural Network (ANN),” 2017.
Aliyev R., Paul D. “A novel application of artificial neural networks to predict rate of penetration,” SPE Western Regional Meeting Proceedings, vol. 2019, 2019, doi: 10.2118/195268-ms.
Mohammadi A.H., Eslamimanesh A., Gharagheizi F., Richon D. “A novel method for evaluation of asphaltene precipitation titration data,” Chemical Engineering Science, Vol. 78, P. 181-185, 2012.
Gramatica P. “Principles of QSAR models validation: internal and external,” QSAR & combinatorial science, Vol. 26, No. 5, P. 694-701, 2007.
Hemmati-Sarapardeh A., Ameli F., Dabir B., Ahmadi M., Mohammadi A.H. “On the evaluation of asphaltene precipitation titration data: Modeling and data assessment,” Fluid Phase Equilibria, Vol. 415, P. 88-100, 2016.
Zhang C., Selinus O. “Statistics and GIS in environmental geochemistry–some problems and solutions,” Journal of Geochemical Exploration, Vol. 64, No. 1-3, P. 339-354, 1998.
CM B. “Bishop pattern recognition and machine learning,” 2010, Springer, New York.
Gholami R., Fakhari N. “Support vector machine: principles, parameters, and applications,” in Handbook of neural computation, Elsevier, 2017, P. 515-535.
Hadi F.A., Nygaard R. “Shear wave prediction in carbonate reservoirs: Can artificial neural network outperform regression analysis?” In ARMA US Rock Mechanics/Geomechanics Symposium, ARMA, 2018, p. ARMA-2018.

Rate of penetration optimization using machine learning technique

Похожие статьи

Другие статьи из раздела «Нефтяная промышленность»