Digital core: neural network recognition of textual geological and geophysical information
https://doi.org/10.31660/0445-0108-2023-2-35-54
Abstract
The algorithm of analog-to-digital conversion of primary geological and geophysical information (on the example of identification of rock lithotypes based on the text description of the physical core) is presented.
As part of the work, a combination of three types of scientific research - prospecting, interdisciplinary and applied, in the formation of the initial base of qualitative data is implemented.
Common algorithms for textual information classification and mechanism of initial data preprocessing using tokenization are described.
The concept of text pattern recognition is implemented using artificial intelligence methods.
For creation of the neural network model of textual geological and geophysical information recognition the Python programming language is used in combination with the convolutional neural network technologies for text classification (TextCNN), bi-directional long-shortterm memory networks (BiLSTM) and bi-directional coder representation networks (BERT).
The stack of these technologies and the Python programming language, after developing and testing the basic version of the neural network model of qualitative information recognition, provided an acceptable level of performance of the algorithm of digital transformation of text data.
The best result (the current version of neural network model is 1.0; more than 3000 examples for training and testing) was achieved when using the algorithm of text data recognition based on BERT with an accuracy on the validation network (Validation Accuracy) ~0.830173 (25th epoch), with Validation Loss ~0.244719, with Training Loss ~0.000984 and probability of recognition of the studied rock lithotypes more than 95 %.
The mechanisms of code modification for further improvement of textual prediction accuracy based on the created neural network were determined.
About the Authors
Yu. E. KatanovRussian Federation
Yuri E. Katanov, Candidate of Geology and Mineralogy, Associate Professor at the Department of Applied Geophysics, Leading Researcher at Well Workover Technology and Production Stimulation Laboratory
Tyumen
A. I. Aristov
Russian Federation
Artyom I. Aristov, Assistant at the Laboratory of Digital Research in the Oil and Gas Industry
Tyumen
A. K. Yagafarov
Russian Federation
Alik K. Yagafarov, Doctor of Geology and Mineralogy, Professor
Tyumen
O. D. Novruzov
Russian Federation
Orchan D. Novruzov, Assistant at the Laboratory of Digital Research in the Oil and Gas Industry
Tyumen
References
1. Katanov, Yu. E., Yagafarov, A. K., Kleshchenko, I. I. Savina, M. E., Shlein, G. A., & Yagafarov, A. K. (2020). Studying the influence of capillary phenomena in two-phase filtration of immiscible fluids in porous media. Oil and Gas Studies, (1), pp. 19-29. (In Russian). DOI: 10.31660/0445-0108-2020-1-19-29
2. Katanov, Yu. E. (2021). A probabilistic and statistical model of rock deformation. E3S Web of Conferences, 266. (In English). Available at: https://doi.org/10.1051/e3sconf/202126603011
3. Katanov, Yu. E., Vaganov, Yu. V., & Listak, M. V. (2020). Geological and mathematical description of the rocks strain during behavior of the producing solid mass in compression (Tension). Journal of Mines, Metals & Fuels, 68(9), pp. 285-293. (In English). DOI: 10.33271/mining15.04.091
4. Lomov, P. A., & Malozemova, M. L. (2021). Training set augmentation in training neural-network language model for ontology population. Тransactions of the Kola Science Centre. Information technologies. Series 12, 12(5), pp. 22-34. (In Russian). DOI: 10.37614/2307-5252.2021.5.12.002
5. Saygin, A. A., & Plotnikova, N. P. (2021). Vectorization of regulatoryreference information using the BERT neural network. Information technology and mathematical modeling in the management of complex systems, (2), pp. 52-59. (In Russian). Available at: https://doi.org/10.26731/2658-3704.2021.2(10).52-59
6. Solomin, A. A., & Ivanova, Yu. A. (2020). Modern approaches to multiclass intent classification based on pre-trained transformers. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 20(4), pp. 532-538. (In Russian). DOI: 10.17586/2226-1494-2020-20-4-532-538
7. Al-Garadi, M. A., Yang, Y. C., Cai, H., Ruan, Y., O'Connor, K., Graciela, G. H., & Sarker, A. (2021). Text classification models for the automatic detection of nonmedical prescription medication use from social media. BMC medical informatics and Decision Making, 21. (In English). Available at: https://doi.org/10.1186/s12911-021-01394-0
8. Arslan, Y., Allix, K., Veiber, L., Lothritz, C., Bissyandé, T. F., Klein, J., & Goujon, A. (2021). Comparison of Pre-Trained Language Models for Multi-Class Text Classification in the Financial Domain. Companion Proceedings of the Web Conference. pp. 260-268. (In English). DOI: 10.1145/3442442.3451375
9. Çelikten, A., & Bulut, H. Turkish Medical Text Classification Using BERT. (2021). 2021 29th Signal Processing and Communications Applications Conference (SIU). IEEE. (In English). Available at: https://doi.org/10.1109/SIU53274.2021.9477847
10. Das, S., Mandal, S. K. D., & Basu, A. (2020). Identification of Cognitive Learning Complexity of Assessment Questions Using Multi-class Text Classification. Contemporary Educational Technology, 12(2). (In English). Available at: https://doi.org/10.30935/cedtech/8341
11. Enkhsaikhan, M., Liu, W., Holden, E. J., & Duuring, P. (2021). Autolabelling entities in low-resource text: a geological case study. Knowledge and Information Systems, 63, pp. 695-715. (In English). DOI: 10.1007/s10115-020-01532-6
12. Gao, X., & Li, Q. (2021). Named entity recognition in material field based on Bert-BILSTM-Attention-CRF. 2021 IEEE Conference on Telecommunications, Optics and Computer Science (TOCS), pp. 955-958. (In English). DOI: 10.1109/TOCS53301.2021.9688665
13. Glazkova, A., Egorov, Y., & Glazkov, M. (2020). A Comparative study of Feature Types for Age-Based Text Classification. International Conference on Analysis of Images, Social Networks and Texts, pp. 120-134. (In English). DOI: 10.1007/978-3-030-72610-2_9
14. Groenwold, S., Honnavalli, S., Ou, L., Parekh, A., Levy, S., Mirza, D., & Wang, W. Y. (2021). Evaluating Transformer-Based Multilingual Text Classification. arXiv:2004.13939v2 [cs.CL]. (In English). Available at: https://doi.org/10.48550/arXiv.2004.13939
15. Huang, X., Zhu, Y., Fu, L., Liu, Y., Tang, K., & Li, J. (2021). Research on a geological entity relation extraction model for gold mine based on BERT. Journal of Geomechanics, 27(3), pp. 391-399. (In English). DOI: 10.12090/j.issn.1006-6616.2021.27.03.035
16. Kabaev, A. S., Khaustov, S. V., Gorlova, N. E., & Kalmykov, A. V. (2021). BERT for Russian news clustering. (In English). Available at: https://doi.org/10.28995/2075-7182-2021-20-385-390
17. Lv, X., Xie, Z., Xu, D., Jin, X., Ma, K., Tao, L., Qiu, Q., & Pan, Y. (2022). Chinese named entity recognition in the geoscience domain based on BERT. Earth and Space Science, 9(3). (In English). Available at: https://doi.org/10.1029/2021EA002166
18. Ma, K., Tian, M., Tan, Y., Xie, X., & Qiu, Q. (2022). What is this article about? Generative summarization with the BERT model in the geosciences domain. Earth Science Informatics, (15) pp. 21-36. (In English). DOI: 10.1007/s12145-021-00695-2
19. Piao, G. (2021). Scholarly Text Classification with Sentence BERT and Entity Embeddings. PAKDD 2021: Trends and Applications in Knowledge Discovery and Data Mining, pp. 79-87. (In English). DOI: 10.1007/978-3-030-75015-2_8
20. Prabhu, S., Mohamed, M., & Misra, H. (2021). Multi-class Text Classification using BERT-based Active Learning. arXiv:2104.14289v2 [cs.IR]. (In English). Available at: https://doi.org/10.48550/arXiv.2104.14289
21. Qasim, R., Bangyal, W. H., Alqarni, M. A., & Ali Almazroi, A. (2022). A Fine-Tuned BERT-Based Transfer Learning Approach for Text Classification. Journal of Healthcare Engineering. (In English). Available at: https://doi.org/10.1155/2022/3498123
22. Kici, D., Bozanta, A., Cevik, M., Parikh, D., & Başar, A. (2021). Text classification on software requirements specifications using transformer models. Proceedings of the 31st Annual International Conference on Computer Science and Software Engineering, pp. 163-172. (In English). DOI: 10.5555/3507788.3507811
23. Lun, C. H., Hewitt, T., & Hou, S. (2021). Extracting Knowledge with NLP from Massive Geological Documents. 82nd EAGE Annual Conference & Exhibition. European Association of Geoscientists & Engineers. (In English). Available at: https://doi.org/10.3997/2214-4609.202112807
24. Smetanin, S. I. (2020). Toxic comments detection in Russian. Computational Linguistics and Intellectual Technologies, pp. 1149-1159. (In English). DOI: 10.28995/2075-7182-2020-19-1149-1159
Review
For citations:
Katanov Yu.E., Aristov A.I., Yagafarov A.K., Novruzov O.D. Digital core: neural network recognition of textual geological and geophysical information. Oil and Gas Studies. 2023;(3):35-54. (In Russ.) https://doi.org/10.31660/0445-0108-2023-2-35-54