Interdisciplinary Journal of Environment and Computer Innovations

Data-Driven Prediction of Emission Factors in Nano-Additive Biofuel Engines: A Comparative Machine Learning Approach

Claire M. Sacro, Jose C. Agoylo Jr., Lalaine G. Melgazo, Jimson A. Olaybar, Jorton A. Tagud, Rosemarie Y. Saligue

Volume: 1, Issue: 1, Pages: 16-25, Published: 2026-06-01
Online ISSN: 3116-5850
Publisher: Bohol Island State University Candijay Campus College of Sciences

Abstract

Accurate prediction of emission factors in biofuel-powered internal combustion engines is critical for enhancing energy efficiency and reducing environmental impact. This study investigates the effectiveness of machine learning techniques in modeling emission behavior using experimental data derived from nano-additive biofuel engine systems. A comprehensive dataset containing engine operational parameters and exhaust emission characteristics—including engine load, speed, fuel blend ratio, nanoparticle concentration, injection pressure, brake thermal efficiency (BTE), and emissions (CO, NOx, HC, CO₂)—was utilized. Three regression-based machine learning models—Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Support Vector Regression (SVR)—were developed and evaluated. Data preprocessing steps included cleaning, encoding of categorical variables, feature scaling, and an 80:20 train–test split to ensure model generalization. Model performance was assessed using Root Mean Square Error (RMSE) and the coefficient of determination (R²). Results indicate that the Random Forest model outperformed other approaches, achieving the lowest RMSE (0.1029) and highest R² (0.9997), followed by XGBoost (RMSE = 0.2130, R² = 0.9989) and SVR (RMSE = 0.3421, R² = 0.9972). Feature importance analysis consistently identified BTE and CO₂ emissions as the most influential predictors, highlighting the significance of combustion efficiency and exhaust composition in emission modeling. The findings demonstrate that ensemble-based learning methods, particularly Random Forest, provide highly accurate and robust predictions for complex nonlinear emission systems. However, the extremely high accuracy values suggest the need for further validation to mitigate potential overfitting. This study contributes to the advancement of data-driven approaches for sustainable engine design and emission optimization in biofuel technologies.

Keywords

BTE, Random Forest, XGBoost, Support Vector Regression, Carbon Oxide.

Full Paper: The full paper will be available soon.