Skip to main content

Modeling retroreflectivity degradation of pavement markings across the US with advanced machine learning algorithms


Retroreflectivity is the primary metric that controls the visibility of pavement markings during nighttime and in adverse weather conditions. Maintaining the minimum level of retroreflectivity as specified by Federal Highway Administration (FHWA) is crucial to ensure safety for motorists. The key objective of this study was to develop robust retroreflectivity prediction models that can be used by transportation agencies to reliably predict the retroreflectivity of their pavement markings utilizing the initially measured retroreflectivity and other key project conditions. A total of 49,632 transverse skip retroreflectivity measurements of seven types of marking materials were retrieved from the eight most recent test decks covered under the National Transportation Product Evaluation Program (NTPEP). Decision Tree (DT) and Artificial Neural Network (ANN) algorithms were considered for developing performance prediction models to estimate retroreflectivity at different prediction horizons for up to three years. The models were trained with randomly selected 80% data points and tested with the remaining 20% data points. Sequential ANN models exhibited better performance with the testing data than the sequential DT models. The training and testing R2 ranges of the sequential ANN models were from 0.76 to 0.96 and 0.55 to 0.94, respectively, which were significantly higher than the R2 range (0.14 to 0.75) from the regression models proposed in past studies. Initial or predicted retroreflectivity, snowfall, and traffic were found to be the most important inputs to model predictions.


Pavement Markings are retroreflective longitudinal and transverse lines that are installed on the pavement surfaces to delineate their profile [1, 2]. The most commonly used marking materials in the U.S. are either non-durable (waterborne paint) or durable (thermoplastic, epoxy, polyurea, methyl methacrylate [MMA], and tape) [3]. Markings are a critical component of overall traffic signalization as they define boundaries between moving vehicles and a well-maintained marking system enhances safety for motorists during daytime, nighttime, and in poor visibility conditions [4].

As per the Federal Highway Administration (FHWA), over 50% of all traffic fatalities occur at night while the majority of travel happens during the daytime [5]. The visibility of pavement markings is primarily dependent on their retroreflectivity [6]. Retroreflectivity is the property of markings that describes the phenomenon in which light originating from vehicle headlights illuminates the visible pavement marking surface and a substantial portion of it returns to the eye of the motorists [7]. Retroreflectivity, measured by the coefficient of retroreflected luminance (RL) in millicandelas per square meter per lux (mcd/m2/lux), is provided through a partial embedment of transparent glass beads inside the markings [6]. An adequate level of RL can reduce 50% of crashes during nighttime and 28% of fatalities in dark, rainy, or snowy weather conditions [8].

RL degrades over time depending on the type of marking material, glass bead properties, climatic conditions, traffic loading, road surface type, and snow-plow frequency [9]. Recently, FHWA published a new rule that established national standards for minimum RL levels for longitudinal marking lines on all roads depending on speed limits [10]. This final rule requires all state agencies to implement a method before 2027 to maintain the specified minimum RL level and consider retroreflectivity in future restriping activities, highlighting the importance of regular retroreflectivity monitoring for the marking lines. However, instead of continuously monitoring and restriping when RL drops below the minimum threshold, many transportation agencies restripe markings as per a fixed schedule or visual inspection [11]. These restriping strategies are not optimum as markings are often restriped before or after the end of their service life, resulting in misspending available funds or compromising the safety of the motorists, respectively. As such, modeling retroreflectivity degradation is critical to estimate the service life of the markings and to accordingly plan for future restriping activities.

Objective and scope

The objective of this study was to develop machine learning models that can be used by U.S. transportation agencies to predict, with superior accuracy, the retroreflectivity of pavement markings, over a period of 3 years, based on the initial measured retroreflectivity and other key project conditions. Two different machine learning algorithms, Decision Tree and Artificial Neural Network algorithms, were utilized to develop performance prediction models for seven types of pavement marking materials (waterborne paint, thermoplastic, preformed thermoplastic, permanent polymeric tape, epoxy, polyurea, and MMA) located in three different U.S. climate zones (Southeast, Northeast, and Upper Midwest). The proposed models are expected to provide a scientific basis for transportation agencies in predicting the service life of pavement markings based on local conditions and only one initial retroreflectivity measurement, thereby eliminating the need for the costly monitoring of the retroreflectivity of marking products.


This section provides a brief background about the NTPEP as it is the source of data used in this study. After that, this section presents the results of previous studies that modeled the retroreflectivity degradation of pavement markings to be used as a baseline for comparison to the results of this study. Eventually, this section is concluded with the shortcomings in the current state of knowledge and how these shortcomings are addressed in the developed models presented in this study.

Overview of the National Transportation Product Evaluation Program (NTPEP)

American Association of State Highway and Transportation Officials (AASHTO) implements a consensus-based work plan every year through National Transportation Product Evaluation Program (NTPEP) to evaluate the field performance of a variety of pavement marking products [12]. NTPEP selects test decks from across the U.S., representing various traffic and geographical conditions and installs marking products on those in transverse direction [13]. In a typical test deck, four marking lines are installed side-by-side on an asphalt or concrete surface from the inner side of the “edge” line to the far side of the “skip” line [12] (see Fig. 1). For each line, retroreflectivity measurements are collected from both the “skip” area (within the first 9 in. from the “skip” line, known as transverse-skip retroreflectivity, RS) and “wheel” area (9 in. on both sides of the left wheel path, known as transverse-wheel retroreflectivity, RW) with a handheld retroreflectometer [11]. These readings are collected at 12 different time intervals (after 0, 1, 2, 3, 11, 12, 15, 21, 24, 27, 33, and 36 months) for up to 3 years. Accelerated degradation of retroreflectivity occurs in the ‘wheel’ area due to continuous friction with vehicle tires and does not mimic the service life of an actual longitudinal marking stripe [11]. Traffic condition in “skip” area is more representative of the skip-line stripes in the field [14], and therefore, only RS measurements were considered in this study for modeling.

Fig. 1
figure 1

Typical configuration of an NTPEP test deck and retroreflectivity measurement locations

Previous retroreflectivity degradation models

A literature review was conducted on past studies that proposed retroreflectivity degradation models to estimate the service life of pavement markings, as summarized in Table 1. Previous studies mostly adopted the parametric approach, more specifically, regression models to predict future retroreflectivity [9, 15,16,17,18,19,20,21]. Regression models make stringent assumptions about the shape of the mapping function which often significantly differ from the true shape of the relationship between the inputs and output [22]. The adopted datasets in the parametric approach typically demonstrate high dimensionality (i.e., data have many inputs) and high multicollinearity (i.e., high correlations between input variables) [23]. This high dimensionality and high multicollinearity of the adopted datasets for the past regression models prevented the input variables to be independent of each other, violating one of the fundamental assumptions of the parametric approach. As such, an unpredictable variance was imposed on the models weakening their statistical power. High variability in retroreflectivity data made it challenging for the regression models to estimate the service life of the markings with a high level of statistical confidence, even with the collection of more data [24]. Due to these limitations, past regression models [9, 15,16,17,18,19,20,21] predicted retroreflectivity with relatively low accuracy (the coefficient of determination (R2) ranging between 0.14 to 0.75).

Table 1 Summary of the literature review on retroreflectivity degradation modeling

Because of the poor reliability of the past regression models, retroreflectivity degradation models were developed using supervised machine learning algorithms [11, 25, 26]. Machine learning algorithms construct models based on a “non-parametric” approach without making explicit assumptions about the shape of the mapping function. These models can efficiently capture complex patterns from the dataset and in general, achieve higher prediction accuracy than the regression models [22]. Karwa and Donnell [25] proposed an Artificial Neural Network model to estimate the service life of thermoplastic markings without considering climatic conditions (i.e., rainfall and/or snowplow activities) as input variables. Mousa et al. [11] predicted retroreflectivity for up to three years with reasonable accuracy (R2 = 0.83 – 0.98) using a Categorical Boosting model, but its applicability was limited to waterborne paint markings. More recently, Idris et al. [26] developed Genetic Algorithm models (R2 = 0.64 – 0.93) to predict retroreflectivity. However, the models were developed for only thermoplastic markings which only predicted retroreflectivity for up to a year.

Advancements based on previous research

Based on Table 1 it can be observed that most previous retroreflectivity prediction models had limited scope as those were site-specific and marking material-specific models. These models were developed considering a few input variables (i.e., time, traffic, line lateral location, and/or initial retroreflectivity). To address the shortcomings of the past studies, retroreflectivity degradation models were developed in this study using two different machine learning algorithms, Decision Tree and Artificial Neural Network. These models were constructed considering all the significant input variables affecting retroreflectivity. Additionally, the scope of the proposed models has been further extended to seven types of commonly used marking materials along with traffic and climatic conditions of different geographical regions across the U. S.

It is worth mentioning that other common machine algorithms besides Decision Tree and Artificial Neural Network, including Support Vector Machine, LightGBM, and K-Nearest Neighbors, were initially implemented in this study to develop retroreflectvity degradation models. Decision Tree and Artificial Neural Network models yielded the highest prediction accuracy among those, and therefore, the results of Decision Tree and Artificial Neural Network models were only presented in this study.

Data collection

In this study, the measured RS and other relevant variables were retrieved from NTPEP’s online data repository and assembled into a dataset. This dataset included RS readings from the following eight recent NTPEP test decks distributed over three different U.S. climate zones (Southeast, Northeast, and Upper Midwest):

  • Minnesota: 2010, and 2013

  • Pennsylvania: 2011, and 2014

  • Florida: 2012, 2015, and 2019

  • Wisconsin: 2017

A total of 517 marking products were considered in this study. Each product was installed as eight transverse lines, four lines on an asphalt surface and four lines on a concrete surface, resulting in a total of 4,136 marking lines (517 products × 2 pavement surfaces × 4 marking lines = 4,136 lines). For each line of these 4,136 lines, RS measurements were collected at 12 time intervals (after 0, 1, 2, 3, 11, 12, 15, 21, 24, 27, 33, and 36 months) resulting in a total of 49,632 RS values (517 products × 8 lines per product × 12 RS per line = 49,632). The descriptions of retrieved variables for each marking are presented in Table 2.

Table 2 Description of the variables in the assembled dataset

Exploratory data analysis

Descriptive statistics

The general descriptive statistics (i.e., minimum, and maximum values, interquartile range, mean, median, and outliers) of the numerical variables were calculated. These descriptive statistics are presented in Fig. 2 as box-whisker plots. It is worth mentioning that the outliers in MRS, as shown in Fig. 2d, represented true variability of RS measurements. Therefore, these outliers were not removed to maintain the original statistical distribution of the study dataset, which is a common practice for handling outliers [27].

Fig. 2
figure 2

Descriptive statistics of (a) TH, b SN, c TR, and d MRS, respectively

Correlation analysis

A correlation analysis was conducted to determine the degree of association between all the collected variables. Pearson’s R, [28], eta (η) coefficient [29], and Cremer’s V [30], were utilized to evaluate the association between numeric-numeric, numeric-categorical, and categorical-categorical variable pairs, respectively. Pearson’s R ranges from -1.0 (a perfect, decreasing, linear association) to 1.0 (a perfect, increasing, linear association) while the values of η coefficient and Cremer’s V lie between 0 (no association) to 1 (a perfect association). The developed correlation matrix is presented in Fig. 3. Among the numeric-categorical variable pairs, (TH, T) had the highest η coefficient of 0.9. (E, TR) showed the highest Pearson’s R of 0.7 among the numeric-numeric variable pairs. (M, T) exhibited the highest Cramer’s V of 0.6 among categorical-categorical variable pairs. These high values of η coefficient, Pearson’s R, and Cramer’s V indicate high multicollinearity in the compiled dataset.

Fig. 3
figure 3

Correlation matrix

Data preprocessing

Both DT and ANN algorithms cannot process categorical data [31]. Therefore, as a data preprocessing step, the categorical variables were converted into numerical forms utilizing Label Encoding technique [32]. This technique designated every unique category a number ranging between 0 and (X-1), where X represented the total number of unique categories for this categorical variable.

Model development

This section provides technical details of the machine learning algorithms implemented in this study, followed by an overview of the model development process.

Machine learning algorithms

Two machine learning algorithms were utilized in this study for model development, as described in the following sub-sections.

  1. a)

    Decision Tree (DT) is a machine learning algorithm that builds models in the form of tree-like structures consisting of a root node, internal nodes, and leaves that are connected through branches [33]. A simple DT model consisting of two predictors (X1 and X2) is illustrated in Fig. 4a. DT uses a series of splitting rules (X1 ≤ s1, X2 ≤ s2,…, X2 ≤ s4) to divide the training observations into various regions of the input space utilizing recursive binary splitting technique (see Fig. 4b) [22]. This process is iteratively applied until the objective function is optimized and the leaves (R1, R2, …, R5) are established [34]. The mean response value for the observations (C1, C2,…, C5) falling under each leaf is used as the final prediction [35]. The most important hyperparameters that control model architecture and require tuning during the training process of a DT model consist of maximum depth (D), minimum samples split (S), minimum samples leaf (L), and maximum features (F) [36]. The general mathematical form of a DT model and the objective function are presented in Eqs. 1 and 2, respectively.

Fig. 4
figure 4

The schematic representation of a Decision Tree (DT) model

$$f\left(x\right)={C}_{q\left(x\right)},(q:{\mathbb{R}}^{m}\to 1, 2, \dots , t, C\in {\mathbb{R}}^{m})$$


q(x) = splitting rule

m = number of input variables

t = total number of leaves

\({C}_{q\left(x\right)}\) = mean response of a leaf

$$Obj\left(T\right)={\sum }_{i=1}^{N}({{y}_{i}-{\widehat{y}}_{i})}^{2}+\alpha (T)$$


\(Obj\left(T\right)\) = objective function

\({\sum }_{i=1}^{N}({{y}_{i}-{\widehat{y}}_{i})}^{2}\) = loss function

\(\alpha (T)\)= regularization term

  1. b)

    Artificial Neural Network (ANN) is another machine learning algorithm that builds an artificial neural network with a multilayer perceptron (MLP) architecture, consisting of an input layer, the maximum of two hidden layers, and an output layer [37, 38]. A simple ANN model consisting of a hidden layer is illustrated in Fig. 5. The primary processing elements in each layer are neurons (N1, N2, ….., Nn) that are interconnected by certain weights. The network is built through a 2-stage optimization process, “forward pass” and “backpropagation” [39]. In the forward pass, the inputs (X1, X2,., Xj) and associated weights (W1,1, W2,1,…., Whh,k) are are multiplied by the weights, summed, and added with a bias term \(\left({a}_{H}\right)\) to produce a linear output. The linear output is passed through an activation function \(\left({\varnothing }_{H}\right)\) (i.e., sigmoid, hyperbolic tangent, or Rectified Linear Unit Function (ReLU)) to obtain a non-linear output [22]. The output from a neuron in the hidden layer acts as an input to the neuron in output layer (K). In backpropagation, the errors made in the forward pass are distributed from the output layer to the input layer through the weights using an optimizer (i.e., adam, or Stochastic Gradient Descent (SGD)) and as a result, the weights are updated [40]. Forward pass and backpropagation are iteratively performed until the objective function is optimized [22]. The most important hyperparameters requiring tuning for an ANN model include number of hidden layers (H), number of neurons (N), batch size (B), activation function (A), optimizer (O), and learning rate (Lr) [36]. The mathematical formulations of an ANN model and objective function are presented in Eqs. 3 and  4, respectively.

Fig. 5
figure 5

The schematic representation of an Artificial Neural Network (ANN) model

$${\widehat y}_n=\varnothing_0\left(\alpha_k+{\textstyle\sum_{p=1}^N}w_{hk}\varnothing_h\left(\alpha_h+{\textstyle\sum_{j=1}^J}w_{jh}x_{ij}\right)\right)$$


\({\widehat{y}}_{n}\)= model prediction

\({w}_{jh}, {w}_{hk}\) = weights between input and the hidden layers and between hidden and output layers

\({x}_{ij}\) = inputs

\({a}_{h}, {a}_{k}\) = bias terms in the hidden layer neuron and output neuron, respectively

N = number of neurons in the hidden layer

\({\varnothing }_{h},{\varnothing }_{0,}\) = activation functions in the hidden layer neuron, and output neuron, respectively.

$$Obj\left(\theta \right)= \frac{1}{M}{\sum }_{m =1}^{M}({{y}_{n}-{\widehat{y}}_{n})}^{2}+\Omega \alpha (\theta )$$


\(Obj\left(\theta \right)\) = objective function

\(\frac{1}{M}{\sum }_{m =1}^{M}({{y}_{n}-{\widehat{y}}_{n})}^{2}\) = loss function

\(\Omega \alpha (\theta )\) = regularization term.

Overview of model development process

In this research, two model development strategies (Strategy A and Strategy B) and were utilized for model development and their results were compared to identify the best strategy of constructing RS prediction models. In Strategy A, two different integral models was developed, one for DT and one for ANN, to predict RS after one month (PRS1), two months (PRS2), three months (PRS3), and 11 months (PRS11), and similarly after 12, 15, 21, 24, 27, 33, and 36 months using S, T, C, M, TH, b, B, E, TR, and SN as inputs and MRS as the target variable. As per this strategy, the general formulations for the DT and ANN models are presented in Eq. 5.

$${PR}_{S}= f(S, T, C, M, TH, b, B, E, TR, SN)$$


\({PR}_{S}\) = Predicted RS

\(S, T, C, M, TH, b, B\) = Time independent inputs

\(TR, SN\) = Time dependent inputs

On the other hand, utilizing Strategy B, a sequential RS prediction models were developed for both DT and ANN and two separate sets of 11 models (DT-A through DT-K and ANN-A through ANN-K) were developed. The schematic of the model inputs and outputs for DT models is illustrated in Fig. 6, as an example. As shown in Fig. 6, Model DT-A utilizes initially measured RS (MRS0) and other key input variables at the time of installation (S, T, C, M, TH, b, B, TR0, and SN0) to predict RS after E = 1 month. For Model DT-B, the output from Model DT-A (PRS1) was combined with other inputs at E = 1 month to predict RS after E = 2 months. This process was iteratively used for the remaining models. Based on the framework illustrated in Fig. 6, the general formulations for the DT and ANN models are presented in Eq. 6.

Fig. 6
figure 6

Schematic representation of the inputs and output for the DT models

$${PR}_{S{E}_{i}}=f(S, T, C, M, TH, b, B, {TR}_{S{E}_{i-1}},{SN}_{S{E}_{i-1}}, {X}_{{E}_{i-1}})$$


\({E}_{i}\)= Elapsed times (\({E}_{0}\), \({E}_{1}\), …,\({E}_{36}\))

\({X}_{{E}_{i-1}}\)= \({MR}_{S0}\), for \({E}_{i}\) = \({E}_{0}\) or \({PR}_{S{E}_{i-1}}\), for \({E}_{i}\) > \({E}_{0}\)

\({PR}_{S{E}_{i}}\) = Predicted RS for month \({E}_{i}\)

\(S, T, C, M, TH, b, B\) = Time independent inputs

\({TR}_{S{E}_{i-1}},{SN}_{S{E}_{i-1}}\)= Time dependent inputs at previous elapsed time (\({E}_{i-1}\))

\({MR}_{S0}\) = Initially measured RS

\({PR}_{S{E}_{i-1}}\) = Predicted RS at previous elapsed time (\({E}_{i-1}\))

Model training

Training DT models

DT models, developed with Strategy A or Strategy B, were trained with the training dataset (randomly selected 80% of the total data points) when the models learnt patterns from the training dataset. DT-based models are unaffected by multicollinearity [41, 42]. These models are also insensitive to the scale of the inputs as the nodes are split based on a single input and are not affected by other inputs [43]. Therefore, inputs were not scaled for training DT models.

Training ANN models

ANN models, developed with Strategy A or Strategy B, were also trained with the training dataset. ANN uses the gradient descent technique to optimize the objective function and scaling the input variables enables it to reach global minima faster [44]. As such, Standard Scaler [45] was implemented to normalize the input variables by removing the mean and scaling to unit variance to ensure fast convergence. During training, each model was set to train for 1000 iterations. Early stopping was included in the training process to terminate training when validation R2 did not improve for 100 consecutive iterations.

Hyperparameter tuning

For both DT and ANN models, model hyperparameters were tuned during the training (Strategy A and Strategy B). Maximum depth (D), minimum samples split (S), minimum samples leaf (L), and maximum features (F) hyperparameters were tuned to prevent overfitting of the DT models. On the other hand, number of hidden layers (H) and neurons (N) were tuned for the ANN models to control model complexity and prevent overfitting. Batch size (B), activation function (A), optimizer (O), and learning rate (Lr) were tuned to achieve improved predictive performance from the ANN models. Moreover, L2 regularization was utilized by tuning alpha (α) hyperparameter to mitigate the effect of high multicollinearity of the assembled dataset.

The tuning of the model hyperparameters for both Strategy A and Strategy B were achieved through the combined implementation of grid search and 10-fold cross-validation techniques [11]. Grid search evaluated all possible combinations of values within the defined hyperparameter space to identify their optimal combination with maximum accuracy. Grid search was accompanied by 10-fold cross-validations, segmenting the training dataset into ten subsets. Training was performed with nine subsets, and validation was done with the remaining subset. This was repeated ten times by changing the validation subset. The average R2 value for the ten trials was used to evaluate the performance of the models. The developed hyperparameter spaces and optimum hyperparameter combinations for the models developed with Strategy A and Strategy B are presented in Tables 3, 4, 5 and 6.

Table 3 Hyperparameter space for the models developed with Strategy A
Table 4 Optimal combination of the hyperparameters for Strategy A
Table 5 Hyperparameter space for the models developed with Strategy B
Table 6 Optimal combination of the hyperparameters for the models developed with Strategy B

The training performance of the models developed with Strategy A and Strategy B was evaluated using coefficient of determination (R2), mean absolute percentage error (MAPE), and root mean square error (RMSE), as illustrated in Table 7 and Fig. 7, respectively. It can be observed from Table 7 that in Strategy A, ANN model provided better training performance than DT model with higher R2 (0.82), lower MAPE (44.6%), and lower RMSE (106.9 mcd/m2/lux).

Table 7 Training performance of the models developed with Strategy A
Fig. 7
figure 7

Training performance of the developed models

On the other hand, in Strategy B, the training R2, MAPE, and RMSE values were similar for both DT and ANN models until a prediction horizon of 3 months (i.e., DT-A to DT-C and ANN-A to ANN-C) (see Fig. 7). Although, in general, a downward trend in R2 and an upward trend in MAPE were observed when the prediction horizon was further extended from 3 to 36 months (i.e., DT-D to DT-K or ANN-D to ANN-K), no trend was observed in RMSE. The decrease in overall training accuracy with an increase in prediction horizon in Strategy B was expected because the predicted values were used as inputs to make further predictions, inducing errors in model predictions. Utilizing Strategy B, DT and ANN models fitted the training data with average R2, MAPE, and RMSE of 0.89/27.1%/53.9 mcd/m2/lux and 0.86/38.1%/61.6 mcd/m2/lux, respectively and therefore, demonstrated better training performance than the models developed with Strategy A. The ranges of training R2, MAPE, and RMSE for DT models were from 0.77 to 0.96, 9.7 to 57.6%, and 46.0 to 67.3 mcd/m2/lux, respectively while for ANN models, these ranges were from 0.76 to 0.96, 10.1 to 62.3%, and 53.5 to 73.5 mcd/m2/lux, respectively. Both DT and ANN models fitted training data well utilizing Strategy B with R2 values higher than the models listed in Table 1. Overall, the DT models demonstrated a better training performance than ANN models in Strategy B.

Model testing

Testing dataset (remaining 20% of total data points) was used to evaluate the performance of the developed models with the unseen data. The testing performance of the models developed with Strategy A is presented in Fig. 8. From Fig. 8, it can be observed that, like training performance, ANN provided better testing performance than DT with higher R2 (0.76), lower MAPE (50.5%), and lower RMSE (121.6 mcd/m2/lux).

Fig. 8
figure 8

Testing performance of the models developed with Strategy A

The testing performance of the DT and ANN models developed with Strategy B are shown in Figs. 9 and 10, respectively. Results showed that, like training accuracy, the testing accuracy of these models decreased with an increase of prediction horizon beyond 3 months (Figs. 9 and 10). Using Strategy B, DT and ANN models provided average R2, MAPE, and RMSE of 0.76/41.9%/79.7 mcd/m2/lux and 0.77/45.5%/74.8 mcd/m2/lux, respectively and therefore, demonstrated better testing performance than the models developed with Strategy A. The ranges of testing R2, MAPE, and RMSE for DT models developed with Strategy B were between 0.54 to 0.93, 12.6 to 95.1%, and 67.9 to 95.2 mcd/m2/lux, respectively; while for ANN models, these ranges were between 0.55 to 0.94, 10.6 to 80.7%, and 61.2 to 92.9 mcd/m2/lux, respectively. These models provided close RS estimates at different prediction horizons from a dataset that was not used during model training, and thereby demonstrating their robustness. Most of the ANN models exhibited better testing performance than the corresponding DT models. The testing R2 values of these models were also reasonably higher than the R2 of most past regression models predicting retroreflectivity for up to 3 years. These high R2 values indicated that proposed machine learning models developed with Strategy B can predict RS with a superior level of accuracy than the traditional regression models.

Fig. 9
figure 9

Testing performances of the models developed with DT algorithm (Strategy B)

Fig. 10
figure 10

Testing performances of the models developed with ANN algorithm (Strategy B)

It is worth mentioning that machine learning aims to build models that generalize well to unseen data, ensuring their effectiveness in real-world applications [46]. As such, testing performance is typically prioritized over training performance when selecting the best models [22]. With Strategy B, DT models exhibited better training performance than the ANN models, while the ANN models demonstrated superior testing performance and hence, a better generalization ability on unseen data than the DT models. Therefore, ANN was selected as the better algorithm than DT for the adopted dataset and ANN models developed with Strategy B were considered for further evaluation.

Feature importance study

The importance of each input for every ANN model was assessed using the SHapley Additive exPlanations (SHAP) values [47]. SHAP value for an input represents the contribution of that input to the difference between actual and expected prediction, averaged over all possible permutations of inputs. An input with a higher SHAP value indicates its higher impact on model prediction. For every ANN model, SHAP values for each input were calculated by taking the weighted average of the marginal contribution of the inputs. The mean absolute SHAP values of ANN-A inputs are presented in Fig. 11, as an example. The mean absolute SHAP values for the remaining models are reported in Table 8.

Fig. 11
figure 11

Feature importance values of the input variables to ANN-A

Table 8 Feature importance of the input variables for models ANN-B through ANN-K

For ANN-A, initially measured retroreflectivity (MRS at E = 0) was the most important input (mean absolute SHAP value = 166.9) (in Fig. 11). For the remaining models (in Table 8), predicted skip retroreflectivity (PRS) had the highest impact on model prediction. Additionally, SN followed by TR were the most important inputs after MRS or PRS for all ANN models (in Fig. 11 and Table 8). Most retroreflectivity degradation models in the literature did not consider these key inputs simultaneously which might explain their relatively low accuracy.

Illustrative implementation of the proposed models

Before using a pavement marking product in a project, a transportation agency might be interested in determining the expected service life of a specific marking product based on a specified minimum retroreflectivity threshold. As an example, the process of implementing the proposed models to estimate the service life of one of the marking products (included in testing dataset) is explained in the following steps with the aid of Tables 9 and 10.

Table 9 Example results for models ANN-A through ANN-F
Table 10 Example results for models ANN-G through ANN-K

Step 1: Initial retroreflectivity measurement

Firstly, initial RS of the marking line right after field installation needs to be determined. This could be measured in the field at the project level or assumed based on previous similar projects. As shown in Table 9, this value was 329 mcd/m2/lux.

Step 2: Collection of inputs over 3 years

In Step 2, all the other inputs in Tables 9 and 10 should be collected at different elapsed times (i.e., E = 0, 1, 2, 3, 11, 12, 15, 21, 24, 27, and 33 months). S, T, C, M, Th, b, and B are time-independent variables and could be easily determined for a specific product. TR and SN vary with E, which could be determined, predicted, or assumed based on historical data. It should be noted that for accurate prediction, all the numerical inputs applied at different E should lie within the ranges illustrated in Fig. 2.

Step 3: Sequential use of the proposed models

In this step, the user will assign all the input data at E = 0 and employ ANN-A to calculate PRS at month 1 (369 mcd/m2/lux in Table 9). Afterward, the user will employ all the input data at E = 1 along with PRS at month 1 (369 mcd/m2/lux in Table 9) and employ ANN-B to calculate PRS at month 2 (343 mcd/m2/lux in Table 9). This process is repeated for all the remaining values of E and for all the remaining models in Tables 9 and 10 until the PRS at month 36 (96 mcd/m2/lux in Table 10) is obtained. A comparison between measured RS (as collected from NTPEP) and predicted RS from the proposed models is illustrated in Fig. 12.

Fig. 12
figure 12

Comparison of the measured and predicted RS for the example in Tables 9 and 10

Step 4: Transverse to longitudinal retroreflectivity conversion

Based on the agency’s policy, the predicted transverse skip retroreflectivity (RS) values at different E should be transformed into longitudinal retroreflectivity (RL), which mimics the actual field conditions. It has been widely accepted to assume that RS correlated well to RL [14]. Furthermore, a recent study proposed simple models to perform the conversion between RS and RL [48].

Step 5: Service life estimation

Based on the RL values obtained in step 4 at different E, the service life of the marking line can be estimated until the time when a marking product’s RL drops below 100 mcd/m2/lux. As per the model predictions in Fig. 12, the expected service life of this marking product is somewhere between 27 to 33 months.

Summary and conclusions

The objective of this study was to develop machine learning models that could be utilized by U.S. local and state agencies to reliably predict the retroreflectivity of pavement markings. To fulfill this objective, transverse skip retroreflectivity data and other key variables were retrieved from NTPEP database. DT and ANN algorithms were considered to develop models for predicting RS sequentially for up to 3 years using two different model development strategies. All the models were trained with 80% of the total data points and tested with the remaining 20% data points. The findings and conclusions of the study were as follows:

  • Correlation analysis of the collected variables indicated that some variables were highly correlated which confirmed high multicollinearity in the study dataset.

  • Sequential retroreflectivity prediction models demonstrated higher accuracy than integral retroreflectivity prediction models both in training and testing.

  • Both sequential DT and ANN models, with overall R2 ranging between 0.54 to 0.96 and 0.55 to 0.96, predicted retroreflectivity at different prediction horizons with a superior level of accuracy as compared with the regression models proposed in the literature.

  • ANN models exhibited better testing performance than the DT models, and therefore, were selected as the better algorithm than DT for the adopted dataset.

  • Study of the feature importance of the input variables to ANN models revealed that initial retroreflectivity or predicted retroreflectivity followed by snowfall and traffic were the most important inputs to model predictions.

  • The proposed models are expected to assist state agencies and transportation officials in determining the service life of pavement marking products and plan for future restriping activities accordingly.

Overall, non-parametric supervised machine learning algorithms seemed to be a promising alternative to traditional parametric methods in modeling retroreflectivity degradation of pavement markings. In the future, longitudinal retroreflectivity data should be collected from northern and western states. More sophisticated DT-based ensemble algorithms (i.e., XGboost and LightGBM) and deep learning techniques (i.e., Recurrent Neural Network) should be employed to develop marking performance prediction models. Additionally, lightweight machine learning models that require less data should be explored to increase applicability in cases with limited data availability. Models compatible with multi-functional road condition detection vehicles, which are being increasingly used for pavement surveys, should also be developed. The models with best performance might be utilized to develop a simple retroreflectivity prediction tool that can enhance the decision-making process of transportation agencies regarding future restriping activities for the marking products.

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.


  1. Rasdorf WJ, Hummer JE, Zhang G, Sitzabee W (2009) Pavement marking performance analysis. Fin Rep, NCDOT, Raleigh

    Google Scholar 

  2. Mohamed MM (2019) Evaluation and modeling of pavement marking characteristics based on laboratory and field data. Dissertation, University of Idaho, Ann Arbor

    Google Scholar 

  3. Migletz J, Graham JL, Harwood DW, Bauer KM (2001) Service life of durable pavement markings. Transp Res Rec 1749(1):13–21

    Article  Google Scholar 

  4. Fu H, Wilmot CG (2013) evaluating alternative pavement marking materials. Public Work Manag Policy 18(3):279–297

    Article  Google Scholar 

  5. Benz RJ, Pike AM, Kuchangi SP, Brackett Q (2009) Serviceable pavement marking retroreflectivity levels. Tech Rep FHWA/TX-09/0-5656-1, TxDOT, Austin

    Google Scholar 

  6. Sitzabee WE, White ED, Dowling AW (2013) Degradation modeling of polyurea pavement markings. Public Work Manag Policy 18(2):185–199

    Article  Google Scholar 

  7. Xu L, Chen Z, Li X, Xiao F (2021) Performance, environmental impact and cost analysis of marking materials in pavement engineering, the-state-of-art. J Clean Prod 294:126302

    Article  Google Scholar 

  8. Hussein M, Sayed T, El-Basyouny K, de Leur P (2020) Investigating safety effects of wider longitudinal pavement markings. Accid Anal Prev 142:105527

    Article  PubMed  Google Scholar 

  9. Ozelim L, Turochy RE (2014) Modeling retroreflectivity performance of thermoplastic pavement markings in alabama. J Transp Eng 140(6):05014001

    Article  Google Scholar 

  10. FHWA (2022) National standards for traffic control devices; the manual on uniform traffic control devices for streets and highways; maintaining pavement marking retroreflectivity. Fed Regist 87(150):47921–47931

    Google Scholar 

  11. Mousa MR, Mousa SR, Hassan M, Carlson P, Elnaml IA (2021) Predicting the retroreflectivity degradation of waterborne paint pavement markings using advanced machine learning techniques. Transp Res Rec 2675(9):483–494

    Article  Google Scholar 

  12. Wang S (2010) Comparative analysis of ntpep pavement marking performance evaluation results. Dissertation, University of Akron, Akron

  13. Thomas GB, Schloz C (2001) Durable, cost-effective pavement markings phase I: synthesis of current research. Fin Rep Project No. TR-454, IOWA DOT, Ames

    Google Scholar 

  14. Zhang Y, Wu D (2010) Methodologies to predict service lives of pavement marking materials. J Transp Res Forum 45(3):5–18

    ADS  Google Scholar 

  15. Lee JT, Maleck TL, Taylor WC (1999) Pavement making material evaluation study in michigan. ITE J 69(7):44

    Google Scholar 

  16. Abboud N, Bowman BL (2002) Cost-and longevity-based scheduling of paint and thermoplastic striping. Trans Res Rec 1794(1):55–62

    Article  Google Scholar 

  17. Hollingsworth JD (2012) Understanding the impact of bead type on paint and thermoplastic pavement markings. Dissertation, Airforce Institute of Technology, Ohio

  18. Sitzabee WE, Hummer JE, Rasdorf W (2009) Pavement marking degradation modeling and analysis. J Infrastruct Syst 15(3):190–199

    Article  Google Scholar 

  19. Sarasua WA, Clarke DB, Davis WJ (2003) Evaluation of interstate pavement marking retroreflectivity. Fin Rep No. FHWA-SC-03-01, SCDOT, Columbia

    Google Scholar 

  20. Robertson J, Sarasua W, Johnson J, Davis W (2013) A methodology for estimating and comparing the lifecycles of high-build and conventional waterborne pavement markings on primary and secondary roads in south carolina. Public Work Manag Policy 18(4):360–378

    Article  Google Scholar 

  21. Malyuta DA (2015) Analysis of factors affecting pavement markings and pavement marking retroreflectivity in tennessee highways. Dissertation, University of Tennessee at Chattanooga, Chattanooga

  22. James G, Witten D, Hastie T, Tibshirani R (2021) An introduction to statistical learning. Springer, New York

    Book  Google Scholar 

  23. Umali J, Barrios E (2014) Nonparametric principal components regression. Commun Stat Comput 43(7):1797–1810

    Article  MathSciNet  Google Scholar 

  24. Kopf J (2004) Reflectivity of pavement markings: analysis of retroreflectivity curves. Res Rep WA-RD 592.1, WSDOT, Seattle

    Google Scholar 

  25. Karwa V, Donnell ET (2011) Predicting pavement marking retroreflectivity using artificial neural networks: exploratory analysis. J Transp Eng 137(2):91–103

    Article  Google Scholar 

  26. Idris II, Mousa MR, Hassan M, Dhasmana H (2022) Predicting the retroreflectivity degradation of thermoplastic pavement markings with genetic algorithm. San Antonio

  27. Aguinis H, Gottfredson RK, Joo H (2013) Best-practice recommendations for defining, identifying, and handling outliers. Organ Res Methods 16:270–301

    Article  Google Scholar 

  28. Benesty J, Chen J, Huang Y (2008) On the importance of the pearson correlation coefficient in noise reduction. IEEE Trans Audio Speech Lang Process 16(4):757–765

    Article  Google Scholar 

  29. Mk UÇAR (2019) Eta correlation coefficient based feature selection algorithm for machine learning: e-score feature selection algorithm. J Intell Syst Theory Appl 2(1):7–12

    Google Scholar 

  30. Jorge I (2011) The influence of the e-tutor on the development of collaborative critical thinking in a student’s e-forum: association levels with cramer’s v. In: Old Meets New Media Educ 61st Int Counc Educ Media XIII Int Symp Comput Educ Jt Conf, University of Lisbon, Portugal

  31. Yadav D (2019) Categorical encoding using label-encoding and one-hot-encoder. Accessed 15 Jul 2022

  32. Agajanian S, Oluyemi O, Verkhivker GM (2019) Integration of random forest classifiers and deep convolutional neural networks for classification and biomolecular modeling of cancer driver mutations. Front Mol Biosci 6:44

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. De Ville B (2013) Decision trees. Wiley Interdiscip Rev Comput Stat 5(6):448–455

    Article  Google Scholar 

  34. Karballaeezadeh N, Mohammadzadeh SD, Moazemi D, Band SS, Mosavi A, Reuter U (2020) Smart structural health monitoring of flexible pavements using machine learning methods. Coatings 10(11):1100

    Article  CAS  Google Scholar 

  35. Loh WY (2011) Classification and regression trees. Wiley Interdiscip Rev Data Min Knowl Discov 1(1):14–23

    Article  Google Scholar 

  36. Yang L, Shami A (2020) On hyperparameter optimization of machine learning algorithms: theory and practice. Neurocomputing 415:295–316

    Article  Google Scholar 

  37. Walczak S, Cerpa N (2019) Artificial neural networks. Encycl Phys Sci Technol 631–645

  38. Mostafa B, El-Attar N, Abd-Elhafeez S, Awad W (2020) Machine and deep learning approaches in genome: review article. Alfarama J Basic Appl Sci 2(1):105–113

    Google Scholar 

  39. Jin M, Liao Q, Patil S, Abdulraheem A, Al-Shehri D, Glatz G (2022) Hyperparameter tuning of artificial neural networks for well production estimation considering the uncertainty in initialized parameters. ACS Omega 7(28):24145–24156

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Sharma S, Sharma S, Athaiya A (2017) Activation functions in neural networks. Int J Eng Appl Sci Technol 4(12):310–316

    Google Scholar 

  41. Kotsiantis SB (2013) Decision trees: a recent overview. Artif Intell Rev 39:261–283

    Article  Google Scholar 

  42. Badr W (2019) Why feature correlation matters.... A Lot!. Accessed 3 Apr 3 2023

  43. Thenraj R (2020) Do decision trees need feature scaling. Accessed 3 Apr 2023

  44. Roy B (2020) All about feature scaling. Accessed 15 2022

  45. Thara TDK, Prema PS, Xiong F (2019) Auto-detection of epileptic seizure events using deep neural network with different feature scaling techniques. Pattern Recognit Lett 128:544–550

    Article  Google Scholar 

  46. Bengio Y, Goodfellow I, Courville A (2016) Deep learning. MIT press, Cambridge

    Google Scholar 

  47. Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. In: 31st Conf. Neural Inf Process Syst (NIPS 2017), Neural Information Processing Systems Foundation, Inc. (NeurIPS), Long Beach

  48. Pike AM, Songchitruksa P (2015) Predicting pavement marking service life with transverse test deck data. Transp Res Rec 2482(1):16–22

    Article  Google Scholar 

Download references


Not applicable.


This research was funded by the National Cooperative Highway Research Program (NCHRP) (Project Number: 20-30/IDEA 237).

Author information

Authors and Affiliations



MM and III developed the study concept. III collected the data, performed the analysis, developed models, and interpreted the results. III, MM, and MH prepared the draft manuscript. All authors reviewed the results and approved the final version of the manuscript.

Corresponding author

Correspondence to Marwa Hassan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Idris, I.I., Mousa, M. & Hassan, M. Modeling retroreflectivity degradation of pavement markings across the US with advanced machine learning algorithms. J Infrastruct Preserv Resil 5, 3 (2024).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: