close
close

Strengthening network DDOS attack detection in heterogeneous IoT environment with federated XAI learning approach

Strengthening network DDOS attack detection in heterogeneous IoT environment with federated XAI learning approach

In this section, we provide information about the tests and outcomes of identifying DDoS attacks with the help of the FDNN model. Using the CIC-IOT-2023 dataset, which has various features related to network traffic data, we performed experiments. Our model’s performance was evaluated using standard measures like accuracy, precision, recall, F1-score, and confusion matrix. The findings show that our federated solution can accurately detect and classify DDoS attacks while keeping data privacy intact among distributed nodes. Further sections go deep into detailed experimental setup description, parameter settings explanation and comprehensive analysis of received results. We used the train-test-split function from the scikit-learn library in Python. The training set was allocated 80% of the data, and the remaining 20% was for the testing set. Moreover, for reproducibility purposes, a random state of 42 has been set so that every time the code runs, the same split of data is used to ensure consistency in model evaluation.

Evaluation metrics

The DDoS detection model is tested, and its effectiveness is being verified through several metrics, including Accuracy, Precision, Recall, F1-Score, and Confusion Matrix. Such measures offer a detailed analysis of the ability of the system to identify DDoS attacks, the prevention of which is one of the main aspects that should be in place for both IoT devices and the networks of the internet.

Confusion matrix

The confusion matrix shown in Table 2 is used to describe the performance of a classification model. It outlines the predicted and actual outcomes and contains four key values:

Table 2 Confusion matrix.

Accuracy

The accuracy of the model is the fraction of the number of (true positives and true negatives) in all the predictions made by the model that are correct. It is given by the Equation 1:

$$\begin{aligned} \text {Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} \end{aligned}$$

(1)

The detection accuracy of the model computes how well the system can tell the difference between the DDoS attacks and normal operations. However, when working with unbalanced datasets, accuracy might not always be enough to give enough diagnostic information.

Precision

The term “precision”, which is also understood as the positive predictive value, is the ratio of correctly predicted positive cases (DDoS attacks) to all predicted positive cases (true and false positives). This indicator tells us how many of the attacks predicted were real, and is figured up as Equation 2:

$$\begin{aligned} \text {Precision} = \frac{TP}{TP + FP} \end{aligned}$$

(2)

In DDoS detection, precision indicates how well the model minimizes false alarms, ensuring benign traffic is not falsely flagged as malicious.

Recall

Recall, also referred to as sensitivity, is the provision of positive cases (DDoS attacks) that were predicted to be positive. The accuracy of the equation (recall) can be defined with the following Equation 3:

$$\begin{aligned} \text {Recall} = \frac{TP}{TP + FN} \end{aligned}$$

(3)

Important in this application is recall because it demonstrates the capability of the model in terms of the finding of the real attacks, thus reducing the count of DDoS incidents that were not detected (false negatives).

F1-score

The F1-Score is the harmonic mean of precision and recall. It is a score that measures the balance between precision, recall and F-measure. It indicates a single measure to balance these two aspects, which is especially the case when precision and recall are opposite (correlated, i.e., when high precision comes due to lower recall, or vice versa) as shown in Equation 4.

$$\begin{aligned} \text {F1-Score} = 2 \times \frac{\text {Precision} \times \text {Recall}}{\text {Precision} + \text {Recall}} \end{aligned}$$

(4)

The F1-Score is particularly useful in imbalanced datasets, like in DDoS detection, where the cost of false negatives (undetected attacks) and false positives (benign traffic flagged as attacks) is significant.

Accuracy, Precision, Recall, and F1-Score metrics are the basic tools to judge the performance of our DDoS detection model. However, while accuracy is a very coarse measure of the correctness of our model, precision truthfully states that the model only minimizes false positives, importance ensures that it can detect even most of the real attacks, and the F1-Score theoretically gives a balanced measure of both precision and recall. The numbers work together to give a more comprehensive view of the real characteristics of the model in imagining and characterizing DDoS attacks.

Experimental setup

The research was carried out on a Windows computer, an HP Omen 15 laptop in particular. This system had an Nvidia 1060 GPU for acceleration and Python 3.8.8 as the programming language. It was equipped with an integrated development environment called Pycharm, which makes it easier to write codes by combining related development tools into one graphical user interface application. Windows OS was chosen because of its wide usage when deploying software developed using Python. HP Omen 15 is fitted with a powerful processor and enough memory that would be ideal for trying out new machine learning methods. Nvidia 1060 GPU enables rapid and efficient parallel processing, thereby cutting down on the time taken during training or evaluating deep learning models. The reason why Python 3.8.8 was used is because, among other things, it has numerous libraries and tools necessary for carrying out various types of data analysis tasks related to artificial intelligence.

FDNN model results

Table 3 contains the experimental findings of the DNN model on the client side. The Accuracy, Precision, Recall and F1-Score metrics are all presented in this table, showing how well the model performed when detecting and classifying DDoS attacks for three different clients. These results clearly show that such an approach is not only effective but also reliable across various organizations where cyber security threats may differ greatly.

The model’s accuracy is high. It very rarely makes mistakes when identifying DDoS attacks. This was confirmed by precision values nearing 99.80%, which implies a low false positive rate – a key attribute for any system intended to be used in the real world where it is important to avoid unnecessary alarms. All recall results are close to 99.74%. This shows that security-wise, the system can detect almost all types of DDoS attacks. F1-scores being consistently above 99.76% imply a good balance between identification and prevention capabilities. These findings support our claim that a deep neural network model trained through a federated learning approach works well across different clients. The fact that similar performance metrics were achieved on various end-points also suggests a stable and reliable process of training such models so that they would yield uniform outputs not only for client-specific datasets but also under any condition. To sum up, Table 3 describes numerical results obtained during experiments with DNN at each customer’s site. According to these numbers, one can see how effective the proposed approach is in terms of accuracy, precision, recall and F1 score when detecting DDoS attacks. Thus indicating potential for scalable systems development against such intrusions using federated learning frameworks.

Table 3 Client-Side DNN experimental results.
Figure 6

Graphical representation of 1st client-side results.

Figure 7
figure 7

Graphical representation of 2nd client-side results.

  • Client 1 Results: Client 1 had a 99.78% accuracy rate, which means that it accurately classified 99.78% of the instances. The precision was 99.77%, showing that the model is very good at reducing false positives. This signifies an incredibly low false positive rate. The recall was 99.74%, indicating the ability of the model to capture most actual DDoS attacks. The F1-score, which is a measure of both precision and recall, was 99.76%, showing balanced performance between the two. The DNN model has been used to classify DDoS and DoS attacks into various categories for Client 1. In the matrix shown in Figure 6a, the actual class of attack is represented by each row, while each column represents the predicted class. There are ten types of attacks given in the matrix such as DDoS-ICMP-Flood, DDoS-UDP-Flood, DDoS-TCP-Flood, DDoS-PSHACK-Flood, DDoS-SYN-Flood, DDoS-RSTFINFlood, DDoS-SynonymousIP-Flood, DoS-UDP-Flood, DoS-TCP-Flood, and DoS-SYN-Flood. For DDoS-ICMP-Flood, a total of 71330 instances were correctly identified with few misclassifications, where 39 instances were incorrectly labelled as DDoS-SynonymousIP-Flood. Similarly, DDoS-UDP-Flood had 41037 correct identifications, but some misclassifications occurred, like 40 instances being marked as DDoS-SynonymousIP-Flood. In addition to this, DDoS-TCP-Flood also recorded 40581 accurate classifications accompanied by minor errors. For instance, 12 cases were misclassified as DDoS-SYN-Flood. Additionally, the DDoS-PSHACK-Flood attack was true 40423 times with few cases misclassified, notably 47 as DDoS-SYN-Flood. DDoS-SYN-Flood had 35955 true positives with some minor misclassifications, such as 21 instances being labelled as DDoS-PSHACK-Flood. Furthermore, DDoS-RSTFINFlood recorded 44693 correct classifications, but 51 cases were misclassified as DDoS-SynonymousIP-Flood. Moreover, the DDoS-SynonymousIP-Flood attack was correctly identified in 53744 instances with a small number of errors like 38 misclassifications as DDoS-ICMP-Flood. Similarly, DoS-UDP-Flood had 20479 correct identifications, although there were some misclassifications, for example, 34 instances being marked as DDoS-PSHACK-Flood. Furthermore, DoS-TCP-Flood recorded 26551 true positives, but 22 cases were misclassified as DDoS-RSTFINFlood. Finally, DoS-SYN-Flood had 33199 accurate classifications, with 63 cases being misclassified as DDoS-SynonymousIP-Flood. The training and validation accuracy and loss of the DNN model are shown in Figure 6b and 6c.

  • Client 2 results: Client 2 performed just as well, achieving an accuracy of 99.78%. It had a slightly higher precision rate at 99.80%, which means it was better able to avoid false positives than Client 1. Although still quite strong, Client 2’s recall was 99.72%, lower than that of Client 1. At 99.76%, the F1-score was the same for both clients, showing that the model performed consistently across various clients. The DNN model-based confusion matrix of Client 2, shown in Figure 7a, presents the classification performance against a variety of DDoS and DoS attacks. It has ten different types of attacks in the matrix which are named as follows: DDoS-ICMP-Flood, DDoS-UDP-Flood, DDoS-TCP-Flood, DDoS-PSHACK-Flood, DDoS-SYN-Flood, DDoS-RSTFINFlood, DDoS-SynonymousIP-Flood, DoS-UDP-Flood, DoS-TCP-Flood, DoS-SYN-Flood. The illustration of training and validation accuracy and loss is shown in Figure 7b and 7c.

    For the DDoS-ICMP-Flood attack, there were 71336 accurately classified instances with few misclassifications, 51 of which were mistakenly labelled as DDoS-SynonymousIP-Flood. Similarly, the DDoS-UDP-Flood attack had 41071 correct classifications with minimal misclassifications, 47 of which were misclassified as DDoS-SynonymousIP-Flood and smaller numbers for other types. In the case of the DDoS-TCP-Flood attack, 40571 instances were identified correctly, but there were some misclassifications, such as 18 cases being tagged with DDoS-SynonymousIP-Flood. Furthermore, for DDoS-PSHACK-Flood, 40430 were rightly identified, but a few misclassifications were observed, such as 59 instances being labelled DDoS-SynonymousIP-Flood. The DDoS-SYN-Flood attack had 35951 true positives along with a few misclassifications, out of which 35 were labelled as DDoS-SynonymousIP-Flood. On the other hand, DDoS-RSTFINFlood had 44715 correct classifications, although it contained errors, too; for instance, 77 instances got misclassified as DDoS-SynonymousIP-Flood. Additionally, the DDoS-SynonymousIP-Flood attack was correctly recognized 53801 times, and only 5 cases were falsely identified as DDoS-RSTFINFlood. To illustrate, during DoS-UDP-Flood, 20,473 instances were correctly categorized. A number of errors were made; for example, 42 attacks were classified as DDoS-PSHACK-Flood when they were actually mislabeled. In the case of a DoS-TCP-Flood assault, CyberOps ACI recorded 26,453 true positives but also made some mistakes, such as 108 incidents being labelled as DDoS-RSTFINFlood incorrectly. Finally, in DoS-SYN-Flood, 33,191 instances were correctly identified, but there were a few errors. For instance, 101 attacks were mislabeled as DDoS-Syn.

  • Client 3 results: The accuracy was 99.78% for Client 3, which is in line with the other two clients. The precision and recall were also close to those of other clients, at 99.78% and 99.73%, respectively. Client 3 had an F1-score of 99.75%, which once again shows that it performed well across the board when identifying DDoS attacks. The confusion matrix for Client 3 gives a detailed view of the DNN model’s classification performance for DDoS and DoS attacks based on kinds, as shown in Figure 8a. This structure consists of ten attack types such as DDoS-ICMP-Flood, DDoS-UDP-Flood, DDoS-TCP-Flood, DDoS-PSHACK-Flood, DDoS-SYN-Flood, DDoS-RSTFINFlood, DDoS-SynonymousIP-Flood, DoS-UDP-Flood, DoS-TCP-Flood, and DoS-SYN-Flood. The illustration of training and validation accuracy and loss is shown in Figure 8b and 8c. Regarding the DDoS-ICMP-Flood attack, 71320 were true positives, and a few were misclassified: 59 DDoS-SynonymousIP-Flood instances were wrongly labelled. In the case of the DDoS-UDP-Flood attack, 41044 were classified right, and 48 were misclassified as DDoS-SynonymousIP-Flood, among others. In the event of a DDoS-TCP-Flood attack, 40552 were correct, with 47 misclassified as DDoS-SynonymousIP-Flood. For a DDoS-PSHACK-Flood attack, 40438 were true positives, but there were some misclassifications, like 61 being marked wrongly as DDoS-SynonymousIP-Flood. The DDoS-SYN-Flood attack had 35948 true positives and a small number of misclassifications; for example, 42 were labelled as DDoS-SynonymousIP-Flood. DDoS-RSTFINFlood had 44711 true positives and a few misclassifications, where 59 were marked wrongly as DDoS-SynonymousIP-Flood. Concerning the DDoSynonymousIP-Flood attack, it was identified correctly 53778 times with few misclassifications, such as 29 instances being marked as DoS-SYN-Flood. The DoS-UDP-Flood attack had 20,491 true positives, while 27 were misclassifications such as DDoS-PSHACK-Flood. In addition, 26,495 instances of DoS-TCP-Flood were correctly classified as true positives, except for 69, which were actually DDoS-RSTFINFlood attacks. DoS-SYN-Flood had 33,211 cases identified accurately, but it also made mistakes-70 of them were wrongly labelled DDoS-SynonymousIP-Flood.

Figure 8
figure 8

Graphical representation of 3rd client-side results.

Comparative analysis

Although previous studies have addressed DDoS detection in IoT contexts, the uniqueness of our research is that there is no existing effort that integrates Federated Learning (FL) with Explainable AI (XAI) for the purpose of detecting DDoS attacks from the CICIoT-2023 dataset. This is an essential difference because the dataset itself brings unique obstacles and traits when compared to those databases utilized in previous research.

In addition, we have executed a thorough comparative analysis in our work, clearly emphasizing how our technique differs from present methods. In particular, our method prioritizes privacy preservation through federated learning and delivers transparency and interpretability in DDoS detection with SHAP values used for feature explanation-two important areas that are often missing in conventional centralized approaches. As such, although the larger theme has received attention, our contribution is notable for its integration of FL and XAI in a fresh dataset, coupled with a concerted effort on both performance and explainability, areas that have not been covered by previous work.

In our research, we address a significant gap in the current literature. No literature integrates Federated Learning (FL) and Explainable Artificial Intelligence (XAI) to identify Distributed Denial of Service (DDoS) attacks on the CIC-IoT-2023 datasets, and this research seeks to fill this gap. Even though several papers have been devoted to analyzing how machine learning or deep learning methods can be applied to detect DDoS attacks, the approaches based on FL or XAI have not been investigated before. The data analysis that is presented in Table 4 compares various intrusion detection approaches’ efficiency on different datasets. Each row in this table showcases a different approach, while the columns contain measures like accuracy, precision, recall (or detection rate), and F1 score. “Multiple Kernel Clustering” was done by Hu et al. (2021) but on NSL-KDD, UNSW-NB15 as well as AWID, where they received scores of 93.80%, 92%, and 95.60% for accuracy, respectively. Precision got them 81.65%, 77.27%, 88.24%; recall (or detection rate) achieved values between 89%-90% while F1 remained within 85.17%-89.11% across all three datasets22. In their research work presented recently by Bhuvaneshwari et al., A deep clustering cnn approach applied only to one dataset such as NSL-KDD attained 98.71% accuracy without mentioning anything else related to this particular approach, which includes precision/recall/f1 scores23. When Clustering and classification by Hammad et al., It only utilized UNSW-NB15 and achieved an accuracy level of about 97.59%. In the process, they noticed that precision/recall/f1 scores were almost similar, with each having 97.6%24. Similarly, the model used for this study employed LSTM classification where Jony et al. reached 98.75% accuracy while its precision point was 98.59%, recall rate recorded 98.75%and finally f1-score standing at 98.66% respectively on CICIoT 2023 dataset5. The FLBC-IDS model proposed by Govindaram25 et al. (2024) addresses the critical security challenges faced by IoT environments through a novel integration of Horizontal Federated Learning (HFL), Hyperledger Blockchain, and EfficientNet. By leveraging HFL, the model enables secure and privacy-preserving training across multiple IoT devices, thus enhancing data privacy without the need to centralize sensitive information. The incorporation of Hyperledger Blockchain ensures tamper-resistant and transparent recording of model updates, enhancing the integrity of the system. Furthermore, EfficientNet improves the model’s robustness by effectively extracting and categorizing features from network traffic data. The model demonstrates impressive performance on the CICIDS-2018 and CICIoT-2023 datasets, achieving an accuracy of 98.89%, recall of 98.04%, precision of 98.44%, and an F1-score of 98.29%. Finally, the suggested approach FDNN has performed exceptionally well on the CICIoT 2023 dataset with an astonishing accuracy of 99.78%, precision of 99.80%, recall of 99.74% and F1 score of 99.76%. When compared to other techniques, the proposed FDNN approach illustrated better results in all aspects, which proves its efficiency in identifying DDoS attacks.

Table 4 Comparative analysis of proposed work.

Related Post