Big Data Analytics in IoT refers to the process of collecting, managing, and analyzing vast volumes of data generated by interconnected devices within the IoT ecosystem. As IoT devices continue to proliferate, they produce diverse and high-velocity data streams that require advanced analytical techniques to derive meaningful insights and actionable intelligence. The integration of Big Data Analytics into IoT enables real-time monitoring, predictive modeling, and optimized decision-making, enhancing the efficiency and scalability of IoT applications. This convergence has far-reaching implications across industries, including healthcare, smart cities, manufacturing, and transportation, driving innovation and creating opportunities for improved services and operational efficiency.
Self-supervised learning (SSL)
SSL is a machine learning technique that leverages unlabeled data by generating pseudo-labels from the data itself through pretext tasks, allowing models to learn meaningful representations without manual annotation. In the context of Big Data Analytics in IoT, SSL enables the extraction of patterns and features from massive streams of sensor data, logs, or device outputs, where manual labeling is infeasible due to scale and heterogeneity. SSL enhances IoT analytics by improving anomaly detection, device behavior modeling, and predictive maintenance through fine-tuning on small labeled datasets after pretraining on large unlabeled ones.
In IoT-based Big Data Analytics, SSL mathematically models latent representation learning from unlabeled sensory or event data by optimizing pretext tasks derived from the natural structure (e.g., time, location, or modality). These learned features are then fine-tuned using minimal labeled data for downstream analytics like anomaly detection, fault prediction, or behavioral profiling. The mathematical formulation emphasizes encoder training via pseudo-labels followed by supervised adaptation
IoT-specific SSL formulations:
Temporal SSL Tasks (e.g., sensor forecasting): predict future signals using context windows
Contrastive Learning on multi-sensor data (e.g., accelerometers, gyroscopes):
Define augmented views , (anchor and positive) from the same IoT device stream:
where sim (⋅,⋅) is cosine similarity and τ is a temperature parameter.
Deng, et al. proposed a two-stage self-supervised learning framework for spectrum blind deconvolution in cyber-manufacturing systems. It uses feature extraction and spectral noise estimation to enhance signal quality from unpaired datasets. The method improves the practicality and robustness of spectral processing for industrial applications involving signal analysis and decision-making. Yu and Sano presents a two-stage semi-supervised framework using wearable sensor data to detect stress in real-life environments. It combines autoencoder-based pretraining and consistency regularization, along with a novel sampling strategy to manage unlabeled data. The proposed method significantly enhances stress detection accuracy compared to supervised baselines.
Zhang, et al. presented PATO, a privacy-aware offloading strategy for IoT systems, that utilizes self-supervised feature mapping and a deep reinforcement learning module to make real-time decisions in end-edge-cloud environments. It ensures data privacy through a unidirectional transformation of sensitive data before task offloading. The strategy proves effective in balancing computation load and privacy protection, with strong generalization capability. Xu, et al. [51] proposed AKGNN, a novel recommendation model tailored to corporate decision-making in B2B volunteer activity matching. It constructs a knowledge graph from organizational attributes and uses extended variational autoencoders and GNNs to learn personalized enterprise preferences. The system effectively addresses data sparsity and semantic asymmetry in corporate recommendations.
Table 2 highlights and evaluates the research papers mentioned above that have utilized SSL in Big Data Analytics for IoT.
Graph neural networks (GNNs)
Graph Neural Networks (GNNs) are deep learning models designed to operate on graph-structured data, making them particularly effective in IoT environments where devices and sensors form complex, interconnected networks. In Big Data Analytics for IoT, GNNs capture spatial and relational dependencies among devices by aggregating and updating node features through iterative message passing across graph edges. This allows GNNs to detect anomalies, predict device failures, or optimize communication paths by learning from both the topology and features of the IoT network.
In Big Data Analytics for IoT, the mathematical foundation of GNNs is grounded in the representation of IoT environments as graphs G = (V, E), where each node v ∈ V represents an IoT device (e.g., sensor, actuator), and each edge (u, v) ∈ E models the communication or functional relationship between devices. The key goal is to learn a node-level, edge-level, or graph-level representation that captures both topological structure and feature correlations among devices.
At each GNN layer k, the feature vector of a node v is updated using a two-step process:
- 1
Message aggregation:
where N(v) is the set of neighboring nodes (i.e., connected devices), and AGGREGATE is a permutation-invariant function (e.g., mean, max, sum, or attention-based mechanism).
- 2
Node feature update:
where is a trainable weight matrix, σ is a non-linear activation function (e.g., ReLU), and COMBINE could be a simple concatenation or addition.
This formulation enables the GNN to efficiently encode structural and temporal dependencies across massive IoT networks, allowing scalable learning from large volumes of interconnected streaming data for tasks such as anomaly detection, predictive maintenance, and network optimization.
Wang, et al. proposed ENGNN, a GNN architecture enhanced with edge-update capabilities to address both node and edge variable optimizations in radio resource management for wireless networks. It demonstrates performance gains in beamforming and power allocation, especially for real-time deployment. Simulation results show superior sum-rate performance and lower latency compared to traditional and learning-based methods. Zhou, et al. proposed HDM-GNN, a GNN-based model designed for crime prediction using heterogeneous and dynamic urban data. It captures complex spatial and temporal dependencies in urban environments enabled by IoT sensors. The model significantly improves prediction accuracy in smart city applications related to public safety. Xie, et al. [54] proposed DCI-PFGL, a decentralized federated learning framework for IoT service recommendation that addresses data privacy and heterogeneity across institutions. It uses graph kernels, smart contracts, and differential privacy to ensure secure and accurate cross-institutional collaboration. Results show superior accuracy and reduced costs in collaborative IoT environments. Zhou, et al. proposed a hierarchical adversarial attack method against GNN-based intrusion detection systems in IoT environments. It constructs a shadow model and saliency-guided perturbation generation mechanism to target vulnerabilities efficiently. Evaluation shows the attack significantly degrades detection accuracy in state-of-the-art models.
Table 3 highlights and evaluates the research papers mentioned above that have utilized GNNs in Big Data Analytics for IoT.
Convolutional neural network (CNN)
Using CNNs in Big Data Analytics for IoT involves leveraging the powerful feature extraction and pattern recognition capabilities of CNNs to analyze the massive and complex data streams generated by IoT devices. CNNs are particularly well-suited for processing unstructured data such as images, video feeds, and sensor data, enabling the extraction of meaningful insights with high accuracy and efficiency. In the IoT context, CNNs have been applied in various domains, including smart surveillance, healthcare monitoring, predictive maintenance, and autonomous vehicle navigation, to enhance data-driven decision-making and system automation. By integrating CNNs with Big Data platforms, IoT ecosystems can process and analyze data in real-time, uncover hidden patterns, and optimize performance for diverse applications.
CNNs are a class of deep learning models designed to process data with a grid-like topology, such as time-series or spatial data, making them highly effective for analyzing unstructured IoT data. CNNs use layers of convolutional operations to automatically extract hierarchical features from input data, such as edges, patterns, or anomalies, by applying filters that capture spatial and temporal dependencies. In the context of Big Data Analytics in IoT, CNNs process large-scale data streams by transforming raw input data into compact feature representations, followed by pooling and fully connected layers that classify or predict outcomes, enabling real-time insights and pattern recognition in complex IoT environments.
Enregistrer un commentaire