Social media & Big Data

Social media platforms generate vast amounts of unstructured data daily, providing a rich source of information for big data analytics. By leveraging advanced computational techniques, this data can be analyzed to uncover trends, user behaviors, and emerging societal patterns. The integration of social media data into big data frameworks enables organizations to gain deeper insights for decision-making, marketing, and public policy. This section explores the methodologies and applications of using social media in big data analytics, highlighting its potential to drive innovation and understanding in diverse domains.

Self-supervised learning (SSL)

SSL in the context of Big Data Analytics in Social Media leverages vast unlabeled user-generated content—such as posts, comments, and interactions—by generating pseudo-labels through pretext tasks like predicting masked words or temporal sequences. Technically, SSL models learn meaningful data representations by solving these auxiliary tasks without requiring manual annotations, which is crucial for the scalability and cost-efficiency of analyzing massive social media datasets. Once pre-trained, these models can be fine-tuned on smaller labeled datasets for downstream tasks like sentiment analysis, misinformation detection, or user behavior modeling.

Zhou, et al. proposed SelfCF, which eliminates the need for negative sampling in collaborative filtering for recommender systems by augmenting latent embeddings rather than raw inputs. It simplifies Siamese network-based architectures and integrates easily with existing models. Experimental results show SelfCF improves recommendation accuracy and efficiency over traditional methods on social interaction datasets. Jiang, et al. introduced LSPSL, a self-supervised learning framework that models both long- and short-term user preferences for next Point-of-Interest (POI) recommendations. It leverages spatio-temporal data and pre-trained contextual embeddings to enhance the representation of user behavior. The method shows improved performance over existing POI recommendation techniques in real-world datasets.

Li, et al. introduced HEAL, a heterogeneous graph learning model designed to enhance social recommendation systems by exploiting higher-order user-item relationships through meta-path-based representations. It integrates semantic and aspect-aware attention mechanisms and uses contrastive learning to improve robustness against noisy feedback. HEAL significantly improves recommendation quality by modeling the complexity and diversity of user interactions in social networks. Sang, et al. proposes BTGCL, a contrastive learning method that jointly models social and collaborative domains using Graph Neural Networks to improve social recommendation accuracy. It introduces a bi-directional transfer mechanism to align features across domains and mitigate the sparsity of user interactions. The model significantly outperforms baseline methods across multiple social recommendation benchmarks. Du, et al. proposed AKGNN, a novel recommendation model tailored to corporate decision-making in B2B volunteer activity matching. It constructs a knowledge graph from organizational attributes and uses extended variational autoencoders and GNNs to learn personalized enterprise preferences. The system effectively addresses data sparsity and semantic asymmetry in corporate recommendations.

Table 9 highlights and evaluates the research papers mentioned above that utilized SSL in Big Data Analytics for Social Media.

Graph neural networks (GNNs)

GNNs in Social Media Big Data Analytics model the complex relationships between users, posts, hashtags, and interactions as graphs, where nodes represent entities and edges represent relationships (e.g., user follows user, user likes post). GNNs propagate and aggregate information across these nodes using neighborhood-based message-passing mechanisms, enabling the model to capture both local and global structural dependencies in social networks. This allows GNNs to perform tasks such as community detection, fake news detection, and influence estimation with higher accuracy and contextual awareness.

The mathematical foundation of GNNs for Big Data Analytics in Social Media lies in representing the social platform as a graph G = (VE), where V denotes nodes such as users, posts, or hashtags, and E represents edges like likes, comments, follows, or shares. Each node vV is initialized with a feature vector 𝑣(0)𝑅𝑑, and GNNs iteratively update these features through a message-passing mechanism defined as:

𝑣(𝑘)=σ(𝑊(𝑘)AGGREGATE(𝑘)({𝑢(𝑘1):𝑢𝑁(𝑣)}{𝑣(𝑘1)}))
(21)

where N(v) is the set of neighbors of node vW(k) is a learnable weight matrix at layer k, and σ is a non-linear activation function (e.g., ReLU). This formulation enables the model to capture homophily (users with similar behaviors) and influence dynamics across multi-relational, large-scale social media networks, making it effective for tasks like user profiling, community detection, and misinformation propagation analysis.

Gao, et al. presents ICS-GNN, a GNN-based framework for online community search in large-scale social networks. It tackles inefficiencies in existing methods through an interactive approach that iteratively updates communities based on user feedback. The system outperforms traditional rule-based systems in both accuracy and resource efficiency. Yang, et al. proposed RoSGAS, which employs a self-supervised, reinforcement learning-based architecture search to adaptively design GNNs for social bot detection. The model dynamically identifies optimal neighborhood hops and network layers. Experiments on Twitter datasets show improved detection accuracy, efficiency, and generalizability.

Zhang, et al. proposed MCAP, which enhanced academic recommendation systems by integrating matrix completion and low-pass propagation into relation-aware GNNs. It builds user-user and item-item graphs using both structural and semantic information, improving accuracy. Evaluation on four datasets confirms MCAP’s superiority in top-N recommendation metrics. Liu, et al. [91] proposed Pone-GNN, which improved recommender systems by integrating both positive and negative feedback through dual embeddings and contrastive learning. This approach helps filter unwanted recommendations and refine user profiles. Experimental results show better accuracy and relevance compared to existing methods.

Fan, et al. presented Makex, a logic-based approach to explain GNN recommendations through interpretable rules. It identifies decisive user-item interaction features and conditions behind recommendations, offering both global and local explanations. Makex achieves higher fidelity and sparsity than state-of-the-art explainers. Chen, et al. proposed SR-HGNN, a hybrid-order gated GNN designed to overcome oversmoothing in session-based recommendation systems. It applies attention-based hybrid-order propagation to better model dependencies within user sessions. The model achieves superior recommendation accuracy compared to prior GNN-based methods.

highlights and evaluates the research papers mentioned above that utilized GNNs in Big Data Analytics for Social Media.

Post a Comment

Plus récente Plus ancienne