Natural Language Process

IEEE 2018 : Natural Language Process


IEEE 2018: A Framework for Real-Time Spam Detection in Twitter

Abstract: With the increased popularity of online social networks, spammers find these platforms easily accessible to trap users in malicious activities by posting spam messages. In this work, we have taken Twitter platform and performed spam tweets detection. To stop spammers, Google SafeBrowsing and Twitter’s BotMaker tools detect and block spam tweets. These tools can block malicious links, however they cannot protect the user in real-time as early as possible. Thus, industries and researchers have applied different approaches to make spam free social network platform. Some of them are only based on user-based features while others are based on tweet based features only. However, there is no comprehensive solution that can consolidate tweet’s text information along with the user based features. To solve this issue, we propose a framework which takes the user and tweet based features along with the tweet text feature to classify the tweets. The benefit of using tweet text feature is that we can identify the spam tweets even if the spammer creates a new account which was not possible only with the user and tweet based features. We have evaluated our solution with four different machine learning algorithms namely - Support Vector Machine, Neural Network, Random Forest and Gradient Boosting. With Neural Network, we are able to achieve an accuracy of 91.65% and surpassed the existing solution [1] by approximately 18%.


IEEE 2018: A Novel Mechanism for Fast Detection of Transformed Data Leakage
Abstract: Data leakage is a growing insider threat in information security among organizations and individuals. A series of methods has been developed to address the problem of data leakage prevention (DLP). However, large amounts of unstructured data need to be tested in the big data era. As the volume of data grows dramatically and the forms of data become much complicated, it is a new challenge for DLP to deal with large amounts of transformed data. We propose an adaptive weighted graph walk model to solve this problem by mapping it to the dimension of weighted graphs. Our approach solves this problem in three steps. First, the adaptive weighted graphs are built to quantify the sensitivity of the tested data based on its context. Then, the improved label propagation is used to enhance the scalability for fresh data. Finally, a low-complexity score walk algorithm is proposed to determine the ultimate sensitivity. Experimental results show that the proposed method can detect leaks of transformed or fresh data fast and efficiently.


IEEE 2018: A Workflow Management System for Scalable Data Mining on Clouds
Abstract: The extraction of useful information from data is often a complex process that can be conveniently modeled as a data analysis workflow. When very large data sets must be analyzed and/or complex data mining algorithms must be executed, data analysis workflows may take very long times to complete their execution. Therefore, efficient systems are required for the scalable execution of data analysis workflows, by exploiting the computing services of the Cloud platforms where data is increasingly being stored. The objective of the paper is to demonstrate how Cloud software technologies can be integrated to implement an effective environment for designing and executing scalable data analysis workflows. We describe the design and implementation of the Data Mining Cloud Framework (DMCF), a data analysis system that integrates a visual workflow language and a parallel runtime with the Software-as-a-Service (SaaS) model. DMCF was designed taking into account the needs of real data mining applications, with the goal of simplifying the development of data mining applications compared to generic workflow management systems that are not specifically designed for this domain. The result is a high-level environment that, through an integrated visual workflow language, minimizes the programming effort, making easier to domain experts the use of common patterns specifically designed for the development and the parallel execution of data mining applications. The DMCF’s visual workflow language, system architecture and runtime mechanisms are presented. We also discuss several data mining workflows developed with DMCF and the scalability obtained executing such workflows on a public Cloud.



IEEE 2018: Machine Learning Methods for Disease Prediction with Claims Data

Abstract: One of the primary challenges of healthcare delivery is aggregating disparate, asynchronous data sources into meaningful indicators of individual health. We combine natural language word embedding and network modeling techniques to learn meaningful representations of medical concepts by using the weighted network adjacency matrix in the GloVe algorithm, which we call Code2Vec. We demonstrate that using our learned embeddings improve neural network performance for disease prediction. However, we also demonstrate that popular deep learning models for disease prediction are not meaningfully better than simpler, more interpretable classifiers such as XGBoost. Additionally, our work adds to the current literature by providing a comprehensive survey of various machine learning algorithms on disease prediction tasks.


IEEE 2018: Review Spam Detection using Machine Learning
Abstract:  Prior to buying a product, people usually inform themselves by reading online reviews. To make more profit sellers often try to fake user experience. As customers are being deceived this way, recognizing and removing fake reviews is of great importance. This paper analyzes spam detection methods, based on machine learning, and presents their overview and results.


IEEE 2017: NetSpam: a Network-based Spam Detection Framework for Reviews in Online Social Media
Abstract: Nowadays, a big part of people rely on available content in social media in their decisions (e.g. reviews and feedback on a topic or product). The possibility that anybody can leave a review provides a golden opportunity for spammers to write spam reviews about products and services for different interests. Identifying these spammers and the spam content is a hot topic of research and although a considerable number of studies have been done recently toward this end, but so far the methodologies put forth still barely detect spam reviews, and none of them show the importance of each extracted feature type. In this study, we propose a novel framework, named NetSpam, which utilizes spam features for modeling review datasets as heterogeneous information networks to map spam detection procedure into a classification problem in such networks. Using the importance of spam features help us to obtain better results in terms of different metrics experimented on real-world review datasets from Yelp and Amazon websites. The results show that NetSpam outperforms the existing methods and among four categories of features; including review-behavioral, user-behavioral, review linguistic, user-linguistic, the first type of features performs better than the other categories.

No comments:

Post a Comment

IEEE 2023: WEB SECURITY OR CYBER CRIME

  IEEE 2023:   Machine Learning and Software-Defined Networking to Detect DDoS Attacks in IOT Networks Abstract:   In an era marked by the r...