The Fourth Colloquium on Analytics, Data Science, and Computing
|Event Program||Accepted Research Papers (Forthcoming)||Accepted Student Projects (Forthcoming)||Call for Papers|
The 4th Colloquium on Analytics, Data Science, and Computing (CADSCOM 2022) will be held from 8:30 am to 4:30 pm CDT on Saturday, November 19 at Minnesota State University Mankato at Edina (7700 France Ave S; Edina, MN 55435). Google Maps Directions
In addition to research paper presentations, CADSCOM 2022 will feature the keynote address, panel discussions, and student project showcase. CADSCOM 2022 has been approved by the Association for Computing Machinery (ACM) as a chapter conference. We are proud to partner with the Minnesota State IT Center of Excellence, Metro State University, Minnesota State University Mankato, and MinnPoly to host CADSCOM 2022. The top three CADSCOM 2022 research papers will be recommended for fast-track review for the Journal of the Midwest Association for Information Systems (JMWAIS).
Title: Implications of Blockchain on Sustainability in the Global Fishing Industry
Author(s) and Affiliation: James Schulz and Steve Sorsen, Metropolitan State University, Mankato
Abstract: Sustainability allows businesses to reinvent their global food supply chain and addresses concerns regarding social concerns and customers’ needs at large. Digitalization, like blockchain technologies, can help identify issues within the food supply chain and allow real-time interventions. This research reviews case studies in the fishing industry that use blockchain technologies implemented into their digitally-enabled food supply chains and how blockchain technology contributes to sustainability in the fish food chain. Key findings suggest sustainability is a priority and supported using blockchain technology. This research contributes to the supply chain management field and identifies an opportunity in the fishery ecosystem.
Title: Understanding Telemarketing Sales Through Interpretable & Explainable Predictive Model
Author(s) and Affiliation: Sandesh Sharma and Rajeev Bukralia, Minnesota State University, Mankato
Abstract: Telemarketing is one of the convenient and effective methods of selling products and services to the customer. If not targeted to the right customer, these telemarketing calls may be perceived as irritating which might instead decrease the company’s value. Using a Portuguese bank telemarketing dataset, we implemented various machine learning algorithms to predict the right customer. We utilized an over-sampling method called SMOTE to mitigate the class imbalance problem. The LGBM model on the plain dataset scored the highest AUC of 0.80 than the over-sampled dataset, implicating that SMOTE might not add any benefits to complex ensemble tree methods. For model explainability, we implemented global as well as local explainer to streamline the decision-making process.
Title: Evaluating Ethics of Loot Boxes in Gaming
Author(s) and Affiliation: Benjamin Vossen and Hamdan Alabsi, Bemidji State University
Abstract: Loot Boxes in gaming are a common practice in gaming today that generates millions of dollars yearly. This paper examines scholarly literatures about the ethical issues related to loot boxes in gaming and then match the findings in the journal with popular news articles with anecdotes of the occurrences found in the journals. This research highlights on the technology background as well as the backlash from the public perception. In addition, this paper analyzes the changes occurred to the gaming industry that help in determining and identifying best ethical practice standards in the industry.
Author(s) and Affiliation: Anthony Sanner and Michael Hart, Minnesota State University, Mankato
Title: A Software Prototype and HIoTR Formula for Evaluating IoT Cybersecurity Risk in Home Networks
Author(s) and Affiliation: Andrew Wilcox and Michael Hart, Minnesota State University, Mankato
Abstract: Internet of Things (IoT) presents several information security challenges at the network layer of home-based computer networks. To advance solutions, this paper follows the Design Science methodology to construct an application prototype that manages IoT information security within local area network (LAN) topologies. The authors demonstrate the value of this application by proposing and testing a new formula, labeled HIoTR, which is capable of calculating the degree of information security risk of home networks containing IoT devices. Results indicate that the proposed formula is a beneficial measure of information security risk. Although the new risk formula shows promise, testing is limited to a small sample of IoT devices. Further exploration is necessary to test HIoTR on alternative IoT hardware and their coinciding network topology.
Title: Machine Learning Based Approach to Recommend MITRE ATT&CK Framework for Software Requirements and Design Specifications
Author(s) and Affiliation: Nicholas Lasky, Benjamin Hallis, Mounika Vanamala, Rushit Dave and Naeem Seliya, Minnesota State University at Mankato, University Of Wisconsin Eau Claire
Abstract: Engineering more secure software has become a critical challenge in the cyber world. It is very important to develop methodologies, techniques, and tools for developing secure software. To develop secure software, software developers need to think like an attacker through mining software repositories. These aim to analyze and understand the data repositories related to software development. The main goal is to use these software repositories to support the decision-making process of software development. There are different vulnerability databases like Common Weakness Enumeration (CWE), Common Vulnerabilities and Exposures database (CVE), and CAPEC. We utilized a database called MITRE. MITRE ATT&CK tactics and techniques have been used in various ways and methods, but tools for utilizing these tactics and techniques in the early stages of the software development life cycle (SDLC) are lacking. In this paper, we use machine learning algorithms to map requirements to the MITRE ATT&CK database and determine the accuracy of each mapping depending on the data split.
Title: How technology may be used for future disease prediction: A Systematic Literature Review
Author(s) and Affiliation: Rich Manprisio, Governors State University in University Park, Illinois
Abstract: Exasperated by the current pandemic, our healthcare system continues to struggle with the accuracy and effectiveness of disease treatments. However, despite these growing challenges, technological advancements have aided potential disease prediction. There has been a positive correlation between utilizing technologies and leveraging them for disease predictions. Thanks to our continued reliance and technological advancement, current research shows that it has many viable options to aid the healthcare field. This systematic review looks at the current state of how technologies have been and can be used to improve healthcare.
Title: Let’s-Go-A-Phishing: A statistical evaluation of URLs for cybersecurity analytics
Author(s) and Affiliation: Taiwo Olaleye, Agbaegbu JohnBosco, Olayemi O. Sadare,Adekunle M. Azeez, Azeez A. Opatunji, Ayobami A. Tewogbade, Saminu A. Akintunde, Federal University of Agriculture, Abeokuta, Osun State University, Elerinmosa Institute of Technology, The Nigeria Police Force
Abstract: The website phishing debacle continues to dominate discuss in the academia and the cyber security industry, despite several proposed state-of-the-arts conceptualized to mitigate the trend. The development has become prominent in the age of high internet penetration when innocent users throng the internet for legitimate reasons but oblivious of the malicious tendencies of criminals who mimics URLs and website domains to make unsuspecting audience vulnerable to cybercrimes. Whereas predictive analytics based solutions continue to dominate cyber security studies with respect to detecting phishing tendencies, studies seldom consider descriptive statistical analysis of feature attributes prior to modelling of conceptual frameworks. This study is therefore motivated by the aforementioned in order to establish most prominent attributes from a mendeley phishing website database released recently. The information gain analysis of the dataset returns five most prominent independent variables which are used to train Naïve Bayes and a Neural Network. Experimental result of the statistical analysis returns the slash (/) character as the most discriminative attribute with strong positive correlation with the ground truth. Malicious phishing websites are observed to contain more dot (.) and slash (/) characters, as well as a higher directory length.
Title: Sentiment-aware Data Analytics for Software Defect Severity Prediction
Author(s) and Affiliation: Wasiu Akanji, Elizabeth Abioye, Taiwo Olaleye, Emmanuel Ezeako, Aanuoluwa Adio, and Ayobami Tewogbade, Lagos State University of Science and Technology , Bells University of Technology, Ota , Federal University of Agriculture Abeokuta , Enugu State University of Science and Technology , Redeemer’s University, Elerinmosa Institute of Technology
Abstract: Quality assurance is an integral factor in a software development life cycle, notwithstanding the software development process model employed in the production of a software. Attempts to prioritize the correction of identified software defects necessitates the classification of defects into various severity levels. Planning for fixing identified abnormalities, subsequent testing, and proper resource allocation, are all functions of an accurate severity assessment method. Existing studies have variously employed diverse techniques for severity classification, particularly with the adoption of natural language processing techniques on defect reports. Studies that deploys sentiment analysis for determining severity levels however does not factor germane considerations in their conceptual methodologies, which is the motivating factor of this study. This paper proposes a sentiment-aware data science approach to ascertain the implication of word count on the severity class of defect reports with respect to the emotion of the reporter and as well establish the inference when defect titles are employed for predictive analytics. Experimental result demonstrates the efficiency of defect titles for predicting severities and the fact that the length of a software defect title is directly proportional to the severity level of the defect.
Title: Text Analysis of Diversity Reports for Benchmark Analysis
Author(s) and Affiliation: Esmeralda Perez-Gomez and Frank Lee, Georgia State University
Abstract: This paper analyzes industry Diversity, Equity, and Inclusion (DEI) reports using topic modeling techniques to create a standard of common practices. This study uses Latent Dirichlet Allocation (LDA) to discover topics and their word distributions to help gain an overview of the most addressed areas in DEI.
Title: Comparing Traditional Econometric ARIMA and RNN’s to Forecast CPI
Author(s) and Affiliation: Mohammad Mazhar, Minnesota State University
Abstract: This paper attempts to compare traditional econometric model called auto-regressive integrated moving average known as ARIMA and Recurrent Neural Networks (RNN) to forecast Consumer Price Index (CPI). Much of macroeconomic decision making depends on Inflationary pressures, such as laying off employees and increasing or decreasing federal discount rates, that can lead to economic growth being affected positively or negatively. Thus, it is necessary to track changes in inflationary pressures by forecasting changes in CPI to make better monetary and fiscal policy decisions. The RNN models have proven to be more accurate in the long-term, while the conventional ARIMA-based econometric models have shown a better performance in short-term timeseries forecasting. Sentiment analysis from Twitter is shown to have a correlation with federal interest rates. In this study, we analyze data from S&P 500 index, 3-month treasury bonds rate, USD index, GDP growth rate, unemployment rate, federal discount rate using ARIMA and RNN, and we compare the accuracy of both models for forecasting short-term and long-term CPI. In addition, we examine whether Twitter sentiments can influence the forecasting accuracy
Title: Classification and Prediction of Savant Syndrome using Machine Learning
Author(s) and Affiliation: Zelalem Denekew, Minnesota State University, Mankato
Abstract: The use of machine learning (ML) to identify and classify individuals with autism spectrum disorder (ASD) has recently gained more popularity in research. Despite the growing interest in ASD, research has not been as widely conducted in the identification and classification of individuals with Savant Syndrome. There have been various theories as to what causes individuals to obtain Savant level skills. D. A. Treffert’s “the 3 R’s: recruitment, rewiring, release” theory has been the more widely accepted and used as a foundation for further research. This work proposes that by implementing similar methods that have been used in the research of identifying and classifying individuals with ASD, we would be able to identify the individuals with savant syndrome, their skills and the capacity at which those skills would be performed with some degree of certainty
Title: Exploration of Machine Learning techniques and Touch Dynamics for Continuous User Authentication
Author(s) and Affiliation: Silverio Mirao, Mounika Vanamala and Rushit Dave, University of Wisconsin Eau Claire, Minnesota State University at Mankato
Abstract: The rise and evolution in mobile technology has led to a greater application of mobile and cellular devices in everyday life. Because a great majority of these devices incorporate touch screen-based technologies, many use these devices to commonly access private or personal data such as banking and identity. As a result of such information being so prevalent and accessible to the common person by such simple means, the rise of fraudulent behavior has been a target of said devices. To combat these threats to personal security, a demand for device and user authentication has sparked innovation for new security features for touch screen mobile devices. Touch Dynamics in particular, refer to the biometric behaviors observed by how a user interacts with a touch screen device. Through this, patterns can be discovered and in turn, users can be identified and recognized as either a user for which the specified device was intended for or a fraud. This can be further used to classify human behavior verses that of a robot, since it is difficult to imitate human behavior. In this research a survey is conducted on different types of machine learning algorithms that will be compared against each other to see which has the best accuracy in detecting authentic users and imposters. Each algorithm will be given some user made inputs and will be observed to see if they will be able to decide on the correct user.
Title: Overview of a Survey on Deepfake Detection Methods
Author(s) and Affiliation: Natalie Krueger, Mounika Vanamala and Rushit Dave, University of Wisconsin Eau Claire, Minnesota State University at Mankato
Abstract: A deepfake is an engineered photo or video of a person in which their image has been altered or replaced with an image of someone else. Some types of deepfakes include face-swapping (switching an image of a face with another), lip syncing (an audio method where the real audio is replaced), and face synthesis (creating a fake image of a face by altering features of a real face image). Deepfakes have the potential to cause a variety of problems and are often used maliciously. A common usage is altering videos of prominent political figures and celebrities. These deepfakes can portray them making offensive, problematic, and/or untrue statements. Current deepfakes can be very realistic, and when used in this way, can spread panic and even influence elections and political opinions. Even more concerning is that deepfakes are easier to produce than ever, and even someone with very little knowledge of technology can use premade software to create them. So, in this project we will survey a variety of current methods and advances in the field of deepfake detection.
Title: Real-time Object Classifier using TensorFlow Object Detection API
Author(s) and Affiliation: Likhitha Tubati and Rajeev Bukralia, Minnesota State University
Abstract: Object detection is the ability to find objects such as buildings, human faces, and animals in images and videos. Object detection is widely used in surveillance, image retrieval, and self-driving cars. We use TensorFlow object detection API, which is built on the top of the TensorFlow, a popular machine learning framework, to train a classifier to accurately identify objects in images.
Title: Evaluating Machine Learning Algorithms for Auto Insurance Fraud Detection
Author(s) and Affiliation: Prasanna Muppidi and Rajeev Bukralia, Minnesota State University, Mankato
Abstract: There has been a significant growth in fraudulent insurance claims by policyholders. Auto insurance fraud occurs when a customer attempts to obtain financial benefits by submitting false documents owing to injuries or property damage in bogus accidents, or by requesting compensation for previous losses or excessive billing. In this study, we use machine learning to detect fraudulent auto-insurance claims. First, we apply feature selection methods to get only important features that can best predict auto-insurance claim frauds. We use the SMOTE oversampling method to balance our unbalanced dataset. We analyze data to detect fraudulent auto-insurance claims using multiple algorithms such as Naive Bayes, KNN, Random Forest, SVM, and Logistic Regression and compare the predictive accuracy of these classification algorithms.
Title: Predicting Gold Trend Change using Astrology, and Stochastic Oscillator with Recurrent Neural Network
Author(s) and Affiliation: Marcho Handoko, Minnesota state university
Abstract: Fundamental and technical analyses are commonly used to make prediction in the stock market. However, it only predicts the price and not the time. For example, one popular trading method is by following trend. The problem with this method is its inability to forecast the timing of a trend change. By the time the traders realize it, it’s usually too late. Another problem in trading is not having fund available when the opportunity arises, which causes traders to lose the opportunity to enter the market. The purpose of this research is to predict the time of gold reversal date using astrology and Stochastic Oscillator with neural networks. The astrology aspect use in this study is planetary aspect. The data is trained using RNN.
Title: Detection Of SLE Using Synthetic images of Butterfly Malar Rash on CNN: A Review
Author(s) and Affiliation: Shourav Bikash Dey, Sonika Shrestha, Tharushi Modaragamage and Hadja Diomande, Minnesota State University, Mankato
Abstract: Systemic Lupus Erythematosus (SLE) is the most common type of Lupus and according to the CDC it is diagnosed in about 200,000 adults in the United States. The diagnosis process of SLE is hard to determine and time-consuming as its cause is unknown and it mimics symptoms that are heterogeneous in nature. One of the known symptoms of SLE is a butterfly-shaped rash across the cheekbone and the nasal bridge known as the Butterfly Malar Rash (BMR). In this paper, we propose a Convolutional Neural Network (CNN) to detect SLE from facial images. Due to the lack of images that present the BMR available, in this work we present the necessity for a Generative Adversarial Network (GAN) model to artificially generate BMR images from publicly available images for better training purposes of the CNN model.
Title: Customer retention in Telecom Industry using Data Science
Author(s) and Affiliation: Jaswanth Vankayalapati and Rajeev Bukralia, Minnesota State University, Mankato
Abstract: In today’s world the telecom customers have wide range of options when it comes to the network. They can easily port-in to other networks. This is one of the challenges faced by telecom industry for retaining those customers. In developing countries, most customers switch to other networks for low monthly deals and other add-ons offered by competing providers. In this paper, we examine the possibilities of customer retention in telecom industry using machine learning algorithms such as support vector machine (SVM) and Logistic regression. We implement the above algorithms using a the dataset from telecom industry. The outcome of this research will be helpful in identifying customers who are likely to change the telecom network.
Title: Evaluating Machine Learning Algorithms for Malware Detection in Androids
Author(s) and Affiliation: Venkata Siva Sai Babburi, Aynura Berdyyeva, Prasanna Muppidi and Connolly Spencer, Minnesota State University, Mankato
Abstract: In the modern world, we are reliant on mobile applications for communication, entertainment, banking, and many other aspects of our life. As these applications have become an integrated part of our life, malware is also becoming a major security threat in the software world. Open-source platforms like Android have become a target of malware as the distribution of the software packages are not tested before publishing in the software system. In this research we focus on a static approach, specifically on signature-based detection of malware in Android devices. In static approach, signatures are generated for API calls made by the applications. Based on identified patterns of such signatures it is possible to detect whether an application is benign or malware. Therefore, our goal is to detect malware from the given dataset of benign and malware applications using multiple algorithms: Naive Bayes, kNN, Random Forest, SVM and Logistical Regression, and to find the best model for detecting malware by comparing the accuracy of these algorithms
Title: Multivariate CNN-based Weather Prediction
Author(s) and Affiliation: Matthew Miers and Rajeev Bukralia , Mankato State University, Mankato
Abstract: Machine learning (ML) has not been widely deployed in weather forecasting models, even though there have been some promising studies. This preliminary investigation represents the initial insights into the efficacy and efficiency of a multivariate Convolutional Neural Network (CNN) based model for weather prediction. Through this investigation, this work proposes architectural frameworks specialized for multivariate CNNs. Additionally, this work examines some key limitations relating to the use of CNNs for weather prediction.
Title: Deepfake Media Detection approach based on Deep Learning
Author(s) and Affiliation: Aniruddha Tiwari, Rushit Dave, Mounika Vanamala, Minnesota State University, Mankato, University of Wisconsin, Eau Claire
Abstract: Conspicuous progression in the field of machine learning (ML) and deep learning (DL) have led the jump of highly realistic fake media, these media oftentimes referred as deepfakes. Deepfakes are fabricated media which are generated by sophisticated AI that are at times very difficult to set apart from the real media. So far, this media can be uploaded to the various social media platforms, hence advertising it to the world got easy, calling for an efficacious countermeasure. Thus, one of the optimistic counter steps against deepfake would be deepfake detection. To undertake this threat, researchers in the past have created models to detect deepfakes based on ML/DL techniques like Convolutional Neural Networks (CNN). This paper aims to explore different methodologies with an intention to achieve a cost-effective model with a higher accuracy with different types of the datasets, which is to address the generalizability of the dataset.
Title: Detecting Overlapping Gene Region Using UNET Attention Mechanism
Author(s) and Affiliation: Samuel Lemma, Metro State University, Mankato
Abstract: According to worldwide cancer data, there were an estimated 18.1 million cancer cases around the world in 2020. (Bray et al., 2018) This increases the need for a system to identify, treat, and prevent cancer. The issue of identifying cancer is solved by a system. which is a system that Doctors can utilize as a backup during a diagnosis. One typical problem with this approach is that it frequently misses chromosomes that are overlapping during the test, which has a significant impact on the results. We expect that our research has an impact on the testing process using neural networks’ U-NET attention mechanism. By making the procedure less tedious and producing results in a better and faster method, removing this hurdle will have a significant positive influence.
Title: Mixed Reality Testing Framework for Autonomous Vehicles Software
Author(s) and Affiliation: Abdelrahman Elkenawy , Qusai Fannoun, and Suboh Alkhushayni, Minnesota State University, Mankato
Abstract: Testing autonomous vehicles (AVs) using closed testing facilities does not present complex testing scenarios, an interaction between automobiles and pedestrians, or any obstacle in the real world, which makes the testing unreliable. While on the other hand, testing in the real world includes all the complex elements that a vehicle may go through, and it has a higher possibility of risking people’s lives. And in order to improve road safety and eliminate the risk factor, a potential solution is needed. This paper introduces an improved framework that uses a mixed-reality environment to create testing scenarios with complex elements that eliminate the risk of getting someone’s life in danger.
Title: Machine Learning based approach for Secure Software Development
Author(s) and Affiliation: Keith Bryant, Alex Caravella, Mounika Vanamala and Rushit Dave, University of Wisconsin Eau Claire, Minnesota State University at Mankato
Abstract: Cyber Security attacks are significantly growing in today’s modern world of technology and advanced software development. The inclusion of cyber security defense is vital in every phase of software development. Identifying and implementing key relevant cyber security vulnerability controls during the early stages of the software development life cycle, i.e., the requirement phase is very important. The Common Attack Pattern Enumeration & Classification (CAPEC) is a publicly available software repository from MITRE that currently lists 555 vulnerability attack patterns. As Cyber Security continues to exponentially grow in complexity, the importance of the Machine Learning role to automate the identification of vulnerabilities for various software development is paramount to aid software developers in creating protected software. In this research, we propose to develop a system to automatically map CAPEC attack patterns to software requirements in a Software Development Specification Document using Machine Learning Models.
Title: Applying Neural Network Algorithms to Detect Credit Card Fraudulent Transactions
Author(s) and Affiliation: Queen Booker, Gary Binns and Zara Juta, Metropolitan State University
Abstract: Credit card fraud is a growing problem for both consumers and card issuers. Prior research has shown that neural networks are good candidates to identify fraudulent transactions but the prior research has used a significant number of variables to achieve the outcome. This research study examines and compares a feedforward neural network against regression for detecting fraudulent credit card transactions using only seven variables, specifically data available at point of sale. The preliminary results are promising, showing that the neural network was able to successfully detect an average of 95% of transactions as fraudulent or non-fraudulent with the limited number of variables.
Title: GANs-based image layer identification
Author(s) and Affiliation: Joan Elizabeth Lahiri, Minnesota State University – Mankato
Abstract: Image recognition will benefit from using generative adversarial networks, or GANs, to pinpoint the many layers that filters have added to an image. This research suggests using GANs to recognize the many layers present in a photo that has been digitally manipulated. This will increase overall image recognition accuracy and reveal any or all image manipulations.
All questions about submissions should be emailed to Dr. Ismail Bile Hassan, program chair at Ismail.firstname.lastname@example.org