Our past events

CADSCOM2022

CADSCOM 2022 Highlights
The 4th Colloquium on Analytics, Data Science, and Computing (CADSCOM 2022) was held from 8:30 am to 4:30 pm CDT on Saturday, November 19 at Minnesota State University Mankato at Edina (7700 France Ave S; Edina, MN 55435). Google Maps Directions In addition to research paper presentations, CADSCOM 2022 featured a keynote address, panel discussions, and student project showcase. CADSCOM 2024 was approved by the Association for Computing Machinery (ACM) as a chapter conference. We are proud to partner with the Minnesota State IT Center of Excellence, Metro State University, Minnesota State University Mankato, and MinnPoly for CADSCOM 2022. The top three CADSCOM 2022 research papers were recommended for fast-track review for the Journal of the Midwest Association for Information Systems (JMWAIS). Publications Panel : Tips for Enhancing Research and Publications Panelists: Dr. Alok Gupta, University of Minnesota; Dr. Rassule Hadidi, Metro State and MWAIS Journal; Dr. Mohammed Mokbel, University of Minnesota; Dr. Omar El-Gayar, Dakota State University; Dr. Deepak Khazanchi, University of Nebraska-Omaha; Moderator: Dr. Rajeev Bukralia, Minnesota State University, Mankato Student Project Showcase: The project showcase is a new component this year that is additional to the peer-reviewed research papers. The showcase will be an opportunity for students to showcase their projects and connect with industry professionals. To participate in the project showcase, interested students should fill out the Project Showcase Interest Form form by November 6, 2022. Conference Registration: Register for CADSCOM 2022 at Eventbrite by November 12. All participants (student authors and presenters, faculty authors, invited academic and industry guests, panelists, and attendees) must register through the Eventbrite site to attend the conference. The registration fee is $20 for faculty authors. Free registration for student authors, industry/academic guests, and attendees! Accepted Papers Title: Implications of Blockchain on Sustainability in the Global Fishing Industry Author(s) and Affiliation: James Schulz and Steve Sorsen, Metropolitan State University, Mankato Abstract: Sustainability allows businesses to reinvent their global food supply chain and addresses concerns regarding social concerns and customers’ needs at large. Digitalization, like blockchain technologies, can help identify issues within the food supply chain and allow real-time interventions. This research reviews case studies in the fishing industry that use blockchain technologies implemented into their digitally-enabled food supply chains and how blockchain technology contributes to sustainability in the fish food chain. Key findings suggest sustainability is a priority and supported using blockchain technology. This research contributes to the supply chain management field and identifies an opportunity in the fishery ecosystem. Title: Understanding Telemarketing Sales Through Interpretable & Explainable Predictive Model Author(s) and Affiliation: Sandesh Sharma and Rajeev Bukralia, Minnesota State University, Mankato Abstract: Telemarketing is one of the convenient and effective methods of selling products and services to the customer. If not targeted to the right customer, these telemarketing calls may be perceived as irritating which might instead decrease the company’s value. Using a Portuguese bank telemarketing dataset, we implemented various machine learning algorithms to predict the right customer. We utilized an over-sampling method called SMOTE to mitigate the class imbalance problem. The LGBM model on the plain dataset scored the highest AUC of 0.80 than the over-sampled dataset, implicating that SMOTE might not add any benefits to complex ensemble tree methods. For model explainability, we implemented global as well as local explainer to streamline the decision-making process. Title: Evaluating Ethics of Loot Boxes in Gaming Author(s) and Affiliation: Benjamin Vossen and Hamdan Alabsi, Bemidji State University Abstract: Loot Boxes in gaming are a common practice in gaming today that generates millions of dollars yearly. This paper examines scholarly literatures about the ethical issues related to loot boxes in gaming and then match the findings in the journal with popular news articles with anecdotes of the occurrences found in the journals. This research highlights on the technology background as well as the backlash from the public perception. In addition, this paper analyzes the changes occurred to the gaming industry that help in determining and identifying best ethical practice standards in the industry. Title: Impact of JavaScript Attention Recognition Messaging and Calibration on Hardware Performance Author(s) and Affiliation: Anthony Sanner and Michael Hart, Minnesota State University, Mankato Abstract: Enabling users to detect eye contact during video conferencing has several associated challenges. The accuracy of gaze prediction along with the performance of supporting hardware in real-world environments are two issues this study investigates. Additionally, it explores how users are notified when accurate eye detection exists. Using the Design Science research methodology, the authors design and develop an eye tracking notification system in JavaScript with novel gaze calibration and messaging functionality. Calibration feedback allows end users to determine whether their eye contact messages are accurate. A custom benchmark is scripted that tests the new video conferencing features on a variety of workstation hardware and web browsers. Although high end user hardware performs well with the newly developed eye tracking features, low-end hardware suffers from time delays and video streaming deficiencies. Benchmark results highlight the need for eye tracking libraries to perform more efficiently for a broader array of end user hardware. Results are limited to benchmarks on three popular web browsers. Future studies should compare benchmarks of additional video conferencing software and a larger sample of commodity hardware. Title: A Software Prototype and HIoTR Formula for Evaluating IoT Cybersecurity Risk in Home Networks Author(s) and Affiliation: Andrew Wilcox and Michael Hart, Minnesota State University, Mankato Abstract: Internet of Things (IoT) presents several information security challenges at the network layer of home-based computer networks. To advance solutions, this paper follows the Design Science methodology to construct an application prototype that manages IoT information security within local area network (LAN) topologies. The authors demonstrate the value of this application by proposing and testing a new formula, labeled HIoTR, which is capable of calculating the degree of information security risk of home networks containing IoT devices. Results indicate that the proposed formula is a beneficial measure of information security risk. Although the new risk formula shows promise, testing is limited to a small sample of IoT devices. Further exploration is necessary to test HIoTR on alternative IoT hardware and their coinciding network topology. Title: Machine Learning Based Approach to Recommend MITRE ATT&CK Framework for Software Requirements and Design Specifications Author(s) and Affiliation: Nicholas Lasky, Benjamin Hallis, Mounika Vanamala, Rushit Dave and Naeem Seliya, Minnesota State University at Mankato, University Of Wisconsin Eau Claire Abstract: Engineering more secure software has become a critical challenge in the cyber world. It is very important to develop methodologies, techniques, and tools for developing secure software. To develop secure software, software developers need to think like an attacker through mining software repositories. These aim to analyze and understand the data repositories related to software development. The main goal is to use these software repositories to support the decision-making process of software development. There are different vulnerability databases like Common Weakness Enumeration (CWE), Common Vulnerabilities and Exposures database (CVE), and CAPEC. We utilized a database called MITRE. MITRE ATT&CK tactics and techniques have been used in various ways and methods, but tools for utilizing these tactics and techniques in the early stages of the software development life cycle (SDLC) are lacking. In this paper, we use machine learning algorithms to map requirements to the MITRE ATT&CK database and determine the accuracy of each mapping depending on the data split. Title: How technology may be used for future disease prediction: A Systematic Literature Review Author(s) and Affiliation: Rich Manprisio, Governors State University in University Park, Illinois Abstract: Exasperated by the current pandemic, our healthcare system continues to struggle with the accuracy and effectiveness of disease treatments. However, despite these growing challenges, technological advancements have aided potential disease prediction. There has been a positive correlation between utilizing technologies and leveraging them for disease predictions. Thanks to our continued reliance and technological advancement, current research shows that it has many viable options to aid the healthcare field. This systematic review looks at the current state of how technologies have been and can be used to improve healthcare. Title: Let’s-Go-A-Phishing: A statistical evaluation of URLs for cybersecurity analytics Author(s) and Affiliation: Taiwo Olaleye, Agbaegbu JohnBosco, Olayemi O. Sadare,Adekunle M. Azeez, Azeez A. Opatunji, Ayobami A. Tewogbade, Saminu A. Akintunde, Federal University of Agriculture, Abeokuta, Osun State University, Elerinmosa Institute of Technology, The Nigeria Police Force Abstract: The website phishing debacle continues to dominate discuss in the academia and the cyber security industry, despite several proposed state-of-the-arts conceptualized to mitigate the trend. The development has become prominent in the age of high internet penetration when innocent users throng the internet for legitimate reasons but oblivious of the malicious tendencies of criminals who mimics URLs and website domains to make unsuspecting audience vulnerable to cybercrimes. Whereas predictive analytics based solutions continue to dominate cyber security studies with respect to detecting phishing tendencies, studies seldom consider descriptive statistical analysis of feature attributes prior to modelling of conceptual frameworks. This study is therefore motivated by the aforementioned in order to establish most prominent attributes from a mendeley phishing website database released recently. The information gain analysis of the dataset returns five most prominent independent variables which are used to train Naïve Bayes and a Neural Network. Experimental result of the statistical analysis returns the slash (/) character as the most discriminative attribute with strong positive correlation with the ground truth. Malicious phishing websites are observed to contain more dot (.) and slash (/) characters, as well as a higher directory length. Title: Sentiment-aware Data Analytics for Software Defect Severity Prediction Author(s) and Affiliation: Wasiu Akanji, Elizabeth Abioye, Taiwo Olaleye, Emmanuel Ezeako, Aanuoluwa Adio, and Ayobami Tewogbade, Lagos State University of Science and Technology , Bells University of Technology, Ota , Federal University of Agriculture Abeokuta , Enugu State University of Science and Technology , Redeemer’s University, Elerinmosa Institute of Technology Abstract: Quality assurance is an integral factor in a software development life cycle, notwithstanding the software development process model employed in the production of a software. Attempts to prioritize the correction of identified software defects necessitates the classification of defects into various severity levels. Planning for fixing identified abnormalities, subsequent testing, and proper resource allocation, are all functions of an accurate severity assessment method. Existing studies have variously employed diverse techniques for severity classification, particularly with the adoption of natural language processing techniques on defect reports. Studies that deploys sentiment analysis for determining severity levels however does not factor germane considerations in their conceptual methodologies, which is the motivating factor of this study. This paper proposes a sentiment-aware data science approach to ascertain the implication of word count on the severity class of defect reports with respect to the emotion of the reporter and as well establish the inference when defect titles are employed for predictive analytics. Experimental result demonstrates the efficiency of defect titles for predicting severities and the fact that the length of a software defect title is directly proportional to the severity level of the defect. Title: Text Analysis of Diversity Reports for Benchmark Analysis Author(s) and Affiliation: Esmeralda Perez-Gomez and Frank Lee, Georgia State University Abstract: This paper analyzes industry Diversity, Equity, and Inclusion (DEI) reports using topic modeling techniques to create a standard of common practices. This study uses Latent Dirichlet Allocation (LDA) to discover topics and their word distributions to help gain an overview of the most addressed areas in DEI. Title: Comparing Traditional Econometric ARIMA and RNN’s to Forecast CPI Author(s) and Affiliation: Mohammad Mazhar, Minnesota State University Abstract: This paper attempts to compare traditional econometric model called auto-regressive integrated moving average known as ARIMA and Recurrent Neural Networks (RNN) to forecast Consumer Price Index (CPI). Much of macroeconomic decision making depends on Inflationary pressures, such as laying off employees and increasing or decreasing federal discount rates, that can lead to economic growth being affected positively or negatively. Thus, it is necessary to track changes in inflationary pressures by forecasting changes in CPI to make better monetary and fiscal policy decisions. The RNN models have proven to be more accurate in the long-term, while the conventional ARIMA-based econometric models have shown a better performance in short-term timeseries forecasting. Sentiment analysis from Twitter is shown to have a correlation with federal interest rates. In this study, we analyze data from S&P 500 index, 3-month treasury bonds rate, USD index, GDP growth rate, unemployment rate, federal discount rate using ARIMA and RNN, and we compare the accuracy of both models for forecasting short-term and long-term CPI. In addition, we examine whether Twitter sentiments can influence the forecasting accuracy Title: Classification and Prediction of Savant Syndrome using Machine Learning Author(s) and Affiliation: Zelalem Denekew, Minnesota State University, Mankato Abstract: The use of machine learning (ML) to identify and classify individuals with autism spectrum disorder (ASD) has recently gained more popularity in research. Despite the growing interest in ASD, research has not been as widely conducted in the identification and classification of individuals with Savant Syndrome. There have been various theories as to what causes individuals to obtain Savant level skills. D. A. Treffert’s “the 3 R’s: recruitment, rewiring, release” theory has been the more widely accepted and used as a foundation for further research. This work proposes that by implementing similar methods that have been used in the research of identifying and classifying individuals with ASD, we would be able to identify the individuals with savant syndrome, their skills and the capacity at which those skills would be performed with some degree of certainty Title: Exploration of Machine Learning techniques and Touch Dynamics for Continuous User Authentication Author(s) and Affiliation: Silverio Mirao, Mounika Vanamala and Rushit Dave, University of Wisconsin Eau Claire, Minnesota State University at Mankato Abstract: The rise and evolution in mobile technology has led to a greater application of mobile and cellular devices in everyday life. Because a great majority of these devices incorporate touch screen-based technologies, many use these devices to commonly access private or personal data such as banking and identity. As a result of such information being so prevalent and accessible to the common person by such simple means, the rise of fraudulent behavior has been a target of said devices. To combat these threats to personal security, a demand for device and user authentication has sparked innovation for new security features for touch screen mobile devices. Touch Dynamics in particular, refer to the biometric behaviors observed by how a user interacts with a touch screen device. Through this, patterns can be discovered and in turn, users can be identified and recognized as either a user for which the specified device was intended for or a fraud. This can be further used to classify human behavior verses that of a robot, since it is difficult to imitate human behavior. In this research a survey is conducted on different types of machine learning algorithms that will be compared against each other to see which has the best accuracy in detecting authentic users and imposters. Each algorithm will be given some user made inputs and will be observed to see if they will be able to decide on the correct user. Title: Overview of a Survey on Deepfake Detection Methods Author(s) and Affiliation: Natalie Krueger, Mounika Vanamala and Rushit Dave, University of Wisconsin Eau Claire, Minnesota State University at Mankato Abstract: A deepfake is an engineered photo or video of a person in which their image has been altered or replaced with an image of someone else. Some types of deepfakes include face-swapping (switching an image of a face with another), lip syncing (an audio method where the real audio is replaced), and face synthesis (creating a fake image of a face by altering features of a real face image). Deepfakes have the potential to cause a variety of problems and are often used maliciously. A common usage is altering videos of prominent political figures and celebrities. These deepfakes can portray them making offensive, problematic, and/or untrue statements. Current deepfakes can be very realistic, and when used in this way, can spread panic and even influence elections and political opinions. Even more concerning is that deepfakes are easier to produce than ever, and even someone with very little knowledge of technology can use premade software to create them. So, in this project we will survey a variety of current methods and advances in the field of deepfake detection. Title: Real-time Object Classifier using TensorFlow Object Detection API Author(s) and Affiliation: Likhitha Tubati and Rajeev Bukralia, Minnesota State University Abstract: Object detection is the ability to find objects such as buildings, human faces, and animals in images and videos. Object detection is widely used in surveillance, image retrieval, and self-driving cars. We use TensorFlow object detection API, which is built on the top of the TensorFlow, a popular machine learning framework, to train a classifier to accurately identify objects in images. Title: Evaluating Machine Learning Algorithms for Auto Insurance Fraud Detection Author(s) and Affiliation: Prasanna Muppidi and Rajeev Bukralia, Minnesota State University, Mankato Abstract: There has been a significant growth in fraudulent insurance claims by policyholders. Auto insurance fraud occurs when a customer attempts to obtain financial benefits by submitting false documents owing to injuries or property damage in bogus accidents, or by requesting compensation for previous losses or excessive billing. In this study, we use machine learning to detect fraudulent auto-insurance claims. First, we apply feature selection methods to get only important features that can best predict auto-insurance claim frauds. We use the SMOTE oversampling method to balance our unbalanced dataset. We analyze data to detect fraudulent auto-insurance claims using multiple algorithms such as Naive Bayes, KNN, Random Forest, SVM, and Logistic Regression and compare the predictive accuracy of these classification algorithms. Title: Predicting Gold Trend Change using Astrology, and Stochastic Oscillator with Recurrent Neural Network Author(s) and Affiliation: Marcho Handoko, Minnesota state university Abstract: Fundamental and technical analyses are commonly used to make prediction in the stock market. However, it only predicts the price and not the time. For example, one popular trading method is by following trend. The problem with this method is its inability to forecast the timing of a trend change. By the time the traders realize it, it’s usually too late. Another problem in trading is not having fund available when the opportunity arises, which causes traders to lose the opportunity to enter the market. The purpose of this research is to predict the time of gold reversal date using astrology and Stochastic Oscillator with neural networks. The astrology aspect use in this study is planetary aspect. The data is trained using RNN. Title: Detection Of SLE Using Synthetic images of Butterfly Malar Rash on CNN: A Review Author(s) and Affiliation: Shourav Bikash Dey, Sonika Shrestha, Tharushi Modaragamage and Hadja Diomande, Minnesota State University, Mankato Abstract: Systemic Lupus Erythematosus (SLE) is the most common type of Lupus and according to the CDC it is diagnosed in about 200,000 adults in the United States. The diagnosis process of SLE is hard to determine and time-consuming as its cause is unknown and it mimics symptoms that are heterogeneous in nature. One of the known symptoms of SLE is a butterfly-shaped rash across the cheekbone and the nasal bridge known as the Butterfly Malar Rash (BMR). In this paper, we propose a Convolutional Neural Network (CNN) to detect SLE from facial images. Due to the lack of images that present the BMR available, in this work we present the necessity for a Generative Adversarial Network (GAN) model to artificially generate BMR images from publicly available images for better training purposes of the CNN model. Title: Customer retention in Telecom Industry using Data Science Author(s) and Affiliation: Jaswanth Vankayalapati and Rajeev Bukralia, Minnesota State University, Mankato Abstract: In today’s world the telecom customers have wide range of options when it comes to the network. They can easily port-in to other networks. This is one of the challenges faced by telecom industry for retaining those customers. In developing countries, most customers switch to other networks for low monthly deals and other add-ons offered by competing providers. In this paper, we examine the possibilities of customer retention in telecom industry using machine learning algorithms such as support vector machine (SVM) and Logistic regression. We implement the above algorithms using a the dataset from telecom industry. The outcome of this research will be helpful in identifying customers who are likely to change the telecom network. Title: Evaluating Machine Learning Algorithms for Malware Detection in Androids Author(s) and Affiliation: Venkata Siva Sai Babburi, Aynura Berdyyeva, Prasanna Muppidi and Connolly Spencer, Minnesota State University, Mankato Abstract: In the modern world, we are reliant on mobile applications for communication, entertainment, banking, and many other aspects of our life. As these applications have become an integrated part of our life, malware is also becoming a major security threat in the software world. Open-source platforms like Android have become a target of malware as the distribution of the software packages are not tested before publishing in the software system. In this research we focus on a static approach, specifically on signature-based detection of malware in Android devices. In static approach, signatures are generated for API calls made by the applications. Based on identified patterns of such signatures it is possible to detect whether an application is benign or malware. Therefore, our goal is to detect malware from the given dataset of benign and malware applications using multiple algorithms: Naive Bayes, kNN, Random Forest, SVM and Logistical Regression, and to find the best model for detecting malware by comparing the accuracy of these algorithms Title: Multivariate CNN-based Weather Prediction Author(s) and Affiliation: Matthew Miers and Rajeev Bukralia , Mankato State University, Mankato Abstract: Machine learning (ML) has not been widely deployed in weather forecasting models, even though there have been some promising studies. This preliminary investigation represents the initial insights into the efficacy and efficiency of a multivariate Convolutional Neural Network (CNN) based model for weather prediction. Through this investigation, this work proposes architectural frameworks specialized for multivariate CNNs. Additionally, this work examines some key limitations relating to the use of CNNs for weather prediction. Title: Deepfake Media Detection approach based on Deep Learning Author(s) and Affiliation: Aniruddha Tiwari, Rushit Dave, Mounika Vanamala, Minnesota State University, Mankato, University of Wisconsin, Eau Claire Abstract: Conspicuous progression in the field of machine learning (ML) and deep learning (DL) have led the jump of highly realistic fake media, these media oftentimes referred as deepfakes. Deepfakes are fabricated media which are generated by sophisticated AI that are at times very difficult to set apart from the real media. So far, this media can be uploaded to the various social media platforms, hence advertising it to the world got easy, calling for an efficacious countermeasure. Thus, one of the optimistic counter steps against deepfake would be deepfake detection. To undertake this threat, researchers in the past have created models to detect deepfakes based on ML/DL techniques like Convolutional Neural Networks (CNN). This paper aims to explore different methodologies with an intention to achieve a cost-effective model with a higher accuracy with different types of the datasets, which is to address the generalizability of the dataset. Title: Detecting Overlapping Gene Region Using UNET Attention Mechanism Author(s) and Affiliation: Samuel Lemma, Metro State University, Mankato Abstract: According to worldwide cancer data, there were an estimated 18.1 million cancer cases around the world in 2020. (Bray et al., 2018) This increases the need for a system to identify, treat, and prevent cancer. The issue of identifying cancer is solved by a system. which is a system that Doctors can utilize as a backup during a diagnosis. One typical problem with this approach is that it frequently misses chromosomes that are overlapping during the test, which has a significant impact on the results. We expect that our research has an impact on the testing process using neural networks’ U-NET attention mechanism. By making the procedure less tedious and producing results in a better and faster method, removing this hurdle will have a significant positive influence. Title: Mixed Reality Testing Framework for Autonomous Vehicles Software Author(s) and Affiliation: Abdelrahman Elkenawy , Qusai Fannoun, and Suboh Alkhushayni, Minnesota State University, Mankato Abstract: Testing autonomous vehicles (AVs) using closed testing facilities does not present complex testing scenarios, an interaction between automobiles and pedestrians, or any obstacle in the real world, which makes the testing unreliable. While on the other hand, testing in the real world includes all the complex elements that a vehicle may go through, and it has a higher possibility of risking people’s lives. And in order to improve road safety and eliminate the risk factor, a potential solution is needed. This paper introduces an improved framework that uses a mixed-reality environment to create testing scenarios with complex elements that eliminate the risk of getting someone’s life in danger. Title: Machine Learning based approach for Secure Software Development Author(s) and Affiliation: Keith Bryant, Alex Caravella, Mounika Vanamala and Rushit Dave, University of Wisconsin Eau Claire, Minnesota State University at Mankato Abstract: Cyber Security attacks are significantly growing in today’s modern world of technology and advanced software development. The inclusion of cyber security defense is vital in every phase of software development. Identifying and implementing key relevant cyber security vulnerability controls during the early stages of the software development life cycle, i.e., the requirement phase is very important. The Common Attack Pattern Enumeration & Classification (CAPEC) is a publicly available software repository from MITRE that currently lists 555 vulnerability attack patterns. As Cyber Security continues to exponentially grow in complexity, the importance of the Machine Learning role to automate the identification of vulnerabilities for various software development is paramount to aid software developers in creating protected software. In this research, we propose to develop a system to automatically map CAPEC attack patterns to software requirements in a Software Development Specification Document using Machine Learning Models. Title: Applying Neural Network Algorithms to Detect Credit Card Fraudulent Transactions Author(s) and Affiliation: Queen Booker, Gary Binns and Zara Juta, Metropolitan State University Abstract: Credit card fraud is a growing problem for both consumers and card issuers. Prior research has shown that neural networks are good candidates to identify fraudulent transactions but the prior research has used a significant number of variables to achieve the outcome. This research study examines and compares a feedforward neural network against regression for detecting fraudulent credit card transactions using only seven variables, specifically data available at point of sale. The preliminary results are promising, showing that the neural network was able to successfully detect an average of 95% of transactions as fraudulent or non-fraudulent with the limited number of variables. Title: GANs-based image layer identification Author(s) and Affiliation: Joan Elizabeth Lahiri, Minnesota State University – Mankato Abstract: Image recognition will benefit from using generative adversarial networks, or GANs, to pinpoint the many layers that filters have added to an image. This research suggests using GANs to recognize the many layers present in a photo that has been digitally manipulated. This will increase overall image recognition accuracy and reveal any or all image manipulations.

CADSCOM 2022 Highlights

The 4th Colloquium on Analytics, Data Science, and Computing (CADSCOM 2022) was held from 8:30 am to 4:30 pm CDT on Saturday, November 19 at Minnesota State University Mankato at Edina (7700 France Ave S; Edina, MN 55435). Google Maps Directions

In addition to research paper presentations, CADSCOM 2022 featured a keynote address, panel discussions, and student project showcase. CADSCOM 2024 was approved by the Association for Computing Machinery (ACM) as a chapter conference. We are proud to partner with the Minnesota State IT Center of Excellence, Metro State University, Minnesota State University Mankato, and MinnPoly for CADSCOM 2022. The top three CADSCOM 2022 research papers were recommended for fast-track review for the Journal of the Midwest Association for Information Systems (JMWAIS).

Publications Panel : Tips for Enhancing Research and Publications

Panelists: Dr. Alok Gupta, University of Minnesota; Dr. Rassule Hadidi, Metro State and MWAIS Journal; Dr. Mohammed Mokbel, University of Minnesota; Dr. Omar El-Gayar, Dakota State University; Dr. Deepak Khazanchi, University of Nebraska-Omaha; Moderator: Dr. Rajeev Bukralia, Minnesota State University, Mankato

Student Project Showcase: The project showcase is a new component this year that is additional to the peer-reviewed research papers. The showcase will be an opportunity for students to showcase their projects and connect with industry professionals. To participate in the project showcase, interested students should fill out the Project Showcase Interest Form form by November 6, 2022.

Conference Registration: Register for CADSCOM 2022 at Eventbrite by November 12. All participants (student authors and presenters, faculty authors, invited academic and industry guests, panelists, and attendees) must register through the Eventbrite site to attend the conference. The registration fee is $20 for faculty authors. Free registration for student authors, industry/academic guests, and attendees!

Accepted Papers

Title: Implications of Blockchain on Sustainability in the Global Fishing Industry

Author(s) and Affiliation: James Schulz and Steve Sorsen, Metropolitan State University, Mankato

Abstract: Sustainability allows businesses to reinvent their global food supply chain and addresses concerns regarding social concerns and customers’ needs at large. Digitalization, like blockchain technologies, can help identify issues within the food supply chain and allow real-time interventions. This research reviews case studies in the fishing industry that use blockchain technologies implemented into their digitally-enabled food supply chains and how blockchain technology contributes to sustainability in the fish food chain. Key findings suggest sustainability is a priority and supported using blockchain technology. This research contributes to the supply chain management field and identifies an opportunity in the fishery ecosystem.

Title: Understanding Telemarketing Sales Through Interpretable & Explainable Predictive Model

Author(s) and Affiliation: Sandesh Sharma and Rajeev Bukralia, Minnesota State University, Mankato

Abstract: Telemarketing is one of the convenient and effective methods of selling products and services to the customer. If not targeted to the right customer, these telemarketing calls may be perceived as irritating which might instead decrease the company’s value. Using a Portuguese bank telemarketing dataset, we implemented various machine learning algorithms to predict the right customer. We utilized an over-sampling method called SMOTE to mitigate the class imbalance problem. The LGBM model on the plain dataset scored the highest AUC of 0.80 than the over-sampled dataset, implicating that SMOTE might not add any benefits to complex ensemble tree methods. For model explainability, we implemented global as well as local explainer to streamline the decision-making process.

Title: Evaluating Ethics of Loot Boxes in Gaming

Author(s) and Affiliation: Benjamin Vossen and Hamdan Alabsi, Bemidji State University

Abstract: Loot Boxes in gaming are a common practice in gaming today that generates millions of dollars yearly. This paper examines scholarly literatures about the ethical issues related to loot boxes in gaming and then match the findings in the journal with popular news articles with anecdotes of the occurrences found in the journals. This research highlights on the technology background as well as the backlash from the public perception. In addition, this paper analyzes the changes occurred to the gaming industry that help in determining and identifying best ethical practice standards in the industry.

Title: Impact of JavaScript Attention Recognition Messaging and Calibration on Hardware Performance

Author(s) and Affiliation: Anthony Sanner and Michael Hart, Minnesota State University, Mankato

Abstract: Enabling users to detect eye contact during video conferencing has several associated challenges. The accuracy of gaze prediction along with the performance of supporting hardware in real-world environments are two issues this study investigates. Additionally, it explores how users are notified when accurate eye detection exists. Using the Design Science research methodology, the authors design and develop an eye tracking notification system in JavaScript with novel gaze calibration and messaging functionality. Calibration feedback allows end users to determine whether their eye contact messages are accurate. A custom benchmark is scripted that tests the new video conferencing features on a variety of workstation hardware and web browsers. Although high end user hardware performs well with the newly developed eye tracking features, low-end hardware suffers from time delays and video streaming deficiencies. Benchmark results highlight the need for eye tracking libraries to perform more efficiently for a broader array of end user hardware. Results are limited to benchmarks on three popular web browsers. Future studies should compare benchmarks of additional video conferencing software and a larger sample of commodity hardware.

Title: A Software Prototype and HIoTR Formula for Evaluating IoT Cybersecurity Risk in Home Networks

Author(s) and Affiliation: Andrew Wilcox and Michael Hart, Minnesota State University, Mankato

Abstract: Internet of Things (IoT) presents several information security challenges at the network layer of home-based computer networks. To advance solutions, this paper follows the Design Science methodology to construct an application prototype that manages IoT information security within local area network (LAN) topologies. The authors demonstrate the value of this application by proposing and testing a new formula, labeled HIoTR, which is capable of calculating the degree of information security risk of home networks containing IoT devices. Results indicate that the proposed formula is a beneficial measure of information security risk. Although the new risk formula shows promise, testing is limited to a small sample of IoT devices. Further exploration is necessary to test HIoTR on alternative IoT hardware and their coinciding network topology.

Title: Machine Learning Based Approach to Recommend MITRE ATT&CK Framework for Software Requirements and Design Specifications

Author(s) and Affiliation: Nicholas Lasky, Benjamin Hallis, Mounika Vanamala, Rushit Dave and Naeem Seliya, Minnesota State University at Mankato, University Of Wisconsin Eau Claire

Abstract: Engineering more secure software has become a critical challenge in the cyber world. It is very important to develop methodologies, techniques, and tools for developing secure software. To develop secure software, software developers need to think like an attacker through mining software repositories. These aim to analyze and understand the data repositories related to software development. The main goal is to use these software repositories to support the decision-making process of software development. There are different vulnerability databases like Common Weakness Enumeration (CWE), Common Vulnerabilities and Exposures database (CVE), and CAPEC. We utilized a database called MITRE. MITRE ATT&CK tactics and techniques have been used in various ways and methods, but tools for utilizing these tactics and techniques in the early stages of the software development life cycle (SDLC) are lacking. In this paper, we use machine learning algorithms to map requirements to the MITRE ATT&CK database and determine the accuracy of each mapping depending on the data split.

Title: How technology may be used for future disease prediction: A Systematic Literature Review

Author(s) and Affiliation: Rich Manprisio, Governors State University in University Park, Illinois

Abstract: Exasperated by the current pandemic, our healthcare system continues to struggle with the accuracy and effectiveness of disease treatments. However, despite these growing challenges, technological advancements have aided potential disease prediction. There has been a positive correlation between utilizing technologies and leveraging them for disease predictions. Thanks to our continued reliance and technological advancement, current research shows that it has many viable options to aid the healthcare field. This systematic review looks at the current state of how technologies have been and can be used to improve healthcare.

Title: Let’s-Go-A-Phishing: A statistical evaluation of URLs for cybersecurity analytics

Author(s) and Affiliation: Taiwo Olaleye, Agbaegbu JohnBosco, Olayemi O. Sadare,Adekunle M. Azeez, Azeez A. Opatunji, Ayobami A. Tewogbade, Saminu A. Akintunde, Federal University of Agriculture, Abeokuta, Osun State University, Elerinmosa Institute of Technology, The Nigeria Police Force

Abstract: The website phishing debacle continues to dominate discuss in the academia and the cyber security industry, despite several proposed state-of-the-arts conceptualized to mitigate the trend. The development has become prominent in the age of high internet penetration when innocent users throng the internet for legitimate reasons but oblivious of the malicious tendencies of criminals who mimics URLs and website domains to make unsuspecting audience vulnerable to cybercrimes. Whereas predictive analytics based solutions continue to dominate cyber security studies with respect to detecting phishing tendencies, studies seldom consider descriptive statistical analysis of feature attributes prior to modelling of conceptual frameworks. This study is therefore motivated by the aforementioned in order to establish most prominent attributes from a mendeley phishing website database released recently. The information gain analysis of the dataset returns five most prominent independent variables which are used to train Naïve Bayes and a Neural Network. Experimental result of the statistical analysis returns the slash (/) character as the most discriminative attribute with strong positive correlation with the ground truth. Malicious phishing websites are observed to contain more dot (.) and slash (/) characters, as well as a higher directory length.

Title: Sentiment-aware Data Analytics for Software Defect Severity Prediction

Author(s) and Affiliation: Wasiu Akanji, Elizabeth Abioye, Taiwo Olaleye, Emmanuel Ezeako, Aanuoluwa Adio, and Ayobami Tewogbade, Lagos State University of Science and Technology , Bells University of Technology, Ota , Federal University of Agriculture Abeokuta , Enugu State University of Science and Technology , Redeemer’s University, Elerinmosa Institute of Technology

Abstract: Quality assurance is an integral factor in a software development life cycle, notwithstanding the software development process model employed in the production of a software. Attempts to prioritize the correction of identified software defects necessitates the classification of defects into various severity levels. Planning for fixing identified abnormalities, subsequent testing, and proper resource allocation, are all functions of an accurate severity assessment method. Existing studies have variously employed diverse techniques for severity classification, particularly with the adoption of natural language processing techniques on defect reports. Studies that deploys sentiment analysis for determining severity levels however does not factor germane considerations in their conceptual methodologies, which is the motivating factor of this study. This paper proposes a sentiment-aware data science approach to ascertain the implication of word count on the severity class of defect reports with respect to the emotion of the reporter and as well establish the inference when defect titles are employed for predictive analytics. Experimental result demonstrates the efficiency of defect titles for predicting severities and the fact that the length of a software defect title is directly proportional to the severity level of the defect.

Title: Text Analysis of Diversity Reports for Benchmark Analysis

Author(s) and Affiliation: Esmeralda Perez-Gomez and Frank Lee, Georgia State University

Abstract: This paper analyzes industry Diversity, Equity, and Inclusion (DEI) reports using topic modeling techniques to create a standard of common practices. This study uses Latent Dirichlet Allocation (LDA) to discover topics and their word distributions to help gain an overview of the most addressed areas in DEI.

Title: Comparing Traditional Econometric ARIMA and RNN’s to Forecast CPI

Author(s) and Affiliation: Mohammad Mazhar, Minnesota State University

Abstract: This paper attempts to compare traditional econometric model called auto-regressive integrated moving average known as ARIMA and Recurrent Neural Networks (RNN) to forecast Consumer Price Index (CPI). Much of macroeconomic decision making depends on Inflationary pressures, such as laying off employees and increasing or decreasing federal discount rates, that can lead to economic growth being affected positively or negatively. Thus, it is necessary to track changes in inflationary pressures by forecasting changes in CPI to make better monetary and fiscal policy decisions. The RNN models have proven to be more accurate in the long-term, while the conventional ARIMA-based econometric models have shown a better performance in short-term timeseries forecasting. Sentiment analysis from Twitter is shown to have a correlation with federal interest rates. In this study, we analyze data from S&P 500 index, 3-month treasury bonds rate, USD index, GDP growth rate, unemployment rate, federal discount rate using ARIMA and RNN, and we compare the accuracy of both models for forecasting short-term and long-term CPI. In addition, we examine whether Twitter sentiments can influence the forecasting accuracy

Title: Classification and Prediction of Savant Syndrome using Machine Learning

Author(s) and Affiliation: Zelalem Denekew, Minnesota State University, Mankato

Abstract: The use of machine learning (ML) to identify and classify individuals with autism spectrum disorder (ASD) has recently gained more popularity in research. Despite the growing interest in ASD, research has not been as widely conducted in the identification and classification of individuals with Savant Syndrome. There have been various theories as to what causes individuals to obtain Savant level skills. D. A. Treffert’s “the 3 R’s: recruitment, rewiring, release” theory has been the more widely accepted and used as a foundation for further research. This work proposes that by implementing similar methods that have been used in the research of identifying and classifying individuals with ASD, we would be able to identify the individuals with savant syndrome, their skills and the capacity at which those skills would be performed with some degree of certainty

Title: Exploration of Machine Learning techniques and Touch Dynamics for Continuous User Authentication

Author(s) and Affiliation: Silverio Mirao, Mounika Vanamala and Rushit Dave, University of Wisconsin Eau Claire, Minnesota State University at Mankato

Abstract: The rise and evolution in mobile technology has led to a greater application of mobile and cellular devices in everyday life. Because a great majority of these devices incorporate touch screen-based technologies, many use these devices to commonly access private or personal data such as banking and identity. As a result of such information being so prevalent and accessible to the common person by such simple means, the rise of fraudulent behavior has been a target of said devices. To combat these threats to personal security, a demand for device and user authentication has sparked innovation for new security features for touch screen mobile devices. Touch Dynamics in particular, refer to the biometric behaviors observed by how a user interacts with a touch screen device. Through this, patterns can be discovered and in turn, users can be identified and recognized as either a user for which the specified device was intended for or a fraud. This can be further used to classify human behavior verses that of a robot, since it is difficult to imitate human behavior. In this research a survey is conducted on different types of machine learning algorithms that will be compared against each other to see which has the best accuracy in detecting authentic users and imposters. Each algorithm will be given some user made inputs and will be observed to see if they will be able to decide on the correct user.

Title: Overview of a Survey on Deepfake Detection Methods

Author(s) and Affiliation: Natalie Krueger, Mounika Vanamala and Rushit Dave, University of Wisconsin Eau Claire, Minnesota State University at Mankato

Abstract: A deepfake is an engineered photo or video of a person in which their image has been altered or replaced with an image of someone else. Some types of deepfakes include face-swapping (switching an image of a face with another), lip syncing (an audio method where the real audio is replaced), and face synthesis (creating a fake image of a face by altering features of a real face image). Deepfakes have the potential to cause a variety of problems and are often used maliciously. A common usage is altering videos of prominent political figures and celebrities. These deepfakes can portray them making offensive, problematic, and/or untrue statements. Current deepfakes can be very realistic, and when used in this way, can spread panic and even influence elections and political opinions. Even more concerning is that deepfakes are easier to produce than ever, and even someone with very little knowledge of technology can use premade software to create them. So, in this project we will survey a variety of current methods and advances in the field of deepfake detection.

Title: Real-time Object Classifier using TensorFlow Object Detection API

Author(s) and Affiliation: Likhitha Tubati and Rajeev Bukralia, Minnesota State University

Abstract: Object detection is the ability to find objects such as buildings, human faces, and animals in images and videos. Object detection is widely used in surveillance, image retrieval, and self-driving cars. We use TensorFlow object detection API, which is built on the top of the TensorFlow, a popular machine learning framework, to train a classifier to accurately identify objects in images.

Title: Evaluating Machine Learning Algorithms for Auto Insurance Fraud Detection

Author(s) and Affiliation: Prasanna Muppidi and Rajeev Bukralia, Minnesota State University, Mankato

Abstract: There has been a significant growth in fraudulent insurance claims by policyholders. Auto insurance fraud occurs when a customer attempts to obtain financial benefits by submitting false documents owing to injuries or property damage in bogus accidents, or by requesting compensation for previous losses or excessive billing. In this study, we use machine learning to detect fraudulent auto-insurance claims. First, we apply feature selection methods to get only important features that can best predict auto-insurance claim frauds. We use the SMOTE oversampling method to balance our unbalanced dataset. We analyze data to detect fraudulent auto-insurance claims using multiple algorithms such as Naive Bayes, KNN, Random Forest, SVM, and Logistic Regression and compare the predictive accuracy of these classification algorithms.

Title: Predicting Gold Trend Change using Astrology, and Stochastic Oscillator with Recurrent Neural Network

Author(s) and Affiliation: Marcho Handoko, Minnesota state university

Abstract: Fundamental and technical analyses are commonly used to make prediction in the stock market. However, it only predicts the price and not the time. For example, one popular trading method is by following trend. The problem with this method is its inability to forecast the timing of a trend change. By the time the traders realize it, it’s usually too late. Another problem in trading is not having fund available when the opportunity arises, which causes traders to lose the opportunity to enter the market. The purpose of this research is to predict the time of gold reversal date using astrology and Stochastic Oscillator with neural networks. The astrology aspect use in this study is planetary aspect. The data is trained using RNN.

Title: Detection Of SLE Using Synthetic images of Butterfly Malar Rash on CNN: A Review

Author(s) and Affiliation: Shourav Bikash Dey, Sonika Shrestha, Tharushi Modaragamage and Hadja Diomande, Minnesota State University, Mankato

Abstract: Systemic Lupus Erythematosus (SLE) is the most common type of Lupus and according to the CDC it is diagnosed in about 200,000 adults in the United States. The diagnosis process of SLE is hard to determine and time-consuming as its cause is unknown and it mimics symptoms that are heterogeneous in nature. One of the known symptoms of SLE is a butterfly-shaped rash across the cheekbone and the nasal bridge known as the Butterfly Malar Rash (BMR). In this paper, we propose a Convolutional Neural Network (CNN) to detect SLE from facial images. Due to the lack of images that present the BMR available, in this work we present the necessity for a Generative Adversarial Network (GAN) model to artificially generate BMR images from publicly available images for better training purposes of the CNN model.

Title: Customer retention in Telecom Industry using Data Science

Author(s) and Affiliation: Jaswanth Vankayalapati and Rajeev Bukralia, Minnesota State University, Mankato

Abstract: In today’s world the telecom customers have wide range of options when it comes to the network. They can easily port-in to other networks. This is one of the challenges faced by telecom industry for retaining those customers. In developing countries, most customers switch to other networks for low monthly deals and other add-ons offered by competing providers. In this paper, we examine the possibilities of customer retention in telecom industry using machine learning algorithms such as support vector machine (SVM) and Logistic regression. We implement the above algorithms using a the dataset from telecom industry. The outcome of this research will be helpful in identifying customers who are likely to change the telecom network.

Title: Evaluating Machine Learning Algorithms for Malware Detection in Androids

Author(s) and Affiliation: Venkata Siva Sai Babburi, Aynura Berdyyeva, Prasanna Muppidi and Connolly Spencer, Minnesota State University, Mankato

Abstract: In the modern world, we are reliant on mobile applications for communication, entertainment, banking, and many other aspects of our life. As these applications have become an integrated part of our life, malware is also becoming a major security threat in the software world. Open-source platforms like Android have become a target of malware as the distribution of the software packages are not tested before publishing in the software system. In this research we focus on a static approach, specifically on signature-based detection of malware in Android devices. In static approach, signatures are generated for API calls made by the applications. Based on identified patterns of such signatures it is possible to detect whether an application is benign or malware. Therefore, our goal is to detect malware from the given dataset of benign and malware applications using multiple algorithms: Naive Bayes, kNN, Random Forest, SVM and Logistical Regression, and to find the best model for detecting malware by comparing the accuracy of these algorithms

Title: Multivariate CNN-based Weather Prediction

Author(s) and Affiliation: Matthew Miers and Rajeev Bukralia , Mankato State University, Mankato

Abstract: Machine learning (ML) has not been widely deployed in weather forecasting models, even though there have been some promising studies. This preliminary investigation represents the initial insights into the efficacy and efficiency of a multivariate Convolutional Neural Network (CNN) based model for weather prediction. Through this investigation, this work proposes architectural frameworks specialized for multivariate CNNs. Additionally, this work examines some key limitations relating to the use of CNNs for weather prediction.

Title: Deepfake Media Detection approach based on Deep Learning

Author(s) and Affiliation: Aniruddha Tiwari, Rushit Dave, Mounika Vanamala, Minnesota State University, Mankato, University of Wisconsin, Eau Claire

Abstract: Conspicuous progression in the field of machine learning (ML) and deep learning (DL) have led the jump of highly realistic fake media, these media oftentimes referred as deepfakes. Deepfakes are fabricated media which are generated by sophisticated AI that are at times very difficult to set apart from the real media. So far, this media can be uploaded to the various social media platforms, hence advertising it to the world got easy, calling for an efficacious countermeasure. Thus, one of the optimistic counter steps against deepfake would be deepfake detection. To undertake this threat, researchers in the past have created models to detect deepfakes based on ML/DL techniques like Convolutional Neural Networks (CNN). This paper aims to explore different methodologies with an intention to achieve a cost-effective model with a higher accuracy with different types of the datasets, which is to address the generalizability of the dataset.

Title: Detecting Overlapping Gene Region Using UNET Attention Mechanism

Author(s) and Affiliation: Samuel Lemma, Metro State University, Mankato

Abstract: According to worldwide cancer data, there were an estimated 18.1 million cancer cases around the world in 2020. (Bray et al., 2018) This increases the need for a system to identify, treat, and prevent cancer. The issue of identifying cancer is solved by a system. which is a system that Doctors can utilize as a backup during a diagnosis. One typical problem with this approach is that it frequently misses chromosomes that are overlapping during the test, which has a significant impact on the results. We expect that our research has an impact on the testing process using neural networks’ U-NET attention mechanism. By making the procedure less tedious and producing results in a better and faster method, removing this hurdle will have a significant positive influence.

Title: Mixed Reality Testing Framework for Autonomous Vehicles Software

Author(s) and Affiliation: Abdelrahman Elkenawy , Qusai Fannoun, and Suboh Alkhushayni, Minnesota State University, Mankato

Abstract: Testing autonomous vehicles (AVs) using closed testing facilities does not present complex testing scenarios, an interaction between automobiles and pedestrians, or any obstacle in the real world, which makes the testing unreliable. While on the other hand, testing in the real world includes all the complex elements that a vehicle may go through, and it has a higher possibility of risking people’s lives. And in order to improve road safety and eliminate the risk factor, a potential solution is needed. This paper introduces an improved framework that uses a mixed-reality environment to create testing scenarios with complex elements that eliminate the risk of getting someone’s life in danger.

Title: Machine Learning based approach for Secure Software Development

Author(s) and Affiliation: Keith Bryant, Alex Caravella, Mounika Vanamala and Rushit Dave, University of Wisconsin Eau Claire, Minnesota State University at Mankato

Abstract: Cyber Security attacks are significantly growing in today’s modern world of technology and advanced software development. The inclusion of cyber security defense is vital in every phase of software development. Identifying and implementing key relevant cyber security vulnerability controls during the early stages of the software development life cycle, i.e., the requirement phase is very important. The Common Attack Pattern Enumeration & Classification (CAPEC) is a publicly available software repository from MITRE that currently lists 555 vulnerability attack patterns. As Cyber Security continues to exponentially grow in complexity, the importance of the Machine Learning role to automate the identification of vulnerabilities for various software development is paramount to aid software developers in creating protected software. In this research, we propose to develop a system to automatically map CAPEC attack patterns to software requirements in a Software Development Specification Document using Machine Learning Models.

Title: Applying Neural Network Algorithms to Detect Credit Card Fraudulent Transactions

Author(s) and Affiliation: Queen Booker, Gary Binns and Zara Juta, Metropolitan State University

Abstract: Credit card fraud is a growing problem for both consumers and card issuers. Prior research has shown that neural networks are good candidates to identify fraudulent transactions but the prior research has used a significant number of variables to achieve the outcome. This research study examines and compares a feedforward neural network against regression for detecting fraudulent credit card transactions using only seven variables, specifically data available at point of sale. The preliminary results are promising, showing that the neural network was able to successfully detect an average of 95% of transactions as fraudulent or non-fraudulent with the limited number of variables.

Title: GANs-based image layer identification

Author(s) and Affiliation: Joan Elizabeth Lahiri, Minnesota State University – Mankato

Abstract: Image recognition will benefit from using generative adversarial networks, or GANs, to pinpoint the many layers that filters have added to an image. This research suggests using GANs to recognize the many layers present in a photo that has been digitally manipulated. This will increase overall image recognition accuracy and reveal any or all image manipulations.

CADSCOM2021

Event Program (PDF)

Accepted Research Papers

Keynote Video

Call for Papers

CADSCOM 2021 Highlights
The 3rd Colloquium on Analytics, Data Science, and Computing (CADSCOM 2021) was held virtually from 8:30 am to 5:30 pm CDT on March 20. Download the printable copy (PDF) of the CADSCOM 2021 event program. In addition to research paper presentations, CADSCOM 2021 featured the keynote address, panel discussions, and invited talk. All research papers were peer reviewed (double anonymous) for quality. Keynote Address: Dr. Radhika Kulkarni, former VP of R&D of SAS and President-Elect of INFORMS, delivered the keynote address titled, “Machine Learning, Artificial Intelligence and Optimization: Opportunities for Inter-Disciplinary Collaboration” Panel Discussion I: Data Science & AI: Current Challenges and Future Frontiers (Panelists: Tonio Lora, Microsoft; Mac Noland, phData; James Harroun, SAS; Dan Atkins, MinneAnalytics; Moderator: Dr. Rajeev Bukralia, Minnesota State University, Manakto) Panel Discussion II: Inclusion & Equity in Data Science and Analytics (Panelists: Kate Bischoff; Alycia Holwerda, IBM; Dr. Bonnie Holub, Teradata; Diana DeSoysa, Optum; Anahita Bahrami, IL Institute of Tech; Moderator: Dr. Queen Booker, Metro State University) Panel Discussion III: Tips for Enhancing Research and Publications (Panelists: Dr. Rassule Hadidi, Metro State and MWAIS Journal; Dr. Amit Deokar, University of Massachusetts, Lowell; Dr. Mohammed Mokbel, University of Minnesota; Moderator: Dr. Sarah Kruse, Minnesota State University, Mankato) Invited Talk: Remote Learning and Teaching Post COVID-19 (Speaker: Dr. Mohammed Ali) Top CADSCOM 2021 Papers: The following three papers were selected after two separate rounds of peer reviews. These top papers were recommended for fast-track review for the Journal of the Midwest Association for Information Systems (JMWAIS): Detecting Online Review Fraud Using Sentiment Analysis (Bryn Caron & Rajeev Bukralia) Using Prototyping to Teach Design Thinking (Mary Lebens) What Do the Twitter Sentiments Say About the COVID-19 Vaccine? (Ilma Sheriff & Naseef Mansoor)

CADSCOM 2021 Highlights

The 3rd Colloquium on Analytics, Data Science, and Computing (CADSCOM 2021) was held virtually from 8:30 am to 5:30 pm CDT on March 20. Download the printable copy (PDF) of the CADSCOM 2021 event program.

In addition to research paper presentations, CADSCOM 2021 featured the keynote address, panel discussions, and invited talk. All research papers were peer reviewed (double anonymous) for quality.

Keynote Address: Dr. Radhika Kulkarni, former VP of R&D of SAS and President-Elect of INFORMS, delivered the keynote address titled, “Machine Learning, Artificial Intelligence and Optimization: Opportunities for Inter-Disciplinary Collaboration”
Panel Discussion I: Data Science & AI: Current Challenges and Future Frontiers (Panelists: Tonio Lora, Microsoft; Mac Noland, phData; James Harroun, SAS; Dan Atkins, MinneAnalytics; Moderator: Dr. Rajeev Bukralia, Minnesota State University, Manakto)
Panel Discussion II: Inclusion & Equity in Data Science and Analytics (Panelists: Kate Bischoff; Alycia Holwerda, IBM; Dr. Bonnie Holub, Teradata; Diana DeSoysa, Optum; Anahita Bahrami, IL Institute of Tech; Moderator: Dr. Queen Booker, Metro State University)
Panel Discussion III: Tips for Enhancing Research and Publications (Panelists: Dr. Rassule Hadidi, Metro State and MWAIS Journal; Dr. Amit Deokar, University of Massachusetts, Lowell; Dr. Mohammed Mokbel, University of Minnesota; Moderator: Dr. Sarah Kruse, Minnesota State University, Mankato)
Invited Talk: Remote Learning and Teaching Post COVID-19 (Speaker: Dr. Mohammed Ali)

Top CADSCOM 2021 Papers: The following three papers were selected after two separate rounds of peer reviews. These top papers were recommended for fast-track review for the Journal of the Midwest Association for Information Systems (JMWAIS):

Detecting Online Review Fraud Using Sentiment Analysis (Bryn Caron & Rajeev Bukralia)
Using Prototyping to Teach Design Thinking (Mary Lebens)
What Do the Twitter Sentiments Say About the COVID-19 Vaccine? (Ilma Sheriff & Naseef Mansoor)

Keynote Address Video: Dr. Radhika Kulkarni

Accepted Research Papers

Title: Immutable Infrastructure with Actionable Monitoring on Containers (Kubernetes)
Author(s) and Affiliation: Mizan Hemani, Minnesota State University, Mankato
Abstract: With the dawn of cloud computing and the growing popularity of containers that run applications and microservices – it has become easier to build new architectures that are deployable as smaller cohesive segments that are highly scalable. Having this container level deployment makes it easier to manage deployments between different environments, however, it carries forward the existing behaviors of directly interacting with the server, while avoiding the pre-configured deployment pipeline – potentially creating a drift in configuration and exposing the system to security vulnerabilities. In this paper, we explore the lack of immutability in a container infrastructure by monitoring audit level logs of interactions with Kubernetes to perform actions on established policies. By leveraging such policies, this paper proposes a pattern that can ensure an intact infrastructure and re-enforce good security and system maintenance principles.

Title: Delay Tolerant Network Security
Author(s) and Affiliation: Rishabh Yata, Minnesota State University, Mankato
Abstract: A delay-tolerant network or DTN is a store and forward network where end-to-end communication is not assumed and where data transmission is performed using opportunistic connections between nodes. DTN is a sparse wireless network that has recently been used by the existing network to link devices or the underdeveloped world in a challenging environment. In any protected environment, such as the military, the network security protocol is often needed. In DTN, the complete path from resource to target does not exist for the most part, which contributes to the difficulty of routing the packet in such an area. For the large implementation of delay-tolerant networks, protection and privacy are essential. People are hesitant to consider such a new network model without protection and privacy assurances. Therefore, in this paper, I plan to discuss various security, as well as cryptography concepts and protocols which are currently in use and propose some promising enhancement concepts to DTN security.

Title: Using Prototyping to Teach Design Thinking
Author(s) and Affiliation: Mary Lebens, Metropolitan State University
Abstract: Companies using design thinking increase revenues and shareholder returns at almost double the rate of their industry peers, yet more than 90% of companies do not employ design thinking, in part due to a lack design skills in the workforce. Adding design thinking to the curriculum is imperative to address this skills gap. Most research emphasizes developers and users physically working together, so it is significant to learn whether online students who are never physically present together in the classroom can successfully learn design thinking skills. This study examines whether students in an “asynchronous online” undergraduate systems analysis course can successfully apply user-centered design standards to develop a system prototype. Additionally, the study examines if students are able to provide substantive feedback to their peers on their prototypes while participating in an iterative review process. The study method employed a model for prototype design, review, and assessment. The study demonstrates that over two course sections, the majority of students in an asynchronous online course successfully developed web prototypes that employed user-centered design, as well as effectively providing feedback to peers on their prototypes during an iterative review process. The implication is faculty can feel confident in employing design thinking and prototyping in asynchronous online courses to teach these valuable skills.

Title: Evaluation of P2P Loan Default Detection Models
Author(s) and Affiliation: Queen E. Booker , Metropolitan State University and Mousumi Munmun, Metropolitan State University
Abstract: The Peer-to-Peer (P2P) lending model is exploding in the US economy. A robust charge off/default detection method is needed to improve the quality of the P2P lending market and establish a more sustainable industry. The study specifically compares the Zhang (2020) Logistic Regression (LR) model to a Deep Learning Neural Network (DLNN) and Naïve Bayes (NB). However, based on the Lending Club dataset and Zhang’s (2020) variables, no model was particularly effective at detecting potentially bad loans.

Title: What do the Twitter sentiments say about the COVID-19 Vaccine?
Author(s) and Affiliation: Ilma Sheriff, Computer Information Science, Minnesota State University, Mankato and Naseef Mansoor, Minnesota State University, Mankato
Abstract: The coronavirus disease (COVID-19) pandemic led to substantial public discussion. Understanding these discussions can help institutions and individuals navigate through this pandemic. In this paper, we analyze and investigate the twitter sentiments toward COVID-19 vaccine. Starting from a publicly available twitter dataset on COVID-19 vaccine from Kaggle, we create a unified dataset containing data about public sentiments, sentiment scores, and COVID-19 cases for various U.S. states. To generate a sentiment scores from the tweets, we have applied a Valence Aware Dictionary and sEntiment Reasoner (VADER) sentiment analyzer. These scores were then classified to positive, negative, and neutral sentiment classes using a simple threshold-based classifier. From our analysis, we observe that in our dataset around 41.93% of the tweets are positive, 17.64% tweets are negative, and 40.42% tweets are neutral. We also analyzed the data based on geographic locations of the tweets to answer the following questions – 1) Is there any relationship between the number of tweets and the number of COVID-19 cases? 2) Is there any shift in the public sentiment after the approval of the vaccine? Our analysis shows high correlation between the number of tweets and the number of COVID-19 cases as well as a decrease in negative sentiment after the approval of the vaccine.

Title: Automated stock recommendations using Financial Indicators and Machine Learning (Full Paper PDF)
Author(s) and Affiliation: Utkarsh Sharma, ASET, Amity University and Simran Gogia, ASET, Amity University
Abstract: Stock market is suggested and regarded as one of the high-yielding long-term investments, yet a majority of people don’t capitalize on the same. Dubious advice and attempts to ‘beat the market’ usually give rise to skepticism and distrust among first-time investors. This paper proposes a subjective, low-risk stock market advising platform that leverages Machine Learning clustering (K-Means) on basic Financial Indicators that are used to track the performance of stocks in the exchange to serve as an aid in investment decision, particularly for first-time investors. The results suggest that clustering-powered subjective recommendations can prove to be a low-risk advising tool.

Title: Strategies that Guide the Availability, Information Security, and Scalability of Future Wireless Sensor Networks (WSNs)
Author(s) and Affiliation: Sapumal Darshana Salpadoru Thuppahi, Minnesota State university Mankato and Michael Hart, Minnesota State university Mankato
Abstract: Wireless Sensor Networks (WSNs) facilitate the opportunity for industries to manage vast amounts of sensors over various types of computer networks. New WSN research indicates several advantages for industries currently not using its associated technological advancements. To help these industries, the authors outline guidance that help inform future WSN implementation frameworks. Using this guidance, the authors propose an iteration of a new WSN model for agriculture. The prototype addresses several needs, including high availability, information security, and scalability of wireless sensor networks using commodity hardware often present in this industry.

Title: Twitter Data Analysis about COVID-19 Vaccines using Sentiment Analysis
Author(s) and Affiliation: Maharu Chamara Wickramarathne, Minnesota State university Mankato
Abstract: The world took tremendous measures to find a cure for COVID-19. After multiple attempts at vaccines against the virus, two vaccines got approved by Food and Drug Administration (FDA) and World Health organization to distribute in USA. They are the Pfizer/BioNTech COVID-19 vaccine and Moderna COVID-19 vaccine. But people are curious of lot questions about the vaccines (“What are the side effects?”). Addressing answers to these questions and doubts are necessary for successful vaccination of the people. This research is addressing to answer these questions using twitter data. Twitter data was analyzed by mining two thousand tweets (hash tag by vaccine name) in Minnesota State for each vaccine. These tweets revealed most people’s opinion about the vaccine and how well they performed. Twitter data mining and cleaning procedures in R was used to get a better insight. Use of Word Cloud data visualization technique and Sentimental Analysis methods helped to explore those questions among the people in Minnesota.

Title: The Impact of AES Encryption on SCADA Systems for Electrical Distribution that Contain HDFS Architecture
Author(s) and Affiliation: Justin Wren and Michael Hart, Minnesota State University, Mankato
Abstract: Supervisory Control and Data Acquisition (SCADA) systems for electrical utility companies have an increasing need to provide additional insight into smart grid data. A significant contingency is the ability to design information security and big data architecture into IT infrastructure that demands minimal network latency. This study explores an IT infrastructure design for electrical generating stations that have the capability to stream encrypted internal SCADA data to a Hadoop Distributed File System (HDFS). Using the design science research methodology, the authors designed and implemented an IT critical infrastructure that uses the Advanced Encryption Standard (AES) between primary SCADA systems and intelligent electronic devices (IEDs). Results illustrate a marginal difference in network packet latency between security gateways that load balance individual relays to IEDs and single instance security gateways that handle all relays to IEDs using a LAN substation. Despite the introduction of network latency, the proposed critical IT infrastructure design decreases the amount of unencrypted data in SCADA environments and could allow streaming data securely to HDFS. Findings emphasize that carefully designing security gateways and encryption in SCADA systems is a viable and necessary step when considering streaming data from IEDs to big data environments.

Title: Blockchain in COVID-19 Vaccine Distribution
Author(s) and Affiliation: Tiati Thelen, Minnesota State University, Mankato and Rajeev Bukralia, Minnesota State University, Mankato
Abstract: Supply chain management has started utilizing blockchain technology to access information from the start of production to the consumer. Blockchains create records of consistent information. Recently, blockchain technology has been introduced into the pharmaceutical supply chain to track temperatures of vaccines from production to patient. Additionally, IoT (Internet of Things) assists blockchains by utilizing embedded sensors and software to supply blockchains with the pertinent information. It is vital because vaccines are temperature sensitive. This research provides the foundations to consider these technologies in the domain of the COVID-19 vaccine which is unique such that many are produced in two doses. This paper contributes a systematic review of previous works and how it can effectively be advanced to the COVID-19 vaccine supply

Title: Detecting Online Review Fraud Using Sentiment Analysis
Author(s) and Affiliation: Bryn Caron, Minnesota State University, Mankato and Rajeev Bukralia, Minnesota State University, Mankato
Abstract: With the exponential increase in e-commerce, online reviews have become integral to the marketing of products and services. Customers are inclined to buy products and services that have received high ratings and positive reviews. Consequently, fake reviews are increasingly becoming a way to mislead customers into trusting, or mistrusting, the credibility and reliability of a product or service. Though online fake reviews have garnered some attention from the media and research communities, there is a need for effective technical solutions for detecting, and therefore mitigating, fraudulent reviews to improve consumer confidence in e-commerce. The purpose of this study is to explore the use of natural language processing techniques in detecting fake online reviews. We analyze the text of online reviews for various book titles. We investigate the accuracy of the polarity score, a common metric used in sentiment analysis, in the context of the star rating of the reviews. Our findings conclude that the polarity score is not a reliable measure for detecting fake reviews. In addition, the study sheds light on the limitations of sentiment analysis in detecting fake reviews.

Title: Ensemble Learning for Authorship Verification
Author(s) and Affiliation: Abdul Wahab Mohammad, Minnesota State University, Mankato and Dr. Michael Hart, Minnesota State University, Mankato
Abstract: Authorship verification is the task in which the author of a given text is identified. In this paper, the author proposes two novel methods to identify authors of the text on two different benchmark datasets namely C50 dataset and Guternberg dataset. The author used BERT which is the state-of-the-art NLP model with Siamese networks and tf-idf with attention models. The BERT model has shown very good results on the training data, but it did not generalize well on the testing data. However, the model with tf-idf and attention mechanism has managed to achieve comparable to state-of-the-art results on C50 dataset. This paper also discusses how word2vec based preprocessing approach works in identifying authors via Siamese networks.

Title: Chatbot Knowledge Retrieval Supported by Forums
Author(s) and Affiliation: Michael A. Nyakonu, Metropolitan State University
Abstract: In the paper we will be looking at how implementing a chatbot system that has a dynamically growing pool of knowledge can be developed. We shall look at how at a forum’s structure can be used as a source of infinite knowledge. The answers will be derived through web crawling. In return we hope to demonstrate a new model that provides infinite knowledge base to the chatbot developers

Title: Game Prediction Model(s) for the National Basketball Association
Author(s) and Affiliation: Qin Sun, Minnesota State University, Mankato and Logan Cook, Minnesota State University, Mankato
Abstract: According to Forbes statistics, there are 750 million families watching National Basketball Association (NBA for short) games in 212 countries. The NBA has become the most globalized and influential professional sports organization in the world. As a sports league with an annual revenue of more than 4 billion U.S. dollars, predicting the outcome of NBA games is an interesting thing with great commercial value. In this article, we selected the team and player data for all seasons of the NBA from 2004 to 2020, using the R language, with thirty different data splits to bring thirty different accuracy to each model. Our conclusion shows that K-Nearest Neighbor Classifier has lowest prediction accuracy during these 4 models, while the SVM classifier has the most accurate effect.

Title: Roadmap Comparison: Telehealth and NIST
Author(s) and Affiliation: Pamal Wanigasinghe,Minnesota State University, Mankato and Sarah Klammer Kruse, Minnesota State University, Mankato
Abstract: Telehealth has great potential to increase patient access to health services, decrease costs, and improve individual and public wellbeing. In order to fully realize these advantages, patients need to be assured that their health-related data will be protected, and providers must take responsibility for the security and integrity of the data gathered. Adoption and use of telehealth could be reduced or delayed if security risks are not adequately addressed. As the popularity of telehealth increases, it is important to emphasize information security for this emerging healthcare technology. A successful telehealth security plan should include all aspects of security including the underlying frameworks, policies, and education of providers and patients. This paper explores security and privacy risks of telehealth and compares the telehealth roadmaps from two organizations to the recommendations given in the Roadmap for Advancing the NIST Privacy Framework.

Title: An OS Benchmark Design to Compare SQL Load on Distributed Big Data Systems
Author(s) and Affiliation: Michael Hart, Minnesota State University, Mankato
Abstract: Although vendors publish key benchmarks of big data systems, under typical industry load and fluctuating network environments results can differ. This work develops a SQL load benchmarking process by employing the Design Science methodology. The proposed experimental process measures varied operating systems under normal business load for a popular distributed big data system. Using a modified version of the IBM supported TPC-DS workflow, the author tests SQL completion times on three separate Apache Spark distributed clusters running Ubuntu Server, Clear Linux, and CentOS Server. Results indicate load in real-life big data environments have a significant effect on SQL completion times.

Industry Talks 2020 (in partnership with DREAM)
Virtual industry talks on data science, computing, and AI featuring speakers from Optum, Boston Scientific, Google, Wells Fargo, and Teradata.

CADSCOM 2019
Colloquium on Analytics, Data Science and Computing (CADSCOM 2019)

CIACAM 2019 (May 23, 2019)
Colloquium on Information Assurance, Cybersecurity, and Management (CIACAM 2019)

CADSCOM 2018 (October 26-27, 2018)
Colloquium on Analytics, Data Science and Computing (CADSCOM 2018)

Metropolitan State University SAS Day (October 12, 2018)
What exactly is data mining and how can it help your organization more confidently predict the future? This presentation will introduce you to the essential aspects of data mining and give you a guided tour of SAS® Enterprise Miner™, the powerful data mining workbench from SAS.