Applied Machine Learning & Big Data Strategy Assignment

Table of Contents

Introduction: Applied Machine Learning And Big Data Strategy Assignment Sample
Chapter 1:
1.1 Context of research
1.2 Aim
1.3 Objectives
1.4 Research question
1.5 Context of research
1.6 Dissertation structure
1.7 Research gap(s) based on literature review
1.8 Discussion of research questions
Chapter 2: Literature review
2.1 Conceptual framework
2.2 literature review in the Retailing Sector
2.3 Machine learning technique
2.4 Identify a research gap(s)
2.6 Establish the hypotheses
2.7 Summary
Chapter 3: Methodology
3.1 Background Information of Dataset
3.2 Discussing and justifying analytical techniques
Chapter 4: Findings
4.1 Key descriptive statistic with interpretation
4.2 Data visualizations with interpretation
4.3 Statistical analysis
Chapter 5: Discussion
5.1 Compare the findings against the literature
5.2 Implications based on findings
5.3 Limitations and recommendation of your research

22 Pages 5522 Words

Introduction: Applied Machine Learning And Big Data Strategy Assignment Sample

Chapter 1:

1.1 Context of research

In the 21^st century people are dependent on digital machine which is a big threat for people as security is a big discussion here. People are too dependent on the digital system. Abnormal activity is the big challenge here. People are dependent on online transactions and credit cards play the big role here so the authenticities of credit cards have become a very important issue in today's world. Many companies take action against this type of fraud where companies involve with these types of transaction and making advance approaches to tackle fraud. Sometimes the credit limit can't stop many people from making purchases, so credit is at its limit after so many people making purchases even with zero account balance so these features can be easily misused by many people. To tackle this type of unethical purchase big organizations need a system to tackle this type of transaction and abort those transactions. Here a system needs to be made like that can track all the transaction that can be made. The techniques by which credit card fraud can be detected are data mining, group the data to find billions of transaction. That system can track those unethical transactions and classify those transactions. The main requirement is to collect the past data and a proper algorithm that can fit those data to make a better form. In this article many types of fraud are discussed and it is also mentioned how to take action against this type of fraud.

Get free written samples from subject experts and help with assignment in UK.

1.2 Aim

The aim of the research is how machine learning technique and big data strategy has been used to detect credit card fraud.

1.3 Objectives

To implement machine learning techniques for detecting credit card fraud.
To evaluate the process of using selected machine learning techniques.
To find patterns and dependencies of credit card fraud detection.
To identify and predict upcoming fraud cases.

1.4 Research question

How can a machine detect fraud in a credit card?
What is the evaluating process of selected machine learning techniques in credit card fraud detection?
How do find the patterns and dependencies of credit card fraud detection?
How do identify and predict the upcoming fraud detection?

1.5 Context of research

Credit card fraud detectors can identify charges that people didn't make or authorize for those charges. Fraud can be done in many ways in many organizations. Major fraud detection can combine a variety of fraud detection so a dataset needs to form an overview that is connected for both types of payments which are valid and non valid. The algorithm started with a root note. To split the tree and make it into regression the decision note is used. For the final decision the lead node has been used. A comprehensive library is imported that is matplot library to create, interactive visualization, and create static in python this library is used on cross platform that can be used for visualize data and plot graphical units numpy is a numeric extension of matplot library. Python has a package called seaborn that is used to make statistical graphics in python. After matplotlib seaborn is used that has pandas integration. To understand the data and explore the data seaborn is used. For graphical plotting plot graphs are used. Here in the dataset it is not possible to make perfect percentage rate as the whole dataset is collected from Kaggale, for that the further research with this dataset cannot possible regarding the missing data value. As for the dataset the possible outcome percentage is less because for authentic reason the proper banking details are not shared in the web for that a gimmick dataset have been used regarding the job. The decision contains geolocation identification of the device history of the transaction. Credit card fraud detection is a type of data investigation of data science that can be achieved through gathering all types of data like data, product categories, and client’s behavior patterns.(Carcillo et al. 2021) These patterns of information need to be checked and run through a training model that can maintain the pattern and rules to classify whether the transaction is legitimate or insecure.( Dornadula et al. 2019) The module used for fraud detection must be simple and fast enough to classify the anomalies. As a merchant, no one can deny the importance of credit card fraud detection. Credit card companies develop sophisticated tools to detect fraud. Scientists monitor every translation, then the user of the credit card can use some complicated method of a computer algorithm to search for any unusual transaction. People use many types of different methods for transactions and many types of different terminals like debit cards and credit card payments. People can detect credit fraud by checking the charges that the user did not make or by authorizing if an unusual charge happens with your credit card a mail of the statement comes through and also a bill for the statement that did not open.

1.6 Dissertation structure

Dissertation Structure

(Source: Created by self)

1.7 Research gap(s) based on literature review

In previous work, a review has been given on how data mining takes place in credit card fraud detection and looks for the important challenges in the given area.( Dornadula et al. 2019) There is a gap considered as a lack of data in research papers or training sets. This problem is maintained by scientists, not by the organization. There is a vast amount of data for this also cited as a problem but there are also many papers regarding the topic in many books (). It is also taken in mind that there are opposite problems regarding this issue. Taking that bulk of voluminous data and making a survey regarding that issue in the literature is inflexible, scattered and from those making out the best method is troublesome. Feature engineering is an important topic regarding credit card fraud detection banks and firms that process credit cards build rich features on the user of the credit card to build their profile which is enriched with the previous transaction values of the credit card holder and that draws the card profile. There is an exception that the main payment card is not related to a user.( Makki et al. 2019) These prepaid cards rarely get recharged. For that, the span of cards is shorter, relatively from several months to several years, so there is limited information and set of features regarding that issue than to create a card model. This type of system can predict card fraud and that can be used in simulation as the author said. There is another problem regarding scalability so it needs a robust and scalable system for sustaining large scalable translation. As in research, the fraud ratio is in minus scale and imbalance class size is concluded in that area for detecting fraud imbalance so facial detection, medical diagnosis, earthquake, mail tagging, and many more features and insurance business customers also. With traditional methods, these problems cannot be solved so special care must be important to address these types of measures and methods.( Fiore et al. 2019) Most algorithms are not designed to address these types of misbalance so this issue should be concluded in the algorithm.( Maniraj et al. 2019) Algorithms need to be designed to classify these types of situations and it is possible to cope with such a problem in algorithms because algorithms adjust them self with these types of minor problems but that does not happen with data level because of a preprocessing step must be added to balance the dataset.( Randhawa et al. 2018) There are many types of preprocessing techniques that can be applied to overcome these minor problems at the data level. It also includes the dominant oversampling and under sampling with a combination of both sampling for both ensemble learning techniques and also cost-sensitive learning techniques.

1.8 Discussion of research questions

Many types of research have been conducted to identify these types of fraud in credit card financial transactions.(Varmedja et al. 2019) Traditional methods that are being used in banks for preventing fraud in credit cards work with manpower. It is described in the rules as the standard and potential checking must be done in suspicious transactions. These rules are flexible to give proper solutions, also not worth maintaining and providing the manpower. Also, it is challenging to maintain and implement because it requires every single rule of fraud that is an anomaly. If a master of detention can't identify anomalies or suspicious transactions fraud will happen and nobody can take any measures to prevent those anomalies. As time passes the credit card develops a new way or pattern so some rules can be obsolete by the time it passes.( Xuan et al. 2018) For this, it needs to identify from time to time the usefulness of some rules and it needs to decide whether the rules can be kept or need to be dumped. With the new fraud pattern, new rules are coming in handy and it becomes tough to maintain and monitor every new rule with new rules, the fraud detection anomaly accuracy decreases. To solve these kinds of problems with rules-based banks, machine learning needs to be deployed to improve the situation and should be divided into two parts: supervised and unsupervised.

Chapter 2: Literature review

Some of the scientists surveyed many types of credit card fraud detection. With the major areas of detection like corporate fraud, bank fraud, and insurance fraud and considering these fraudulence scientists made two types of transactions that are virtually( without a card) and with a card ( physically). Scientists made some techniques such as regression, logistic regression, classification, neural network, k-nearest neighbor, and so on. scientists already explained the type of data mining techniques with the theoretical background such as clustering classification, outlier detection visualization, and regression. Scientists have also explained some traditional techniques which are based on statistical and computation of artificial intelligence.

2.1 Conceptual framework

Conceptual Framework

(Source: Created by self)

2.2 literature review in the Retailing Sector

Credit card fraud came into existence in the US. It increases day by day in the year 2011 to 2018. There is a fraud loss in increments after getting done by all of the data. There is still a loophole to force into the system. Even today the fraud investigator detector cannot detect all of the fraud even cannot monitor minor theft in transactions to verify every detail regarding that transaction. As the UK is the world's largest economy, there will be a 314$ billion commerce market in 2021.(Yousefi et al. 2019) In the UK most people using on; line transactions that many transactions can increase the attack of fraudsters.

2.3 Machine learning technique

A survey has been made on credit card fraudulence in consideration to maintain the security in some areas of credit card fraud detection; those are bank fraud, insurance fraud, and corporate fraud. There are two types of transactions: card less and via card. Scientists made some techniques such as regression, logistic regression, classification, neural network, k-nearest neighbor, and so on. Scientists already explained the type of data mining techniques with the theoretical background such as clustering classification, outlier detection visualization, and regression. They discussed six types of data mining approaches for aiming at theoretical background regression detection visualization classification clustering prediction and outlier detection. The paper-based is on existing techniques and improved by implementing an artificial immune system, neural network, support vector machine, regression and visualization, self-organizing map, tree, hybrid method with these provided in the journal it has been proved that acquiring high accuracy for detecting threats. Many organizations are finding new ways to increase profit rates with minimal cost. Machine learning is a good method for credit card fraud detection. These rules are flexible to give proper solutions, also not worth maintaining and providing the manpower. Also, it is challenging to maintain and implement because it requires every single rule of fraud that is an anomaly. If a master of detention can't identify anomalies or suspicious transactions fraud will happen and nobody can take any measures to prevent those anomalies. As time passes the credit card develops a new way or pattern so some rules can be obsolete by the time it passes. With this method, a training loss occurred of 0.024 and 0.027 loss and 25 % fraud data with only 75% of model is with context added

2.4 Identify a research gap(s)

There is a gap considered as a lack of data in research papers or training sets. This problem is maintained by scientists, not by organizations. There is a vast amount of data for this also cited as a problem but there are also many papers regarding the topic in many books. It is also taken in mind that there are opposite problems regarding this issue. Taking that bulk of voluminous data and making a survey regarding that issue in the literature is scattered and from those making out the best method is troublesome. Feature engineering is an important topic regarding credit card fraud detection banks and firms that process credit cards build rich features on the user of the credit card to build their profile which is enriched with the previous transaction values of the credit card holder and that draws the card profile. There is an exception that the main payment card is not related to a user. These prepaid cards rarely get recharged. For that, the span of cards is shorter, relatively from several months to several years, so there is limited information and set of features regarding that issue than to create a card model. This type of system can predict card fraud and that can be used in simulation as the author said. Mohammud Zamini stated a method of fraud detection about a method of encoder clustering. These rules are flexible to give proper solutions, also not worth maintaining and providing the manpower. Also, it is challenging to maintain and implement because it requires every single rule of fraud that is an anomaly. If a master of detention can't identify anomalies or suspicious transactions fraud will happen and nobody can take any measures to prevent those anomalies. As time passes the credit card develops a new way or pattern so some rules can be obsolete by the time it passes. For this, it needs to identify from time to time the usefulness of some rules and it needs to decide whether the rules can be kept or need to be dumped. Mohammud Zamini stated a method of fraud detection about a method of encoder clustering. The autoencoder is an assotiatournal network that is used to lower the dimensionality useful features are extracted and lower the dimensional and also increase the usefulness of the neural feature with 284807 datasets in which .17% is fraud with also trained with auto encoder clustering induced following parameter= 300 iterations 2 clusters, k-means++, 0.01 tolerance, 0.1 learning rate model, 200 no of epochs relu activation.

2.6 Establish the hypotheses

The card payment user in the UK is almost 56% and the fraud occurring on websites of retailers is over 400 million dollars. That is also increased by 25% as every country is facing fraudulence in credit cards. To fight against these types of fraud a machine learning approach needs to be utilized in this section. Sahil Dhankhar used a supervised algorithm of machine learning in data set of the real world with that algorithm scientists implement a super classifier by using ensemble random forest, recall precision, classifier stacking, Naive Bayes KNN, XGB classifier, Decision Tree these machine learning techniques are used but logistic regression is better than other machine learning for as a fraud prediction.

2.7 Summary

A GA-based selection of research has been imposed RF, DT, NB, and LR is proposed and it is further applied to European holders of cards with these 5 optimal features generated. With GA-RF arrays overall 99.28% accuracy can be achieved and 99.92% was achieved by GA-DT. And the results from those arrays are far better than the previously done traditional array. At first implement these programs in a synthetic credit card data set to identify the whole result on the European credit card dataset. With the improved details, the experiment outcomes show an outstanding performance with 100 % fraud detection and achieve to obtain AUC-1 and with also 100% accuracy GA-ANN placed as 2nd in AUC of 0.94. In the upcoming future, it needs more data to improvise recent frameworks.

Chapter 3: Methodology

3.1 Background Information of Dataset

In this dataset the credit card data contains data from may 2018 by American cardholders. The whole dataset took a range of 10 days of credit card transactions, where 6636 frauds out of 30000 data transactions. The whole dataset is highly frenzy where the positive (fraud) is collected almost 22% of total transaction. Here it consists of only numerical transactions where the only PCA transformation results are shown. From V1 to V28 are the main components that can be acquired from PCA. Only the time and amount feature cannot obtain form PCA. Time contains the second elapse between every interval. The main transaction amount is feature amount and this is related to example also cost compassionate. The class feature is only responsible for variables and it knows only the Boolean values that are 0 and 1 where 1 means fraud cases and 0 otherwise. In machine learning the precision recall curve is used to measure the accuracy. A research has been conducted by the world line of a machine learning group from ULB on the dataset that is collected then analyzed. In this dataset pip chart studio has been installed because this is free and it is an open source library for graphical design for python to implement plot wise charts like scatter plots, plots, area charts, bar charts and many more. Catboost has been installed to automate and simplify the task that is performed consistently in the time specified from this process. Employees' workload decreases and focus increases on other important jobs and it can help those customers who are waiting in a long line to get a response. Panda, numpy, matplotlib and many installed to make it more efficient.

3.2 Discussing and justifying analytical techniques

The main reason to choose this type of dataset is that the dataset is imbalanced for the future that makes this dataset a pretty valid reason to use. Today it is possible for many banks to import a high advance security measure so it becomes so much harder for a hacker to get into their system. But if a hacker finds any vulnerability he takes the most chance to get into the system. That's why in the dataset most of the transactions are normal, only few of them are fraudulent. In this dataset of European cardholder a real bank dataset in the year of 2013. For security purposes actual data cannot be shared but that's why it is a PCA version which is a transform version else RFE or RFECV is recommended so better and VIF score checked to justify best feature for any model.. For this dataset a necessary package needs to be installed at the beginning. Implementing many types of packages to simplify the whole program and make this program run successfully. An open source python package is pandas that are used for data science and machine learning tasks. Another package is also installed that is numpy to support the array which is multi dimensional. A comprehensive library is imported that is matplot library to create, interactive visualization, and create static in python this library is used on cross platform that can be used for visualize data and plot graphical units numpy is a numeric extension of matplot library. Python has a package called seaborn that is used to make statistical graphics in python. After matplotlib seaborn is used that has pandas integration. To understand the data and explore the data seaborn is used. For graphical plotting plot graphs are used. Plotly figure factory use where it is very difficult to create graph objects there plotly figure helps to create a specific plot. For making decision trees LightGBM is used because it is fast, high performance and distributed, basically used for classification, ranking, and machine learning tasks. Fort creating and removing directory and also to fetch its content os module have been used. After installation these packages the dataset have been imported in python using pd.read it only supports a csv file.

Chapter 4: Findings

4.1 Key descriptive statistic with interpretation

Package install

(Source: Created by self)

Pandas, numpy, matplotlib, matplotlib.pyplot, seaborn, matplotlib, plotty.graphs, plotly.figure have been imported. For data science and machine learning tasks pandas have been imported where pandas help to analyse the dataset panda used as pd. Numpy is installed to support arrays. Numpy is a multi-dimensional array that helps to add powerful data structures to improve calculation speed and efficiency. Calculation with this array helps to solve critical problems with enormous data and easily solve high level mathematical problems. Matplot library is used to create interactive visualization and static in python. It is used for data visualization. It helps to make a 2d plot in cross platform from data in an array. For objects that are orientated to API with python GUI toolkit. library is imported, that is matplot library to create interactive visualization, and create static in python this library is used on cross platform that can be used for visualize data and plot graphical units numpy is a numeric extension of matplot library. Python has a package called seaborn that is used to make statistical graphics in python. After matplotlib seaborn is used that has pandas integration. To understand the data and explore the data seaborn is used. It is most comfortable to handle a panda in any data frame graphical unit. It creates a beautiful structure with the help of a basic information set. Data frames and arrays are normally hard to use but it works efficiently with arrays and data frames. Figures in python are transformed into objects and axes. For plotting it also contains various types of API. Plotly figure factory used where it very difficult to create a graph objects there plotly figure helps to create a specific plot. Plotly used for making a list of charts and tools also it contains some tools that can make a dashboard. For making unique types of charts that are not included in plotly. Plotly has many types of wrapper functions in a figure factory module. For making decision trees LightGBM is used because it is fast, high performance and distributed, basically used for classification, ranking, and machine learning tasks. Fort creating and removing directory and also to fetch its content os module have been used. LightBGM is light gradient boosting method. For classification and ranking many types of python packages are used. LightBGM is best among them that work on decision tree algorithms because it is distributed and fast with high performing gradients. In trees algorithm LightBGM works on leaf wise weather other algorithms used as depth wise and level wise rather than leaf wise. So when it works on boosting algorithms it works leaf wise it is best on boosting algorithms other than depth wise as it gives better and efficient results. After installation these packages the dataset have been imported in python using pd.read it only supports a csv file. Gradient boosting on decision trees is difficult without the CatBoost algorithm. It is work to predict machine learning techniques like search recommendation systems, self-driving cars, and weather forecasting. Catboost is used rather than XGboost on decision trees because it is 3.5 times faster.

4.2 Data visualizations with interpretation

Installing other packages

(Source: created by self)

Scikit-learn are the most used library in python. With a lot of tools available in the sklearn library for statistical modelling and machine learning like dimensionality reduction, classification, clustering, regression and many more, Sklearn model selection is used to train and test data. Train and test module is used to split the data to monitor the capability of the machine. To fit the dataset model train is used. To make a few prediction test data is used. To prevent over fitting of data models in python, sklearn data test and train is used. Kfold model data is used to make an estimated value of performance on data which is not seen by the user. It can perform replacement without resembling the data. In this dataset it can be seen that the data is categorical and random to split this type of data random forest classifier is used. For boosting machine learning performance an Ad boost classifier is used.

Column formatting

(Source: created by self)

As in the program the variable changes from 0 to 25,690.26. Standardization is used to scale the unit and also remove the mean value. For that almost 69% of the value is in between (1,-1). As the above picture shows, the whole column is formatting and the column is determined as 100. Importing os module to python to provide and create a directory (folder), change current directory, fetch all its content. Then the dataset is imported using data = pd.read_csv (“D:\creditcard.csv”).

Output of Data head

(Source: created by self)

In the above image the data head is showing the data from upwards position. The data head is seen by using data.head ().

Data information

(Source: Created by self)

In the above image the information of data is shown by using data.info (). All the id null value count and data type can be seen by the above output image

Describe the data

(Source: Created by self)

To describe the data here data.describe() command is used. Here the output dataset is to calculate some of the statistical data like mean, STD, and percentile of these numerical dataset in the data frame.

Return missing value

(Source: Created by self)

In the above image data.isnull ().sum() is used to return the missing value in the dataset. The simplest way to do this with the data that contain missing values is jump rows that contain some missing value.

Bar plot

(Source: Created by self)

From the above image it can be described that the pandas data frame is a double dimensional data set whose size is mutable. It is a heterogeneous and tabular data structure which is labeled as rows and columns. Data frame is mostly 2 dimensional structural data.

Result of fraud and not fraud

(Source: Created by self)

In the above image it is shown that the result output of fraud and non fraud bar graph of the imported data. This is the final output of the program and it is shown that the program runs perfectly where it can be seen that 23364 is not fraud and 6636 fraud in the given data.

4.3 Statistical analysis

In this model correlation of model and regression has been done with the help of machine learning. Finding null value

As in the above image it can be found that in this data there is no null value. data.isna().sum() command has been used to find null value in the given dataset.

Define age and sex value

With the reference of the above image it can be justified that there is no null value that's why its output is false. To find null value possibility data.isna().any.any() is used. Age and sex value is defined with the reference to column axis.

Liner Regression

From the above image it can be described that with the value of x and y linear regression has been done. For linear regression the common reg= linear_model.LinearRegression() has been done. For rechecking the linear regression again the common LinearRegression has been written.

Coefficient of each target

In the above image coefficient of each target has been done by using the common reg.coef_, to find the coefficient of each target array.

Prediction of Array

In the above image it can describe the prediction of an array has been done by using the command reg.predict(AGEtest).

Chapter 5: Discussion

5.1 Compare the findings against the literature

The whole project is done in 2 parts. The classification progress has been done in the first part where f is conducted as F = {t1,t2………tn}. From both ANN and RF algorithms it is possible to make the highest test accuracy from this project. So for this dataset it is possible to make 22% of fraud cases while in the journal dataset it can be seen with only 0.017% of fraudulent. The main transaction amount is feature amount and this is related to example also cost compassionate. The class feature is only responsible for variables and it knows only the Boolean values that are 0 and 1 where 1 means fraud cases and 0 otherwise. In machine learning the precision recall curve is used to measure the accuracy. A research has been conducted by the world line of a machine learning group from ULB on the dataset that is collected then analyzed.

5.2 Implications based on findings

The whole dataset is performed in RF method and make a accuracy of 97%. Therefore the RF method is the best in terms of decision trees. In the whole program it can conclude that the whole process has been done after getting the array percentage. With the dataset of European banks, the result of the 22% fraud rate has been determined. The whole dataset is imported in jupyter notebook and using command, importing packages to make it possible to read the data and make the array percentage visible in the output. After the decision tree regression of machine learning based approach is used to solve the whole data program and design. Today it is possible for many banks to import a high advance security measure so it becomes so much harder for a hacker to get into their system. But if a hacker finds any vulnerability he takes the most chance to get into the system. That's why in the dataset most of the transactions are normal, only few of them are fraudulent. In this dataset of European cardholder a real bank dataset in the year of 2013. For security purposes actual data cannot be shared but that's why it is a PCA version which is a transform version else RFE or RFECV is recommended so well and VIF score checked to justify best feature for any model.

5.3 Limitations and recommendation of your research

The algorithm started with a root note. To split the tree and make it into regression the decision note is used. For the final decision the lead node has been used. A comprehensive library is imported that is matplot library to create, interactive visualization, and create static in python this library is used on cross platform that can be used for visualize data and plot graphical units numpy is a numeric extension of matplot library. Python has a package called seaborn that is used to make statistical graphics in python. After matplotlib seaborn is used that has pandas integration. To understand the data and explore the data seaborn is used. For graphical plotting plot graphs are used. Here in the dataset it is not possible to make perfect percentage rate as the whole dataset is collected from Kaggale, for that the further research with this dataset cannot possible regarding the missing data value. As for the dataset the possible outcome percentage is less because for authentic reason the proper banking details are not shared in the web for that a gimmick dataset have been used regarding the job.

To improve this program the program the proper dataset and proper machine learning is require to increase the efficiency of the given data, therefore with the knowledge and values the every industry and banking related sector need to improve their security by importing machine learning in credit card fraudulent in that way there was a probability to prevent card fraudulent.

References

Carcillo, F., Le Borgne, Y.A., Caelen, O., Kessaci, Y., Oblé, F. and Bontempi, G., 2021. Combining unsupervised and supervised learning in credit card fraud detection. Information sciences, 557, pp.317-331.
Dornadula, V.N. and Geetha, S., 2019. Credit card fraud detection using machine learning algorithms. Procedia computer science, 165, pp.631-641.
Fiore, U., De Santis, A., Perla, F., Zanetti, P. and Palmieri, F., 2019. Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Information Sciences, 479, pp.448-455.
Makki, S., Assaghir, Z., Taher, Y., Haque, R., Hacid, M.S. and Zeineddine, H., 2019. An experimental study with imbalanced classification approaches for credit card fraud detection. IEEE Access, 7, pp.93010-93022.
Maniraj, S.P., Saini, A., Ahmed, S. and Sarkar, S., 2019. Credit card fraud detection using machine learning and data science. International Journal of Engineering Research and, 8(09).
Randhawa, K., Loo, C.K., Seera, M., Lim, C.P. and Nandi, A.K., 2018. Credit card fraud detection using AdaBoost and majority voting. IEEE access, 6, pp.14277-14284.
Taha, A.A. and Malebary, S.J., 2020. An intelligent approach to credit card fraud detection using an optimized light gradient boosting machine. IEEE Access, 8, pp.25579-25587.
Varmedja, D., Karanovic, M., Sladojevic, S., Arsenovic, M. and Anderla, A., 2019, March. Credit card fraud detection-machine learning methods. In 2019 18th International Symposium INFOTEH-JAHORINA (INFOTEH) (pp. 1-5). IEEE.
Xuan, S., Liu, G., Li, Z., Zheng, L., Wang, S. and Jiang, C., 2018, March. Random forest for credit card fraud detection. In 2018 IEEE 15th international conference on networking, sensing and control (ICNSC) (pp. 1-6). IEEE.
Yousefi, N., Alaghband, M. and Garibay, I., 2019. A comprehensive survey on machine learning techniques and user authentication approaches for credit card fraud detection. arXiv preprint arXiv:1912.02629.