Since the pandemic has limited physical contact and in-person shopping, many businesses have been forced to digitally transform themselves to adapt to an increase in ecommerce. In response, fraud has also increased significantly (LexisNexis report on the fraud trend). According to a report published by Aite Group, losses from identity theft increased by 42% from 2019 to 2020. In the US, fraud attempts increased by 25% during the first 4 months of 2021 when compared with the last 4 months of 2020 (report by TransUnion). In the same report published by TransUnion, they recorded a dramatic shift in fraud targets. In particular, financial services witnessed the largest increase of fraud at about 150%, followed by travel & leisure with a 25.03% increase. The most frequent types of fraud within these two industries are identity theft and credit card fraud. Conversely, there was a substantial decrease in fraud in Logistics and Insurance (-32.74% and -16.35%, respectively).
Fraud is more sophisticated with ML
Fraud is becoming more sophisticated and difficult to detect, especially in the digital era where most transactions (retail, finance, banking, etc.) are made online. Using new technology, fraudsters can now write tools to automatically find matching credentials and exploit them and create automatic agents to scam victims. In the UK, it has been reported that fraudsters have automatic bots contact victims. These bots generally pretend to represent an authority and ask victims to provide their personal information. The victims are also threatened with a court order or warrant if they choose not to cooperate . In more advanced fraud attempts, fraudsters use Deep Learning (Blackbox-like approach using neural network techniques) to generate fake identities. Recently, a loss of €220,000 was reported by an energy firm where fraudsters utilized AI to mimic the CEO’s voice to order a transfer of funds to a Hungarian supplier.
Why machine learning
To combat increasingly sophisticated fraud methods, organizations must quickly adopt the latest technologies, especially AI. Traditional fraud detection with various conditional rules and human observation tends to be slow and may not be powerful enough to detect modern fraud methods that are increasing in volume. Machine Learning can aid in detecting fraud, enabling fraud investigators to identify fraudulent activities much faster and more efficiently.
Over the past few years, many banks and financial institutions have started adopting AI as an integral part of their fraud detection capability. Traditional binary classification algorithms using ensemble learning (bagging, boosting) or even Logistic Regression have been widely used in building predictive models that can learn from a large and complex dataset of transaction history. Similarly, some organizations found great success in adopting unsupervised learning to find anomalies which could potentially be flagged as fraud. In more advanced cases, Deep Learning has also been causing a buzz in the fraud detection landscape thanks to its strong predictive power.
Machine Learning models (either supervised or unsupervised) could perform exceptionally well on the dataset that is used to build them. However, as fraud becomes more sophisticated and changes quickly over time, Machine Learning models, when serving, tend to not perform as well as they do in the lab, leading to a high number of false reports. Furthermore, the process of grooming a model requires much time, effort, and experimentation of data scientists; thus, the model starts to decay as soon as it is in production.
On the other hand, banking and financial institutions would require transparency when it comes to decision-making. In particular, when an application is rejected, they would need to have valid reasons to explain to the applicant why it was rejected. This, however, would become very challenging when Machine Learning models become more complex (i.e., using too many features or deep learning).
To sort out the model decaying issue, banks and financial organizations must shorten their processes of building and productionizing their Machine Learning models. Building infrastructure with CI/CD and MLOps becomes essential to reducing the cycle of new model training and serving in extent. Models, thus, should be continuously trained in an automated data pipeline so that they will always be relevant to new data.
Banks and financial organizations must also closely monitor the performance of their in-production models over time to understand if their performance has been affected. Alongside model monitoring, data drift monitoring (including both feature and concept drift) also becomes crucial to instantly observing any changes to the incoming data over time, which generally are the cause of model decaying.
Building a data pipeline for model serving is equally important. New models should be registered and their performance while serving should be well-logged. Additionally, the model interpretation pipeline (with SHAP and LIME) is needed to quickly provide reasons to applicants in case the decision is not in their favor. Recently, Google has published an architecture for model building and serving with MLOps, CI/CD, and more.
While Machine Learning is effective in detecting fraud, it can indirectly benefit the process of fraud handling. For instance, some applications use NLP to suggest the text to be written by the case handlers, speeding up their writing process overall. Moreover, Machine Learning can be employed to suggest similar cases, thus assigning them all to the same case handler, making the review process more efficient. In the context of fraud detection, reinforcement learning can be utilized to sort out the issue of imbalanced data by learning to generate fraud-alike data.
Machine Learning has been an important part of the fraud detection landscape. However, with fraud becoming more sophisticated as fraudsters quickly adopt new technologies, including Machine Learning, it is critical that banking and financial organisations act promptly to build scalable and powerful Machine Learning pipelines to efficiently and effectively identify frauds. Consumers also need to protect themselves in the digital era. Setting up 2-factor authentication is a must for any website that wishes to store personal information, especially bank account/card details. In addition, it is important to always bear in mind not to share information with anyone via phone call unless they can be verified.