<span id="version" style="color: grey; float: right">Version 1.1.1</span>

# Context
 
Anti Money Laundering (AML) processes usually assemble a few sequential components including [KYC](https://en.wikipedia.org/wiki/Know_your_customer), alert generation, investigation and case management. One of the important pain points in this process comes from the significant share of false positive outputted by the alert generation. By regulation, all alerts must be processed and modifying rule-generated alerts can be a complex task. Therefore, finding a way to prioritize alerts allows compliance officers to greatly increase their efficiency, which is the purpose of this project.
 
# Alert Generation
 
One of the main parts of an AML setup is the transaction monitoring that triggers alerts. Customer information is harvested and organised using KYC and Entity Resolution, this first step especially enables to create segments of customers with similar behaviours over which the same rules can apply. Compliance officers then use their expertise to create the significant features that will help discriminate legitimate transactions from illegal ones and set the thresholds that will trigger the alerts. An additional layer of intelligence can also be added if some machine learning rules are also set up using unsupervised learning techniques that allow the anomaly detection. This process runs in regular batches and produces alerts that investigators will have to either escalate, meaning to report to regulation authorities, or to close.
 
 # Alert Prioritization
 
Once alerts are triggered, compliance teams have the legal obligation to process them all. Unfortunately, most of the alerts (around 95%) are false alerts so a disproportionate amount of time is spent closing irrelevant alerts. These investigated alerts can therefore be labelled over time, with the knowledge of the investigator's conclusion. The labelled alerts enable to build a two-class classification prediction model to discriminate between real and false alerts. This model also provides a probability, which is compared with a threshold to generate the two-class classification. The probability of an alert being escalated can be interpreted as a priority score, whereby those with values closest to one should be investigated first, and those close to zero could be left in the backlog.
 
 # Model Explainability
  
AML in financial institutions is a regulated domain, where models are required to be extensively explained, understood and documented. Model explainability for AML is described in this [document](https://www.researchgate.net/publication/352145844_Deep_Learning_and_Explainable_Artificial_Intelligence_Techniques_Applied_for_Detecting_Money_Laundering-A_Critical_Review). It doesn't mean that models must necessarily be white box: more complex models can be considered but they must be analysed to understand precisely how they work. Therefore, performance is not the only indicator that is controlled. Models are described through global explanations for the big picture, then further investigated with local explanations. The model drift is also monitored over time to make sure its performance remains acceptable.
 