Fraud detection analysis

Fraud is a billion-dollar business and it is increasing every year. Data analysis methods are used to detect and prevent fraud. This requires detailed domain knowledge of financial, economic, business practices, and law to do these kinds of analysis.

Techniques used for fraud activities include:

Data processing and validation techniques to detect, validate, correct, and missing value handling of incorrect data
Statistical modelling techniques to compute aggregates such as mean, standard deviation etc. by various attributes such as time, geography and others.
Computing user profile base value and statistical modelling of base case scenarios.
Time series analysis of time series data.
Clustering and classification techniques to find patterns and associated groups of data.
Matching algorithms to detect anomalies in transaction behaviors as compared with previous base case models and profiles.

Fraud detection requires various machine learning and data mining techniques such as

Supervised learning (e.g. neural networks, support vector machines, decision trees etc.) to model fraudulent activities.

Unsupervised learning techniques such as clustering techniques could be used to model fraud such as

Peer group analysis
Break point analysis
Three level profiling
Multivariate normal distribution modelling to model base cases where the fraud data is very sparse.

Various types of analysis we employ to detect fraud data are

Duplicate transaction analysis

Exact duplicates check: All fields are identical within a date range.
Fuzzy duplicates check: Some fields are identical, with at least one or more fields that are similar or different.

Data quality analysis

Key data elements are missing or invalid
Range of the attribute falls outside of normal values.
Sequence gaps found in key fields, such as the check or payment number.

Transaction limits

Single and multiple accumulated values exceed limits.
Transaction amounts exceed, or are just below, the authorization limit.

Character pattern matching analysis

Prohibited key words
Prohibited vendors/employees
- Percent of names matched against a list of restricted names:
Phonetic string match
Fuzzy address match
- A portion of address values are matched against the list of restricted addresses.

Numeric pattern matching analysis

Benford analysis
- Transaction amounts fail to follow expected digital frequencies.
- Numeric sequence or gaps
  - Sequences of check numbers
  - Frequent transactions have even dollar amounts.

Aging analysis

Single record age (number of days between Create Date and Approval Date)
Multiple files aging (Invoice Create Date prior to PO Create Date)

Variance tests and analysis

Comparison of the number of and amount of variances to a yearly average:
- Is there a product price variance spike?
- Is there an excessive spike in vendor invoice counts?

Date/time matching analysis

Transaction dates occur on a weekend or holiday.
Transactions occur at odd hours.

Segregation of duties (SoD) analysis

Performed at the security table level to identify potential conflicts
Performed at the transaction level to identify violations that occurred

Fraud detection analysis