Fraud detection analysis

Fraud is a billion-dollar business and it is increasing every year. Data analysis methods are used to detect and prevent fraud. This requires detailed domain knowledge of financial, economic, business practices, and law to do these kinds of analysis.

Techniques used for fraud activities include:

  • Data processing and validation techniques to detect, validate, correct, and missing value handling of incorrect data
  • Statistical modelling techniques to compute aggregates such as mean, standard deviation etc. by various attributes such as time, geography and others.
  • Computing user profile base value and statistical modelling of base case scenarios.
  • Time series analysis of time series data.
  • Clustering and classification techniques to find patterns and associated groups of data.
  • Matching algorithms to detect anomalies in transaction behaviors as compared with previous base case models and profiles.

Fraud detection requires various machine learning and data mining techniques such as:

  • Supervised learning (e.g. neural networks, support vector machines, decision trees etc.) to model fraudulent activities.
  • Unsupervised learning techniques such as clustering techniques could be used to model fraud such as
    • Peer group analysis
    • Break point analysis
    • Three level profiling
    • Multivariate normal distribution modelling to model base cases where the fraud data is very sparse.

Various types of analysis we employ to detect fraud data are:


Duplicate transaction analysis

  • Exact duplicates check: All fields are identical within a date range.
  • Fuzzy duplicates check: Some fields are identical, with at least one or more fields that are similar or different.

Data quality analysis

  • Key data elements are missing or invalid
  • Range of the attribute falls outside of normal values.
  • Sequence gaps found in key fields, such as the check or payment number.

Transaction limits

  • Single and multiple accumulated values exceed limits.
  • Transaction amounts exceed, or are just below, the authorization limit.

Character pattern matching analysis

  • Prohibited key words
  • Prohibited vendors/employees
    • Percent of names matched against a list of restricted names:
  • Phonetic string match
  • Fuzzy address match
    • A portion of address values are matched against the list of restricted addresses.

Segregation of duties (SoD) analysis

  • Performed at the security table level to identify potential conflicts
  • Performed at the transaction level to identify violations that occurred

Aging analysis

  • Single record age (number of days between Create Date and Approval Date)
  • Multiple files aging (Invoice Create Date prior to PO Create Date)

Numeric pattern matching analysis

  • Benford analysis
    • Transaction amounts fail to follow expected digital frequencies.
    • Numeric sequence or gaps
      • Sequences of check numbers
      • Frequent transactions have even dollar amounts.

Date/time matching analysis

  • Transaction dates occur on a weekend or holiday.
  • Transactions occur at odd hours.

Variance tests and analysis

  • Comparison of the number of and amount of variances to a yearly average:
    • Is there a product price variance spike?
    • Is there an excessive spike in vendor invoice counts?