ATM Fraud detection with Kafka and KSQL

A presentation at IT Tage in December 2018 in Frankfurt, Germany by Robin Moffatt

Slide 1

Slide 1

with Apac he Kafka and KSQL @rmoff Photo by Freddie Collins on Unsplash M T A D d u a r F n o i t c e et

Slide 2

Slide 2

hoto by Mirza Babic on Unsplash Spotting fraud in realtime @rmoff ATM Fraud Detection with Apache Kafka and KSQL

Slide 3

Slide 3

Photo by Lasaye Hommes on Unsplash

Slide 4

Slide 4

Inbound stream of ATM data @rmoff • Account id • Location • Amount • https://github.com/rmoff/gess ATM Fraud Detection with Apache Kafka and KSQL

Slide 5

Slide 5

@rmoff Demo! ATM Fraud Detection with Apache Kafka and KSQL

Slide 6

Slide 6

Spot patterns within this stream Ac. ID A42 Transaction ID Time xxx116d91d6-ef17 11:56:58 @rmoff ATM Midland A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds ATM Fraud Detection with Apache Kafka and KSQL

Slide 7

Slide 7

@rmoff Spot patterns within this stream Ac. ID A42 Transaction ID Time xxx116d91d6-ef17 11:56:58 ATM Midland A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds Legit Legit ATM Fraud Detection with Apache Kafka and KSQL

Slide 8

Slide 8

@rmoff Spot patterns within this stream Ac. ID A42 Transaction ID Time xxx116d91d6-ef17 11:56:58 ATM Midland A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds Legit Dodgy! Legit ATM Fraud Detection with Apache Kafka and KSQL

Slide 9

Slide 9

@rmoff Spot patterns within this stream Ac. ID A42 Transaction ID Time xxx116d91d6-ef17 11:56:58 ATM Midland A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds Legit Dodgy! Legit ATM Fraud Detection with Apache Kafka and KSQL

Slide 10

Slide 10

Inbound stream of ATM data @rmoff • Account id • Location • Amount • https://github.com/rmoff/gess ATM Fraud Detection with Apache Kafka and KSQL

Slide 11

Slide 11

KSQL : Stream Processing with SQL @rmoff SELECT TXN_ID, ATM, CUSTOMER_NAME, CUSTOMER_PHONE FROM ATM_POSSIBLE_FRAUD; ATM Fraud Detection with Apache Kafka and KSQL

Slide 12

Slide 12

@rmoff ATM Fraud Detection with Apache Kafka and KSQL

Slide 13

Slide 13

@rmoff Elasticsearch Customer details ATM fraud txns with customer details Notification service

  1. Spot fraud in stream of transactions 2.Enrich transaction events with customer data ATM Fraud Detection with Apache Kafka and KSQL

Slide 14

Slide 14

@rmoff KSQL is the Streaming SQL Engine for Apache Kafka ATM Fraud Detection with Apache Kafka and KSQL

Slide 15

Slide 15

KSQL for Real-Time Monitoring @rmoff • Log data monitoring, tracking and alerting • syslog data • Sensor / IoT data CREATE STREAM SYSLOG_INVALID_USERS AS SELECT HOST, MESSAGE FROM SYSLOG WHERE MESSAGE LIKE ‘%Invalid user%’; http://cnfl.io/syslogs-filtering / http://cnfl.io/syslog-alerting ATM Fraud Detection with Apache Kafka and KSQL

Slide 16

Slide 16

KSQL for Streaming ETL @rmoff Joining, filtering, and aggregating streams of event data CREATE STREAM vip_actions AS SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id WHERE u.level = ‘Platinum’; ATM Fraud Detection with Apache Kafka and KSQL

Slide 17

Slide 17

KSQL for Anomaly Detection @rmoff Identifying patterns or anomalies in real-time data, surfaced in milliseconds CREATE TABLE possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 SECONDS) GROUP BY card_number HAVING count() > 3; ATM Fraud Detection with Apache Kafka and KSQL

Slide 18

Slide 18

KSQL for Data Transformation @rmoff Make simple derivations of existing topics from the command line CREATE STREAM pageviews WITH (PARTITIONS=4, VALUE_FORMAT=’AVRO’) AS SELECT * FROM pageviews_json; ATM Fraud Detection with Apache Kafka and KSQL

Slide 19

Slide 19

@rmoff KSQL in Development and Production Interactive KSQL for development and testing Headless KSQL for Production REST Desired KSQL queries have been identified “Hmm, let me try out this idea…” ATM Fraud Detection with Apache Kafka and KSQL

Slide 20

Slide 20

Stream Stream joins Orders @rmoff Shipments order.id = shipment.order_id Leadtime shipment_ts - order_ts Which orders haven’t shipped? ATM Fraud Detection with Apache Kafka and KSQL

Slide 21

Slide 21

Stream Stream joins @rmoff ATM transactions ATM Fraud Detection with Apache Kafka and KSQL

Slide 22

Slide 22

Stream Stream joins @rmoff ATM transactions ATM Fraud Detection with Apache Kafka and KSQL

Slide 23

Slide 23

@rmoff Demo! ATM Fraud Detection with Apache Kafka and KSQL

Slide 24

Slide 24

@rmoff Self-Join (Cartesian product) T Ac. ID Transaction ID A42 A42 A42 T1 Ac. ID Transaction ID A42 A42 A42 Time Time ATM xxx116d91d6-ef17 11:56:58 Midland 116d91d6-ef17 11:58:19 Halifax 09c2f660-ef17 19:31:11 Lloyds ATM xxx116d91d6-ef17 11:56:58 Midland 116d91d6-ef17 11:58:19 Halifax 09c2f660-ef17 19:31:11 Lloyds T2 Ac. ID Transaction ID A42 A42 A42 Time ATM xxx116d91d6-ef17 11:56:58 Midland 116d91d6-ef17 11:58:19 Halifax 09c2f660-ef17 19:31:11 Lloyds ATM Fraud Detection with Apache Kafka and KSQL

Slide 25

Slide 25

Self-Join (Cartesian product) T1 Ac. ID Transaction ID A42 Time ATM xxx116d91d6-ef17 11:56:58 Midland @rmoff T2 Ac. ID Transaction ID A42 Time ATM xxx116d91d6-ef17 11:56:58 Midland A42 116d91d6-ef17 11:58:19 Halifax A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds A42 09c2f660-ef17 19:31:11 Lloyds ATM_TXNS T1 INNER JOIN ATM_TXNS T2 ON T1.ACCOUNT_ID = T2.ACCOUNT_ID ATM Fraud Detection with Apache Kafka and KSQL

Slide 26

Slide 26

Self-Join (Cartesian product) T1 Ac. ID Transaction ID A42 Time ATM xxx116d91d6-ef17 11:56:58 Midland @rmoff T2 Ac. ID Transaction ID A42 Time ATM xxx116d91d6-ef17 11:56:58 Midland A42 116d91d6-ef17 11:58:19 Halifax A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds A42 09c2f660-ef17 19:31:11 Lloyds FROM ATM_TXNS T1 INNER JOIN ATM_TXNS T2 WITHIN 10 MINUTES ON T1.ACCOUNT_ID = T2.ACCOUNT_ID ATM Fraud Detection with Apache Kafka and KSQL

Slide 27

Slide 27

@rmoff Self-Join T1 Txn ID T2 Txn ID T1 Time T2 Time T1 ATM T2 ATM 11:56:58 11:58:19 Midland Halifax xxx116d91d6-ef17 116d91d6-ef17 116d91d6-ef17 xxx116d91d6-ef17 11:58:19 11:56:58 Halifax Midland xxx116d91d6-ef17 xxx116d91d6-ef17 11:56:58 11:56:58 Midland Midland 116d91d6-ef17 116d91d6-ef17 11:58:19 Halifax Halifax 11:58:19 ATM Fraud Detection with Apache Kafka and KSQL

Slide 28

Slide 28

@rmoff Self-Join T1 Txn ID T2 Txn ID T1 Time T2 Time T1 ATM T2 ATM 11:56:58 11:58:19 Midland Halifax xxx116d91d6-ef17 116d91d6-ef17 116d91d6-ef17 xxx116d91d6-ef17 11:58:19 11:56:58 Halifax Midland xxx116d91d6-ef17 xxx116d91d6-ef17 11:56:58 11:56:58 Midland Midland 116d91d6-ef17 116d91d6-ef17 11:58:19 Halifax Halifax 11:58:19 Self join on same txn IDs ATM Fraud Detection with Apache Kafka and KSQL

Slide 29

Slide 29

@rmoff Exclude joins on the same txn T1 Txn ID T2 Txn ID T1 Time T2 Time T1 ATM T2 ATM 11:56:58 11:58:19 Midland Halifax 11:56:58 Halifax Midland xxx116d91d6-ef17 116d91d6-ef17 116d91d6-ef17 xxx116d91d6-ef17 11:58:19 WHERE T1.TRANSACTION_ID != T2.TRANSACTION_ID ATM Fraud Detection with Apache Kafka and KSQL

Slide 30

Slide 30

@rmoff Exclude joins on the same txn T1 Txn ID T2 Txn ID T1 Time T2 Time T1 ATM T2 ATM 11:56:58 11:58:19 Midland Halifax 11:56:58 Halifax Midland xxx116d91d6-ef17 116d91d6-ef17 116d91d6-ef17 xxx116d91d6-ef17 11:58:19 Duplicate results (A:B / B:A) ATM Fraud Detection with Apache Kafka and KSQL

Slide 31

Slide 31

@rmoff Join Windows T1 Ac. ID Transaction ID A42 Time ATM xxx116d91d6-ef17 11:56:58 Midland T2 Ac. ID Transaction ID A42 Time ATM xxx116d91d6-ef17 11:56:58 Midland A42 116d91d6-ef17 11:58:19 Halifax A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds A42 09c2f660-ef17 19:31:11 Lloyds WHERE WITHIN 10 MINUTES T1.TRANSACTION_ID != T2.TRANSACTION_ID ATM Fraud Detection with Apache Kafka and KSQL

Slide 32

Slide 32

@rmoff Join Windows T1 Ac. ID Transaction ID A42 Time ATM xxx116d91d6-ef17 11:56:58 Midland T2 Ac. ID Transaction ID A42 Time ATM xxx116d91d6-ef17 11:56:58 Midland A42 116d91d6-ef17 11:58:19 Halifax A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds A42 09c2f660-ef17 19:31:11 Lloyds WHERE WITHIN 10 MINUTES T1.TRANSACTION_ID != T2.TRANSACTION_ID ATM Fraud Detection with Apache Kafka and KSQL

Slide 33

Slide 33

@rmoff Join Windows T1 Ac. ID Transaction ID A42 Time ATM xxx116d91d6-ef17 11:56:58 Midland T2 Ac. ID Transaction ID A42 Time ATM xxx116d91d6-ef17 11:56:58 Midland A42 116d91d6-ef17 11:58:19 Halifax A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds A42 09c2f660-ef17 19:31:11 Lloyds WHERE WITHIN 10 MINUTES T1.TRANSACTION_ID != T2.TRANSACTION_ID ATM Fraud Detection with Apache Kafka and KSQL

Slide 34

Slide 34

@rmoff Only join forward T1 Ac. ID Transaction ID A42 Time ATM xxx116d91d6-ef17 11:56:58 Midland T2 Ac. ID Transaction ID A42 Time ATM xxx116d91d6-ef17 11:56:58 Midland A42 116d91d6-ef17 11:58:19 Halifax A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds A42 09c2f660-ef17 19:31:11 Lloyds WITHIN (0 MINUTES, 10 MINUTES) WHERE T1.TRANSACTION_ID != T2.TRANSACTION_ID ATM Fraud Detection with Apache Kafka and KSQL

Slide 35

Slide 35

@rmoff Only join forward T1 Ac. ID Transaction ID A42 Time ATM xxx116d91d6-ef17 11:56:58 Midland T2 Ac. ID Transaction ID A42 Time ATM xxx116d91d6-ef17 11:56:58 Midland A42 116d91d6-ef17 11:58:19 Halifax A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds A42 09c2f660-ef17 19:31:11 Lloyds WITHIN (0 MINUTES, 10 MINUTES) WHERE T1.TRANSACTION_ID != T2.TRANSACTION_ID ATM Fraud Detection with Apache Kafka and KSQL

Slide 36

Slide 36

@rmoff Only join forward T1 Txn ID T2 Txn ID xxx116d91d6-ef17 116d91d6-ef17 T1 Time T2 Time T1 ATM T2 ATM 11:56:58 11:58:19 Midland Halifax WITHIN (0 MINUTES, 10 MINUTES) Ignore events in the right-hand stream prior to those in the left ATM Fraud Detection with Apache Kafka and KSQL

Slide 37

Slide 37

@rmoff Only join forward T1 Txn ID xxx116d91d6-ef17 T2 Txn ID 116d91d6-ef17 T1 Time T2 Time T1 ATM T2 ATM 11:56:58 11:58:19 Midland Halifax Legit Dodgy! ATM Fraud Detection with Apache Kafka and KSQL

Slide 38

Slide 38

Photo by Esteban Lopez on Unsplash @rmoff ATM Fraud Detection with Apache Kafka and KSQL

Slide 39

Slide 39

Calcuate distance between ATMs @rmoff TX1 TX2 GEO_DISTANCE(TX1.location->lat, TX1.location->lon, TX2.location->lat, TX2.location->lon, ‘KM’) ATM Fraud Detection with Apache Kafka and KSQL

Slide 40

Slide 40

Calculate time between transactions @rmoff TX2.ROWTIME - TX1.ROWTIME AS MILLISECONDS_DIFFERENCE (TX2.ROWTIME - TX1.ROWTIME) / 1000 / 60 / 60 AS HOURS_DIFFERENCE ATM Fraud Detection with Apache Kafka and KSQL

Slide 41

Slide 41

@rmoff Photo by Esteban Lopez on Unsplash GEO_DISTANCE(…) / HOURS_DIFFERENCE AS KMH_REQUIRED ATM Fraud Detection with Apache Kafka and KSQL

Slide 42

Slide 42

@rmoff So speaking of time… ksql> PRINT ‘atm_txns_gess’ ; Format:JSON Kafka message { “ROWTIME”: 1544116309152, timestamp “ROWKEY”: “null”, 2018-12-06 17:11:49 “account_id”: “a218”, “timestamp”: “2018-12-06 17:09:58 +0000”, “atm”: “HSBC”, …} Event time ATM Fraud Detection with Apache Kafka and KSQL

Slide 43

Slide 43

CREATE STREAM ATM_TXNS_GESS (account_id VARCHAR, timestamp VARCHAR, … WITH (KAFKA_TOPIC=’atm_txns_gess’, TIMESTAMP=’timestamp’, TIMESTAMP_FORMAT= ‘yyyy-MM-dd HH:mm:ss X’); ksql> PRINT ‘atm_txns_gess’ ; Format:JSON { “ROWTIME”: 1544116309152, “ROWKEY”: “null”, “account_id”: “a218”, “timestamp”: “2018-12-06 17:09:58 +0000”,

Slide 44

Slide 44

But what about the account holder? 👤 @rmoff ATM Fraud Detection with Apache Kafka and KSQL

Slide 45

Slide 45

Photo by Samuel Zeller on Unsplash @rmoff ATM Fraud Detection with Apache Kafka and KSQL

Slide 46

Slide 46

@rmoff Elasticsearch Customer details ATM fraud txns with customer details Notification service

  1. Enrich transaction events with customer data ATM Fraud Detection with Apache Kafka and KSQL

Slide 47

Slide 47

@rmoff Streaming Integration with Kafka Connect syslog flat file CSV JSON Sources MQTT Tasks Workers Kafka Connect Kafka Brokers ATM Fraud Detection with Apache Kafka and KSQL

Slide 48

Slide 48

@rmoff Streaming Integration with Kafka Connect Amazon S3 Sinks MQTT Tasks Workers Kafka Connect Kafka Brokers ATM Fraud Detection with Apache Kafka and KSQL

Slide 49

Slide 49

@rmoff Streaming Integration with Kafka Connect Amazon S3 syslog flat file CSV JSON Sources Sinks MQTT MQTT Tasks Workers Kafka Connect Kafka Brokers ATM Fraud Detection with Apache Kafka and KSQL

Slide 50

Slide 50

Confluent Hub @rmoff • One-stop place to discover and download : • Connectors • Transformations • Converters hub.confluent.io ATM Fraud Detection with Apache Kafka and KSQL

Slide 51

Slide 51

Demo Time! Customer details @rmoff t c e n n o C a m k u f i a z K e b e D ATM Fraud Detection with Apache Kafka and KSQL

Slide 52

Slide 52

@rmoff Do you think that’s a table you are querying? ATM Fraud Detection with Apache Kafka and KSQL

Slide 53

Slide 53

Time The Table Stream Duality Stream Account ID Amount 12345 + €50 12345

  • €25 12345 -€60 @rmoff Account ID Balance Table 12345 €50 Account ID Balance 12345 €75 Account ID Balance 12345 €15 ATM Fraud Detection with Apache Kafka and KSQL

Slide 54

Slide 54

@rmoff The truth is the log. The database is a cache of a subset of the log. —Pat Helland Immutability Changes Everything http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf Photo by Bobby Burch on Unsplash ATM Fraud Detection with Apache Kafka and KSQL

Slide 55

Slide 55

@rmoff Spot patterns within this stream Ac. ID A42 Transaction ID Time xxx116d91d6-ef17 11:56:58 ATM Midland A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds Legit Dodgy! Legit ATM Fraud Detection with Apache Kafka and KSQL

Slide 56

Slide 56

@rmoff Suspect Transactions Ac. ID T1 Time ATM T2 Time ATM A42 11:56:58 Midland 11:58:19 Halifax Dodgy! ATM Fraud Detection with Apache Kafka and KSQL

Slide 57

Slide 57

@rmoff Suspect Transactions Name Phone Robin M 1234 567 Ac. ID T1 Time ATM T2 Time ATM A42 11:56:58 Midland 11:58:19 Halifax ATM Fraud Detection with Apache Kafka and KSQL

Slide 58

Slide 58

@rmoff Elasticsearch Customer details ATM fraud txns with customer details Notification service

  1. Spot fraud in stream of transactions 2.Enrich transaction events with customer data ATM Fraud Detection with Apache Kafka and KSQL

Slide 59

Slide 59

@rmoff Elasticsearch atm_txns_gess accounts Customer details ATM fraud txns ATM_POSSIBLE_FRAUD_ENRICHED with customer details Notification service

  1. Spot fraud in stream of transactions 2.Enrich transaction events with customer data ATM Fraud Detection with Apache Kafka and KSQL

Slide 60

Slide 60

Photo by Joshua Rodriguez on Unsplash What can we do with it? @rmoff ATM Fraud Detection with Apache Kafka and KSQL

Slide 61

Slide 61

Realtime Operations View & Analysis @rmoff ATM Fraud Detection with Apache Kafka and KSQL

Slide 62

Slide 62

Push notification to the customer @rmoff ATM Fraud Detection with Apache Kafka and KSQL

Slide 63

Slide 63

@rmoff Confluent Community Components Apache Kafka with a bunch of cool stuff! For free! Log Events Database Changes loT Data Web Events … Confluent Platform Data Integration Real-time Applications Monitoring & Administration Confluent Control Center | Security Confluent Platform Transformations Hadoop Operations Replicator | Auto Data Balancing Custom Apps Database Data Compatibility Schema Registry SQL Stream Processing KSQL Analytics Data Warehouse Development and Connectivity Clients | Connectors | REST Proxy | CLI CRM Monitoring Apache Kafka® Core | Connect API | Streams API … CUSTOMER SELF-MANAGED Datacenter Public Cloud … CONFLUENT FULLY-MANAGED Confluent Cloud ATM Fraud Detection with Apache Kafka and KSQL

Slide 64

Slide 64

@rmoff Free Books! https://www.confluent.io/apache-kafka-stream-processing-book-bundle ATM Fraud Detection with Apache Kafka and KSQL

Slide 65

Slide 65

https://www.confluent.io/ksql http://cnfl.io/slack https://cnfl.io/demo-scene @rmoff robin@confluent.io

Slide 66

Slide 66

@rmoff Resources • CDC Spreadsheet #EOF • Blog: No More Silos: How to Integrate your Databases with Apache Kafka and CDC • #partner-engineering on Slack for questions • BD team (#partners / partners@confluent.io) can help with introductions on a given sales op ATM Fraud Detection with Apache Kafka and KSQL