🗓️ April 2020👀 loading

Data Models for Financial Transactions

We'll cover some recommended techniques for modelling financial transactions within your database, including various Postgres model schemas and useful SQL queries.

financemodellingpostgres

When a customer buys an item on your website, it's usually a good idea to save some information about the transaction into your database, like:

the time it happened
the amount and currency
a short description of the purchased item
some kind of ID reference of the payment at Stripe or PayPal

That's why we're going to look at a few different ways of modelling financial transactions in your system, and how you might persist them in your database.

Basics
Transaction Lifecycle
Basic Data Model
Full Data Model

Basics

Fundamentally, a financial transaction is nothing more than a movement of funds into or out of your (digital) wallet or bank account, at a certain point in time. This means that any model, at the very least, would account for the following properties in some way, shape or form:

amount
currency
created (timestamp)

Like we discussed in the introduction, it probably makes sense to also include a description or identifier of the thing which was purchased:

description

And since you'll probably be processing your payments through some kind of gateway like Stripe or PayPal, it makes sense to also include the following attributes:

gateway
method (optional)

Here, the method attribute could be useful if you've configured your gateway integration to support multiple payment methods (cards, iDeal, SEPA direct).

There's much more to it, though. Financial transactions transition through multiple stages before they're finalized. And they're never really finalized either, since transactions can usually be reversed through refunds or credit card chargebacks, many months later sometimes. That's why it's important to be aware of the transaction lifecycle (which we'll be discussing next), and why it makes sense to also include some kind of enum attribute like

status,

indicating whether the transaction is pending, failed, captured or refunded.

Transaction Lifecycle

Most financial transactions transition through the following stages:

Authorization - The customer is asked to authorize the transaction via their chosen payment method; if successful, this will put a temporary hold on the customers' funds, but doesn't yet move them into your company's account
Capture - After the customer has authorized the transaction, you'll be able to capture the authorized funds; this will actually move the money from the customer's account into your company's account
Refund - If you're unlucky, a previously captured transaction will be refunded, in which case the funds will move back from your company's account into the customer's account

Even though it's not so easy to design a one-size-fits-all model which supports any type of transaction and any type of payment method, the lifecycle overview above does capture the general idea pretty well.

To prove this, we'll have a look at the flows of the following payment methods, with Stripe and PayPal as our gateways, and see how they compare against this rather simple authorize-capture-refund lifecycle above.

payment method	gateway
card	Stripe
iDeal	Stripe
PayPal	PayPal

Stripe Card Payments Flow

Stripe recommends using their Payment Intents API to accept card payments. With this API, you're supposed initiate the payment flow server-side by creating a new Payment Intent object.

Upon doing so, Stripe will provide you with an id of this Payment Intent object, and a client_secret which you're supposed to pass to the browser. Stripe's JavaScript SDK then uses this client_secret to authenticate the Card payment, where the user might be prompted with a pop-up to confirm their purchase via SMS or password or some other authentication method chosen by their bank (this is completely handled by Stripe).

After the user has authorized the payment, Stripe will (by default) automatically try to charge the card. You're recommended to setup a server-side webhook event listener for Stripe's charge.succeeded and charge.failed events, informing your application of the charge attempt's outcome.

It's also recommended to set up a server-side webhook event listener for charge.refunded events, allowing you to automatically process refunds or chargebacks (by revoking customer access to your website, for example).

Below you'll find a summary of Stripe's Payment Intents flow, and how it connects to the transaction lifecycle.

Authorization - Create a Payment Intent via Stripe's API and set the status of your own transaction to pending
Authorization + Capture - Have the customer confirm this Payment Intent via the browser
Capture - Listen for the charge.succceeded webhook event and transition your own transaction's status from pending to captured if it comes in
Refund - Listen for the charge.refunded webhook event and transition your own transaction's status from captured to refunded if it comes in

Stripe iDeal Payments Flow

A very convenient aspect about the Payment Intents flow for Dutch websites is that it can also be used for iDeal payments (iDeal being the single most-used payment method in the Netherlands).

The only real difference with the Payment Intents card flow, is that when confirming an iDeal payment via the browser, you're supposed to provide Stripe with a return_url where the user will be redirected after confirming the iDeal payment on their bank's website or app.

Apart from this small difference, the same authorization, capture and refund logic applies. You can even subscribe to the same webhook events.

PayPal Payments Flow

With PayPal's Orders API, the idea is to initiate the transaction by creating a new Order object from your server. Two important parameters you should specify when creating this order are the return_url and the cancel_url where the customer will be redirected after they've confirmed or cancelled their purchase.

Upon doing so, PayPal will provide you with an Order id and an approval_url, where you should redirect the use customer so they can authorize their payment on PayPal's website. Once finished, PayPal will redirect them back to the return_url you've specified before.

Because PayPal makes a clear distinction between authorizations and captures, you're still required to manually capture the payment after it's been authorized by the customer. You can either wait for the customer to be redirected back to your website, but I recommend listening for PayPal's CHECKOUT.ORDER.APPROVED webhook event and using this as the trigger for capturing the payment (through a different API request).

By setting up webhook listeners for PayPal's PAYMENT.CAPTURE.REFUNDED and PAYMENT.CAPTURE.REVERSED events, you'll also be able to automatically process refunds.

Below you'll find a summary of PayPal's Order flow, and how it connects to the transaction lifecycle.

Authorization - Create an Order via PayPal's API and set the status of your own transaction to pending
Authorization - Redirect the customer to the Order's approval_url and wait for them to approve the payment
Capture - Listen for the CHECKOUT.ORDER.APPROVED webhook event and use this as a trigger to capture the previously created Order, transitioning your own transaction's status from pending to captured upon success
Refund - Listen for the PAYMENT.CAPTURE.REFUNDED and PAYMENT.CAPTURE.REVERSED wehbhook events, and transition your own transaction's status from captured to refunded if either one comes in

Basic Data Model

With regards to the transaction lifecycle we've just discussed, I would like to make the case for two different database models for financial transactions:

a basic model where all information is stored in a single transaction table
a full model which distinguishes between a transaction table and a transaction_event table (a transaction event being an authorization, a capture or a refund)

The basic (single-table) model is easier to work with, but operates on the important assumption that you'll process transactions in such a way that you'll have:

at most one authorization event per transaction
at most one capture event per transaction
at most one refund event per transaction

One counter-example where these assumptions wouldn't necessarily hold is the hotel industry, where it's customary to authorize the full amount (for the entire stay) on the guest's credit card, and increase this authorization if he or she orders room service, resulting in multiple authorization events against a single transaction.

A hotel transaction is then usually captured at the end of the guest's stay (when the final amount is known), but could later still be disputed by the guest, potentially resulting in one or many (partial) refund events against this single transaction.

If you aren't planning to issue partial captures or partial refunds, however, and if you're planning to authorize, capture and refund at most once against any transaction, then I would like to present the following database model.

/**********************/
/* PostgreSQL example */
/**********************/
 
CREATE TABLE transaction (
  id                UUID PRIMARY KEY,
  gateway           VARCHAR NOT NULL,  /* stripe, paypal, adyen */
  method            VARCHAR NOT NULL,  /* card, ideal, paypal */
  status            VARCHAR NOT NULL,  /* pending, failed, captured, refunded */
  amount            INTEGER NOT NULL,
  currency          VARCHAR NOT NULL,
  description       VARCHAR,
  metadata          JSONB,
  created           TIMESTAMP NOT NULL,
  authorization_id  VARCHAR, /* gateway specific ID */
  captured          TIMESTAMP,
  capture_id        VARCHAR, /* gateway specific ID */
  refunded          TIMESTAMP,
  refund_id         VARCHAR  /* gateway specific ID */
);
/*
 * Webhook events always contain a reference to gateway specific IDs,
 * so I recommend indexing the following columns for fast lookups
 */
CREATE INDEX ON transaction (created);
CREATE INDEX ON transaction (authorization_id);
CREATE INDEX ON transaction (capture_id);
CREATE INDEX ON transaction (refund_id);

Consider the model above to be a starting point which you're more than welcome to extend with columns from your own domain (it probably makes sense to include some kind of customer_id column, for example).

As for the gateway specific IDs I recommend using, please refer to the following Stripe overview:

authorization_id: the id of the Payment Intent object
capture_id: the id of the charge.succeeded webhook event resource
refund_id: the id of the first element under the refunds property of the charge.refunded webhook event resource

And the following PayPal overview:

authorization_id: the id of the Order object
capture_id: the id of the first element under the captures property of the capture HTTP response
refund_id: the id of the PAYMENT.CAPTURE.REFUNDED or PAYMENT.CAPTURE.REVERSED webhook event resource

Full Data Model

The full data model which I'm about to showcase basically separates the transaction itself from its events, allowing you to keep track and process multiple authorizations, captures and/or refunds for a single transaction.

/**********************/
/* PostgreSQL example */
/**********************/
 
CREATE TABLE transaction (
  id                UUID PRIMARY KEY,
  gateway           VARCHAR NOT NULL,  /* stripe, paypal, adyen */
  method            VARCHAR NOT NULL,  /* card, ideal, paypal */
  description       VARCHAR,
  metadata          JSONB,
  created           TIMESTAMP NOT NULL
);
CREATE INDEX ON transaction (created);
 
CREATE TABLE transaction_event (
  id                UUID PRIMARY KEY,
  transaction_id    UUID NOT NULL REFERENCES transaction (id),
  type              VARCHAR NOT NULL,  /* authorization, capture, refund */
  amount            INTEGER NOT NULL,
  currency          VARCHAR NOT NULL,
  gateway_id        VARCHAR,
  created           TIMESTAMP NOT NULL
);
CREATE INDEX ON transaction_event (created);
CREATE INDEX ON transaction_event (gateway_id);

While being more flexible, this full data model can also be a bit more difficult to work with. That's why I've included some example queries for common tasks, like figuring out how much has been captured for a particular transaction, or finding the latest event type for a particular transaction.

/***********************/
/* PostgreSQL examples */
/***********************/
 
/*
 * This query finds all transactions created in January 2020 for
 * which there's been a capture, like this:
 *
 * | id                | gateway | method | captured_amount |
 * | ----------------- | ------- | ------ | --------------- |
 * | 6e94f915-ac87-... | stripe  | card   | -10000          |
 * | 8a2e0fef-e188-... | paypal  | paypal | -16400          |
 * | dea8a23a-d015-... | stripe  | card   | -7800           |
 * | 9a7abdb9-a240-... | stripe  | card   | -4000           |
 */
SELECT
  t.id,
  t.gateway,
  t.method,
  sum(e.amount) AS captured_amount
FROM transaction t
  JOIN transaction_event e ON t.id = e.transaction_id
WHERE
  t.created BETWEEN '2020-01-01' AND '2020-02-01'
  AND e.type = 'capture'
GROUP BY
  t.id;
 
/*
 * This query finds the latest transaction event type for each
 * transaction created in January 2020, like this:
 *
 * | id                | gateway | method | latest_event  |
 * | ----------------- | ------- | ------ | ------------- |
 * | 6e94f915-ac87-... | stripe  | card   | capture       |
 * | 8a2e0fef-e188-... | paypal  | paypal | capture       |
 * | dea8a23a-d015-... | stripe  | card   | refund        |
 * | 9a7abdb9-a240-... | stripe  | card   | authorization |
 * | e2b445a8-9893-... | stripe  | ideal  | capture       |
 *
 * It includes a CASE-based ORDER BY clause in the `string_agg`
 * aggregate function for handling the (common) scenario of
 * authorizations and captures occurring at the exact same time,
 * in which case it makes sense to assume that 'authorization' ->
 * 'capture' -> 'refund' is the correct chronological order
 *
 * Please note that the `string_agg` and `split_part` functions
 * are specific to PostgreSQL; for MySQL-based dialects, refer to
 * the similar `group_concat` and `substring_index` functions
 */
SELECT
  t.id,
  t.gateway,
  t.method,
  split_part(string_agg(e.type, ','
     ORDER BY
      e.created DESC,
      CASE
        WHEN e.type = 'authorization' THEN 0
        WHEN e.type = 'capture' THEN 1
        WHEN e.type = 'refund' THEN 2
      END DESC
  ), ',', 1) AS latest_event
FROM transaction t
  JOIN transaction_event e ON t.id = e.transaction_id
WHERE
  t.created BETWEEN '2020-01-01' AND '2020-02-01'
GROUP BY
  t.id;