Phoenix favicon

Apache Phoenix

Features

Atomic Upsert

Use UPSERT ... ON DUPLICATE KEY for atomic row-level updates in Phoenix, with examples and limitations.

To support atomic upsert, an optional ON DUPLICATE KEY clause, similar to the MySQL syntax, has been incorporated into the UPSERT VALUES command as of Phoenix 4.9. The general syntax is described here. This feature provides a superset of the HBase Increment and CheckAndPut functionality to enable atomic upserts. On the server-side, when the commit is processed, the row being updated will be locked while the current column values are read and the ON DUPLICATE KEY clause is executed. Given that the row must be locked and read when the ON DUPLICATE KEY clause is used, there will be a performance penalty (much like there is for an HBase Put versus a CheckAndPut).

In the presence of the ON DUPLICATE KEY clause, if the row already exists, the VALUES specified will be ignored and instead either:

  • the row will not be updated if ON DUPLICATE KEY IGNORE is specified or
  • the row will be updated (under lock) by executing the expressions following the ON DUPLICATE KEY UPDATE clause.

Multiple UPSERT statements for the same row in the same commit batch will be processed in the order of their execution. Thus the same result will be produced when auto commit is on or off.

Examples

For example, to atomically increment two counter columns, you would execute the following command:

UPSERT INTO my_table(id, counter1, counter2) VALUES ('abc', 0, 0)
ON DUPLICATE KEY UPDATE counter1 = counter1 + 1, counter2 = counter2 + 1;

To only update a column if it doesn't yet exist:

UPSERT INTO my_table(id, my_col) VALUES ('abc', 100)
ON DUPLICATE KEY IGNORE;

Note that arbitrarily complex expressions may be used in this new clause:

UPSERT INTO my_table(id, total_deal_size, deal_size) VALUES ('abc', 0, 100)
ON DUPLICATE KEY UPDATE
    total_deal_size = total_deal_size + deal_size,
    approval_reqd = CASE WHEN total_deal_size < 100 THEN 'NONE'
    WHEN total_deal_size < 1000 THEN 'MANAGER APPROVAL'
    ELSE 'VP APPROVAL' END;

ON DUPLICATE KEY UPDATE_ONLY

UPDATE_ONLY is a third variant of the ON DUPLICATE KEY clause, available from Phoenix 5.3.0 (PHOENIX-7648). Unlike UPDATE, it only runs the update-clause expressions when the row already exists — it never inserts. If the row is missing, the supplied VALUES are discarded and the statement is a no-op. Use it when an UPSERT should be able to update an existing row but must not create a new one.

UPSERT INTO inventory(id, qty) VALUES ('sku-42', 0)
ON DUPLICATE KEY UPDATE_ONLY qty = qty - 1;

If id = 'sku-42' already exists, its qty is decremented under the row lock. If it doesn't, the supplied VALUES are discarded and nothing is written.

The three forms compared:

ClauseRow missingRow present
ON DUPLICATE KEY IGNOREInsert from VALUESNo-op
ON DUPLICATE KEY UPDATE ...Insert from VALUESRun the update
ON DUPLICATE KEY UPDATE_ONLY ...No-op (VALUES discarded)Run the update

Returning the affected row

A single-row UPSERT or DELETE can append RETURNING * to return the affected row as a JDBC ResultSet in the same round-trip, available from Phoenix 5.3.0 (PHOENIX-7651). The returned row reflects the server-side state after the mutation — including any values computed by an ON DUPLICATE KEY UPDATE clause under the row lock.

This collapses two common patterns into one round-trip:

  • Atomic read-modify-write counters — increment, then read the new value without a follow-up SELECT (and without risking a different writer slipping in between).
  • Tombstoning / queue consumers — atomically delete a row and read the payload you just removed, useful for idempotent replay and "claim the next task" patterns.
-- Atomic increment that also returns the post-update row.
UPSERT INTO counters(id, hits) VALUES ('home', 0)
ON DUPLICATE KEY UPDATE hits = hits + 1
RETURNING *;

-- Atomic delete returning the row that was removed.
DELETE FROM tasks WHERE id = ? RETURNING *;

The statement must affect a single row when RETURNING * is used. Drivers iterate the returned ResultSet exactly like the result of a regular SELECT.

Java JDBC thick-client API

Java applications using the Phoenix thick JDBC client can drive the same atomic-return semantics directly through typed methods on PhoenixStatement and PhoenixPreparedStatement, without putting RETURNING * in the SQL:

  • executeAtomicUpdateReturnRow — returns the row state after the mutation (or the unchanged row when the statement was a no-op). Equivalent to RETURNING * in SQL.
  • executeAtomicUpdateReturnOldRow — returns the row state before the mutation, regardless of whether the mutation was applied. There is no SQL form for this — only the typed API exposes the pre-image, which is the natural fit for audit logs and compare-and-swap-style flows.

Both return a Pair<Integer, ResultSet> where the integer is 1 if the row was mutated and 0 for a no-op, so callers know the outcome without having to inspect cell contents. Both require an auto-commit connection.

Limitations

The following limitations are enforced for the ON DUPLICATE KEY clause usage:

  • Primary key columns may not be updated, since this would essentially be creating a new row.
  • Transactional tables may not use this clause as atomic upserts are already possible through exception handling when a conflict occurs.
  • Immutable tables may not use this clause as by definition there should be no updates to existing rows.
  • The CURRENT_SCN property may not be set on connection when this clause is used as HBase does not handle atomicity unless the latest value is being updated.
  • The same column should not be updated more than once in the same statement.
  • No aggregation or references to sequences are allowed within the clause.
Edit on GitHub

On this page