Atomic Upsert
Use UPSERT ... ON DUPLICATE KEY for atomic row-level updates in Phoenix, with examples and limitations.
To support atomic upsert, an optional ON DUPLICATE KEY clause, similar to the MySQL syntax, has been
incorporated into the UPSERT VALUES command as of Phoenix 4.9. The general syntax is described
here. This feature provides a superset of the HBase Increment and
CheckAndPut functionality to enable atomic upserts. On the server-side, when the commit
is processed, the row being updated will be locked while the current column values are read and the
ON DUPLICATE KEY clause is executed. Given that the row must be locked and read when the ON DUPLICATE KEY
clause is used, there will be a performance penalty (much like there is for an HBase Put versus a CheckAndPut).
In the presence of the ON DUPLICATE KEY clause, if the row already exists, the VALUES specified will
be ignored and instead either:
- the row will not be updated if
ON DUPLICATE KEY IGNOREis specified or - the row will be updated (under lock) by executing the expressions following the
ON DUPLICATE KEY UPDATEclause.
Multiple UPSERT statements for the same row in the same commit batch will be processed in the order of their
execution. Thus the same result will be produced when auto commit is on or off.
Examples
For example, to atomically increment two counter columns, you would execute the following command:
UPSERT INTO my_table(id, counter1, counter2) VALUES ('abc', 0, 0)
ON DUPLICATE KEY UPDATE counter1 = counter1 + 1, counter2 = counter2 + 1;To only update a column if it doesn't yet exist:
UPSERT INTO my_table(id, my_col) VALUES ('abc', 100)
ON DUPLICATE KEY IGNORE;Note that arbitrarily complex expressions may be used in this new clause:
UPSERT INTO my_table(id, total_deal_size, deal_size) VALUES ('abc', 0, 100)
ON DUPLICATE KEY UPDATE
total_deal_size = total_deal_size + deal_size,
approval_reqd = CASE WHEN total_deal_size < 100 THEN 'NONE'
WHEN total_deal_size < 1000 THEN 'MANAGER APPROVAL'
ELSE 'VP APPROVAL' END;ON DUPLICATE KEY UPDATE_ONLY
UPDATE_ONLY is a third variant of the ON DUPLICATE KEY clause, available from
Phoenix 5.3.0 (PHOENIX-7648).
Unlike UPDATE, it only runs the update-clause expressions when the row already
exists — it never inserts. If the row is missing, the supplied VALUES are discarded
and the statement is a no-op. Use it when an UPSERT should be able to update an
existing row but must not create a new one.
UPSERT INTO inventory(id, qty) VALUES ('sku-42', 0)
ON DUPLICATE KEY UPDATE_ONLY qty = qty - 1;If id = 'sku-42' already exists, its qty is decremented under the row lock.
If it doesn't, the supplied VALUES are discarded and nothing is written.
The three forms compared:
| Clause | Row missing | Row present |
|---|---|---|
ON DUPLICATE KEY IGNORE | Insert from VALUES | No-op |
ON DUPLICATE KEY UPDATE ... | Insert from VALUES | Run the update |
ON DUPLICATE KEY UPDATE_ONLY ... | No-op (VALUES discarded) | Run the update |
Returning the affected row
A single-row UPSERT or DELETE can append RETURNING * to return the affected row
as a JDBC ResultSet in the same round-trip, available from Phoenix 5.3.0
(PHOENIX-7651). The returned
row reflects the server-side state after the mutation — including any values
computed by an ON DUPLICATE KEY UPDATE clause under the row lock.
This collapses two common patterns into one round-trip:
- Atomic read-modify-write counters — increment, then read the new value
without a follow-up
SELECT(and without risking a different writer slipping in between). - Tombstoning / queue consumers — atomically delete a row and read the payload you just removed, useful for idempotent replay and "claim the next task" patterns.
-- Atomic increment that also returns the post-update row.
UPSERT INTO counters(id, hits) VALUES ('home', 0)
ON DUPLICATE KEY UPDATE hits = hits + 1
RETURNING *;
-- Atomic delete returning the row that was removed.
DELETE FROM tasks WHERE id = ? RETURNING *;The statement must affect a single row when RETURNING * is used. Drivers iterate
the returned ResultSet exactly like the result of a regular SELECT.
Java JDBC thick-client API
Java applications using the Phoenix thick JDBC client can drive the same
atomic-return semantics directly through typed methods on PhoenixStatement and
PhoenixPreparedStatement, without putting RETURNING * in the SQL:
executeAtomicUpdateReturnRow— returns the row state after the mutation (or the unchanged row when the statement was a no-op). Equivalent toRETURNING *in SQL.executeAtomicUpdateReturnOldRow— returns the row state before the mutation, regardless of whether the mutation was applied. There is no SQL form for this — only the typed API exposes the pre-image, which is the natural fit for audit logs and compare-and-swap-style flows.
Both return a Pair<Integer, ResultSet> where the integer is 1 if the row was
mutated and 0 for a no-op, so callers know the outcome without having to inspect
cell contents. Both require an auto-commit connection.
Limitations
The following limitations are enforced for the ON DUPLICATE KEY clause usage:
- Primary key columns may not be updated, since this would essentially be creating a new row.
- Transactional tables may not use this clause as atomic upserts are already possible through exception handling when a conflict occurs.
- Immutable tables may not use this clause as by definition there should be no updates to existing rows.
- The
CURRENT_SCNproperty may not be set on connection when this clause is used as HBase does not handle atomicity unless the latest value is being updated. - The same column should not be updated more than once in the same statement.
- No aggregation or references to sequences are allowed within the clause.