Configuration
Phoenix provides many different knobs and dials to configure and tune the system to run more optimally on your cluster. The configuration is done through a series of Phoenix-specific properties specified both on client and server-side hbase-site.xml files. In addition to these properties, there are of course all the HBase configuration properties with the most important ones documented here.
The table below outlines the full set of Phoenix-specific configuration properties and their defaults.
Property | Description | Default | ||||||||||
data.tx.snapshot.dir | Server-side property specifying the HDFS directory used to store snapshots of the transaction state. No default value. | None | ||||||||||
data.tx.timeout | Server-side property specifying the timeout in seconds for a transaction to complete. Default is 30 seconds. | 30 | ||||||||||
phoenix.query.timeoutMs | Client-side property specifying the number of milliseconds after which a query will timeout on the client. Default is 10 min. | 600000 | ||||||||||
phoenix.query.keepAliveMs | Maximum time in milliseconds that excess idle threads will wait for a new tasks before terminating when the number of threads is greater than the cores in the client side thread pool executor. Default is 60 sec. | 60000 | ||||||||||
phoenix.query.threadPoolSize | Number of threads in client side thread pool executor. As the number of machines/cores in the cluster grows, this value should be increased. | 128 | ||||||||||
phoenix.query.queueSize | Max queue depth of the bounded round robin backing the client side thread pool executor, beyond which an attempt to queue additional work is rejected. If zero, a SynchronousQueue is used instead of the bounded round robin queue. The default value is 5000. | 5000 | ||||||||||
phoenix.stats.guidepost.width | Server-side parameter that specifies the number of bytes between guideposts. A smaller amount increases parallelization, but also increases the number of chunks which must be merged on the client side. The default value is 100 MB. | 104857600 | ||||||||||
phoenix.stats.guidepost.per.region | Server-side parameter that specifies the number of guideposts per region. If set to a value greater than zero, then the guidepost width is determiend by MAX_FILE_SIZE of table / phoenix.stats.guidepost.per.region. Otherwise, if not set, then the phoenix.stats.guidepost.width parameter is used. No default value. | None | ||||||||||
phoenix.stats.updateFrequency | Server-side paramater that determines the frequency in milliseconds for which statistics will be refreshed from the statistics table and subsequently used by the client. The default value is 15 min. | 900000 | ||||||||||
phoenix.stats.minUpdateFrequency | Client-side parameter that determines the minimum amount of time in milliseconds that must pass before statistics may again be manually collected through another UPDATE STATISTICS call. The default value is phoenix.stats.updateFrequency / 2. | 450000 | ||||||||||
phoenix.stats.useCurrentTime | Server-side parameter that if true causes the current time on the server-side to be used as the timestamp of rows in the statistics table when background tasks such as compactions or splits occur. If false, then the max timestamp found while traversing the table over which statistics are being collected is used as the timestamp. Unless your client is controlling the timestamps while reading and writing data, this parameter should be left alone. The default value is true. | true | ||||||||||
phoenix.query.spoolThresholdBytes | Threshold size in bytes after which results from parallelly executed query results are spooled to disk. Default is 20 mb. | 20971520 | ||||||||||
phoenix.query.maxSpoolToDiskBytes | Threshold size in bytes up to which results from parallelly executed query results are spooled to disk above which the query will fail. Default is 1 GB. | 1024000000 | ||||||||||
phoenix.query.maxGlobalMemoryPercentage | Percentage of total heap memory (i.e. Runtime.getRuntime().maxMemory()) that all threads may use. Only course grain memory usage is tracked, mainly accounting for memory usage in the intermediate map built during group by aggregation. When this limit is reached the clients block attempting to get more memory, essentially throttling memory usage. Defaults to 15% | 15 | ||||||||||
phoenix.query.maxGlobalMemorySize | Max size in bytes of total tracked memory usage. By default not specified, however, if present, the lower of this parameter and the phoenix.query.maxGlobalMemoryPercentage will be used | |||||||||||
phoenix.query.maxGlobalMemoryWaitMs | Maximum amount of time that a client will block while waiting for more memory to become available. After this amount of time, an InsufficientMemoryException is thrown. Default is 10 sec. | 10000 | ||||||||||
phoenix.query.maxTenantMemoryPercentage | Maximum percentage of phoenix.query.maxGlobalMemoryPercentage that any one tenant is allowed to consume. After this percentage, an InsufficientMemoryException is thrown. Default is 100% | 100 | ||||||||||
phoenix.query.dateFormat | Default pattern to use for conversion of a date to/from a string, whether through the TO_CHAR(<date>) or TO_DATE(<date-string>) functions, or through resultSet.getString(<date-column>). Default is yyyy-MM-dd HH:mm:ss.SSS | yyyy-MM-dd HH:mm:ss.SSS | ||||||||||
phoenix.query.dateFormatTimeZone | A timezone id that specifies the default time zone in which date, time, and timestamp literals should be interpreted when interpreting string literals or using the TO_DATE function. A time zone id can be a timezone abbreviation such as “PST”, or a full name such as “America/Los_Angeles”, or a custom offset such as “GMT-9:00”. The time zone id “LOCAL” can also be used to interpret all date, time, and timestamp literals as being in the current timezone of the client. | GMT | ||||||||||
phoenix.query.timeFormat | Default pattern to use for conversion of TIME to/from a string, whether through the TO_CHAR(<time>) or TO_TIME(<time-string>) functions, or through resultSet.getString(<time-column>). Default is yyyy-MM-dd HH:mm:ss.SSS | yyyy-MM-dd HH:mm:ss.SSS | ||||||||||
phoenix.query.timestampFormat | Default pattern to use for conversion of TIMESTAMP to/from a string, whether through the TO_CHAR(<timestamp>) or TO_TIMESTAMP(<timestamp-string>) functions, or through resultSet.getString(<timestamp-column>). Default is yyyy-MM-dd HH:mm:ss.SSS | yyyy-MM-dd HH:mm:ss.SSS | ||||||||||
phoenix.query.numberFormat | Default pattern to use for conversion of a decimal number to/from a string, whether through the TO_CHAR(<decimal-number>) or TO_NUMBER(<decimal-string>) functions, or through resultSet.getString(<decimal-column>). Default is #,##0.### | #,##0.### | ||||||||||
phoenix.mutate.maxSize | The maximum number of rows that may be batched on the client before a commit or rollback must be called. | 500000 | ||||||||||
phoenix.mutate.batchSize | The number of rows that are batched together and automatically committed during the execution of an UPSERT SELECT or DELETE statement. This property may be overridden at connection time by specifying the UpsertBatchSize property value. Note that the connection property value does not affect the batch size used by the coprocessor when these statements are executed completely on the server side. | 1000 | ||||||||||
phoenix.query.maxServerCacheBytes | Maximum size (in bytes) of a single sub-query result (usually the filtered result of a table) before compression and conversion to a hash map. Attempting to hash an intermediate sub-query result of a size bigger than this setting will result in a MaxServerCacheSizeExceededException. Default 100MB. | 104857600 | ||||||||||
phoenix.coprocessor.maxServerCacheTimeToLiveMs | Maximum living time (in milliseconds) of server caches. A cache entry expires after this amount of time has passed since last access. Consider adjusting this parameter when a server-side IOException(“Could not find hash cache for joinId”) happens. Getting warnings like “Earlier hash cache(s) might have expired on servers” might also be a sign that this number should be increased. | 30000 | ||||||||||
phoenix.query.useIndexes | Client-side property determining whether or not indexes are considered by the optimizer to satisfy a query. Default is true | true | ||||||||||
phoenix.index.failure.handling.rebuild | Server-side property determining whether or not a mutable index is rebuilt in the background in the event of a commit failure. Only applicable for indexes on mutable, non transactional tables. Default is true | true | ||||||||||
phoenix.index.failure.block.write | Server-side property determining whether or not a writes to the data table are disallowed in the event of a commit failure until the index can be caught up with the data table. Requires that phoenix.index.failure.handling.rebuild is true as well. Only applicable for indexes on mutable, non transactional tables. Default is false | false | ||||||||||
phoenix.index.failure.handling.rebuild.interval | Server-side property controlling the millisecond frequency at which the server checks whether or not a mutable index needs to be partially rebuilt to catch up with updates to the data table. Only applicable for indexes on mutable, non transactional tables. Default is 10 seconds. | 10000 | ||||||||||
phoenix.index.failure.handling.rebuild.overlap.time | Server-side property controlling how many milliseconds to go back from the timestamp at which the failure occurred to go back when a partial rebuild is performed. Only applicable for indexes on mutable, non transactional tables. Default is 1 millisecond. | 1 | ||||||||||
phoenix.index.mutableBatchSizeThreshold | Number of mutations in a batch beyond which index metadata will be sent as a separate RPC to each region server as opposed to included inline with each mutation. Defaults to 5. | 5 | ||||||||||
phoenix.schema.dropMetaData | Determines whether or not an HBase table is dropped when the Phoenix table is dropped. Default is true | true | ||||||||||
phoenix.groupby.spillable | Determines whether or not a GROUP BY over a large number of distinct values is allowed to spill to disk on the region server. If false, an InsufficientMemoryException will be thrown instead. Default is true | true | ||||||||||
phoenix.groupby.spillFiles | Number of memory mapped spill files to be used when spilling GROUP BY distinct values to disk. Default is 2 | 2 | ||||||||||
phoenix.groupby.maxCacheSize | Size in bytes of pages cached during GROUP BY spilling. Default is 100Mb | 102400000 | ||||||||||
phoenix.groupby.estimatedDistinctValues | Number of estimated distinct values when a GROUP BY is performed. Used to perform initial sizing with growth of 1.5x each time reallocation is required. Default is 1000 | 1000 | ||||||||||
phoenix.distinct.value.compress.threshold | Size in bytes beyond which aggregate operations which require tracking distinct value counts (such as COUNT DISTINCT) will use Snappy compression. Default is 1Mb | 1024000 | ||||||||||
phoenix.index.maxDataFileSizePerc | Percentage used to determine the MAX_FILESIZE for the shared index table for views relative to the data table MAX_FILESIZE. The percentage should be estimated based on the anticipated average size of an view index row versus the data row. Default is 50%. | 50 | ||||||||||
phoenix.coprocessor.maxMetaDataCacheTimeToLiveMs | Time in milliseconds after which the server-side metadata cache for a tenant will expire if not accessed. Default is 30mins | 180000 | ||||||||||
phoenix.coprocessor.maxMetaDataCacheSize | Max size in bytes of total server-side metadata cache after which evictions will begin to occur based on least recent access time. Default is 20Mb | 20480000 | ||||||||||
phoenix.client.maxMetaDataCacheSize | Max size in bytes of total client-side metadata cache after which evictions will begin to occur based on least recent access time. Default is 10Mb | 10240000 | ||||||||||
phoenix.sequence.cacheSize | Number of sequence values to reserve from the server and cache on the client when the next sequence value is allocated. Only used if not defined by the sequence itself. Default is 100 | 100 | ||||||||||
phoenix.clock.skew.interval | Delay interval(in milliseconds) when opening SYSTEM.CATALOG to compensate possible time clock skew when SYSTEM.CATALOG moves among region servers. | 2000 | ||||||||||
phoenix.index.failure.handling.rebuild | Boolean flag which turns on/off auto-rebuild a failed index from when some updates are failed to be updated into the index. | true | ||||||||||
phoenix.index.failure.handling.rebuild.interval | Time interval(in milliseconds) for index rebuild backend Job to check if there is an index to be rebuilt | 10000 | ||||||||||
phoenix.index.failure.handling.rebuild.overlap.time | Index rebuild job builds an index from when it failed - the time interval(in milliseconds) in order to create a time overlap to prevent missing updates when there exists time clock skew. | 300000 | ||||||||||
phoenix.query.force.rowkeyorder | Whether or not a non aggregate query returns rows in row key order for salted tables. For version prior to 4.4, use phoenix.query.rowKeyOrderSaltedTable instead. Default is true. | true | ||||||||||
phoenix.connection.autoCommit | Whether or not a new connection has auto-commit enabled when it is created. | false | ||||||||||
phoenix.table.default.store.nulls | The default value of the STORE_NULLS flag used for table creation which determines whether or not null values should be explicitly stored in HBase. This is a client side parameter. Available starting from Phoenix 4.3. |
false | ||||||||||
phoenix.table.istransactional.default | The default value of the TRANSACTIONAL flag used for table creation which determines whether or not a table is transactional . This is a client side parameter. Available starting from Phoenix 4.7. |
false | ||||||||||
phoenix.transactions.enabled | Determines whether or not transactions are enabled in Phoenix. A table may not be declared as transactional if transactions are disabled. This is a client side parameter. Available starting from Phoenix 4.7. |
false | ||||||||||
phoenix.mapreduce.split.by.stats | Determines whether to use the splits determined by stastics for MapReduce input splits. Default is true. This is a server side parameter. Available starting from Phoenix 4.10. Set to false to enable behavior from previous versions. |
true | ||||||||||
phoenix.log.level | Client-side property enabling query (only SELECT statement) logging. The logs are written to the SYSTEM.LOG table (requires a user to have W access on SYSTEM.LOG table). Possible values:
WARNING: Enabling this feature may leak sensitive information to anyone who can access the SYSTEM.LOG table. |
OFF | ||||||||||
phoenix.log.sample.rate | Client-side property controlling the probability of logging a query to the query log. Set to a value between 0.0(no query) and 1.0(100% queries) . Available starting from Phoenix 4.14. |
1.0 |