Skip to main content

TritonCPUConfig()

Generates a configuration for running Towhee pipelines with a Triton inference server on the CPU. See Towhee Pipeline in Triton for details.

TritonCPUConfig(num_instances_per_device=1, max_batch_size=None, batch_latency_micros=None, preferred_batch_size=None)

Parameters

  • num_instances_per_device - int

    • Number of instances per CPU.

    • The value defaults to 1, indicating that there is one model instance running on the CPU.

  • max_batch_size - int or None

    • A maximum batch size that the model in the pipeline supports for the types of batching that can be exploited by Triton. See Maximum Batch Size for details.

    • The value defaults to None, leaving Triton to generate the value.

  • batch_latency_micros - int or None

    • Latency for Triton to process the delivered batch, in microseconds.

    • The value defaults to None, leaving Triton to generate the value.

  • preferred_batch_size - list[int] or None

    • A list of batch sizes that the Triton should attempt to create.

    • The value defaults to None, leaving Triton to generate the value.

Returns

A TowheeConfig object with server set to a dictionary. The dictionary contains the specified parameters and their values with device_ids set to None.

Examples

from towhee import pipe, ops, AutoConfig

auto_config1 = AutoConfig.TritonCPUConfig()
auto_config1.config # return {'server': {'device_ids': None, 'num_instances_per_device': 1, 'max_batch_size': None, 'batch_latency_micros': None, 'triton': {'preferred_batch_size': None}}}

# or you can also set the configuration
auto_config2 = AutoConfig.TritonCPUConfig(num_instances_per_device=3,
max_batch_size=128,
batch_latency_micros=100000,
preferred_batch_size=[8, 16])
auto_config2.config # return {'server': {'device_ids': None, 'num_instances_per_device': 3, 'max_batch_size': 128, 'batch_latency_micros': 100000, 'triton': {'preferred_batch_size': [8, 16]}}}

# you can also add the configuration
auto_config3 = AutoConfig.LocalCPUConfig() + AutoConfig.TritonCPUConfig()
auto_config3.config # return {'device': -1, 'server': {'device_ids': None, 'num_instances_per_device': 1, 'max_batch_size': None, 'batch_latency_micros': None, 'triton': {'preferred_batch_size': None}}}