Skip to content

Default for SupervisedTrainer slows down training almost 2x #8541

Description

@id-b3

decollate: whether to decollate the batch-first data to a list of data after model computation,
recommend `decollate=True` when `postprocessing` uses components from `monai.transforms`.
default to `True`.

The default behaviour for the SupervisedTrainer slows down each training step quite a bit. The cause of this slow-down is not apparent until profiling each step. With larger batch sizes, the delay from the default decollation step increases as more tensors have to move from GPU/CPU and back.

Image In this flame-chart, there are two steps. A forward, backward and the decollate.

It may be better to set the default for decollation in the supervised trainer to False or to add a warning that the default behaviour can significantly impact training time (in our case, 2x speed up from 4 days to 2 days by disabling decollation).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions