Troubleshoot | A visual recipe job log says “Computation will not be distributed” #
In the case of a visual recipe, two general causes for slow performance are using an inefficient execution engine or using data formats that don’t allow for the most optimal execution engine.
Suboptimal dataset formats #
Another way to tell if a visual recipe is not optimized is by looking in the job log for any reference to “Computation will not be distributed.” That’s an indicator that there is something suboptimal in your input/output dataset format, the engine you’ve selected, or the permissions on the input/output dataset connection.
For example, using the fast-path when writing to an S3 CSV dataset requires that the output dataset does not have a header row configured. If you attempt to write to an output S3 CSV dataset that does, you’ll notice an entry in the job log that indicates that this is the case, and that this can lead to a performance issue:
[2022/01/21-17:47:35.980]
[null-err-43]
[INFO]
[dku.utils]
-
[2022/01/21-17:47:35.978]
Cannot
use
Csv
write
fast-path
for
Csv-S3
dataset:
Csv
fast-path
output
is
disabled
in
configuration
[2022/01/21-17:47:35.982]
[null-err-43]
[INFO]
[dku.utils]
-
[2022/01/21-17:47:35.978]
Writing
S3
dataset
as
remote
dataframe.
Computation
will
not
be
distributed
In each of the above cases, it’s usually best to modify your Flow in a way that will allow you to use the fast-path and preferred engine.