WIP: Fix cross dependent path #5

andreaskuster · 2021-09-11T13:13:46Z

check ComputeGraph: setup_internal_buffers
check KernelChainGraph: compute_kernel_latency
check KernelChainGraph: compute_delay_buffer
apply persistent fix & remove manual fix for horidiff

…ix_cross_dependent_path

andreaskuster · 2021-09-11T14:42:50Z

Are the op_latency values in stencilflow/compute_graph.config accurate or do we have to make them dynamic i.e. as a function of the device type or IP block we invoke?

Furthermore, we model the pipeline latency as 1 elem/cycle (push/pop). Is this accurate in all cases?

Extend optimization functionality.

definelicht · 2021-09-13T07:37:12Z

In practice they are dynamic, but I think our current approach of just estimating them conservatively is "good enough". The true buffer space consumption comes from having to buffering large delays, which will always be orders of magnitude bigger than a conservative estimation of the operation latency. We could even just estimate any arithmetic operation on floating point numbers to be 16 cycles and be done with it :-)

definelicht

What did you want me to review here? I could only find some debugging code

README.md

vars.sh

stencilflow/kernel_chain_graph.py

andreaskuster · 2021-09-13T18:34:36Z

In practice they are dynamic, but I think our current approach of just estimating them conservatively is "good enough". The true buffer space consumption comes from having to buffering large delays, which will always be orders of magnitude bigger than a conservative estimation of the operation latency. We could even just estimate any arithmetic operation on floating point numbers to be 16 cycles and be done with it :-)

Well, that might be an issue, since we have several paths. Setting the arithmetic operation latency too high might be able to create interlocks since the buffer of the other path has been over-estimated (i.e. size of delay buffer too high).

Co-authored-by: definelicht <definelicht@inf.ethz.ch>

andreaskuster · 2021-09-13T18:35:54Z

What did you want me to review here? I could only find some debugging code

Sorry for that, it is still WIP, and pushed the review button by mistake, and removed it immediately, but I guess the email notificaiton has already been sent out.

…ix_cross_dependent_path

definelicht · 2021-09-14T07:16:34Z

Well, that might be an issue, since we have several paths. Setting the arithmetic operation latency too high might be able to create interlocks since the buffer of the other path has been over-estimated (i.e. size of delay buffer too high).

I don't think I understand why this is an issue. As long as we use FIFOs (instead of, say, fixed depth shift registers), isn't it fine if they're too big? Then they will just never use their full capacity.

Adjust delay buffer computation.

andreaskuster · 2021-09-14T12:02:21Z

@definelicht FYI: The dace submodule link from master is invalid i.e. 404

definelicht · 2021-09-14T12:06:56Z

@definelicht FYI: The dace submodule link from master is invalid i.e. 404

Fixed

andreaskuster · 2021-09-14T12:53:52Z

Furthermore, we have a mpi4py as a python dependency, but do not check if mpi-dev is installed (i.e. for me mpi.h is missing on fpga1)

definelicht · 2021-09-14T12:56:25Z

Furthermore, we have a mpi4py as a python dependency, but do not check if mpi-dev is installed (i.e. for me mpi.h is missing on fpga1)

This is okay for me, but perhaps we can make it optional so it only fails if trying to run something distributed?

andreaskuster · 2021-09-15T10:26:48Z

Furthermore, we have a mpi4py as a python dependency, but do not check if mpi-dev is installed (i.e. for me mpi.h is missing on fpga1)

This is okay for me, but perhaps we can make it optional so it only fails if trying to run something distributed?

Yep, that makes sense

andreaskuster · 2021-09-15T10:44:29Z

Well, that might be an issue, since we have several paths. Setting the arithmetic operation latency too high might be able to create interlocks since the buffer of the other path has been over-estimated (i.e. size of delay buffer too high).

I don't think I understand why this is an issue. As long as we use FIFOs (instead of, say, fixed depth shift registers), isn't it fine if they're too big? Then they will just never use their full capacity.

To my understanding, the example below might stall in both cases, i.e. if the delay buffer is too small, but also too big, right?

definelicht · 2021-09-15T11:29:29Z

To my understanding, the example below might stall in both cases, i.e. if the delay buffer is too small, but also too big, right?

The size of the delay buffer should have no influence on when the data arrives from c to k3. These are just FIFOs, so they can be read early. k3 will start consuming as soon as it has data available on both inputs regardless of the size of the delay buffer. How did you calculate the 2048 cycles arrival at k3?

andreaskuster · 2021-09-15T12:02:39Z

To my understanding, the example below might stall in both cases, i.e. if the delay buffer is too small, but also too big, right?

The size of the delay buffer should have no influence on when the data arrives from c to k3. These are just FIFOs, so they can be read early. k3 will start consuming as soon as it has data available on both inputs regardless of the size of the delay buffer. How did you calculate the 2048 cycles arrival at k3?

Ok yes you are right, but we are basically wasting memory.

definelicht · 2021-09-15T12:06:14Z

Ok yes you are right, but we are basically wasting memory

Yep, this is the "safe" solution. Keep in mind that the buffer sizes from this will be tiny compared to buffering a slice of the 2D domain, so I'm not concerned about this being a significant factor :-)

andreaskuster · 2021-09-15T12:23:33Z

Compiling reference SDFG...
Loading input arrays...
Initializing output arrays...
Executing DaCe program...
Finished running program.
Executing reference DaCe program...
Finished running program.
Results saved to results/horidiff
Reference results saved to results/horidiff/reference
Comparing to reference SDFG...
Results verified.
bin/run_program.py test/stencils/horidiff.json emulation -compare-to-referenc  47.84s user 11.63s system 214% cpu 27.689 total

It seems to be fixed, please reproduce this finding before we merge :)

andreaskuster · 2021-09-15T12:33:34Z

Compiling reference SDFG...
Loading input arrays...
Initializing output arrays...
Executing DaCe program...
Finished running program.
Executing reference DaCe program...
Finished running program.
Results saved to results/horidiff
Reference results saved to results/horidiff/reference
Comparing to reference SDFG...
Results verified.
bin/run_program.py test/stencils/horidiff.json emulation -compare-to-referenc  47.84s user 11.63s system 214% cpu 27.689 total

It seems to be fixed, please reproduce this finding before we merge :)

For min channel depth = 1024

andreaskuster added 10 commits January 8, 2021 16:26

Add minimal bug example.

5e2d0dc

Further reduce minimal example.

f4763fc

Account for offset to center.

89f87c3

Add fpga0 sdk env vars script

e1caeb1

Add larger jacobi3d example

34389ae

Add temporary fix.

b1bac07

Merge remote-tracking branch 'origin/fix_cross_dependent_path' into f…

d300e59

…ix_cross_dependent_path

Increase problem size.

e83e4e3

Add more complex example.

9da97a3

Merge remote-tracking branch 'origin/fix_cross_dependent_path' into f…

74f8d21

…ix_cross_dependent_path

andreaskuster self-assigned this Sep 11, 2021

andreaskuster added 9 commits September 11, 2021 15:21

Add minimal bug example.

4000eca

Further reduce minimal example.

f0e2e3b

Account for offset to center.

8023480

Add temporary fix.

0afabe0

Add fpga0 sdk env vars script

c4b83c5

Add larger jacobi3d example

ab8c555

Add more complex example.

47cc666

Increase problem size.

ed1dcb8

Merge remote-tracking branch 'origin/fix_cross_dependent_path' into f…

97fe0fd

…ix_cross_dependent_path

andreaskuster added 3 commits September 11, 2021 21:45

Make example more distinct to test function correctness.

ff683e1

Add path inclusion for direct file execution.

0e35d03

Extend optimization functionality.

Several readme extension

fb9966c

andreaskuster requested review from definelicht and removed request for definelicht September 11, 2021 19:56

Remove horidiff hotfix

67a6b93

definelicht reviewed Sep 13, 2021

View reviewed changes

README.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

vars.sh Outdated Show resolved Hide resolved

stencilflow/kernel_chain_graph.py Outdated Show resolved Hide resolved

andreaskuster and others added 2 commits September 13, 2021 20:34

Update README.md

47dfc58

Co-authored-by: definelicht <definelicht@inf.ethz.ch>

Update README.md

6b737df

Co-authored-by: definelicht <definelicht@inf.ethz.ch>

andreaskuster added 3 commits September 13, 2021 21:22

Move test config to default location

6a27a5f

Remove local env setup

b9ae43b

Merge remote-tracking branch 'origin/fix_cross_dependent_path' into f…

2a73675

…ix_cross_dependent_path

Add extended horidiff example.

86ab704

Adjust delay buffer computation.

andreaskuster added 2 commits September 15, 2021 14:34

Reduce min channel depth to 1024

d442f68

Update dace version

6a0cf1b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Fix cross dependent path #5

WIP: Fix cross dependent path #5

andreaskuster commented Sep 11, 2021 •

edited

Loading

andreaskuster commented Sep 11, 2021 •

edited

Loading

definelicht commented Sep 13, 2021

definelicht left a comment

andreaskuster commented Sep 13, 2021

andreaskuster commented Sep 13, 2021

definelicht commented Sep 14, 2021 •

edited

Loading

andreaskuster commented Sep 14, 2021

definelicht commented Sep 14, 2021

andreaskuster commented Sep 14, 2021

definelicht commented Sep 14, 2021

andreaskuster commented Sep 15, 2021

andreaskuster commented Sep 15, 2021

definelicht commented Sep 15, 2021

andreaskuster commented Sep 15, 2021

definelicht commented Sep 15, 2021

andreaskuster commented Sep 15, 2021

andreaskuster commented Sep 15, 2021

WIP: Fix cross dependent path #5

Are you sure you want to change the base?

WIP: Fix cross dependent path #5

Conversation

andreaskuster commented Sep 11, 2021 • edited Loading

andreaskuster commented Sep 11, 2021 • edited Loading

definelicht commented Sep 13, 2021

definelicht left a comment

Choose a reason for hiding this comment

andreaskuster commented Sep 13, 2021

andreaskuster commented Sep 13, 2021

definelicht commented Sep 14, 2021 • edited Loading

andreaskuster commented Sep 14, 2021

definelicht commented Sep 14, 2021

andreaskuster commented Sep 14, 2021

definelicht commented Sep 14, 2021

andreaskuster commented Sep 15, 2021

andreaskuster commented Sep 15, 2021

definelicht commented Sep 15, 2021

andreaskuster commented Sep 15, 2021

definelicht commented Sep 15, 2021

andreaskuster commented Sep 15, 2021

andreaskuster commented Sep 15, 2021

andreaskuster commented Sep 11, 2021 •

edited

Loading

andreaskuster commented Sep 11, 2021 •

edited

Loading

definelicht commented Sep 14, 2021 •

edited

Loading