Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ubuntu support #789

Open
natewise opened this issue May 11, 2022 · 19 comments
Open

Ubuntu support #789

natewise opened this issue May 11, 2022 · 19 comments

Comments

@natewise
Copy link

For reference I'm using the pod_X1Y1_ruche_X16Y8_hbm machine and the bigblade-verilator platform

Here is a screenshot of the error I'm getting:
image

Previous commands had the include -I/mnt/c/Projects/bsg_bladerunner/verilator/include/vltstd, which is where this svdpi.h is located:
image
But for some reason the command causing the issue doesn't have this include, which would seem to be the cause for the error that's happening.

I'm not entirely sure how to move forward in solving this issue since it seems to be caused by something internally, so I would appreciate any help. Thanks!

@drichmond
Copy link
Collaborator

Hi @natewise, can you try this PR? #787

@natewise
Copy link
Author

Do you mean re-trying the command while in the vcs-verilator-svdpi-fix branch? I gave that a go but make regression still failed.

/mnt/c/Projects/bsg_bladerunner/bsg_replicant/libraries/bsg_manycore_features.h:34:2: error: #error "_BSD_SOURCE not defined: required for bsg_manycore_runtime"
34 | #error "_BSD_SOURCE not defined: required for bsg_manycore_runtime"

@drichmond
Copy link
Collaborator

Ah, you're on Ubuntu

@natewise
Copy link
Author

Well WSL 2 technically, but yes Ubuntu xD

@drichmond
Copy link
Collaborator

We haven't run through the steps on Ubuntu in a while, it's on our todo list. @dpetrisko was going to take a look last week but I know he's busy.

To solve the error above, the following diff will work:

diff --git a/libraries/bsg_manycore_features.h b/libraries/bsg_manycore_features.h
index 271a4b66..f7b21730 100644
--- a/libraries/bsg_manycore_features.h
+++ b/libraries/bsg_manycore_features.h
@@ -29,10 +29,12 @@
 #define BSG_MANYCORE_FEATURES_H
 // <features.h> sorts out many of these defines based on compile time flags (e.g. -std=c++11)
 #include <features.h>
-// check _BSG_SOURCE
+// check _BSD_SOURCE
 #ifndef _BSD_SOURCE
+#ifndef _DEFAULT_SOURCE
 #error "_BSD_SOURCE not defined: required for bsg_manycore_runtime"
 #endif
+#endif

@drichmond
Copy link
Collaborator

The next issue you will probably run into is a linking error with zlib1g-dev, which I haven't solved yet. I think the link flags are in the wrong order. This is a workaround, for now:

diff --git a/libraries/platforms/bigblade-verilator/link.mk b/libraries/platforms/bigblade-verilator/link.mk
index 4b16d1c0..8db5b8d6 100644
--- a/libraries/platforms/bigblade-verilator/link.mk
+++ b/libraries/platforms/bigblade-verilator/link.mk
@@ -207,7 +207,6 @@ $(SIMSCS): %/simsc : %/bsg_manycore_simulator.o %/V$(BSG_DESIGN_TOP)__ALL.a
 # regression tests can build them before launching parallel
 # compilation and execution
 REGRESSION_PREBUILD += $(BSG_MACHINExPLATFORM_PATH)/exec/simsc
-REGRESSION_PREBUILD += $(BSG_MACHINExPLATFORM_PATH)/debug/simsc
 REGRESSION_PREBUILD += $(BSG_MACHINExPLATFORM_PATH)/profile/simsc
 REGRESSION_PREBUILD += $(BSG_PLATFORM_PATH)/libbsgmc_cuda_legacy_pod_repl.so
 REGRESSION_PREBUILD += $(BSG_PLATFORM_PATH)/libbsg_manycore_runtime.so

After that, the notation |& isn't supported by ubuntu's default shell, but is used in our scripts. You can re-write it as 2>&1

Last week when we went down this path the executable segfaulted. We haven't had time to get that far yet, so your mileage may vary.

(We use Centos 7 for our development, but I've been meaning to try on Ubuntu for a while. Haven't had time)

@natewise
Copy link
Author

I made all those changes and am running the command again. If all goes well, I'll close out this issue, otherwise I will send another comment. Thank you for your prompt responses, they were very much appreciated and helpful!

@drichmond
Copy link
Collaborator

If they work, keep this issue open, but retitle to Ubuntu support

@natewise
Copy link
Author

So I ran the command overnight, and it looks like my computer crashed sometime last night. But I just reran the command again and it looks like those steps got me along much further but still ended in an error:
image

@natewise natewise changed the title make regression issue Ubuntu support May 12, 2022
@drichmond
Copy link
Collaborator

It's likely because of how our regression works. It searches for a success message using grep and if it doesn't find it, it fails.

In other words, it is probably searching the stale log from when your computer crashed in test_vcache_flush. Make clean in that directory and try again.

You might also consider running the pod_X1Y1_ruche_X8Y4_hbm machine. It is faster to compile and simulate.

@natewise
Copy link
Author

I went ahead and switched to the pod_X1Y1_ruche_X8Y4_hbm machine, did make clean and then reran make regression (it takes forever!), but unfortunately:
image

@drichmond
Copy link
Collaborator

You can run make regression -jN where N is some reasonable number.

Verilator is a lot slower than VCS at the moment. There are optimizations, but we haven't explored them due to lack of resources.

Ignore this failure for now. I need to look into this, and it should not be related to Verilator. I would say that your system is working.

If you want more assurance, try running make regression from inside of the examples/cuda directory.

@dpetrisko
Copy link
Collaborator

it should not be related to Verilator.

Presumably it's related to 8x4? It is concerning that is a hardware assertion and not a test failure though

@drichmond
Copy link
Collaborator

I don't think it's related to 8x4. I think it's related to the icache, which now has to be written in series (and in blocks of 4 words). This test was likely not updated and we somehow didn't catch it.

@dpetrisko
Copy link
Collaborator

Oh I see, this test may be manually writing I$ incorrectly and that would trigger the assertion. I was concerned because I thought it was a demand fill coming back out of order

@drichmond
Copy link
Collaborator

Yeah, it's failing in VCS but not causing simulation to terminate. The test still passes. Will look into

@natewise
Copy link
Author

Unfortunately neither command worked for me.

Here is the output of make regression -j4:
image

And here is the output of make regression in examples/cuda:
image

@drichmond
Copy link
Collaborator

I bet the latter triggered the former, or there is probable cause.

In the end, I would call your installation working. My impression is there are some issues in our code that don't show up on our system so we'll have to spin up Ubuntu and iron them out there.

But you should be safe to develop.

However, if you're willing to poke a bit, can you comment out this line and re-run test_binary_load_buffer?

@drichmond
Copy link
Collaborator

I went ahead and switched to the pod_X1Y1_ruche_X8Y4_hbm machine, did make clean and then reran make regression (it takes forever!), but unfortunately: image

This issue is related to padding in the final RISC-V executable. I have diagnosed the issue and it should not show up in normal development. It does not cause VCS to fail.

(Breadcrumb for anyone following along, look at the output of nm in bsg_manycore/software/spmd/bsg_loader_suite/loopback_big_text. The text section is not aligned to 4 words/16 bytes In part this is because of the asm(".zero 4192"). With asm(".zero 4188") it passes.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants