rust non-determinism

Richard Purdie richard.purdie at linuxfoundation.org
Tue Aug 6 08:03:38 UTC 2024


On Tue, 2024-08-06 at 09:06 +0200, Bernhard M. Wiedemann via rb-general wrote:
> On 05/08/2024 18.37, John Gilmore wrote:
> > Bernhard M. Wiedemann <bernhardout at lsmod.de> wrote:
> > > => https://github.com/rust-lang/rust/issues/128675
> > 
> > Two Rustc developers closed it within 8 hours as "already completed",
> > even though it isn't.
> > 
> > They also said "CGU partitioning is very deliberately designed to be
> > deterministic."  Implying that therefore there is no bug, because design
> > and implementation are the same thing?
> > 
> > Rust has 36 open reproducibility bugs (not including this closed one).
> > It'd be worth seeing what other ones they closed unfixed.
> 
> my workaround for openSUSE at
> https://github.com/Firstyear/cargo-packaging/pull/11 was also rejected.
> 
> It is a complex ecosystem. pop-launcher pulls in 258 modules and many
> have their own build.rs that could break reproducibility. There is 
> `crate` and `just` as build tools that can cause issues, too.
> In the past there were also LLVM bugs that caused non-determinism.
> 
> So it is often hard to pin-point the source of non-determinism with 
> precision and confidence.
> 
> And it does not help that rust HashMaps have non-deterministic order by 
> default.
> 
> 
> I'm getting closer to think that llvm's LTO and rust's codegen-units=16 
> might be determistic by themselves, but (similar to PGO[1]) amplify 
> other sources of non-determinism which makes it harder to debug.
> 
> If that is indeed the case, we can disable them during debugging r-b but 
> leave them on when the package-specific issue is fixed.

You might find this interesting:

https://git.yoctoproject.org/poky/commit/?id=e2e7017350d0b5324811fef3b841f98b00273887

Yocto Project has been chasing a problem where rust itself was
reproducible but rustdoc was not. We found it only happened if the
build was in paths of a different length. We had a pid in the buildpath
so this sometimes happened and sometimes did not, we've fixed things so
they're always different lengths now.

We found we had to disable both lto and use codegen-units=1 to make
things always reproduce. It took us months to work that out!

Exactly where the bug is, we're not sure but it does mean we have a
working workaround for now.

Cheers,

Richard





More information about the rb-general mailing list