DALEQ: An Open-Source Tool for Assessing Java Binary Equivalence
Justin Cappos
jc4946 at nyu.edu
Fri Aug 8 12:40:17 UTC 2025
Wow, this is excellent to hear about! It looks like we have something fun
to dig into this weekend!
Thanks,
Justin
On Fri, Aug 8, 2025 at 5:03 AM Jens Dietrich via rb-general <
rb-general at lists.reproducible-builds.org> wrote:
> Introducing DALEQ: An Open-Source Tool for Assessing Java Binary
> Equivalence
>
> We’re excited to announce the release of DALEQ — a new open-source tool
> for analyzing and comparing Java binaries. DALEQ is designed to help
> developers, security researchers, and build engineers assess whether two
> .jar files built from the same source code are semantically equivalent,
> even when they’re not bitwise identical. This is particularly useful for
> comparing jars from Maven Central and jars produced via reproducible
> builds, or generated by services like Oracle’s build-from-source or
> Google’s Assured OSS. Although tools like diff or hash-based checks can
> detect binary differences, they don’t explain why binaries differ, or
> whether those differences matter. Bytecode-level differences can be caused
> by changes in compilers or build pipelines — not necessarily by compromised
> builds. DALEQ helps distinguish harmless variation from meaningful
> divergence.
>
> How DALEQ Works
>
> DALEQ focuses on Java bytecode comparison, though it can also analyze
> resources and metadata in jars. At its core, DALEQ uses a datalog engine
> (Soufflé) — the same kind of logic-based analysis engine used in systems
> like CodeQL — to normalize and compare bytecode structures. Key features
> include:
>
> - Bytecode normalization to reduce irrelevant build differences
> - Semantic diffing that identifies and explains non-equivalent instructions
> - Provenance tracking: For equivalent files, DALEQ shows how equivalence
> was derived via datalog rules, for non-equivalent files, it provides
> bytecode-level diffs
>
> DALEQ also verifies whether the underlying source code inputs are the same
> (or at least equivalent, tolerating some variations in comments and
> formatting) and includes integrations with existing tools like the standard
> javap disassembler. It supports extensibility through a plugin system.
>
> Real-World Evaluation
>
> DALEQ builds on our earlier research into levels of binary equivalence. We
> evaluated the tool using real-world .jar files from Oracle and Google, both
> of whom independently rebuild Java packages from source. The results are
> encouraging: DALEQ was able to classify 85–90% of .class files that were
> not bitwise identical as still being semantically equivalent, with
> supporting provenance.
>
> Learn More
>
> You can try out DALEQ now on GitHub: https://github.com/binaryeq/daleq/
> A detailed technical paper describing DALEQ and our evaluation:
> https://arxiv.org/abs/2508.01530
> A technical paper describing the conceptual approach of levels of binary
> equivalence: https://arxiv.org/abs/2410.08427 (to be presented at ICSME’25
> <https://conf.researchr.org/home/icsme-2025>)
>
>
> Jens Dietrich (Associate Professor at Victoria University of Wellington)
>
> Behnaz Hassanshahi (Principal Researcher and Tech Lead at Oracle, Oracle
> Labs Brisbane)
>
> -
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.reproducible-builds.org/pipermail/rb-general/attachments/20250808/d76a0d76/attachment.htm>
More information about the rb-general
mailing list