This pull request introduces workflows and updates to ensure reproducible builds for the Lighthouse project. It adds two GitHub Actions workflows for building and testing reproducible Docker images and binaries, updates the `Makefile` to streamline reproducible build configurations, and modifies the `Dockerfile.reproducible` to align with the new build process. Additionally, it removes the `reproducible` profile from `Cargo.toml`.
### New GitHub Actions Workflows:
* [`.github/workflows/docker-reproducible.yml`](diffhunk://#diff-222af23bee616920b04f5b92a83eb5106fce08abd885cd3a3b15b8beb5e789c3R1-R145): Adds a workflow to build and push reproducible multi-architecture Docker images for releases, including support for dry runs without pushing an image.
### Build Configuration Updates:
* [`Makefile`](diffhunk://#diff-76ed074a9305c04054cdebb9e9aad2d818052b07091de1f20cad0bbac34ffb52L85-R143): Refactors reproducible build targets, centralizes environment variables for reproducibility, and updates Docker build arguments for `x86_64` and `aarch64` architectures.
* [`Dockerfile.reproducible`](diffhunk://#diff-587298ff141278ce3be7c54a559f9f31472cc5b384e285e2105b3dee319ba31dL1-R24): Updates the base Rust image to version 1.86, removes hardcoded reproducibility settings, and delegates build logic to the `Makefile`.
* Switch to using jemalloc-sys from Debian repos instead of building it from source. A Debian version is [reproducible](https://tests.reproducible-builds.org/debian/rb-pkg/trixie/amd64/jemalloc.html) which is [hard to achieve](https://github.com/NixOS/nixpkgs/issues/380852) if you build it from source.
### Profile Removal:
* [`Cargo.toml`](diffhunk://#diff-2e9d962a08321605940b5a657135052fbcef87b5e360662bb527c96d9a615542L289-L295): Removes the `reproducible` profile, simplifying build configurations and relying on external tooling for reproducibility.
Co-Authored-By: Moe Mahhouk <mohammed-mahhouk@hotmail.com>
Co-Authored-By: chonghe <44791194+chong-he@users.noreply.github.com>
Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>
Follow-up to:
- https://github.com/sigp/lighthouse/pull/7764
The `heaptrack` feature added in my previous PR was ineffective, because the jemalloc feature was turned on by the Linux target-specific dependency.
This PR tweaks the features such that:
- The jemalloc feature is just used to control whether jemalloc is compiled in. It is enabled on Linux by the target-specific dependency (see `lighthouse/Cargo.toml`), and completely disabled on Windows.
- If the `sysmalloc` feature is set on Linux then it overrides jemalloc when selecting an allocator, _even if_ the jemalloc feature is enabled (and the jemalloc dep was compiled).
NA
Bumps the `ethereum_ssz` version, along with other crates that share the dep.
Primarily, this give us bitfields which can store 128 bytes on the stack before allocating, rather than 32 bytes (https://github.com/sigp/ethereum_ssz/pull/38). The validator count has increase massively since we set it at 32 bytes, so aggregation bitfields (et al) now require a heap allocation. This new value of 128 should get us to ~2m active validators.
#5244
Pass `JEMALLOC_SYS_WITH_LG_PAGE=16` env to aarch64 cross-compilation to support systems with up to 64-KiB page sizes. This is backwards-compatible for the current (most usual) 4-KiB systems.
## Issue Addressed
Synchronize dependencies and edition on the workspace `Cargo.toml`
## Proposed Changes
with https://github.com/rust-lang/cargo/issues/8415 merged it's now possible to synchronize details on the workspace `Cargo.toml` like the metadata and dependencies.
By only having dependencies that are shared between multiple crates aligned on the workspace `Cargo.toml` it's easier to not miss duplicate versions of the same dependency and therefore ease on the compile times.
## Additional Info
this PR also removes the no longer required direct dependency of the `serde_derive` crate.
should be reviewed after https://github.com/sigp/lighthouse/pull/4639 get's merged.
closes https://github.com/sigp/lighthouse/issues/4651
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
Co-authored-by: Michael Sproul <micsproul@gmail.com>
## Proposed Changes
Another `tree-states` motivated PR, this adds `jemalloc` as the default allocator, with an option to use the system allocator by compiling with `FEATURES="" make`.
- [x] Metrics
- [x] Test on Windows
- [x] Test on macOS
- [x] Test with `musl`
- [x] Metrics dashboard on `lighthouse-metrics` (https://github.com/sigp/lighthouse-metrics/pull/37)
Co-authored-by: Michael Sproul <micsproul@gmail.com>
## Proposed Changes
I did some gardening 🌳 in our dependency tree:
- Remove duplicate versions of `warp` (git vs patch)
- Remove duplicate versions of lots of small deps: `cpufeatures`, `ethabi`, `ethereum-types`, `bitvec`, `nix`, `libsecp256k1`.
- Update MDBX (should resolve#3028). I tested and Lighthouse compiles on Windows 11 now.
- Restore `psutil` back to upstream
- Make some progress updating everything to rand 0.8. There are a few crates stuck on 0.7.
Hopefully this puts us on a better footing for future `cargo audit` issues, and improves compile times slightly.
## Additional Info
Some crates are held back by issues with `zeroize`. libp2p-noise depends on [`chacha20poly1305`](https://crates.io/crates/chacha20poly1305) which depends on zeroize < v1.5, and we can only have one version of zeroize because it's post 1.0 (see https://github.com/rust-lang/cargo/issues/6584). The latest version of `zeroize` is v1.5.4, which is used by the new versions of many other crates (e.g. `num-bigint-dig`). Once a new version of chacha20poly1305 is released we can update libp2p-noise and upgrade everything to the latest `zeroize` version.
I've also opened a PR to `blst` related to zeroize: https://github.com/supranational/blst/pull/111
## Proposed Changes
Lots of lint updates related to `flat_map`, `unwrap_or_else` and string patterns. I did a little more creative refactoring in the op pool, but otherwise followed Clippy's suggestions.
## Additional Info
We need this PR to unblock CI.
## Issue Addressed
Closes https://github.com/sigp/lighthouse/issues/2857
## Proposed Changes
Explicitly set GNU malloc's MMAP_THRESHOLD to 128KB, disabling dynamic adjustments. For rationale see the linked issue.
## Proposed Changes
Add `mallinfo2` behind a feature flag so that we can get accurate memory metrics during debugging. It can be enabled when building Lighthouse like so (so long as the platform supports it):
```
cargo install --path lighthouse --features "malloc_utils/mallinfo2"
```
## Proposed Changes
While investigating memory usage I noticed that the malloc metrics were going negative once they passed 2GiB. This is because the underlying `mallinfo` function returns a `i32`, and we were casting it straight to an `i64`, preserving the sign.
The long-term fix will be to move to `mallinfo2`, but it's still not yet widely available.
## Issue Addressed
NA
## Proposed Changes
I've noticed some of the SigP Prater nodes struggling on v1.4.0-rc.0. I suspect this is due to the changes in #2296. Specifically, the trade-off which lowered the memory footprint whilst increasing runtime on some functions.
Presently, this PR is documenting my testing on Prater.
## Additional Info
NA
## Issue Addressed
NA
## Proposed Changes
Modify the configuration of [GNU malloc](https://www.gnu.org/software/libc/manual/html_node/The-GNU-Allocator.html) to reduce memory footprint.
- Set `M_ARENA_MAX` to 4.
- This reduces memory fragmentation at the cost of contention between threads.
- Set `M_MMAP_THRESHOLD` to 2mb
- This means that any allocation >= 2mb is allocated via an anonymous mmap, instead of on the heap/arena. This reduces memory fragmentation since we don't need to keep growing the heap to find big contiguous slabs of free memory.
- ~~Run `malloc_trim` every 60 seconds.~~
- ~~This shaves unused memory from the top of the heap, preventing the heap from constantly growing.~~
- Removed, see: https://github.com/sigp/lighthouse/pull/2299#issuecomment-825322646
*Note: this only provides memory savings on the Linux (glibc) platform.*
## Additional Info
I'm going to close#2288 in favor of this for the following reasons:
- I've managed to get the memory footprint *smaller* here than with jemalloc.
- This PR seems to be less of a dramatic change than bringing in the jemalloc dep.
- The changes in this PR are strictly runtime changes, so we can create CLI flags which disable them completely. Since this change is wide-reaching and complex, it's nice to have an easy "escape hatch" if there are undesired consequences.
## TODO
- [x] Allow configuration via CLI flags
- [x] Test on Mac
- [x] Test on RasPi.
- [x] Determine if GNU malloc is present?
- I'm not quite sure how to detect for glibc.. This issue suggests we can't really: https://github.com/rust-lang/rust/issues/33244
- [x] Make a clear argument regarding the affect of this on CPU utilization.
- [x] Test with higher `M_ARENA_MAX` values.
- [x] Test with longer trim intervals
- [x] Add some stats about memory savings
- [x] Remove `malloc_trim` calls & code