question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Split debug symbols into a separate filesystem layer

See original GitHub issue

So in the lead up to the release of Unreal Engine 4.27 and the impending availability of prebuilt container images for all licensees, I’ve been thinking about the experience for developers who are pulling these images and what will happen when they pull different image variants. As can be seen in the generation script and build script for the official images, there will be two development images available which are based on ue4-minimal:

  • ghcr.io/epicgames/unreal-engine:dev-4.27.0, which includes debug symbols and templates
  • ghcr.io/epicgames/unreal-engine:dev-slim-4.27.0, which excludes debug symbols and templates

As the Dockerfile currently stands, the majority of the data in the largest filesystem layer (the one including the Installed Build of the Engine) is going to be duplicated when pulling the two image variants, since the removal of debug symbols takes place prior to copying the Installed Build into a new build stage, leaving the version of the COPY layer with debug symbols included completely unrelated to the version without debug symbols. This isn’t ideal, since users who want both image variants will be pulling the same data twice.

The ideal situation is one where the debug symbols are in a separate layer that stacks on top of the Installed Build, ensuring any given file is only pulled once. This should be feasible if we move the files to a separate directory rather than deleting them, and then merge them in using a second COPY directive in the subsequent build stage. Unfortunately, we can’t just perform a straight merge this way, as evidenced when testing a simple example Dockerfile:

# Create some files and directories
FROM ubuntu:20.04 as first
RUN mkdir /root
RUN mkdir /root/a && mkdir /root/b && mkdir /root/a/c
RUN echo '1' > /root/a/1.txt
RUN echo '2' > /root/b/2.txt
RUN echo '3' > /root/a/c/3.txt

# Create additional files and directories under the same parent directory
FROM ubuntu:20.04 as second
RUN mkdir /root
RUN mkdir /root/d && mkdir mkdir /root/a/e
RUN echo '4' > /root/d/4.txt
RUN echo '5' > /root/a/e/5.txt

# Copy the first set of files and directories
FROM ubuntu:20.04 as final
COPY --from=first /root /root

# Attempt to merge the second set of files and directories with the first
# (These directives both fail with the error `mkdir: cannot create directory '/root': File exists`)
COPY --from=second /root /root
COPY --from=second /root/**/*.txt /root

I can think of two potential approaches to work around this limitation:

  • Copy the files to a separate parent directory in the COPY directive and then hardlink or symlink them into the correct locations using a RUN directive, since performing a move operation here will duplicate the file data in the new filesystem layer. (OCI container image filesystem layers have no concept of renaming a file or directory, just adding, modifying and removing.) This approach should be compatible across both Linux and Windows containers.

  • Use the –mount=type=bind feature from BuildKit to mount the separate parent directory from the previous build stage and then copy the files into place. This approach is far cleaner, but is only compatible with Linux containers until the efforts to add Windows container support to BuildKit are complete.

It is worth noting that this improvement will do nothing to address the frustrations related to Windows filesystem size limit bugs, since the debug symbols will still be committed in the same filesystem layer as the Installed Build of the Engine in the build stage that creates them, and the split layers only exist in the final image.

@slonopotamus @TBBle what are your thoughts on the best way to approach this? Is there value in using the same approach across Linux and Windows for the sake of consistency, or should we use what works best for each platform? Are there any alternative approaches that I’ve overlooked?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:15 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
adamrehncommented, Aug 11, 2021

Initial implementation is in commit 4e6e64b, currently testing to ensure I haven’t inadvertently broken anything.

1reaction
adamrehncommented, Aug 12, 2021

As I noted in my initial comment, the debug symbols are still committed in the same filesystem layer as the Installed Build itself when they’re created, it’s only afterwards that we split them out, so sadly I think we’ll still be bottlenecked by those bugs on the original filesystem layer before we ever get to the point where we split them out.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Extract debug information in separate file with Visual C++
I would like to extract the debug information from my binary and store it into a separate file in case I need to...
Read more >
Split debugging info -- symbols - Technovelty
In a previous post I mentioned split debugging info. One addendum to this is how symbols are handled. Symbols are separate to debugging...
Read more >
Enable debug symbols for all packages · Issue #18530 - GitHub
Currently, a large number of libraries are lacking debug symbols even though environment.enableDebugInfo = true; is specified in my /etc/nixos/ ...
Read more >
Separate Debug Files (Debugging with GDB) - sourceware.org
GDB allows you to put a program's debugging information in a file separate from the executable itself, in a way that allows GDB...
Read more >
Preparing Yocto Development Environment for Debugging
Adds the symbols and debug info files onto the filesystem as separate files instead of having them embedded with the executables. Finally, for ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found