Georg's Log

Fri 22 July 2016

LD_LIBRARY_PATH considered harmful

Posted by Georg Sauthoff in C   

The purpose of the LD_LIBRARY_PATH environment variable is to instruct the linker to consider additional directories when searching for libraries. Its valid use case is the test of alternative library versions installed in non-standard locations. In contrast to that, globally setting the LD_LIBRARY_PATH (e.g. in the profile of a user) is harmful because there is no setting that fits every program. The directories in the LD_LIBRARY_PATH environment variable are considered before the default ones and the ones specified in the binary executable. Thus, a - say - system command that is supposed to use a system library easily gets linked at runtime with an API incompatible version. Also, having a program that relies on a certain LD_LIBRARY_PATH setting creates the maintenance burden to always accurately document that setting and distribute that documentation with the binary. Instead, to avoid these issues, the additional directories (if any) that should be searched by the runtime linker should be specified via linker options (e.g. -rpath or -R) at build-time. This results in those directories being written to an ELF attribute that is considered by the runtime linker (i.e. the runpath).

Usenet discussions about the miss-use of LD_LIBRARY_PATH go back as early as 1993. In 1994, Casper H.s. Dik (who later posted as Sun engineer) concludes his answer in comp.unix.solaris with 'LD_LIBRARY_PATH: just say no'. For context, the first Solaris version that comes with ELF executables and shared libraries seems to be Solaris 2.0, which was released 1992. Linux supports ELF since 1995.

Around 1999, David Barr published the article Why LD_LIBRARY_PATH is bad. It has two examples that detail how LD_LIBRARY_PATH causes harm, motivates valid uses and describes better alternative ways already available on Solaris 7. The page LD_LIBRARY_PATH Is Not The Answer references David Barr's article and calls globally setting the LD_LIBRARY_PATH a 'complete hack'. Also referenced by this page is Rod Evans' 2004 blog post LD_LIBRARY_PATH - just say no. He, as a Sun employee at that time - in his sun.com blog, details on the sister variables LD_LIBRARY_PATH_32 and LD_LIBRARY_PATH_64 that are also available on Solaris, in addition to LD_LIBRARY_PATH. His acroread example shows how they can complicate the situation such that even more harm is delivered. His conclusion also is to use the runpath and where necessary to make use of the $ORIGIN linker variable - a variable that is substituted by the runtimee linker with the path where the executable is located. Similar to this example is the war story Purging LD_LIBRARY_PATH written 2010 by Joseph D. Darcy on his Oracle blog. He describes the 'messy' way the JDK used and manipulated LD_LIBRARY_PATH until version 7. Again, $ORIGIN is found to be a better alternative mechanism for that use case. Another (then) Sun colleague Ali Bahrami follows up on Evans with Avoiding LD_LIBRARY_PATH: The Options, in 2007. He calls LD_LIBRARY_PATH a 'crude tool' and argues that it is probably the '#1 one way to get yourself into trouble in an ELF environment'. As an alternative he describes the elfedit tool available in Solaris 11 and later Solaris 10 patch levels.

The Shared Library HowTo also references David Barr's article and concludes that it

is handy for development and testing, but shouldn't be modified by an installation process for normal use by normal users.

As an alternative, it includes an example how (on Linux) the runtime linker /lib/ld-linux.so.2 can be explicitly invoked for executing a given binary using an alternative search path.

The Sun Studio 12 Fortran Programming Guide (!) warns about using the LD_LIBRARY_PATH for anything but test scenarios:

Use of the LD_LIBRARY_PATH environment variable with production software is strongly discouraged. Although useful as a temporary mechanism for influencing the runtime linker’s search path, any dynamic executable that can reference this environment variable will have its search paths altered. You might see unexpected results or a degradation in performance.

(emphasis theirs)

Linux distributions usually don't include any package that relies on a certain LD_LIBRARY_PATH setting - they install the packaged libraries into the standard locations. But even a package distribution like OpenCSW (that installs all its packages into a non-standard path) has a policy against LD_LIBRARY_PATH for all the right reasons:

It is not necessary to set it for OpenCSW binaries. All of them are built with the -R flag, so each binary itself knows where to look for the shared objects.

You do not need to set LD_LIBRARY_PATH system-wide; and if you do, you will likely break your system, even to the point of locking yourself out. Some of the library names clash between /usr/lib and /opt/csw/lib, and if you run the Solaris openssh daemon with LD_LIBRARY_PATH set to /opt/csw/lib, /usr/lib/ssh/sshd will try to load libcrypto from /opt/csw/lib and fail to start.

They also reference Rod Evan's blog article.

The title of this article is inspired by the considered harmful meme. See for example Go To Statement Considered Harmful and Recursive Make Considered Harmful. As with the LD_LIBRARY_PATH that has legitimate uses, goto has them as well, cf. Structured Programming with go to (Knuth, 1974).

Possible Roots

Looking at the documented harmfulness of LD_LIBRARY_PATH one might wonder why it is popular in certain circles. One reason probably can be traced back to the standard install note printed when installing a package that uses Libtool (often used with Autoconf/Automake):

----------------------------------------------------------------------
Libraries have been installed in:
   $PREFIX/lib

If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the `-LLIBDIR'
flag during linking and do at least one of the following: 
   - add LIBDIR to the `LD_LIBRARY_PATH' environment variable
 during execution
   - add LIBDIR to the `LD_RUN_PATH' environment variable
 during linking
   - use the `-Wl,-rpath -Wl,LIBDIR' linker flag
   - have your system administrator add LIBDIR to `/etc/ld.so.conf'

See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.
----------------------------------------------------------------------

This note contains two bad advices:

  • the use of LD_LIBRARY_PATH
  • the use of LD_RUN_PATH (which is in effect similar to LD_LIBRARY_PATH but only considered if the -rpath option isn't supplied)

It is unfortunate because the ramifications of the alternatives aren't qualified and LD_LIBRARY_PATH is even mentioned first.

Thus, a developer or sysadmin who doesn't know much about linking might be tempted to see the LD_LIBRARY_PATH as THE standard way and because it works for one package then wrongly internalize that as this-is-how-it-is-done-on-unix.

In addition to that, some vendors that distribute binary executables and libraries just give bad advice in their install instructions. For example Oracle, the well-known 'enterprise' DB vendor:

Add the name of the directory containing the Instant Client libraries to LD_LIBRARY_PATH.

(SQLPlus® User's Guide and Reference, Oracle 11g2, Configuring SQLPlus Instant Client)

Before you can connect Instant Client (including Instant Client Light) to an Oracle database, ensure that the LD_LIBRARY_PATH environment variable specifies the directory that contains the Instant Client libraries.

(Database Client Installation Guide, Oracle 11g2, Recommended Postinstallation Tasks)

The instantclient_12_1 directory must be on the LD_LIBRARY_PATH before linking the application.

(Oracle C++ Call Interface Programmer's Guide, Oracle 12c, Installation and Upgrading)

Last but not least, a quick google search regarding some cannot-start-program-library-not-found error might turn up low quality forum posts, where setting the LD_LIBRARY_PATH is suggested.

Harm at compile time

At compile time, the linker ld is usually called by the compiler such that all object files a binary executable (or library) consists of are linked together and dependent libraries are referenced. How LD_LIBRARY_PATH influences the linking differs on Linux and Solaris.

Linux

The LD_LIBRARY_PATH directories aren't considered when ld searches for libraries specified via -l. But, the LD_LIBRARY_PATH is considered when shared library dependencies of linked shared libraries are resolved (cf. -rpath-link in ld(1)). In that case, the LD_LIBRARY_PATH directories are searched after the ones specified with -rpath-link and -rpath but before ones specified by ELF attributes and the default ones (e.g. /lib and /usr/lib).

Solaris

On Solaris, in contrast to Linux, the LD_LIBRARY_PATH directories are searched by ld when searching for libraries specified via -l. Those directories are appended to the search path resulting from any -L option

Makefiles

Even if the LD_LIBRARY_PATH is not globally set, it still may be in effect because a poorly written makefile assigns this environment variable.

Also, when make is called from an IDE (like emacs) the environment of that process is inherited - thus, an LD_LIBRARY_PATH setting in the start script of that IDE may induce harm.

Harm at runtime

At runtime, the runtime linker (e.g. on Linux this is ld.so) searches the LD_LIBRARY_PATH directories before the ones specified by the DT_RUNPATH ELF attribute and the before default ones.

On Linux, the DT_RPATH ELF attribute (which is documented as deprecated) is considered before the LD_LIBRARY_PATH, if and only if the binary doesn't also has the DT_RUNPATH attribute set. In that case the DT_RPATH ELF attribute is ignored.

The writing of these two ELF attributes is system dependent:

System Compiler switch ELF attribute
Linux -Wl,-rpath,SOMEDIR DT_RPATH = SOMEDIR
Linux -Wl,-RSOMEDIR DT_RPATH = SOMEDIR
Linux -Wl,--enable-new-dtags,-rpath,SOMEDIR DT_RUNPATH = SOMEDIR
Solaris -RSOMEDIR DT_RUNPATH = DT_RPATH = SOMEDIR
Solaris -Wl,-RSOMEDIR DT_RUNPATH = DT_RPATH = SOMEDIR
Solaris -Wl,-rpath,SOMEDIR DT_RUNPATH = DT_RPATH = SOMEDIR

Note that:

  • The compiler option -Wl instructs the compiler to pass the option following the first comma directly to the linker. All following commas are interpreted as argument delimiter.
  • On Linux, -Wl-path,SOMEDIR and -Wl,-RSOMEDIR is equivalent due to option parsing magic - for compatibility reasons -R is overloaded. If the argument of -R is a filename the option has a different effect.
  • On Solaris, the compiler and the linker both understand -R such that -Wl,-R is equivalent to -Wl,-rpath,SOMEDIR
  • The Solaris 10 ld also understands -rpath although this isn't documented in all versions of the SunOS 5.10 ld(1) man page. It is documented in the 2011 version of that page, though.

Conclusion

Globally setting the LD_LIBRARY_PATH is never a good idea. The narrow original use case of LD_LIBRARY_PATH are quick tests of alternate libraries. When dealing with properly created executables setting the LD_LIBRARY_PATH is redundant in the best case, but it breaks things in the common case. There are better mechanisms and tools than LD_LIBRARY_PATH available to instruct the linker how to search for the correct libraries at build-time and at runtime.

Recommendations

Verify Environment Settings

Verify that in fact LD_LIBRARY_PATH (or it variants LD_LIBRARY_PATH_32, LD_LIBRARY_PATH_64 or LD_RUN_PATH) isn't globally set via shell run control files like /etc/profile, /etc/bashrc or something like that. Also check that it isn't set in user dotfiles like ~/.bashrc, ~/.profile etc. Such a setting would be bad in a config of a development user, but exorbitantly more so in the profile of a production user.

If any running process still has the LD_LIBRARY_PATH set can be verified via looking at its environment. For example, on Linux via /proc:

$ < /proc/$SOMEPID/environ tr '\0' '\n' | grep LD_LIBRARY_PATH

Or on Solaris via pargs:

$ pargs -e $SOMEPID | grep LD_LIBRARY_PATH

Set the runtime library search path at build time

Analogously to the -LSOMEDIR option that adds a directory to the build-time library search path, the option -Wl,-rpath,SOMEDIR (or -Wl,-RSOMEDIR) adds a directory to the runtime library search path. That means that the resulting path is written by the linker into the DT_RPATH and/or DT_RUNPATH ELF attribute of the resulting binary.

The thus set attributes can be printed on Linux via readelf, e.g.

$ readelf -d my_binary_or_so | grep PATH
 0x000000000000001d (RUNPATH)            Library runpath: [SOMEDIR]

And on Solaris via elfdump:

$ elfdump my_binary_or_so | grep PATH
       [4]  RUNPATH           0x128               SOMEDIR
       [5]  RPATH             0x128               SOMEDIR

Other interesting attributes dumped by those tools and that are relevant in this context are NEEDED (i.e. the dependent shared libraries specified via -l or as absolute path) and SONAME (i.e. the name of a shared library that is copied from the library to the NEEDED attribute of the binary that depends on that library).

The effect of different runpaths and specified libraries can be verified via calling ldd my_binary_or_so. When it outputs contains lines like

libxyz.so => not found

on Linux, or on Solaris

libxyz.so =>     (file not found)

then the path is still incomplete or incorrect and the runtime linker will abort the program start with a message like this

./my_binary: error while loading shared libraries: libxyz.so: cannot open shared object file: No such file or directory

on Linux and on Solaris:

ld.so.1: my_binary: fatal: libxyz.so: open failed: No such file or directory
Killed

The Linux runtime linker exits with exit status 127, while the Solaris runtime linker exits with 137.

When the start of a binary has succeeded one can verify the actually runtime linked libraries via pldd, which is available on Linux and Solaris. Or, as an alternative, via lsof.

Use $ORIGIN

The linker variable $ORIGIN is expanded by the runtime linker with the current 'origin' of the ELF binary. The origin is the directory where the binary is stored.

Thus, using this variable in a path specified via -Wl,-rpath,SOMEDIR (or via -Wl,-RSOMEDIR) allows for path specifications that are relative to the location of the binary.

The obvious usecase are binaries that are supposed to be installed inside a non-standard prefix (e.g. /opt/foo) with some of its needed libraries. The directory could be then specified like this (in a shell):

-Wl,-rpath,'$ORIGIN/../lib64'

Note that the linker variable $ORIGIN is enclosed in single quotes such that it is expanded by the shell (e.g. bash).

When using this in a makefile, in addition to the single quotes, the dollar sign has to be escaped such that make doesn't expand it as make variable:

-wl,-rpath,'$$ORIGIN/../lib64'

The linker variable $ORIGIN is understood by the Linux and by the Solaris runtime linker.

On Linux, the runtime linker also expands a few other variables.

On Solaris, use -Wl,-i

The option -Wl,-i instructs the linker to ignore any LD_LIBRARY_PATH environment variable. Thus, this variable can be used as safety net in case LD_LIBRARY_PATH accidentally is still set.

Unfortunately, ld on Linux interprets -i differently (i.e. as: link incrementally).

Patch existing ELF binaries

In case one doesn't have access to the source code, it is still an option to rewrite the DT_RPATH and/or DT_RUNPATH attributes of an exisiting ELF binary. The tool patchelf supports this.

For example, to fix some Oracle executables and the client library that are part of the Oracle 11g2 'Instant Client' (for Linux):

$ patchelf --set-rpath '$ORIGIN/..' /path/to/instantclient_11_2/sdk/proc
$ patchelf --set-rpath '$ORIGIN'    /path/to/instantclient_11_2/sqlplus
$ patchelf --set-rpath '$ORIGIN'    /path/to/instantclient_11_2/libclntsh.so.11.1

The effectiveness of such changes can be verified with the usual tools, e.g.:

$ patchelf --print-rpath mybinary   # or:
$ readelf -d mybinary | grep PATH
$ ldd mybinary

After the change the ldd utility shouldn't print any 'not found' lines, anymore.

Patchelf is packaged for the major Linux distributions and should also be portable to other ELF platforms.

Solaris 11 (and apparently later Solaris 10 patch levels) come with the tool elfedit that can also be used to edit the DT_RUNPATH/DT_RPATH ELF attributes. Example:

$ elfedit -e 'dyn:runpath $ORIGIN/lib' mybinary

However (in contrast to patchelf) it has some limitations (cf. its man page or Changing ELF Runpaths) - e.g. such that elfedit doesn't find enough space for path edits. Especially with binaries created on previous Solaris 10 (or even older Solaris) versions this is issue. Later versions reserve some space (512 bytes it seems) at build-time - such that the room for edits is of fixed size. Thus, it is easy to construct a path that patchelf has no issue to add but where elfedit fails with:

elfedit: [0: .dynstr]: String table does not have room to add string

Also, the dependency management of Solaris 10 doesn't seem very complete such that a system may provide elfedit but still miss some libraries for it:

ld.so.1: elfedit: fatal: liblddbg.so.4: version 'SUNWprivate_4.83' not found (required by file /usr/bin/elfedit)
ld.so.1: elfedit: fatal: liblddbg.so.4: open failed: No such file or directory

Obviously, when dealing with such poorly created binaries, created by an overpaid vendor, one may see this as indicator of the general quality of the provided software and service. And perhaps one reaches to the conclusion that there are better alternatives out there, built by people who know what they are doing. For our initial Oracle example the obvious alternative would be PostgreSQL. It is arguably of better quality than Oracle, implements features Oracle doesn't have and it is ridiculously easy to install (in comparison to Oracle) because it is available from the distributions package repositories.

Quarantine legacy LD_LIBRARY_PATH settings

As a last resort, when re-linking or patching an existing ELF binary is not an option one should at least restrict the scope of LD_LIBRARY_PATH to that binary, i.e. to a start script of that binary.

For example, if the original legacy binary is located under /opt/sware/bin/foo one limits the harm of LD_LIBRARY_PATH via putting it in a start script like this:

$ mv /opt/sware/bin/foo /opt/sware/bin/foo.orig
$ cat <<EOF > /opt/sware/bin/foo
#!/bin/sh
export LD_LIBRARY_PATH=/opt/sware/lib
exec /opt/sware/bin/foo.orig "$@"
EOF
$ chmod 755 /opt/sware/bin/foo

Thus, its effect is limited to the legacy binary. This is a significant improvement over globally setting it.

In case the process forks any child processes, the LD_LIBRARY_PATH setting is inherited, though.

This can be avoided via directly invoking the runtime linker and supplying the search path as an argument. A Linux example:

$ mv /opt/sware/bin/foo /opt/sware/bin/foo.orig
$ cat <<EOF > /opt/sware/bin/foo
#!/bin/sh
exec /lib64/ld-linux-x86-64.so.2 --library-path /opt/sware/lib \
  /opt/sware/bin/foo.orig "$@"
EOF
$ chmod 755 /opt/sware/bin/foo

This the runtime linker for a 64 bit binary, for a 32 bit binary one would use /lib/ld-linux.so.2.

A Solaris example:

$ mv /opt/sware/bin/foo /opt/sware/bin/foo.orig
$ cat <<EOF > /opt/sware/bin/foo
#!/bin/sh
exec /lib/64/ld.so.1 -e LD_LIBRARY_PATH=/opt/sware/lib \
  /opt/sware/bin/foo.orig "$@"
EOF
$ chmod 755 /opt/sware/bin/foo