Dive into ELF files using readelf command

Posted by Unknown Minggu, 30 Desember 2012 0 komentar
http://mylinuxbook.com/readelf-command


With the understanding of the ELF format, one gets to know about its sections, the headers, etc. However, apart from the theoretical concepts of ELF, how about if we can verify and understand the format in its actual machine language i.e. the way machine understands it. Yes, we have many tools out there which are provided by the open source community, like readelf, objdump, etc to strip off an ELF binary. However, in this article we shall be exploring the readelf command or tool in Linux.

Please note, a prior understanding of the ELF format would be great for the readers of this article.

Linux readelf command

Introduction

readelf is a Linux utility which can read and understand the format of the ELF files, be it object files, executable etc.
It has the capability of displaying all sorts of information related to ELF format, be it the section headers, the sections, or the symbols, etc. One may wonder, why a programmer would we ever need to know such kind of details? Well, such details are of great help when one is debugging some “unresolved symbol” linking errors, or debugging a crash or maybe hacking an executable. The most paramount is to know how and when to use readelf.

The Usage

Here is the syntax in its abstract form:
$readelf
Well, there are numerous options offered by ‘readelf’ for many scenarios and usage. What better source than man page to get familiar with these options.
As described in the man page
NAME
readelf - Displays information about ELF files.

SYNOPSIS
readelf [-a|--all]
[-h|--file-header]
[-l|--program-headers|--segments]
[-S|--section-headers|--sections]
[-g|--section-groups]
[-t|--section-details]
[-e|--headers]
[-s|--syms|--symbols]
[--dyn-syms]
[-n|--notes]
[-r|--relocs]
[-u|--unwind]
[-d|--dynamic]
[-V|--version-info]
[-A|--arch-specific]
[-D|--use-dynamic]
[-x |--hex-dump=]
[-p |--string-dump=]
[-R |--relocated-dump=]
[-c|--archive-index]
[-w[lLiaprmfFsoRt]|
--debug-dump[=rawline,=decodedline,=info,=abbrev,=pubnames,=aranges,=macro,=frames,=frames-interp,=str,=loc,=Ranges,=pubtypes,=trace_info,=trace_abbrev,=trace_aranges,=gdb_index]]
[--dwarf-depth=n]
[--dwarf-start=n]
[-I|--histogram]
[-v|--version]
[-W|--wide]
[-H|--help]
elffile...
In further sections, we shall be discussing a few of readelf command options, how to understand them and to use them understand ELF format. Following is our example C source code i.e. the test program, which would be used, along with its object file and executable, throughout this article.
test Program
#include < stdio.h >

int d = 1;
const int N = 48;

int main()
{
char c;
c = d + N;
printf("Char is %c\n", c);

return 0;
}
Create its object file and executable:
$ gcc -c tstProgram.c
$ gcc -Wall tstProgram.c -o tstProgram

The Top-level ELF Header

Any ELF file will have a top level ELF header, which like any other header lists down what is coming up.
In our test program, we can view the ELF header using option ‘-h’
$ readelf -h ./tstProgram
What we get is;
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Intel 80386
Version: 0x1
Entry point address: 0x8048310
Start of program headers: 52 (bytes into file)
Start of section headers: 4400 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 8
Size of section headers: 40 (bytes)
Number of section headers: 29
Section header string table index: 26
Lets understand what all these pieces of information means.
First of all, we see some bytes of data in the beginning of the Elf header.
7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
The first four bytes represent the“Magic Number”, to identify the type of file. Here, these bytes
7f 45 4c 46
The remaining bytes represents the metadata of the file, like the version, size, data encoding etc.
We shall be discussing some of the listed information, not all as most of the header information are self explanatory like the
 Data:                              2's complement, little endian
which states that the data of file is being stored in the form of 2’s complement with little endian byte order. The elf has 8 segments and 29 sections.
One important thing to note is the
 Type:                              EXEC (Executable file)
It specifies if the ELF file is an Executable. An Elf file could be an relocatable file (i.e. an object file), a shared object, core file or processor specific. A Linux kernel object is of type relocatable.
Next is the entry point.
 Entry point address:               0x8048310
All the beginner programmers are told that, the execution of a program is entered from the method main(). However, actually, entry point to a C executable is the method _start().
The hex number in front of ‘Entry point address’ i.e. 0×8048310 is the the address of this method ‘_start’ which marks the entry point for the instruction pointer. Note, this is the virtual address.
Regarding the entry through method ‘_start()’, lets confirm that through a simple test. Lets write a program without main() and try to compile it.
#include < stdio.h >

int function()
{

printf("In function \n");
return 1;
}
How about trying to compiling and linking it to get an executable? Lets try
$ gcc empty.c -o empty
/usr/lib/gcc/i686-linux-gnu/4.6/../../../i386-linux-gnu/crt1.o: In function `_start':
(.text+0x18): undefined reference to `main'
collect2: ld returned 1 exit status
Check out the error, it says, “In the function ‘_start’”, which confirms that, first and foremost, it calls ‘_start’, which is the entry point and there, it tries calling ‘main()’ which was not available and hence the error.
One can also confirm it through its disassembly using Linux tool ‘objdump’ which is out of the scope of this article.
Next items in the ELF header are
 Start of program headers:          52 (bytes into file)
Start of section headers: 4400 (bytes into file)
Here it specifies the offsets from the beginning of the elf file for program header table and section header table in the ELF file. The program header table lists the information related to segments needs to be created in the run time process image. However, section table lists all the information related to sections in the binary elf file. Hence, it is through program table, it comes to know which section goes to which segment.
Further moving on to
Flags:                             0x0
Section header string table index: 26
The flags specify any processor specific flags and the Section header string table contains the null terminated strings which are the names of the sections. Hence, in our case, section header string table is at index ‘26’.

Sections

Moving on what are the sections lying underneath the elf file, we use the ‘-S’ option
readelf -S ./tstProgram
Using -S option, readelf lists down the section headers of the elf file, along with the offset at which they are starting at.
In our case we get
There are 29 section headers, starting at offset 0x1130:

Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .interp PROGBITS 08048134 000134 000013 00 A 0 0 1
[ 2] .note.ABI-tag NOTE 08048148 000148 000020 00 A 0 0 4
[ 3] .note.gnu.build-i NOTE 08048168 000168 000024 00 A 0 0 4
[ 4] .gnu.hash GNU_HASH 0804818c 00018c 000020 04 A 5 0 4
[ 5] .dynsym DYNSYM 080481ac 0001ac 000050 10 A 6 1 4
[ 6] .dynstr STRTAB 080481fc 0001fc 00004c 00 A 0 0 1
[ 7] .gnu.version VERSYM 08048248 000248 00000a 02 A 5 0 2
[ 8] .gnu.version_r VERNEED 08048254 000254 000020 00 A 6 1 4
[ 9] .rel.dyn REL 08048274 000274 000008 08 A 5 0 4
[10] .rel.plt REL 0804827c 00027c 000018 08 A 5 12 4
[11] .init PROGBITS 08048294 000294 000030 00 AX 0 0 4
[12] .plt PROGBITS 080482c4 0002c4 000040 04 AX 0 0 4
[13] .text PROGBITS 08048310 000310 00018c 00 AX 0 0 16
[14] .fini PROGBITS 0804849c 00049c 00001c 00 AX 0 0 4
[15] .rodata PROGBITS 080484b8 0004b8 000018 00 A 0 0 4
[16] .eh_frame PROGBITS 080484d0 0004d0 000004 00 A 0 0 4
[17] .ctors PROGBITS 08049f14 000f14 000008 00 WA 0 0 4
[18] .dtors PROGBITS 08049f1c 000f1c 000008 00 WA 0 0 4
[19] .jcr PROGBITS 08049f24 000f24 000004 00 WA 0 0 4
[20] .dynamic DYNAMIC 08049f28 000f28 0000c8 08 WA 6 0 4
[21] .got PROGBITS 08049ff0 000ff0 000004 04 WA 0 0 4
[22] .got.plt PROGBITS 08049ff4 000ff4 000018 04 WA 0 0 4
[23] .data PROGBITS 0804a00c 00100c 00000c 00 WA 0 0 4
[24] .bss NOBITS 0804a018 001018 000008 00 WA 0 0 4
[25] .comment PROGBITS 00000000 001018 00002a 01 MS 0 0 1
[26] .shstrtab STRTAB 00000000 001042 0000ee 00 0 0 1
[27] .symtab SYMTAB 00000000 0015b8 000420 10 28 44 4
[28] .strtab STRTAB 00000000 0019d8 000206 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
So, looking through its output, one can actually structure through the ELF file, with addresses and offsets.
As one can observe from the output, all sections have a name and a type. Each type has a meaning, important ones are as follows
  • PROGBITS : This section holds data related to the program. Examples would be sections like .text, .data, etc.
  • NOTE : This section holds data which is not used by the program though. In the above output, you can observe the section “.note.gnu.build-i” as a NOTE section. It holds a build-id, which may be necessary for a particular project build maintenance this source is part of, but is not at all needed by the application.
  • SYMTAB : This section holds the symbol table. Just as an exercise, observe this section in two cases, building the executable with debug option ‘-g’ and without the debug option.
  • REL : It is in this section it holds the relocation entries.
  • NOBITS : This section is empty and holds no data.
  • STRTAB : This section would hold the string table.
  • DYNAMIC : This Section holds details regarding dynamic linking.
  • NULL : Its an inactive one and associated to no section.
After section type, it gives the address at which the section is on memory, the offset and its size.
The next ones are all flags related to linking and debugging. Although the Flags do signify certain things like
A allocatable
X executable
W writable
M mergeable
S holds null terminated strings
G member of section group
T used for thread local storage

Segments

The segments play their role in the execution image of the ELF, the same way sections are in the linking image of the ELF. Hence, while the process is running, triggered by an ELF executable, all the instructions, data, etc are held in segments. Hence, when execution is initiated, data and information from sections are moved to segments, as per a set mapping.
To view this mapping and what segments, we use
$readelf -l ./tstProgram
In our case, the output we see is
Elf file type is EXEC (Executable file)
Entry point 0x8048310
There are 8 program headers, starting at offset 52

Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x08048034 0x08048034 0x00100 0x00100 R E 0x4
INTERP 0x000134 0x08048134 0x08048134 0x00013 0x00013 R 0x1
[Requesting program interpreter: /lib/ld-linux.so.2]
LOAD 0x000000 0x08048000 0x08048000 0x004d4 0x004d4 R E 0x1000
LOAD 0x000f14 0x08049f14 0x08049f14 0x00104 0x0010c RW 0x1000
DYNAMIC 0x000f28 0x08049f28 0x08049f28 0x000c8 0x000c8 RW 0x4
NOTE 0x000148 0x08048148 0x08048148 0x00044 0x00044 R 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4
GNU_RELRO 0x000f14 0x08049f14 0x08049f14 0x000ec 0x000ec R 0x1

Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame
03 .ctors .dtors .jcr .dynamic .got .got.plt .data .bss
04 .dynamic
05 .note.ABI-tag .note.gnu.build-id
06
07 .ctors .dtors .jcr .dynamic .got
Note that, it states all the segments, its offset, virtual and physical address, etc.
Moreover, looking into the bottom half output, it mentions how sections are mapped to each segment. For Example,segment 02 i.e. LOAD is created through sections
.interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame

Symbol Resolution

After the compilation of a source code, we get an object file. There may be certain symbols in this object file which have undefined references i.e. their definition is still unknown. The symbols get resolved during linking i.e. if a function is being called, then the caller is updated with the function’s address, so that it can jump to its definition during execution. This is called symbol resolution. If due to any reason, the definition is not there, then the linker would complain.
Lets get more insight of symbol resolutions using readelf.
We would have to take a two file test program to explore symbol resolution.
NOTE: This example source code is entirely and only for this section “Symbol resolution”.
main.c
#include < stdio.h >
char toChar(int num);
int main()
{

int num = 3;
char ch;
ch = toChar(num);
printf("Char is %c \n", ch);

return 0;
}
ch.c
#include < stdio.h >
#define CONST 48
char toChar(int num)
{
char c;
c = num + 48;
return c;
}
Obtaining the object files and the final executable
$ gcc -c main.c -o main.o
$ gcc -c ch.c -o ch.o
$ gcc main.c ch.c -Wall -o main
Do you think, there would be any unresolved symbols in the object files?
readelf will help us find it out by peeking into its symbol table,
$readelf -s main.o
what do we see?
Symbol table '.symtab' contains 12 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 FILE LOCAL DEFAULT ABS main.c
2: 00000000 0 SECTION LOCAL DEFAULT 1
3: 00000000 0 SECTION LOCAL DEFAULT 3
4: 00000000 0 SECTION LOCAL DEFAULT 4
5: 00000000 0 SECTION LOCAL DEFAULT 5
6: 00000000 0 SECTION LOCAL DEFAULT 7
7: 00000000 0 SECTION LOCAL DEFAULT 8
8: 00000000 0 SECTION LOCAL DEFAULT 6
9: 00000000 62 FUNC GLOBAL DEFAULT 1 main
10: 00000000 0 NOTYPE GLOBAL DEFAULT UND toChar
11: 00000000 0 NOTYPE GLOBAL DEFAULT UND printf
The last column lets us know the name of the symbols. All the global variables and functions, even main() are being part of our program are included in the symbol table.
Notice,
    10: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND toChar
11: 00000000 0 NOTYPE GLOBAL DEFAULT UND printf
It mentions ‘UND’ before printf and toChar, and that is how it tells about the undefined symbol. Rightly stated as the standard function ‘printf()’ would be defined in the library ‘libc’, which is not yet linked and ‘toChar()’ is defined in a separate object file.
Let’s concentrate only on symbol ‘toChar’ as ‘printf’ symbol resolution would need the knowledge of dynamic linking and much more, which is beyond the scope of this article.
Now, to see the symbol table of the final executable,
$readelf -s main
and zooming in to the symbol ‘toChar’ in the symbol table.
52: 08048424    21 FUNC    GLOBAL DEFAULT   13 toChar
Yes, it is no more undefined as when the executable was created, the object file was linked to ch.o, and this object file holds the definition of ‘toChar’ and the symbol got resolved in the final executable.

Relocation

Before the linking phase, the object files are relocatable. By relocatable, we mean all the symbol references occupying relative address spaces. Hence, when the program is actually loaded on memory, those addresses would be different.
Relocation involves:
  1. Once the symbol resolution is done, the next big thing is to combining the sections of all the object files and use them to create one section for the executable. For example, all the object files would be having a .bss section, however there has to be just one .bss, combining information from all the object files.
  2. Updating all the addresses of the symbols with its load-time addresses.
Now we shall be pulling out the roots of relocation using readelf.  We’ll get back to our very own test program i.e. tstProgram.c included in section “The Usage”.
To have a look at the relocation section of the object file,
$ readelf -r tstProgram.o

Relocation section '.rel.text' at offset 0x398 contains 4 entries:
Offset Info Type Sym.Value Sym. Name
0000000a 00000801 R_386_32 00000000 d
00000011 00000901 R_386_32 00000000 N
00000022 00000501 R_386_32 00000000 .rodata
0000002e 00000b02 R_386_PC32 00000000 printf
These are the relocation entries, which majorly hold the
offset
r_info
addend
The offset is the offset at which this particular storage unit would be placed at, on which relocation needs to be applied.
The r_info, caters two purposes – one, it gives the index of the symbol, in the symbol table with respect to which, relocation is to be made.
It is computed through following macro for both 32 bit and 64 bit, which is defined in /usr/src/linux-2.6.39/include/linux/elf.h in my case.
/* The following are used with relocations */
#define ELF32_R_SYM(x) ((x) >> 8)
#define ELF64_R_SYM(i) ((i) >> 32)
Second, it gives the type of relocation.
#define ELF32_R_TYPE(x) ((x) & 0xff)
#define ELF64_R_TYPE(i) ((i) & 0xffffffff)
Picking a symbol, lets take ‘N’ from the relocation entry.
00000011  00000901 R_386_32          00000000   N
For symbol ‘N’,
offset = 0x00000011
r_info = 0x901
Offset from the start of the section is 0×11, for it which needs to be relocated.
First let’s compute which symbol does relocation go to. The index of the relocation, from the symbol table is, as computed using the macro mentioned above.
For 32 bit,
#define ELF32_R_SYM(x) ((x) >> 8)
Here ‘x’ is ‘r_info’ which is 0×901 in hex, and in binary it comes out to be
r_info = 100100000001
r_info >> 8 i.e. 100100000001 >> 8
= 1001
= 9 in decimal
Hence, we need to go to index 9 of the symbol table. How do we see the symbol table?
$readelf -s tstProgram.o

Symbol table '.symtab' contains 12 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 FILE LOCAL DEFAULT ABS tstProgram.c
2: 00000000 0 SECTION LOCAL DEFAULT 1
3: 00000000 0 SECTION LOCAL DEFAULT 3
4: 00000000 0 SECTION LOCAL DEFAULT 4
5: 00000000 0 SECTION LOCAL DEFAULT 5
6: 00000000 0 SECTION LOCAL DEFAULT 7
7: 00000000 0 SECTION LOCAL DEFAULT 6
8: 00000000 4 OBJECT GLOBAL DEFAULT 3 d
9: 00000000 4 OBJECT GLOBAL DEFAULT 5 N
10: 00000000 57 FUNC GLOBAL DEFAULT 1 main
11: 00000000 0 NOTYPE GLOBAL DEFAULT UND printf
Check out index 9 symbol entry, which is
    9: 00000000     4 OBJECT  GLOBAL DEFAULT    5 N
From here, we need to go to the relevant section, which is identified through ‘Ndx’ value. The ‘Ndx’ value is ‘5’ for symbol index ‘9’.
Ndx = 5
Further, to see, to which it needs to relocate, we need to look at the section headers.
$ readelf -S tstProgram.o
There are 11 section headers, starting at offset 0x100:

Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00000000 000034 000039 00 AX 0 0 4
[ 2] .rel.text REL 00000000 000398 000020 08 9 1 4
[ 3] .data PROGBITS 00000000 000070 000004 00 WA 0 0 4
[ 4] .bss NOBITS 00000000 000074 000000 00 WA 0 0 4
[ 5] .rodata PROGBITS 00000000 000074 000010 00 A 0 0 4
[ 6] .comment PROGBITS 00000000 000084 00002b 01 MS 0 0 1
[ 7] .note.GNU-stack PROGBITS 00000000 0000af 000000 00 0 0 1
[ 8] .shstrtab STRTAB 00000000 0000af 000051 00 0 0 1
[ 9] .symtab SYMTAB 00000000 0002b8 0000c0 10 10 8 4
[10] .strtab STRTAB 00000000 000378 00001e 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
Our ‘Ndx’ value is actually ‘Nr’ in this section header table. Hence, our relocation is to the section corresponding to section with ‘Nr’ value as ‘5’ which is .rodata
So, now, we can say, that relocation for the storage unit at offset 0×11, ’ in our program is in ELF section ‘.rodata’ at an offset 0×0 (taken from symbol table). There is a concrete way to compute the exact address, which depends on type of the relocation and the underlying architecture.
Here is one of such table listing the way of computation for Intel Architecture,
 Name | Value | Field | Calculation
R_386_NONE | 0 | none | none
R_386_32 | 1 | word32 | S + A
R_386_PC32 | 2 | word32 | S + A - P
R_386_GOT32 | 3 | word32 | G + A - P
R_386_PLT32 | 4 | word32 | L + A - P
R_386_COPY | 5 | none | none
R_386_GLOB_DAT | 6 | word32 | S
R_386_JMP_SLOT | 7 | word32 | S
R_386_RELATIVE | 8 | word32 | B + A
R_386_GOTOFF | 9 | word32 | S + A - GOT
R_386_GOTPC | 10 | word32 | GOT + A - P
Where,
S = value of symbol whose index resides in relocation
A = the addend, it is one of the adjustment variable for padding.
P = place of the storage unit which is being relocated.
GOT = Global Offset Table address
B = base address at which shared object is being loaded in memory during execution
To compute type of relocation, we need to use macro for 32 bit,
#define ELF32_R_TYPE(x) ((x) & 0xff)
that is, last one byte, which is 0×1.
For relocation type 1 and intel architecture, it uses
S + A

Conclusion

This was all about playing with readelf, to understand and imbibe the elf format. However, besides learning, readelf is really useful debugging linking issues, and many complicated issues due to intricacies in the elf. However, it is also a great tool to debug Linux kernel objects. It is one of those tools, which may be difficult to learn, but in-stores plethora of features and interesting ways to use it.
In the end, I would say, happy to learn about your experiences with readelf, how it helped you and what options did you use and in what way.

References

http://sourceware.org/binutils/docs-2.18/as/Section.html
http://www.skyfree.org/linux/references/ELF_Format.pdf
http://www.linuxjournal.com/article/6463?page=0,0
TERIMA KASIH ATAS KUNJUNGAN SAUDARA
Judul: Dive into ELF files using readelf command
Ditulis oleh Unknown
Rating Blog 5 dari 5
Semoga artikel ini bermanfaat bagi saudara. Jika ingin mengutip, baik itu sebagian atau keseluruhan dari isi artikel ini harap menyertakan link dofollow ke https://androidjapane.blogspot.com/2012/12/dive-into-elf-files-using-readelf.html. Terima kasih sudah singgah membaca artikel ini.

0 komentar:

Posting Komentar

Trik SEO Terbaru support Online Shop Baju Wanita - Original design by Bamz | Copyright of android japan.