Segmentation fault while typecasting unsigned int to pointer: Solved

TL;DR: While typecasting an unsigned int into a pointer, 64-bit operating systems can run into segmentation fault error due to the size difference between an int and a pointer. To fix this error, use uintptr_t data type for the pointer instead of unsigned or signed int.

While going through the typecasting section in the wonderful book by Jon Erickson, I ran into a segmentation fault error. The unique thing is, I followed the same code by Erickson’s book but my compiler gave me multiple warnings, while Erickson’s code ran successfully without warnings or errors. So why was this happening? The answer could be in the way 64-bit systems work compared to 32-bit systems. Let’s look at the code.

c hackypointer.c


#include <stdio.h>

int main() {
    int i;

    char char_array[5] = {'a', 'b', 'c', 'd', 'e'};
    int int_array[5] = {1, 2, 3, 4, 5};

    unsigned int hacky_pointer;

    hacky_pointer = (unsigned int) char_array;

    for(i=0; i < 5; i++) {
        printf("[hacky_pointer] points to %p, which contains the character %c\n", hacky_pointer, *((char *) hacky_pointer));
        hacky_pointer = hacky_pointer + sizeof(char);
    }

    hacky_pointer = (unsigned int) int_array;

    for(i=0; i < 5; i++) {
        printf("[hacky_pointer] points to %p, which contains the integer %d\n", hacky_pointer, *((int *) hacky_pointer));
        hacky_pointer = hacky_pointer + sizeof(int);
    }
}

One thing you might notice and which might also feel off about this code is that I am using an unsigned int datatype for the hacky_pointer variable as opposed to just declaring it as a pointer with int *hacky_pointer. This is because we are learning typecasting. Typecasting refers to the feature in C where we can instruct the program to temporarily treat a previously declared variable as a different datatype. For example, look at the following code from the same book.

#include <stdio.h>

int main() {
    int a, b;
    
    float = c;
    
    a = 13;
    b = 5;
    
    c = (float) a / (float) b;
    printf("%f", c); // prints 2.600000
}

In the above code, we are temporarily typecasting the integers a and b into float datatype. This gives us a more accurate result of 2.600000 for the division of 13 by 5.

Now, back to the original problem. What we are doing in this code is declaring an unsigned int variable hacky_pointer. Next, we typecast the character array char_array into an unsigned int and assign it to hacky_pointer. Finally, we iterate through the hacky_pointer by first printing the memory address of this pointer with %p then the character with %c.

We are dereferencing the pointer with *((char *) hacky_pointer), thus accessing the actual data stored at this address. Finally, we increment the hacky_pointer with the sizeof(char) so the memory address properly increments by 1 byte and prints out the correct address and data for next iteration. Ideally, a sample result of this code should be something like this:

[hacky_pointer] points to 0x7fff3b63dc1b, which contains the character a

However, this is where the program runs into an error. When I compiled the above code, I received the following warnings:

$ gcc -g -o hacky_pointer hacky_pointer.c
hacky_pointer.c: In function ‘main’:
hacky_pointer.c:11:21: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
   11 |     hacky_pointer = (unsigned int) char_array;
      |                     ^
hacky_pointer.c:14:100: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
   14 |         printf("[hacky_pointer] points to %p, which contains the character %c\n", hacky_pointer, *((char *) hacky_pointer));
      |                                                                                                    ^
hacky_pointer.c:18:21: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
   18 |     hacky_pointer = (unsigned int) int_array;
      |                     ^
hacky_pointer.c:21:98: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
   21 |         printf("[hacky_pointer] points to %p, which contains the integer %d\n", hacky_pointer, *((int *) hacky_pointer));
      |  

These warnings tell me that the pointer is being cast to an integer of a different size. Although the program compiles without errors, these warnings tell me that unexpected behaviour is expected while running the program. Consequently, I ran into a segmentation fault error while executing the program.

Let’s debug the binary and see what is the problem.

$ gdb -q ./hacky_pointer 
(gdb) break main
Breakpoint 1 at 0x1141: file hacky_pointer.c, line 6.
(gdb) run
Starting program: /home/kali/Documents/c-program/hacky_pointer 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, main () at hacky_pointer.c:6
6           char char_array[5] = {'a', 'b', 'c', 'd', 'e'};
<strong>(gdb) disassemble main
</strong>Dump of assembler code for function main:
   0x0000555555555139 &#x3C;+0>:     push   rbp
   0x000055555555513a &#x3C;+1>:     mov    rbp,rsp
   0x000055555555513d &#x3C;+4>:     sub    rsp,0x30
=> 0x0000555555555141 &#x3C;+8>:     mov    DWORD PTR [rbp-0xd],0x64636261
   0x0000555555555148 &#x3C;+15>:    mov    BYTE PTR [rbp-0x9],0x65
   0x000055555555514c &#x3C;+19>:    mov    DWORD PTR [rbp-0x30],0x1
   0x0000555555555153 &#x3C;+26>:    mov    DWORD PTR [rbp-0x2c],0x2
   0x000055555555515a &#x3C;+33>:    mov    DWORD PTR [rbp-0x28],0x3
   0x0000555555555161 &#x3C;+40>:    mov    DWORD PTR [rbp-0x24],0x4
   0x0000555555555168 &#x3C;+47>:    mov    DWORD PTR [rbp-0x20],0x5
   0x000055555555516f &#x3C;+54>:    lea    rax,[rbp-0xd]
   0x0000555555555173 &#x3C;+58>:    mov    DWORD PTR [rbp-0x8],eax
   0x0000555555555176 &#x3C;+61>:    mov    DWORD PTR [rbp-0x4],0x0
   0x000055555555517d &#x3C;+68>:    jmp    0x5555555551a9 &#x3C;main+112>
   0x000055555555517f &#x3C;+70>:    mov    eax,DWORD PTR [rbp-0x8]
   0x0000555555555182 &#x3C;+73>:    movzx  eax,BYTE PTR [rax]
   0x0000555555555185 &#x3C;+76>:    movsx  edx,al
   0x0000555555555188 &#x3C;+79>:    mov    eax,DWORD PTR [rbp-0x8]
   0x000055555555518b &#x3C;+82>:    mov    esi,eax
   0x000055555555518d &#x3C;+84>:    lea    rax,[rip+0xe74]        # 0x555555556008
   0x0000555555555194 &#x3C;+91>:    mov    rdi,rax
   0x0000555555555197 &#x3C;+94>:    mov    eax,0x0
   0x000055555555519c &#x3C;+99>:    call   0x555555555030 &#x3C;printf@plt>
   0x00005555555551a1 &#x3C;+104>:   add    DWORD PTR [rbp-0x8],0x1
   0x00005555555551a5 &#x3C;+108>:   add    DWORD PTR [rbp-0x4],0x1
   0x00005555555551a9 &#x3C;+112>:   cmp    DWORD PTR [rbp-0x4],0x4
   0x00005555555551ad &#x3C;+116>:   jle    0x55555555517f &#x3C;main+70>
...SNIP...
   0x00005555555551e5 &#x3C;+172>:   cmp    DWORD PTR [rbp-0x4],0x4
   0x00005555555551e9 &#x3C;+176>:   jle    0x5555555551bf &#x3C;main+134>
   0x00005555555551eb &#x3C;+178>:   mov    eax,0x0
   0x00005555555551f0 &#x3C;+183>:   leave
   0x00005555555551f1 &#x3C;+184>:   ret
End of assembler dump.

I will allow the program to run until the instruction at <+70>. Before this instruction, the program is assigning variables and initialising the for-loop. The actual problem should start after the for-loop is initialised and we enter the printf instruction. As expected, the program runs into an error at the following instruction.

(gdb) nexti

Program received signal SIGSEGV, Segmentation fault.
main () at hacky_pointer.c:14
14              printf("[hacky_pointer] points to %p, which contains the character %c\n", hacky_pointer, *((char *) hacky_pointer));

To understand why this is happening, we need to start at the very beginning and go through the instructions step-by-step.

In the first few instructions, the program is assigning values to the memory addresses. The values “a, b, c, d” are assigned as DWORD at the address 0xd. These values take up 4 bytes of memory, then the program assigns the final byte “e” at the address 0x9. Now, the program assigns the integers “1, 2, 3, 4, 5”. As each integer takes up 4 bytes of memory, we can see the address moving with a difference of 4 bytes, 0x30,0x2c,0x28 and so on.

Now, the program loads the memory address of 0xd into the register rax. This is where the problem begins. In 64-bit systems, the register are 64-bits long. Thus, the register rax can hold a value upto 64-bits long. The register rax is currently holding the address to the char_array, we can confirm this by examining the register.

(gdb) x/8b $rbp-0xd
0x7fffffffdd83: 0x61    0x62    0x63    0x64    0x65    0xb0    0xda    0xff
(gdb) x/8b $rax
0x7fffffffdd83: 0x61    0x62    0x63    0x64    0x65    0xb0    0xda    0xff

So far so good? However, in the next few instructions we are setting ourselves up for a disaster. Right after loading the array address into rax, we load the pointer into eax.

(gdb) nexti
13          for(i=0; i < 5; i++) {
(gdb) x/8b $rbp-0x8
0x7fffffffdd88: 0x83    0xdd    0xff    0xff    0xff    0x7f    0x00    0x00

Look closely, the value at rbp-0x8, where we move the eax register is actually the memory address for rax and thus, the char_array.

(gdb) x/8b $rax
0x7fffffffdd83: 0x61    0x62    0x63    0x64    0x65    0x83    0xdd    0xff
(gdb) x/8b 0x007fffffffdd83
0x7fffffffdd83: 0x61    0x62    0x63    0x64    0x65    0x83    0xdd    0xff

Great! We have our data and the pointer set up. But, in the very next instruction we zero out the first 4 bytes of this value. Now, the eax register still contains the address for the char_array, but this address is incomplete. If we try to access the memory location now we will get an error.

(gdb) nexti
13          for(i=0; i < 5; i++) {
(gdb) x/8b $rbp-0x8
0x7fffffffdd88: 0x83    0xdd    0xff    0xff    0x00    0x00    0x00    0x00
(gdb) x/8b 0x0000ffffdd83
0xffffdd83:     Cannot access memory at address 0xffffdd83

This is the problem our program encounters. It tries to access the pointer at an incomplete memory location and runs into an error.

Okay, but why is this even happening? The answer lies in the 64-bit architecture. We are declaring our pointer as an int. An int is 4 bytes long. Therefore, when the program casts the pointer into an int, the size is truncated to 4 bytes. However, in 64-bit architecture, the pointer is 8 bytes long. So, when the program finally reaches the point where it has to cast the int back into a pointer, it runs into a segmentation fault error. A detailed explanation can be found in the following resources.

Segmentation fault in typecasting from unsigned int to char pointer

Size of pointer in C

A simple way to fix this error for our scenario is to use a bigger size datatype to hold the pointer. In C, we can use the uintptr_t data type which is sufficiently long.

uintptr_t and Other Helpful Types (Solaris 64-bit Developer’s Guide)

Let’s make the desired changes and see if our program runs successfully now.

c hackypointer.c


#include <stdio.h>
#include <stdint.h>

int main() {
    int i;

    char char_array[5] = {'a', 'b', 'c', 'd', 'e'};
    int int_array[5] = {1, 2, 3, 4, 5};

    uintptr_t hacky_pointer;        // change from unsigned int to uintptr_t

    hacky_pointer = (uintptr_t) char_array;

    for(i=0; i < 5; i++) {
        printf("[hacky_pointer] points to %p, which contains the character %c\n", hacky_pointer, *((char *) hacky_pointer));
        hacky_pointer = hacky_pointer + sizeof(char);
    }

    hacky_pointer = (uintptr_t) int_array;

    for(i=0; i < 5; i++) {
        printf("[hacky_pointer] points to %p, which contains the integer %d\n", hacky_pointer, *((int *) hacky_pointer));
        hacky_pointer = hacky_pointer + sizeof(int);
    }
}

Let’s compile and run this.

$ gcc -g -o hacky_pointer hacky_pointer.c
                                                                                                                                                                                            
$ ./hacky_pointer                        
[hacky_pointer] points to 0x7fffa19e4b2b, which contains the character a
[hacky_pointer] points to 0x7fffa19e4b2c, which contains the character b
[hacky_pointer] points to 0x7fffa19e4b2d, which contains the character c
[hacky_pointer] points to 0x7fffa19e4b2e, which contains the character d
[hacky_pointer] points to 0x7fffa19e4b2f, which contains the character e
[hacky_pointer] points to 0x7fffa19e4b10, which contains the integer 1
[hacky_pointer] points to 0x7fffa19e4b14, which contains the integer 2
[hacky_pointer] points to 0x7fffa19e4b18, which contains the integer 3
[hacky_pointer] points to 0x7fffa19e4b1c, which contains the integer 4
[hacky_pointer] points to 0x7fffa19e4b20, which contains the integer 5

Wonderful! The error is fixed and our program is working fine. Let’s also debug this and look at the difference in instructions.

(gdb) break main
Breakpoint 1 at 0x1141: file hacky_pointer.c, line 7.
(gdb) run
Starting program: /home/kali/Documents/c-program/hacky_pointer 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, main () at hacky_pointer.c:7
7           char char_array[5] = {'a', 'b', 'c', 'd', 'e'};
(gdb) disassemble main
Dump of assembler code for function main:
   0x0000555555555139 <+0>:     push   rbp
   0x000055555555513a <+1>:     mov    rbp,rsp
   0x000055555555513d <+4>:     sub    rsp,0x30
=> 0x0000555555555141 <+8>:     mov    DWORD PTR [rbp-0x15],0x64636261
   0x0000555555555148 <+15>:    mov    BYTE PTR [rbp-0x11],0x65
   0x000055555555514c <+19>:    mov    DWORD PTR [rbp-0x30],0x1
   0x0000555555555153 <+26>:    mov    DWORD PTR [rbp-0x2c],0x2
   0x000055555555515a <+33>:    mov    DWORD PTR [rbp-0x28],0x3
   0x0000555555555161 <+40>:    mov    DWORD PTR [rbp-0x24],0x4
   0x0000555555555168 <+47>:    mov    DWORD PTR [rbp-0x20],0x5
   0x000055555555516f <+54>:    lea    rax,[rbp-0x15]
   0x0000555555555173 <+58>:    mov    QWORD PTR [rbp-0x10],rax
   0x0000555555555177 <+62>:    mov    DWORD PTR [rbp-0x4],0x0
   0x000055555555517e <+69>:    jmp    0x5555555551ae <main+117>
   0x0000555555555180 <+71>:    mov    rax,QWORD PTR [rbp-0x10]
   0x0000555555555184 <+75>:    movzx  eax,BYTE PTR [rax]
   0x0000555555555187 <+78>:    movsx  edx,al
   0x000055555555518a <+81>:    mov    rax,QWORD PTR [rbp-0x10]
   0x000055555555518e <+85>:    mov    rsi,rax
   0x0000555555555191 <+88>:    lea    rax,[rip+0xe70]        # 0x555555556008
   0x0000555555555198 <+95>:    mov    rdi,rax
   0x000055555555519b <+98>:    mov    eax,0x0
   0x00005555555551a0 <+103>:   call   0x555555555030 <printf@plt>
...SNIP...
   0x00005555555551e1 <+168>:   call   0x555555555030 <printf@plt>
   0x00005555555551e6 <+173>:   add    QWORD PTR [rbp-0x10],0x4
   0x00005555555551eb <+178>:   add    DWORD PTR [rbp-0x4],0x1
   0x00005555555551ef <+182>:   cmp    DWORD PTR [rbp-0x4],0x4
   0x00005555555551f3 <+186>:   jle    0x5555555551c5 <main+140>
   0x00005555555551f5 <+188>:   mov    eax,0x0
   0x00005555555551fa <+193>:   leave
   0x00005555555551fb <+194>:   ret
End of assembler dump.

Stepping through the same instructions as before, we can see the 4 bytes are still being zeroed out but now the correct memory address is untouched.

(gdb) x/8b $rax
0x7fffffffdd7b: 0x61    0x62    0x63    0x64    0x65    0x7b    0xdd    0xff
(gdb) x/8b $rbp-0x10
0x7fffffffdd80: 0x7b    0xdd    0xff    0xff    0xff    0x7f    0x00    0x00

The solution was pretty simple but it taught me a lot about the size of data types and the difference between 32-bit and 64-bit architecture. Thank you for reading, I will see you in the next post!