Segmentation fault while typecasting unsigned int to pointer: Solved
TL;DR: While typecasting an unsigned int
into a pointer
, 64-bit operating systems can run into segmentation fault error due to the size difference between an int
and a pointer
. To fix this error, use uintptr_t
data type for the pointer instead of unsigned or signed int
.
While going through the typecasting section in the wonderful book by Jon Erickson, I ran into a segmentation fault error. The unique thing is, I followed the same code by Erickson’s book but my compiler gave me multiple warnings, while Erickson’s code ran successfully without warnings or errors. So why was this happening? The answer could be in the way 64-bit systems work compared to 32-bit systems. Let’s look at the code.
One thing you might notice and which might also feel off about this code is that I am using an unsigned int
datatype for the hacky_pointer
variable as opposed to just declaring it as a pointer with int *hacky_pointer
. This is because we are learning typecasting. Typecasting refers to the feature in C where we can instruct the program to temporarily treat a previously declared variable as a different datatype. For example, look at the following code from the same book.
In the above code, we are temporarily typecasting the integers a
and b
into float
datatype. This gives us a more accurate result of 2.600000
for the division of 13 by 5.
Now, back to the original problem. What we are doing in this code is declaring an unsigned int
variable hacky_pointer
. Next, we typecast the character array char_array
into an unsigned int
and assign it to hacky_pointer
. Finally, we iterate through the hacky_pointer
by first printing the memory address of this pointer with %p
then the character with %c
.
We are dereferencing the pointer with *((char *) hacky_pointer)
, thus accessing the actual data stored at this address. Finally, we increment the hacky_pointer
with the sizeof(char)
so the memory address properly increments by 1 byte and prints out the correct address and data for next iteration. Ideally, a sample result of this code should be something like this:
[hacky_pointer] points to 0x7fff3b63dc1b, which contains the character a
However, this is where the program runs into an error. When I compiled the above code, I received the following warnings:
These warnings tell me that the pointer is being cast to an integer of a different size. Although the program compiles without errors, these warnings tell me that unexpected behaviour is expected while running the program. Consequently, I ran into a segmentation fault error while executing the program.
Let’s debug the binary and see what is the problem.
I will allow the program to run until the instruction at <+70>
. Before this instruction, the program is assigning variables and initialising the for-loop. The actual problem should start after the for-loop is initialised and we enter the printf
instruction. As expected, the program runs into an error at the following instruction.
To understand why this is happening, we need to start at the very beginning and go through the instructions step-by-step.
In the first few instructions, the program is assigning values to the memory addresses. The values “a, b, c, d” are assigned as DWORD
at the address 0xd
. These values take up 4 bytes of memory, then the program assigns the final byte “e” at the address 0x9
. Now, the program assigns the integers “1, 2, 3, 4, 5”. As each integer takes up 4 bytes of memory, we can see the address moving with a difference of 4 bytes, 0x30,0x2c,0x28
and so on.
Now, the program loads the memory address of 0xd
into the register rax
. This is where the problem begins. In 64-bit systems, the register are 64-bits long. Thus, the register rax
can hold a value upto 64-bits long. The register rax
is currently holding the address to the char_array
, we can confirm this by examining the register.
So far so good? However, in the next few instructions we are setting ourselves up for a disaster. Right after loading the array address into rax
, we load the pointer into eax
.
Look closely, the value at rbp-0x8
, where we move the eax
register is actually the memory address for rax
and thus, the char_array
.
Great! We have our data and the pointer set up. But, in the very next instruction we zero out the first 4 bytes of this value. Now, the eax
register still contains the address for the char_array
, but this address is incomplete. If we try to access the memory location now we will get an error.
This is the problem our program encounters. It tries to access the pointer at an incomplete memory location and runs into an error.
Okay, but why is this even happening? The answer lies in the 64-bit architecture. We are declaring our pointer as an int
. An int
is 4 bytes long. Therefore, when the program casts the pointer
into an int
, the size is truncated to 4 bytes. However, in 64-bit architecture, the pointer
is 8 bytes long. So, when the program finally reaches the point where it has to cast the int
back into a pointer
, it runs into a segmentation fault error. A detailed explanation can be found in the following resources.
Segmentation fault in typecasting from unsigned int to char pointer
A simple way to fix this error for our scenario is to use a bigger size datatype to hold the pointer. In C, we can use the uintptr_t
data type which is sufficiently long.
uintptr_t and Other Helpful Types (Solaris 64-bit Developer’s Guide)
Let’s make the desired changes and see if our program runs successfully now.
Let’s compile and run this.
Wonderful! The error is fixed and our program is working fine. Let’s also debug this and look at the difference in instructions.
Stepping through the same instructions as before, we can see the 4 bytes are still being zeroed out but now the correct memory address is untouched.
The solution was pretty simple but it taught me a lot about the size of data types and the difference between 32-bit and 64-bit architecture. Thank you for reading, I will see you in the next post!