Assembly on OSX is quite weird. I had experience previously with Windows on IA32/x86. I learned some new things today.
On OSX x86_64, everything seems to follow this “ABI” I never heard about before… I’m not sure if this is standard for all OS on x86_64 or not, but it would make sense. However this thing makes everything a pain in the ass.
First, it specifies how parameters should be passed to functions. This would be all great but it actually specifies that integer-like (whatever that means) parameters should be passed in registers and not on the stack. There is a special order defined, first one goes into RDI, second one into RSI, etc. Of course it specifies a shitload of other rules for other kind of parameters. Now generally speaking this is good since passing arguments through registers is faster than using the stack. But it’s kind of a pain when you have to read parameters in your asm function. There are also rules about which registers must be preserved but I didn’t even bother with that yet. The other problem I had with this is that cdecl is “supposed to” pass all parameters on stack (according to wiki), but it seems the clang compiler on OS X doesn’t agree with this and just follows ABI no matter which calling convention you use. Even for variadic functions like printf, lol.
The other thing that kind of sucks is that clang on OS X only seems to support the AT&T syntax for inline assembler in C, and AT&T syntax is DISGUSTING. And it really starts to mess with your mind after a few hours of using it if you’re used to the Intel syntax, because the order of parameters is reversed. And you have to put those ugly % everwhere, yuck. At least I learned how to pass operands to asm in inline assembly, basically:
asm("mov %1, %0" :"=r"(var1):"r"(var2));
This would assign the value of var2 to var1. I really hate the syntax of this because I always want to put a comma after the assembly string, instead of the colon. Whoever thought of this kind of syntax sucks big time. It has nothing in common with C, I don’t know where he came up with the colon. Otherwise this feature works pretty good as it also automatically inserts additional ASM instructions to copy vars to registers and so on.
But then I had another problem. I had a naked function without its own stack frame (which means if you use any local variables it will mess up the caller function big time) and I wanted to execute some C code. As long as I don’t do anything that messes with the base pointer (rbp / ebp) I’m good. The problem is, how to get the value of registers and read things from stack with C code without local variables? GCC supports that you put a “register” keyword in front of your variable definition like this: register int var asm(“rax”); which on first glance seems like it binds the variable to the rax register. But unfortunately this shit is not as useful as it looks to be, it only works inside inline assembly. So it’s just like a hint “if I use var in inline assembly, put it in rax”. Even if you do basically nothing with var, the compiler will say “let’s write whatever in stack for var!” = segmentation fault, nice.
Well I found a “solution” for this. As long as I compile my code with a big enough optimization level (I only tried -O3 and it’s ok), then the compiler will basically say “ok I don’t have so many local variables, I will put them directly in registers instead of on stack to optimize”. It will make the same conclusion for parameters. So now I have code that only works if I compile it with a high optimization level. But at least I can mix C code and asm code in my naked functions without any problems.
But here comes a sad part of OSX asm. And I think the situation is similar on Linux. There is no good asm debugger GUI (maybe IDA Pro, I haven’t tried it and I don’t know if it can debug or only disasm). On Windows I always use OllyDbg and it’s the bomb. But on OSX I’m stuck with stupid gdb. It’s really lame, it doesn’t even show the current instruction, you have to issue commands to do anything. Typical dumb Linux tool… It took me ages to debug anything with this. I tried the DDD GUI which is *kind of* an improvement but it’s also really clunky compared to Olly. That’s why I never liked working with Linux, every tool makes your life a living hell (after you read a 1000 page manual first). So I really wish for something like Olly on OSX. Remember I am talking about asm debugger, not Xcode debugger for objective-C or anything like that. Maybe if I have time I will attempt to write an Olly clone in Cocoa, probably using gdb as the backend.
After a lot of screwing around I finally came to some working code. But then I changed one little parameter and I get a segfault again. I got some weird error about stack alignment. I was really lucky when googling to accidentally stumble upon a guy talking about this fucking ABI, and he mentioned that ABI requires, for some unknown reason, that the stack is 16-byte aligned when calling any external function. I don’t understand what the point of this is – 16 bytes is 128 bits. What the heck does that have to do with anything? So before I call anything external from my naked C functions I must check if stack is aligned and if not I have to push something just for the fun of it. Really stupid.
So now I am one step closer. I am trying to implement lambda-like behavior in C. I know there is C++11 which does this but I want it in C and I just need to use it for Ruby stuff. I am trying to compile Ruby bytecode (which is not that complex actually) into machine code and so I need a mechanism to handle lambdas/blocks in C. Basically I am now at the point where I can use the native stack as the Ruby stack, and lambdas can look in this stack to read local variables. Just before a function ends I copy the part of the stack with the local variables to somewhere else so that blocks can still use it. I will have to come up with a mechanism to know when this is necessary and also when I can free this memory. I could only copy what the lambda actually uses, but this would only work if I have control over the lambda which may not be the case because I plan to accept lambdas from interpreted Ruby also. In this case I will have to make the interpreter know how to access my native stack, though. There is some more fun to be had.