-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IBM z Systems port of ocamlopt #275
Conversation
Import of Bill O'Farrell's port, rebased from 4.02.1 to trunk.
Plus a few simplifications: - emit.mlp: assume target is ELF - proc.ml & emit.mlp: use asm register names for 'register_name'
Using the low bit of return addresses to mark already-scanned stack frames improves GC time on architectures that ignore this bit in 'return' instructions, like Power. Otherwise, as is the case for zSystem, clearing up this bit before every 'return' instruction costs too much in running time. asmrun/stack.h: turn off the marking of return addresses for z asmcomp/s390x/emit.mlp: suppress clearing of low bit of return addresses
- Ibased addressing is removed. The code generated for an Ibased load/store is no better than the code we generate for an Iindexed load/store preceded by a Iconst_symbol instruction that loads the address of the global variable. Plus, we now get opportunities for CSE of the Iconst_symbol. - Iindexed2 addressing is extended with a constant displacement, to take full advantage of the ofs(%r1, %r2) addressing mode of the processor. - During selection instruction, make sure that the constant displacement of Iindexed and Iindexed2 is within range (20 bit signed).
Instead, do the binary64->binary32 conversion before, and use emit_load_store with %f15 as source register.
Taking a leaf from recent versions of GCC, in PIC mode, we use a PC-relative load with GOTENT relocation to access the global offset table. This way, we don't have to save, setup and reload %r12 as GOT pointer in every function.
Following the previous commit, %r12 becomes usable as a normal register. However it must be saved in caml_call_gc. Independently: change Proc.loc_external_arguments to account for the 160 reserved bytes at bottom of stack. Then, caml_c_call and emission of code for Iextcall(false) no longer need to account for those reserved bytes.
In emit.mlp, write %rN and %fN directly in `...` strings, instead of going through emit_gpr and emit_fpr. Justification: for other ports like Power, several concrete asm syntaxes for register names exist, so it makes sense to abstract over them. This is not the case for z systems under Linux. Plus, using the concrete syntax directly makes it easier to review emit.mlp.
New function emit_stack_adjust, which chooses the shortest instruction that performs the required adjustment. Later, this will be a good place to put cfi_adjust directives.
Move the cold path (the one that calls the GC when alloc_ptr < alloc_limit) as much as possible to the end of the function. Use la and lay to produce shorter code.
lghi is 4 bytes, lgfi is 6.
…anches This saves one instruction. These cond branches are heavily used by pattern-matching compilation.
This avoids testing the "< 0" case separately.
Use la/lay when possible for add immediate and sub immediate, because these instructions support the case result <> argument. Use 'and/or/xor immediate over low 32 bits' instructions. Do this only if the top 32 bits of the constant are 0 (or/xor) or -1 (and).
In PIC mode, Itailcall_imm should jumpt to the PLT of the called function. Also: use %r7 rather than %r1 to pass the function pointer argument to caml_c_call. It can be that caml_c_call is in a different shared object than the caller. In this case, %r0 and %r1 can be destroyed by PLT stub code, according to the ELF ABI.
To reflect the changes to caml_c_call from commit cc9c12d
Without the special reloading implemented here, a 2-address instruction such as x := x + y' could be reloaded as 'x1 := x2 + y' with two different temporaries x1, x2 for x.
- Test multiply-high - Less verbose output - Use our own PRNG instead of rand()
Following Hacker's Delight section 8.3.
The locgr instruction is not available in z10, the baseline for this port. Instead, generate pedestrian code with a conditional branch. Pass -march=z10 to the assembler to enforce z10 compliance.
The latencies are based on wild guesses for the z10. Since newer z processors are out-of-order, basic-block scheduling could also be turned off entirely.
House rules.
Resolved conflicts: testsuite/tests/asmcomp/mainarith.c
IBM z Systems port of ocamlopt This is a port of the OCaml native-code compiler to IBM's z Systems architecture under Linux. z Systems (https://en.wikipedia.org/wiki/IBM_System_z), also known as "s390x" in the GNU/Linux world, is IBM's line of mainframe computers. They are supported by several Linux distributions: RHEL, Suse, Debian. The OCaml port was developed by Bill O'Farrell at IBM Toronto, with help from Tristan Amini, based on OCaml 4.02.1. I upgraded the port to the current OCaml trunk and performed some simplifiications and fixes. A CLA was signed to cover the reuse of Bill O'Farrell's code.
@xavierleroy Could you change the code to restore the allocation pointer when calling the minor GC? This is needed for #297. If I attempt it myself, I'll probably get it wrong. |
Here is the diff for young_ptr-restoring allocation. I did not test but it's pretty straightforward. You can either integrate it in your GPR#297 or ask me to integrate it once GPR#297 is merged. |
FTR: #297 was changed, so it is not necessary to restore |
Fix lazy behaviour for multicore
This is a port of the OCaml native-code compiler to IBM's z Systems architecture under Linux.
z Systems (https://en.wikipedia.org/wiki/IBM_System_z), also known as "s390x" in the GNU/Linux world, is IBM's line of mainframe computers. They are supported by several Linux distributions: RHEL, Suse, Debian.
The OCaml port was developed by Bill O'Farrell at IBM Toronto, with help from Tristan Amini, based on OCaml 4.02.1. I upgraded the port to the current OCaml trunk and performed some simplifiications and fixes. A CLA was signed to cover the reuse of Bill O'Farrell's code.
The test suite passes and the .opt compilers are working. I did not test further.
Concerning future maintenance, I have (time-limited) access to a z VM via IBM's LinuxOne program. Bill O'Farrell agreed to help with testing and maintenance. Finally, it is also possible to run z Linux on a PC through the Hercules emulator (http://www.josefsipek.net/docs/s390-linux/hercules-s390.html), even though it is very slow and there are some problems with floating-point instructions.