"Doors" in Solaris: Lightweight RPC Using File Descriptors (1996)

monocasa · 2024-07-24T11:21:14 1721820074

Writing a top level comment here to hopefully address some of the misconceptions present across this comment section.

Doors at the end of the day aren't message passing based RPC. There is no door_recv(2) syscall or equivalent nor any way for a thread pool in the callee to wait for requests.

Doors at the end of the day are a control transfer primitive. In a very real sense the calling thread is simply transferred to the callee's address space and continues execution there until a door_return(2) syscall transfers it back into the caller address_space.

It truly is a 'door' into another address space.

This is most similar to some of the CPU control transfer primitives. It's most like task gate style constructs like seen on 286/i432/mill CPUs. Arguably it's kind of like the system call instruction itself too, transferring execution directly to another address space/context.

wang_li · 2024-07-24T16:57:43 1721840263

>Doors at the end of the day are a control transfer primitive. In a very real sense the calling thread is simply transferred to the callee's address space and continues execution there until a door_return(2) syscall transfers it back into the caller address_space.

Your phrasing is misleading. A door is a scheduler operation. What would it even mean for a thread to go from a caller to a callee? They are different processes with completely different contexts on the system. Different owners, different address spaces, etc. What the door is doing is passing time on a CPU core from the calling thread to the receiving thread and back.

starspangled · 2024-07-24T11:52:40 1721821960

Maybe you know more about it I don't want to say you're wrong because the only thing I know of it is reading the page linked to. But that page semes to disagree with you (EDIT: sorry I don't know how to best format quoting the article vs quoting your reply):

    Doors Implementation

    [...] the default server threads that are created in response to incoming door requests on the server.

    A new synchronization object, called a shuttle [...] allowing a direct scheduler hand-off operation between two threads.

    [...] a shuttle marks the current thread as sleeping, marks the server thread as running, and passes control directly to the server thread.

    Server Threads

    Server threads are normally created on demand by the doors library [...]
    The first server thread is created automatically when the server issues a door create call.
    [...] Once created, a server thread will place itself in a pool of available threads and wait for a door invocation.

There would be no need for these server threads if the client thread transferred directly to the server process address space.

> There is no door_recv(2) syscall or equivalent nor any way for a thread pool in the callee to wait for requests

It says the thread pool on the server is created by the doors library when the server creates a door. So the process of receiving and processing requests would be carried out internally within the doors library, so there would be no need for the server application to have an API for the server application to accept requests, it's handled by the library.

At least that's what is described in the link, AFAIKS. It's only a conceptual door, underneath it is implemented in some message passing style maybe with some extra sugar for performance and process control nicety with this "shuddle" thing for making the requests.

dan353hehe · 2024-07-24T14:16:10 1721830570

I did some debugging about 8 years ago in SmartOS. There were some processing that were running for way too long without using the cpu. I ended up using kdbg to trace where the processes were hung. I had to jump between different processes to trace the door calls. There was no unified stack in one process that spanned multiple address spaces, and the calling threads were paused.

So, yes. The documentation in right, the calling thread is paused while a different one is used one the other side of the door.

netbsdusers · 2024-07-24T20:27:36 1721852856

Kernel and user threads are distinct on Solaris. The implementation of doors is not by message passing. There needs to be a userland thread context in the server process with a thread control block and all the other things that the userland threading system depends on to provide things like thread-local variables (which include such essentials as errno). I do not know whether there is a full kernel thread (or LWP as they call it in Solaris), but a thread is primarily a schedulable entity, and if the transfer of control is such that the activated server thread is treated in the same way in terms of scheduling as the requesting client thread, then effectively it is still what it says it is.

fanf2 · 2024-07-24T12:09:02 1721822942

> There would be no need for these server threads if the client thread transferred directly to the server process address space.

The client’s stack isn’t present in the server’s address space: the server needs a pool of threads so that a door call does not need to allocate memory for the server stack.

starspangled · 2024-07-24T12:50:25 1721825425

It wouldn't need a pool of threads for a stack, it just needs some memory in the server's address space for the stack.

Together with the other points about marking the current thread sleeping and passing control to the server thread, and about server threads waiting for an invocation, I think what is described is pretty clearly the server threads are being used to execute code on the server on behalf of the client request.

kragen · 2024-07-24T12:53:36 1721825616

can you figure out how this mechanism works? my notes in https://news.ycombinator.com/item?id=41056051 reflect me rooting around and giving up; it's nonobvious

kragen · 2024-07-24T23:34:00 1721864040

wahern did figure it out

kragen · 2024-07-24T11:58:54 1721822334

this is very confusing and now i want to see truss output

man pages like https://docs.oracle.com/cd/E88353_01/html/E37843/door-server... reference literally zero man pages in section 2, so i wonder if there is in fact a door_recv system call and it just isn't documented?

but yeah it sure seems like there's a thread pool of server threads (full-fledged posix threads with their own signal masks and everything) that sit around waiting for door calls

> The door_server_create() function allows control over the creation of server threads needed for door invocations. The procedure create_proc is called every time the available server thread pool is depleted. In the case of private server pools associated with a door (see the DOOR_PRIVATE attribute in door_create()), information on which pool is depleted is passed to the create function in the form of a door_info_t structure. The di_proc and di_data members of the door_info_t structure can be used as a door identifier associated with the depleted pool. The create_proc procedure may limit the number of server threads created and may also create server threads with appropriate attributes (stack size, thread-specific data, POSIX thread cancellation, signal mask, scheduling attributes, and so forth) for use with door invocations.

<https://docs.oracle.com/cd/E88353_01/html/E37843/door-server...>

apparently things like door_create() survived into opensolaris and so they are presumably open source now? even if under the cddl

<https://www.unix.com/man-page/opensolaris/3c/door_create/>

/me git clone https://github.com/kofemann/opensolaris

jesus fuck, 1.4 gigabytes? fuck you very much fw_lpe11002.h

okay so usr/src/lib/libc/port/threads/door_calls.c says the 'raw system call interfaces' are __door_create, __door_return, __door_ucred, __door_unref, and __door_unbind, which, yes, do seem to be undocumented. they seem to have been renamed in current illumos https://github.com/illumos/illumos-gate/blob/master/usr/src/...

unfortunately it's not obvious to me how to find the kernel implementation of the system call here, which would seem to be almost the only resort when it isn't documented? i guess i can look at how it's used

__door_create in particular is called with the function pointer and the cookie, and that's all door_create_cmn does with the function pointer; it doesn't, for example, stash it in a struct so that a function internal to door_calls.c can call it in a loop after blocking on some sort of __door_recv() syscall (which, as i said, doesn't exist)

it does have a struct privdoor_data under some circumstances; it just doesn't contain the callback

i don't know, i've skimmed all of door_calls.c and am still not clear on how these threads wait to be invoked

aha, the kernel implementation is in usr/src/uts/common/fs/doorfs/door_sys.c. door_create_common stashes the function pointer in the door_pc of a door_node_t. then door_server_dispatch builds a new thread stack and stashes the door_pc on it as the di_proc of a door_info_t starting at line 1284: https://github.com/kofemann/opensolaris/blob/master/usr/src/...

this seems to be a structure with layout shared between the kernel and userspace (mentioned in the man page i quoted above): https://github.com/illumos/illumos-gate/blob/master/usr/src/...

...but then door_calls.c never uses di_proc! so i'm still mystified as to how the callback function you pass to door_create ever gets called

probably one hour diving into solaris source code is enough for me for this morning, though it's very pleasantly formatted and cleanly structured and greppable. does anybody else know how this works and how they got those awesome numbers? is door_call web scale?

wahern · 2024-07-24T19:49:07 1721850547

The callback address is loaded and invoked here: https://github.com/illumos/illumos-gate/blob/5d9d909/usr/src... It's why door_server_dispatch copies out door_results to the server thread stack.

The aha moment was when I realized that the door_return(2) syscall is how threads yield and wait to service the next request. In retrospect it make sense, but I didn't see it until I tried to figure out how a user space thread polled for requests--a thread first calls door_bind, which associates it to the private pool, and then calls door_return with empty arguments to wait for the initial call. (See example door_bind at https://github.com/illumos/illumos-gate/blob/4a38094/usr/src... and door_return at https://github.com/illumos/illumos-gate/blob/4a38094/usr/src...)

One of the confusing aspects for me was that there's both a global pool of threads and "private" pool of threads. By default threads are pulled from the global pool to service requests, but if you specify DOOR_PRIVATE to door_create, it uses the private pool bound to the door.

AFAICT, this is conceptually a typical worker thread pooling implementation, with condition variables, etc for waking and signaling. (See door_get_server at https://github.com/illumos/illumos-gate/blob/915894e/usr/src...) Context switching does seem to be optimized so door_call doesn't need to bounce through the system scheduler. (See shuttle_resume at https://github.com/illumos/illumos-gate/blob/2d6eb4a/usr/src...) And door_call/door_return data is copied across caller and callee address spaces like you'd expect, except the door_return magic permits the kernel to copy the data to the stack, adjusting the stack pointer before resuming the thread and resetting it when it returns. (See door.S above and door_args at https://github.com/illumos/illumos-gate/blob/915894e/usr/src...)

This works just as one might expect: no real magic except for the direct thread-thread context switching. But that's a similar capability provided by user space scheduler activations in NetBSD, or the switchto proposal for Linux. The nomenclature is just different.

It's a slightly different story for in-kernel doors, but that's not surprising, either, and there's nothing surprsing there, AFAICT (but I didn't trace it as much).

Thanks for finding those source code links. I cloned the repo and started from there, grep'ing the whole tree to find the user space wrappers, etc.

kragen · 2024-07-24T20:01:06 1721851266

aha! i should have looked in the assembly. in opensolaris it's opensolaris/usr/src/lib/libc/amd64/sys/door.s line 121; as far as i can see the file hasn't changed at all except for being renamed to door.S. and of course the assembly doesn't call the struct field di_proc, it defines a struct field address macro

so door_return (or, in opensolaris, __door_return) is the elusive door_recv syscall, eh?

(i'll follow the rest of your links later to read the code)

i wonder if spring (which was never released afaik) used a different implementation, maybe a more efficient one

this is strongly reminiscent of ipc in liedtke's l4 (which doesn't implicitly create threads) or in keykos (which only has one thread in a domain)

thank you so much for figuring this out!

wahern · 2024-07-24T21:54:49 1721858089

The one thing I wasn't sure about was how the per-process global thread pool for non-private doors was populated. I missed it earlier (wasn't looking closely enough, and the hacks to handle forkall headaches make it look more complicated than it is), but the user space door_create wrapper code--specifically door_create_cmn--invokes door_create_server (through the global door_server_func pointer) on the first non-private door_create call. door_create_server creates a single thread which then calls door_return. On wakeup, before calling the application's handler function, the door_return assembly wrapper conditionally invokes the global door_depletion_cb (defined back in door_calls.c) which can spin up additional threads.

The more I read, the more Doors seems pretty clever. Yes, the secret sauce is very much like scheduler activations or Google's switchto, but you can pass data at the same time, making it useful across processes, not just threads sharing an address space. And you can't get the performance benefit merely by using AF_UNIX sockets because the door_call and door_return have an association for the pendency of the call. The tricky part from an API perspective isn't switching the time slice to the server thread, it's switching it back to the client thread on the return.

Microkernels can do this, of course, but figuring out a set of primitives and an API that fit into the Unix model without introducing anything too foreign, like Mach Ports in macOS, does take some consideration. Now that Linux has process file descriptors, maybe they could be used as an ancillary control message token to get the round-trip time slice and data delivery over AF_UNIX sockets. And memfd objects could be used for messages larger than the receiver's buffer, which seems to be a niche case, anyhow. Or maybe that's over complicating things.

kragen · 2024-07-24T23:32:56 1721863976

yeah, i've been thinking of doing something like your memfd thing for intranin, the bubbleos kernel, which is still vaporware: transmit one or more pages from the sender to the receiver as part of the ipc, so they disappear from the sender's address space and appear in the receiver's address space. my theory is that, that way, you get the efficiency of shared memory (mostly, at least) without the security risks and debugging headaches. in a repetitive client/server kind of relationship, you can pass the same pages back and forth, so it's effectively just shared memory with exclusive locking, and it extends smoothly to zero-copy networking across hosts, at least if the network message size isn't small compared to a page (you don't want to have to zero a 4096-byte page in order to deliver it up to a user process with a 153-byte message in it)

obviously this is the opposite extreme from 'fit into the unix model without introducing anything too foreign' ;) and it may turn out that it's not as good an idea as it seems once i try it. but while some aspects of the unix model are very good, i suspect others were more meritorious on non-networked pdp-11

with respect to starting up a thread to run a callback registered for a message port, i feel like it could be done with a great deal less code than this, though maybe i'm being too optimistic about keeping messy real-world complexity out. in einkornix, which i've only tested as a user-level threading library under linux in emulation, my function for starting up such a thread is einspawn(), which is called (like door_create()) with a callback function and a userdata argument. __door_create() also takes a flags argument; einspawn() instead takes a pointer to the task struct it will finish initializing. it's ten instructions long: http://canonical.org/~kragen/sw/dev3/einkornix.S

cryptonector · 2024-07-25T17:29:57 1721928597

A much better description is that doors is RPC-like but with priority and CPU quantum transfer to the server thread.

This works great as long as the server thread isn't itself doing slow I/O. The moment the server thread does slow I/O you want to get back to async I/O, and the whole point of doors is that it's synchronous because only by being synchronous do you get to transfer priority and CPU quantum from the client to the server. For async RPC-like schemes you cannot transfer priority and CPU quantum to the server because the client will continue executing with that same CPU quantum (that being implied by "async").

Also, this really only works locally. For remote services the server could get priority metadata from the client's authentication credentials, but that's it.

ajross · 2024-07-24T17:11:04 1721841064

> Doors at the end of the day are a control transfer primitive.

This is true, but isomorphic to RPC, so I really don't think I understand the distinction you're trying to draw. A "procedure call" halts the "caller", passes "arguments" into the call, and "returns a value" when it completes. And that's what a door call does.

Nothing about "Remote Procedure Calls" requires thread pools or a recv() system call, or an idea of "waiting for a request". Those are all just implementation details of existing RPC mechanisms implemented without a send+recv call like doors.

Nor FWIW are doors alone in this space. Android Binder is effectively the same metaphor, just with a more opinionated idea of how the kernel should manage the discovery and dispatch layer (Solaris just gives you a descriptor you can call and lets userspace figure it out).

klodolph · 2024-07-24T17:15:53 1721841353

I thought the parent comment was drawing a very clear distinction—that doors do not involve message passing, but a different mechanism.

If it were a message passing system, you would be able to save the message to disk and inspect it later, or transmit the message over the network, etc.

ajross · 2024-07-24T17:45:10 1721843110

Is this a pedantic argument about a specific definition for "message passing"? If not, then it's wrong: door_call() allows passing an arbitrary/opaque buffer of data into the call, and returning arbitrary data as part of the return. (Maybe the confusion stems from the fact that the sample code in the linked article skipped the second argument to door_call(), but you can see it documented in the man page).

If so, then I retreat to the argument above: there is nothing about "Remote Procedure Call" as a metaphor that requires "message passing". That, again, is just an implementation detail of other RPC mechanisms that don't implement the call as directly as doors (or binder) do.

klodolph · 2024-07-24T19:32:51 1721849571

> Is this a pedantic argument about a specific definition for "message passing"?

No. This is just colloquial usage of the term “message passing”.

Yes, you can use doors to pass messages. That is not the only thing you can do.

> If so, then I retreat to the argument above: there is nothing about "Remote Procedure Call" as a metaphor that requires "message passing".

Yeah. Everyone else here agrees with that. The original comment you replied to said, “Doors at the end of the day aren't message passing based RPC.” This absolutely indicates that the poster agrees with you on that point—that not all forms of RPC are based on a message passing system. There are other forms of RPC, and this is one of them.

Ultimately, I think you can conceive of “message passing” broadly enough that, well, everything is message passing. What is a CPU register but something that receives a message and later re-transmits it? I would rather think of the colloquial use of “message passing”.

Likewise, you can probably dig into doors and find some message system somewhere in the implementation. Or at least, you’ll find something you can call a message.

kragen · 2024-07-24T11:47:28 1721821648

what does the stack look like in the callee's address space? is it a stack that previously existed (you seem to be saying it isn't) or a new one created on demand?

maybe https://docs.oracle.com/cd/E88353_01/html/E37843/door-create... is where i should start reading...

robinhouston · 2024-07-24T08:09:26 1721808566

Back in 1998 or so, a colleague and I were tasked with building a system to target adverts to particular users on a website. It obviously needed to run very efficiently, because it would be invoked on every page load and the server hardware of the time was very underpowered by today’s standards.

The Linux server revolution was still a few years away (at least it was for us – perhaps we were behind the curve), so our webservers were all Sun Enterprise machines running Solaris.

We decided to use a server process that had an in-memory representation of the active users, which we would query using doors from a custom Apache module. (I had read about doors and thought they sounded cool, but neither of us had used them before.)

It worked brilliantly and was extremely fast, though in the end it was never used in production because of changing business priorities.

codetrotter · 2024-07-24T08:34:27 1721810067

Solaris was ahead of its time. They had Zones, which was a container technology, years ago. Likewise, FreeBSD way ahead with Jails. Only later did Docker on Linux come into existence and then containerization exploded in popularity.

layer8 · 2024-07-24T09:06:26 1721811986

In the early 2000s I rented a managed VPS based on FreeBSD jails. This was nice because you were root in your jail while the distribution was managed for you (security updates in particular), and you could still override any system files you wanted via the union file system. It was like the best of both worlds between a managed Unix account and a VPS where you can do anything you want as root.

bionsystem · 2024-07-24T09:26:08 1721813168

AND you are running bare metal, without the virtualization layer overhead, which is quite nice especially in the early 2000s :)

My first project as a junior was to deploy Solaris Zones to replace a bunch of old SPARC machines. It's such a great tech and was a fun project.

stormking · 2024-07-24T09:44:04 1721814244

Well, Docker did a few things differently then these older approaches. First, the one-process-per-container mantra, second the layered file system and build process and third, the "wireing together" of multiple containers via exposed ports. Zones and Jails were more like LXC and were mostly used as a "lightweight virtual machine".

hnlmorg · 2024-07-24T09:31:04 1721813464

Linux had a few different containerisation technologies before Docker blew up. But even they felt like a step backwards from Jails and Zones.

Personally I think its more fair to say that Linux was behind the times because containerisation and virtualisation were common concepts in UNIXes long before they became Linux staples.

pjmlp · 2024-07-24T10:59:09 1721818749

While Solaris was/is one of my favourite UNIXes, back in 1999, HP-UX already had HP Vaults, which keeps being forgotten when Solaris Zones are pointed out.

dmd · 2024-07-24T11:01:14 1721818874

We all try to do our best to forget HP-UX. I worked on it at Bristol-Myers Squibb in 1999 and hated every minute of it.

pjmlp · 2024-07-24T11:05:05 1721819105

How much was HP-UX, and how much was the job itself?

dmd · 2024-07-24T11:18:00 1721819880

40 60

ilikejam · 2024-07-24T12:02:46 1721822566

I've said it before, and I'll never stop saying it.

Zones were awful.

Solaris had all of these great ideas, but the OS was an absolute pain in the ass to admin. And I say that from both sides of the SWAN firewall.

EvanAnderson · 2024-07-24T12:14:49 1721823289

My only Solaris experience is using SmartOS. I find Zones there to be fairly easy to deal with. Is this just because SmartOS is papering-over the awfulness?

4ad · 2024-07-24T10:43:59 1721817839

And ZFS, and DTrace.

rbanffy · 2024-07-24T12:18:27 1721823507

Crossbow as well.

wang_li · 2024-07-24T17:20:14 1721841614

And STMF.

panick21_ · 2024-07-24T10:23:38 1721816618

I highly recommend this talk, talking about Jails and Zones. Jails was first, but Zones took a lot of lessens from it and went further.

https://www.youtube.com/watch?v=hgN8pCMLI2U

dprice1 · 2024-07-24T18:34:45 1721846085

Rereading the zones paper now makes me cringe, but I was in my 20s, what can I say. I think the argument we made that holds up is that this was designed to be a technology for server consolidation, and the opening section sets some context about how primitive things were in Sun's enterprise customer base at the time.

I have a lot of admiration for what Docker dared to do-- to really think differently about the problem in a way which changed application deployment for everyone.

Also I can tell you at the time that we were not especially concerned about HP or IBM's solutions in this space; nor did we face those container solutions competitively in any sales situation that I can recall. This tech was fielded in the wake of the dot-com blowout-- and customers had huge estates of servers often at comically low utilization. So this was a good opportunity for Sun to say "We are aligned with your desire to get maximum value out of the hardware you already have."

It's a blast to see this come up from time to time on HN, thanks.

panick21_ · 2024-07-25T13:59:55 1721915995

> Rereading the zones paper now makes me cringe, but I was in my 20s, what can I say.

Still a nice paper!

> to really think differently about the problem in a way which changed application deployment for everyone.

Could that also have been done with Zones? In terms of the developer experience?

Seems to me Docker just thought about the develop-and-deploy pipeline differently.

solarengineer · 2024-07-27T12:08:57 1722082137

I had done the packaging work (like what Docker has done) as part of experiments with Belenix. I used puppet and rpm5 (that I had ported to Belenix) to ensure idempotency, and used ZFS snapshots for the layers. Unfortunately, life happened and I did not take the work live.

panick21_ · 2024-07-31T09:42:11 1722418931

Interesting, never heard of BeleniX. Funny how live turns sometimes. Docker was open-sourced because a PAAS company almost went bust and exploded.

dprice1 · 2024-07-25T18:32:19 1721932339

Yes, technology wise, I think it could have been done; once I left Sun/Oracle I stopped paying attention, so I can't speak to what else was done later.

solarengineer · 2024-07-27T12:10:25 1722082225

Please take some pride in your work. I ended up DTracing Zones and I find your ideas to be excellent.

pjmlp · 2024-07-24T11:00:05 1721818805

Historically, HP-UX 10 Vaults might have been there first.

kragen · 2024-07-24T11:50:50 1721821850

probably lynn wheeler will tell you cp/67 was there first: https://www.garlic.com/~lynn/2003d.html#72

p_l · 2024-07-24T13:23:34 1721827414

CP-67 was a hypervisor, a different model altogether to the chroot/jails/zones/linux namespace evolution, the sidequest of HP Vaults and various workload partitioning schemes on systems like AIX, or the grand-daddy of chroot line, Plan 9 namespace system.

kragen · 2024-07-24T13:25:15 1721827515

yes, i agree, except that doesn't chroot predate plan9 by almost a decade?

p_l · 2024-07-24T13:30:37 1721827837

Thank you, somehow I missed the part of the history! Yes indeed, chroot() starts in 1979 patches ontop of V7 kernel. Plan9 namespaces probably evolved from there as Research Unix V8 was BSD based.

kragen · 2024-07-24T14:06:39 1721829999

thanks! i didn't have any idea 8th edition was bsd-based

pjmlp · 2024-07-24T12:46:01 1721825161

Certainly, I was only focusing on the UNIX side of the history.

kragen · 2024-07-24T12:54:04 1721825644

understandable!

panick21_ · 2024-07-24T13:09:49 1721826589

From what I have read, it was nowhere near as comprehensive and integrated. I only read about them once, so I don't really know.

pjmlp · 2024-07-24T13:58:59 1721829539

I would say it was easy enough for traditional UNIX admins.

We used it for our HP-UX customers in telecoms, using our CRM software stack, based on Apache/Tcl/C.

jiveturkey · 2024-07-24T19:22:31 1721848951

Solaris is still ahead of its time on many things. I wish it were worth porting eBPF to Solaris.

jeffbee · 2024-07-24T23:52:49 1721865169

Still the only Unix where SMB works.

cryptonector · 2024-07-25T17:33:19 1721928799

> I wish it were worth porting eBPF to Solaris.

What on Earth for? Solaris has DTrace.

ahoka · 2024-07-24T12:31:17 1721824277

cgroups came only three years after zones, not a huge difference.

lukeh · 2024-07-24T11:08:04 1721819284

1998 paper on Linux implementation: http://www.rampant.org/doors/linux-doors.pdf

oecumena · 2024-07-24T14:43:47 1721832227

I did another one back in 2001: https://ldoor.sourceforge.net/ .

heinrichhartman · 2024-07-24T06:32:19 1721802739

> Conceptually, during a door invocation the client thread that issues the door procedure call migrates to the server process associated with the door, and starts executing the procedure while in the address space of the server. When the service procedure is finished, a door return operation is performed and the thread migrates back to the client's address space with the results, if any, from the procedure call.

Note that Server/Client refer to threads on the same machine.

While I can see performance benefits of this approach, over traditional IPC (sockets, shared memory), this "opens the door" for potentially worse concurrency headaches you have with threads you spawn and control yourself.

Has anyone here hands-on experience with these and can comment on how well this worked in practice?

wahern · 2024-07-24T08:42:54 1721810574

IIUC, what they mean by "migrate" is the client thread is paused and the server thread given the remainder of the time slice, similar to how pipe(2) originally worked in Unix and even, I think, early Linux. It's the flow of control that "conceptually" shifts synchronously. This can provide surprising performance benefits in alot of RPC scenarios, though less now as TLB, etc, flushing as part of a context switch has become more costly. There are no VM shenanigans except for some page mapping optimizations for passing large chunks of data, which apparently wasn't even implemented in the original Solaris implementation.

The kernel can spin up a thread on the server side, but this works just like common thread pool libraries, and I'm not sure the kernel has any special role here except to optimize context switching when there's no spare thread to service an incoming request and a new thread needs to be created. With a purely userspace implementation there may be some context switch bouncing unless an optimized primitive (e.g. some special futex mode, perhaps?) is available.

Other than maybe the file namespace attaching API (not sure of the exact semantics), and presuming I understand properly, I believe Doors, both functionally and the literal API, could be implemented entirely in userspace using Unix domain sockets, SCM_RIGHTS, and mmap. It just wouldn't have the context switching optimization without new kernel work. (See the switchto proposal for Linux from Google, though that was for threads in the same process.)

I'm basing all of this on the description of Doors at https://web.archive.org/web/20121022135943/https://blogs.ora... and http://www.rampant.org/doors/linux-doors.pdf

monocasa · 2024-07-24T10:27:11 1721816831

Not quite.

There isn't a door_recv(2) systemcall or equivalent.

Doors truly don't transfer messages, they transfer the thread itself. As in the thread that made a door call is now just directly executing in the address space of the callee.

They're more like i432/286/mill cpu task gates.

wahern · 2024-07-24T10:55:01 1721818501

> Doors truly don't transfer messages, they transfer the thread itself. As in the thread that made a door call is now just directly executing in the address space of the callee.

In somewhat anachronistic verbiage (at least in a modern software context) this may be true, but today this statement makes it sounds like code from the caller process is executing in the address space of the callee process, such that miraculously the caller code now can directly reference data in the callee. AFAICT that just isn't the case, and wouldn't even make sense--i.e. how would it know the addresses without a ton of complex reflection that's completely absent from example code? (Caller and callee don't need to have been forked from each other.) And according to the Linux implementation, the "argument" (a flat, contiguous block of data) passed from caller to callee is literally copied, either directly or by mapping in the pages. The caller even needs to provide a return buffer for the callee's returned data to be copied into (unless it's too large, then it's mapped in and the return argument vector updated to point to the newly mmap'd pages). File descriptors can also be passed, and of course that requires kernel involvement.

AFAICT, the trick here pertains to scheduling alone, both wrt to the hardware and software systems. I.e. a lighter weight interface for the hardware task gating mechanism, like you say, reliant on the synchronous semantics of this design to skip involving the system scheduler. But all the other process attributes, including address space, are switched out, perhaps in an optimized matter as mentioned elsethread but still preserving typical process isolation semantics.

If I'm wrong, please correct me with pointers to more detailed technical documentation (Or code--is this still in Illuminos?) because I'd love to dig more into it.

FWIW, Here's the Solaris man page for libdoor: https://docs.oracle.com/cd/E36784_01/html/E36873/libdoor-3li... Did you mean door_call or door_return instead of door_recv?

monocasa · 2024-07-24T11:32:23 1721820743

I didn't imply that the code remains and it's only data that is swapped out. The thread jumps to another complete address space.

It's like a system call instruction that instead of jumping into the kernel, jumps into another user process. There's a complete swap out of code and data in most cases.

Just like with system calls how the kernel doesn't need a thread pool to respond to user requests applies here. The calling thread is just directly executing in the callee address space after the door_call(2).

> Did you mean door_call or door_return instead of door_recv?

I did not. I said there is no door_recv(2) systemcall. The 'server' doesn't wait for messages at all.

kragen · 2024-07-24T11:52:42 1721821962

thanks for finding the man page!

gpderetta · 2024-07-24T09:41:13 1721814073

I think what doors do is rendezvous synchronization: the caller is atomically blocked as the callee is unblocked (and vice versa on return). I don't think there is an efficient way to do that with just plain POSIX primitives or even with Linux specific syscalls (Binder and io_uring possibly might).

xoranth · 2024-07-24T11:25:44 1721820344

Sounds a bit like Google's proposal for a `switchto_switch` syscall [1] that would allow for cooperative multithreading bypassing the scheduler.

(the descendants of that proposal is `sched_ext`, so maybe it is possible to implement doors in eBPF + sched_ext?)

[1]: https://youtu.be/KXuZi9aeGTw?t=900

p_l · 2024-07-24T10:07:35 1721815655

The thread in this context refers to kernel scheduler thread[1], essentially the entity used to schedule user processes. By migrating the thread, the calling process is "suspended", it's associated kernel thread (and thus scheduled time quanta, run queue position, etc.) saves the state into Door "shuttle", picks up the server process, continues execution of the server procedure, and when the server process returns from the handler, the kernel thread picks up the Door "shuttle", restores the right client process state from it, and lets it continue - with the result of the IPC call.

This means that when you do a Door IPC call, the service routine is called immediately, not at some indefinite point in time in the future when the server process gets picked by scheduler to run and finds out an event waiting for it on select/poll kind of call. If the service handler returns fast enough, it might return even before client process' scheduler timeslice ends.

The rapid changing of TLB etc. are mitigated by hardware features in CPU that permit faster switches, something that Sun had already experience with at the time from the Spring Operating System project - from which the Doors IPC in fact came to be. Spring IPC calls were often faster than normal x86 syscalls at the time (timings just on the round trip: 20us on 486DX2 for typical syscall, 11us for sparcstation Spring IPC, >100us for Mach syscall/IPC)

EDIT:

[1] Some might remember references to 1:1 and M:N threading in the past, especially in discussions about threading support in various unices, etc.

The "1:1" originally referred to relationship between "kernel" thread and userspace thread, where kernel thread didn't mean "posix like thread in kernel" and more "the scheduler entity/concept", whether it was called process, thread, or "lightweight process"

creshal · 2024-07-24T07:17:23 1721805443

Sounds like Android's binder was heavily inspired by this. Works "well" in practice in that I can't recall ever having concurrency problems, but I would not bother trying to benchmark the efficiency of Android's mess of abstraction layers piled over `/dev/binder`. It's hard to tell how much of the overhead is required to use this IPC style safely, and how much of the overhead is just Android being Android.

p_l · 2024-07-24T09:01:41 1721811701

Not sure which one came first, but Binder is direct descendant (down to sometimes still matching symbol names and calls) of BeOS IPC system. All the low level components (Binder, Looper, serialization model even) come from there.

creshal · 2024-07-24T21:01:10 1721854870

From what I understand, Sun made their Doors concept public in 1993 and shipped a SpringOS beta with it in 1994, before BeOS released, but it's hard to tell if Sun inspired BeOS, or of this was a natural solution to a common problem that both teams ran into at the same time.

p_l · 2024-07-25T12:04:41 1721909081

I'd expect convergent evolution - both BeOS team and Spring team were very well aware of issues with Mach (which nearly single-handedly coined the idea that microkernels are slow and bad) and worked to design better IPC mechanisms.

Sharing of scheduler slice is an even older idea, AFAIK, and technically something already done whenever you call a kernel (it's not a context switch to a separate process, it's a switch to different address space but running in the same scheduler thread)

stuaxo · 2024-07-24T20:04:26 1721851466

That's really interesting, I wonder if a Haiku on Linux could use it.

Will binder ever make it into mainline Linux?

p_l · 2024-07-25T11:59:36 1721908776

Binders has been in mainline kernel for years, and some projects ended up using it, if only to emulate android environment - both anbox and its AFAIK successor Waydroid use native kernel binder to operate.

You can of course build your own use (depending on what exactly you want to do, you might end up writing your own userland instead of using androids)

creshal · 2024-07-24T21:05:22 1721855122

As far as I understand, it is already mainlined, it's just not built by "desktop" distributions since nobody really cares - all the cool kids want dbusFactorySingletonFactoryPatternSingletons to undo 20 years of hardware performance increases instead.

p_l · 2024-07-25T11:59:53 1721908793

Bunch of desktop distros include it to run anbox/waydroid

kragen · 2024-07-24T12:54:54 1721825694

what's the best introduction to how beos ipc worked?

p_l · 2024-07-24T13:17:35 1721827055

Be Book, Haiku source code, and yes Android low level internals docs.

A quick look through BeOS and Android Binder-related APIs will quickly show how Android side is derived from it (through OpenBinder, which was for a time going to be used in next Palm system based on Linux, at least one of them)

kragen · 2024-07-24T13:21:18 1721827278

thank you very much!

pjmlp · 2024-07-24T11:02:10 1721818930

Binder was inspired by IPC mechanisms in Palm and BeOS, whose engineers joined the original Android team.

rerdavies · 2024-07-24T07:33:45 1721806425

Conceptually. What are they actually doing, and why is it faster than other RPC techniques?

fch42 · 2024-07-24T11:34:58 1721820898

Think of it in terms of REST. A door is an endpoint/path provided by a service. The client can make a request to it (call it). The server can/will respond.

The "endpoint" is set up via door_create(); the client connects by opening it (or receiving the open fd in other ways), and make the request by door_call(). The service sends its response by door_return().

Except that the "handover" between client and service is inline and synchronous, "nothing ever sleeps" in the process. The service needn't listen for and accept connections. The operating system "transfers" execution directly - context switches to the service, runs the door function, context switches to the client on return. The "normal" scheduling (where the server/client sleeps, becomes runnable from pending I/O and is eventually selected by the scheduler) is bypassed here and latency is lower.

Purely functionality-wise, there's nothing you can do with doors that you couldn't do with a (private) protocol across pipes, sockets, HTTP connections. You "simply" use a faster/lower-latency mechanism.

(I actually like the "task gate" comparison another poster made, though doors do not require a hardware-assisted context switch)

p_l · 2024-07-24T15:00:55 1721833255

Well, Doors' speed was derived from hardware-assisted context switching, at least on SPARC. Combination of ASIDs (which allowed task switching with reduced TLB flushing) and WIM register (which marked which register windows are valid for access by userspace) meant that IPC speed could be greatly increased - in fact that was basis for "fast path" IPC in Spring OS from which Doors were ported into Solaris.

fch42 · 2024-07-24T16:59:26 1721840366

I was (more) of a Solaris/x86 kernel guy on that particular level and know the x86 kernel did not use task gates for doors (or any context switching other than the double fault handler). Linux did taskswitch via task gates on x86 till 2.0, IIRC. But then, hw assist or no, x86 task gates "aren't that fast".

The SPARC context switch code, to me, always was very complex. The hardware had so much "sharing" (the register window set could split to multiple owners, so would the TSB/TLB, and the "MMU" was elaborate software in sparcv9 amyway). SPARC's achilles heel always were the "spills" - needless register window (and other cpu state) to/from memory. I'm kinda still curious from a "historical" point of view - thanks!

p_l · 2024-07-24T17:56:52 1721843812

The historical point was that for Spring OS "fast path" calls, if you kept register stack small enough, you could avoid spilling at all.

Switching from task A to task B to service a "fast path" call AFAIK (have no access to code) involved using WIM register to set windows used by task A to be invalid (so their use would trigger a trap), and changing the ASID value - so if task B was already in TLB you'd avoid flushes, or reduce them only to flushing when running out of TLB slots.

The "golden" target for fast-path calls was calls that would require as little stack as possible, and for common services they might be even kept hot so they would be already in TLB.

__d · 2024-07-25T00:36:10 1721867770

Is the Spring source available anywhere?

p_l · 2024-07-25T11:46:24 1721907984

Unfortunately, as far as I know, not. Only published articles and technical reports :(

__d · 2024-07-26T00:50:22 1721955022

I once had the university CD-ROM release, but I've lost it, sadly.

rerdavies · 2024-07-30T08:36:38 1722328598

So if I understand it correctly, the IPC advantage is that they preserve registers across the process context switch, thereby avoiding having to do notoriously expensive register saves and restores? In effect, leaking register contents across the context switch becomes a feature instead of a massive security risk. Brilliant!

Is that it?

grishka · 2024-07-24T14:09:55 1721830195

Why would you care who spawned the thread? If your code is thread-safe, it shouldn't make a difference.

One potential problem with regular IPC I see is that it's nondeterministic in terms of performance/throughput because you can't be sure when the scheduler will decide to run the other side of whatever IPC mechanism you're using. With these "doors", you bypass scheduling altogether, you call straight "into" the server process thread. This may make a big difference for systems under load.

actionfromafar · 2024-07-24T07:10:26 1721805026

Reminds me vaguely of how Linux processes are (or were? I haven't looked at this in ages) elevated to kernel mode during a syscall.

With a "door", a client is elevated to the "server" mode.

blacklion · 2024-07-24T08:55:58 1721811358

Conceptually is key word here.

Later in this article authors says tat Server manage its own pool (optionally bounded) of threads to serve requests.

akira2501 · 2024-07-24T07:53:49 1721807629

> this "opens the door" for potentially worse concurrency headaches you have with threads you spawn and control yourself.

What makes doors "potentially worse" than regular threads?

rbanffy · 2024-07-24T12:17:26 1721823446

I wonder if anyone ended up saving a copy of the Spring operating system.

sillywalk · 2024-07-24T19:51:58 1721850718

bcantrill had a copy in his basement as of 2015.

https://news.ycombinator.com/item?id=10325362

rbanffy · 2024-07-25T12:18:30 1721909910

Is his basement open to public visitation? ;-)

danwills · 2024-07-24T09:07:25 1721812045

Somehow I'm only just noticing the name-conflict with SideFX Houdini's new(ish) USD context, which is also called 'Solaris'! .. Guess I don't search for the old SunSoft kind of Solaris much these days eh!

rbanffy · 2024-07-24T12:19:33 1721823573

You probably noticed that before Oracle's legal dept.

crest · 2024-07-25T15:36:20 1721921780

I wish FreeBSD and Linux had Solaris style doors extended to support kqueue()/epoll() in addition to threads.

cryptonector · 2024-07-25T17:33:49 1721928829

Doors are designed to be synchronous. Therefore doors are not really compatible with kqueue/epoll/event ports. That is, the point of doors is that they are for clients to temporarily and synchronously transfer thread priority and CPU quantum to a server thread that will respond quickly because it does no blocking I/O. This transfer of CPU quantum thing cannot happen for async versions of doors, which would defeat the point of doors, thus there is no point to async doors.

One can implement async doors by having the client transfer the FD to a door where the client is the service, then have the server do what it can synchronously, schedule async work, return immediately, and later call back the client on its door when the slow work is done. But if you'd do this then you might as well just use more traditional (and portable) IPC techniques.

quotemstr · 2024-07-24T17:14:33 1721841273

Nobody has ever done IPC better than Microsoft did with COM/RPC/AIPC. Nobody else even came close. I will die on this hill. The open source world has done itself a tremendous disservice eschewing object capability systems with in-process bypasses.

bux93 · 2024-07-24T11:23:02 1721820182

I wonder if the name is related to BBS doors? https://en.wikipedia.org/wiki/Door_(bulletin_board_system)

kragen · 2024-07-24T11:51:46 1721821906

i don't think so; i think people independently chose the name 'door' for a way to get out of one program and into another in both cases. bbs people and sun labs kernel researchers didn't talk to each other that much unfortunately, and the mechanisms aren't really that similar

debo_ · 2024-07-24T23:21:39 1721863299

I didn't realize "staff engineer" was a common software title as far back as 1996.