|
|
Subscribe / Log in / New account

Short subjects: Realtime, Futexes, and ntfs3

By Jonathan Corbet
August 16, 2021
Even in the dog days of (northern-hemisphere) summer, the kernel community is a busy place. There are many developments that show up on your editor's radar, but which, for whatever reason, do not find their way into a full-length feature article. The time has come to catch up with a few of those topics; read on for updates on the realtime patch set, the effort to reinvent futexes, and the ntfs3 filesystem.

Realtime

The realtime preemption story is a long one; it first showed up on LWN in 2004. Over the years, this work has had a significant impact on kernel development as a whole; much of what is just seen as part of the core kernel now had its origins in the realtime tree. The code around which the realtime work was initially built — the preemptible locking infrastructure — remains out of the mainline, though. Without the locking changes, the mainline is not able to offer the sort of response-time guarantees that realtime users need.

The locking infrastructure makes almost all locks, spinlocks included, into sleeping locks; that ensures that a higher-priority task can always take over the processor quickly. It is the sort of change that makes kernel developers nervous, since mistakes in this area can lead to all sorts of subtle problems. For that reason, predicting when the locking code will be merged into the mainline is a fool's game. Your editor knows this well, having confidently predicted that it would be merged within a year — in 2007.

Still, one might be tempted to think that the end might be getting closer. Realtime developer Thomas Gleixner has brought the locking infrastructure back to the mailing lists for consideration; the fifth revision of the 72-part patch set was posted on August 15. Normally configured kernels should behave about the same with these patches applied, but those configured for realtime operation will have realtime-specific versions of mutexes, wait/wound mutexes, reader/writer semaphores, spinlocks, and reader/writer locks.

Commentary on this work has slowed; there does not appear to be much in the way of objections at this point — though it must be noted that Linus Torvalds has not yet made his feelings known on the subject. Unless something surprising comes up, it might just be that the core realtime code will finally find its way into the mainline. Your editor, however, is too old, wise, and cowardly to venture a guess as to when that will happen.

A smaller step for futex2

Perhaps the number of comments on the realtime changes is low because most developers fear the prospect of digging into code of that complexity. There are, however, places in the kernel that are even more frightening; the futex subsystem is surely one of them. Futexes provide fast mutexes for user space; they started out as a simple subsystem but failed to remain that way. Over time, it has become clear that futexes could do with a number of improvements to make them better suited for current workloads and, at the same time, to move beyond the multiplexer futex() system call.

For some time now, André Almeida has been pushing in that direction with the futex2 proposal. This work would split the futex functionality into several single-purpose system calls, support multiple lock sizes, and more. While there has been interest in this work, progress has been slow (to put it charitably); it seems as if the kernel is no closer to a new futex subsystem than it was a year or two ago.

In an attempt to push this project forward, Almeida has posted a new patch set with significantly reduced ambitions. Rather than introduce a whole new subsystem with its own system calls, this series adds exactly one system call that works with existing futexes:

    struct futex_waitv {
        uint64_t val;
        void *uaddr;
        unsigned int flags;
    };

    int futex_waitv(struct futex_waitv *waiters, unsigned int nr_futexes,
                    unsigned int flags, struct timespec *timo);

This function will cause the calling process to wait on several futexes simultaneously, returning when one or more of them can be acquired (or the timeout expires). That functionality is not supported by the current futex API, but it turns out to be especially useful for game engines, which perform significantly better when using the new system call. This documentation patch describes the new API in more detail.

This patch set has drawn no comments in the week since it was posted. Assuming that silence implies a lack of objections rather than a lack of interest, this piece of the futex2 work might make it into a mainline release before too long. Whether the rest of the futex2 work will follow depends on how strong the use cases driving it are; if futex_waitv() solves the worst problems, there might not be much motivation to push the other changes.

Waiting for ntfs3

The kernel has long had an implementation of the NTFS filesystem, but it has always suffered from performance and functionality problems; the user community would gladly trade it for something better. By all accounts, the ntfs3 implementation posted by Konstantin Komarov is indeed something better, but it is still not clear when it will be merged; this work was first posted one year ago, and version 27 of the patch set was posted on July 29.

The delay in accepting this work is proving frustrating to users; this complaint from Neal Gompa is typical:

I know that compared to all you awesome folks, I'm just a lowly user, but it's been frustrating to see nothing happen for months with something that has a seriously high impact for a lot of people.

It's a shame, because the ntfs3 driver is miles better than the current ntfs one, and is a solid replacement for the unmaintained ntfs-3g FUSE implementation.

Torvalds has said that maybe it is time to merge this code, but that still may not happen right away.

The biggest holdup for ntfs3 at the moment would appear to be concerns about the level of development effort behind it. From the public evidence, it seems that ntfs3 is a one-person project, and that makes other filesystem developers nervous. Those developers have been reporting test failures for ntfs3 that have gone unfixed. Meanwhile, Komarov is sometimes unresponsive to questions; various comments on the version 26 posting (from early April) got no answers, for example. This sort of silence gives the impression that ntfs3 does not have a lot of effort behind it. (It's worth noting that some other developers have been happy with the level of response from Komarov).

Unsurprisingly, the filesystem developers are unenthusiastic about the prospect of taking on a new NTFS implementation that may turn out to have serious problems and which does not come with a promise of reliable support. For ntfs3 to be merged, those fears will need to be addressed somehow. One way for that to happen, as suggested by Ted Ts'o, would be for other developers, perhaps representing one or more distributors that would like to see a better NTFS implementation in the kernel, to start contributing patches to ntfs3 and commit to helping with its maintenance going forward.

Index entries for this article
KernelFilesystems/ntfs3
KernelFutex
KernelRealtime


to post comments

Short subjects: Realtime, Futexes, and ntfs3

Posted Aug 16, 2021 17:26 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (14 responses)

> This function will cause the calling process to wait on several futexes simultaneously, returning when one or more of them can be acquired (or the timeout expires).
WaitForMultipleObjects, yay!

Short subjects: Realtime, Futexes, and ntfs3

Posted Aug 16, 2021 23:23 UTC (Mon) by itsmycpu (guest, #139639) [Link] (13 responses)

This supports only a subset of WaitForMultipleObjects.

After reading comments on previous patch versions, I find it difficult to imagine that kernel engineers plan on accepting this one, without comment.

Short subjects: Realtime, Futexes, and ntfs3

Posted Aug 17, 2021 17:45 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (12 responses)

Can't you already do most of the other WaitForMultipleObjects things using some combination of select/poll/epoll, signalfd, eventfd, etc.? Or is there some weird use case where you want to mix (very lightweight) futexes with (much heavier) other synchronization/IPC/IO primitives?

Short subjects: Realtime, Futexes, and ntfs3

Posted Aug 18, 2021 5:41 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (11 responses)

You can do that (and that's what Wine does), but for simple mutexes it's about an order of magnitude slower. It's _usually_ not a big deal because WFMO is typically used in top-level event loops that run at most hundreds of times per second.

Short subjects: Realtime, Futexes, and ntfs3

Posted Aug 18, 2021 20:26 UTC (Wed) by itsmycpu (guest, #139639) [Link] (10 responses)

WaitForMultipleObjects is not a good API though.

I think this comment by Thomas Gleixner still applies, even with the attempt to separate the code:

> While all the currently proposed extensions (multiple wait, variable
> size) make sense conceptually, I'm really uncomfortable to just cram
> them into the existing code. They create an ABI which we have to
> maintain forever.

I think any such step should be conceived on a much larger scale, in a much larger context.
In the meantime, the existing futex API plus appropriate userspace code should do fine.
(Perhaps aided by a much simpler WAKE-multiple syscall that would have a much lower maintenance footprint.)

Short subjects: Realtime, Futexes, and ntfs3

Posted Aug 18, 2021 20:30 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (9 responses)

> WaitForMultipleObjects is not a good API though.

I never understood why. It's perfect for what it was designed: waiting on a few objects. It's not a replacement for highly scalable epoll or other APIs.

> Perhaps aided by a much simpler WAKE-multiple syscall that would have a much lower maintenance footprint.

How would this work for the WFMO case?

Short subjects: Realtime, Futexes, and ntfs3

Posted Aug 18, 2021 21:07 UTC (Wed) by itsmycpu (guest, #139639) [Link] (2 responses)

> I never understood why. It's perfect for what it was designed: waiting on a few objects. It's not a replacement for highly scalable epoll or other APIs.

> How would this work for the WFMO case?

Somewhat unfortunately, I've spent a lot of time on a different website/forum to answer such questions and usually this results in exhausting discussions.
So please forgive me for not going into this once more, I understand you'd deserve a better answer. Also your question indicates I'd perhaps basically have to start at the beginning of a longer thing. Allow me to simply state my opinion here without going into details.

Short subjects: Realtime, Futexes, and ntfs3

Posted Aug 19, 2021 9:37 UTC (Thu) by farnz (subscriber, #17727) [Link] (1 responses)

Got a link to a discussion of this that you've had in the past? Would be nice to understand it all.

Short subjects: Realtime, Futexes, and ntfs3

Posted Aug 19, 2021 16:52 UTC (Thu) by itsmycpu (guest, #139639) [Link]

> Got a link to a discussion of this that you've had in the past? Would be nice to understand it all.

Well, are you asking as someone who
a) already knows about WFMO and problems with it (perceived or real), and
b) already would know how to implement wait-for-any with the existing futex API?
Or are these questions new to you?

Regarding b), you might start with the comments on the article linked above as "futex2 proposal". On a quick (re-)glance, I notice @ras, @ncm, and @pbonzini as knowing what they are talking about.

Short subjects: Realtime, Futexes, and ntfs3

Posted Aug 19, 2021 7:53 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (5 responses)

The problem with "a few" is that nobody knows how long their code is going to live. If it's only doing "a few" objects now, it's very tempting to just add one more object to the end of the list. I mean, that's only a O(1) slowdown to initialize the array, right? "A few" plus one is still "a few," right?

And then, once you added a new object today, that sets the precedent that it's OK to do so again tomorrow, and the next day, and then... before you know it, you're bumping up against MAXIMUM_WAIT_OBJECTS (64) and have to* start sharding it out into threads.

*Seriously, the MSDN docs explicitly recommend that solution. As a non-Windows developer, I'm appalled that that's apparently the best suggestion they could come up with.

Short subjects: Realtime, Futexes, and ntfs3

Posted Aug 19, 2021 18:11 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

WFMO are typically used kinda like "select" statement in Go. E.g. one common usage is to support cancellation:

object = WaitForMultipleObjects(someLock, cancelSignal);
if (object == cancelSignal) { return -ERRCANCELED;}

Short subjects: Realtime, Futexes, and ntfs3

Posted Aug 20, 2021 1:25 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (3 responses)

OK, but what do you use for the main event loop?

(Assume, for the sake of argument, that this is a non-GUI application such as a server, and so you're not just pumping window messages with GetMessage().)

Short subjects: Realtime, Futexes, and ntfs3

Posted Aug 20, 2021 3:45 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

> OK, but what do you use for the main event loop?
WFMO for the GUI apps :)

> (Assume, for the sake of argument, that this is a non-GUI application such as a server, and so you're not just pumping window messages with GetMessage().)

For server applications you should use either a good old thread-per-connection method or overlapped IO if you want asynchronous processing. WFMO was used in some of Ye Olde Servere Software to wait on large arrays of sockets, but that is roughly from the era when Linux only had select().

Short subjects: Realtime, Futexes, and ntfs3

Posted Aug 20, 2021 8:07 UTC (Fri) by njs (guest, #40338) [Link] (1 responses)

The problem is that there are objects that you can *only* wait on using WFMO -- so IOCP isn't enough, you need IOCP *and* WFMO, which is a terrific hassle.

Short subjects: Realtime, Futexes, and ntfs3

Posted Aug 20, 2021 17:09 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

Technically, you can use WFSO or WFMO _with_ IOCP to get notified about the signaled state.

What's wrong with ntfs-3g?

Posted Aug 31, 2021 18:29 UTC (Tue) by rfjakob (guest, #95595) [Link]

Looking at https://github.com/tuxera/ntfs-3g , it does not seem unmaintained at all. Last commit yesterday!


Copyright © 2021, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds