Lots of things tried including slabbing requests to reduce
allocations on schedule. Turns out just not calling read/write
by setting an atomic flag if its already been trigger was
a major win.
Tuned all the atomic operations with std::memory_order*
to release/acquire or relaxed appropriately.
When processing items in the accept queue they are grabbed
now in 128 task chunks and processed inline. This had a monster
speedup effect since the lock is significantly less contentious.
In all went from about 1.5mm ops/sec to 4mm ops/sec.
Good fun day.
The scheduler had a 'nice' optimization where any newly
submitted or resumed task would try and check if the current
thread its executing was the process event thread and if so
directly start or resume the task rather than pushing it into
the FIFO queues. Well this has a bad side effect of a recursive
task which generates sub tasks will eventually cause a
stackoverflow to occur. To avoid this the tasks for
submitting and resuming go through the normal FIFO queue which
is slower but removes the recursive function calls.
Attempted to test an accept task coroutine but
the performance was lacking, took a major hit
so scrapping that idea for now. Currently
proccessing events inline on the background
thread epoll loop appears to be the most
efficient.
Prioritize resumed tasks over new tasks.
Fixed issue with operator() called immediately
on lambdas causing them to go out of scope,
Debug builds didn't show a problem but Release did.
This allows for an internal unsafe_yield() which will
call coroutine.resume() directly from internal engine
supported yield functions.
This allows for an external yield() which now co_awaits
the event, and then event upon being set will correctly
resume the awaiting coroutine on the engine thread for
the user.
Turns out that the final_suspend() method is required
to be std::suspend_always() otherwise the coroutine_handle<>.done()
function will not trigger properly. Refactored the task class
to allow the user to decide if they want to suspend at the beginning
but it now forces a suspend at the end to guarantee that
task.is_ready() will work properly.