* fix up g++ 11 warnings for unused return values
* Add opensuse/leap:15.2 CI build
* Add __attribute__((used)) for template functions
openSUSE/leap gcc on Release build seems to be aggressively
removing templates that are actually required, the used attribute
will force the compiler to leave them in.
* Update docs on io_scheduler for inline processing
Support gcc 10.3.1 (fedora 33 updated)
Update ci.yml to run fedora 32,33,34 and support both
gcc 10.2.1 and 10.3.1
* fedora 32 -> gcc-c++ drop version
* Update ci.yml and test_latch.cpp
The eventfd shutdown_fd was being reset instead of the schedule_fd,
this was causing a 100% cpu churn on scheduling the first task when
in inline mode. This fixes it, and confirmed the throughput on the
inline scheduler has jumped quite a bit consistently.
* io_scheduler inline support
* add debug info for io_scheduler size issue
* move poll info into its own file
* cleanup for feature
* Fix valgrind introduced use after free with inline processing
Running the coroutines inline with event processing caused
a use after free bug with valgrind detected in the inline
tcp server/client benchmark code. Basically if an event
and a timeout occured in the same time period because the
inline processing would resume _inline_ with the event or the
timeout -- if the timeout and event occured in the same epoll_wait()
function call then the second one's coroutine stackframe would
already be destroyed upon resuming it so the poll_info->processed
check would be reading already free'ed memory.
The solution to this was to introduce a vector of coroutine handles
which are appended into on each epoll_wait() iteration of events
and timeouts, and only then after the events and timeouts are
deduplicated are the coroutine handles resumed.
This new vector has elided a malloc in the timeout function, but
there is still a malloc to extract the poll infos from the timeout
multimap data structure. The vector is also on the class member
list and is only ever cleared, it is possible with a monster set
of timeouts that this vector could grow extremely large, but
I think that is worth the price of not re-allocating it.
* Update README with section links
* add # to links
* try event instead of coro::event
* Update section names to remove "::" since markdown doesn't seem to link
properly with them
* Add coro::mutex example to readme
* explicit lock_operation ctor
* lock_operation await_ready() uses try_lock
This allows for the lock operation to skip await_suspend() entirely
if the lock was unlocked.
* io_scheduler uses thread pool to schedule work
fixes#41
* use task_container in bench tcp server test
* adjust benchmark for github actions CI
* fix io_scheduler tests cross thread memory boundaries
* more memory barriers
* sprinkle some shutdowns in there
* update readme
* udp_peer!
I hope using the udp peer makes sense on how udp packets are
sent and received now. Time will tell!
* Fix broken benchmark tcp server listening race condition
* io_scheduler support timeouts
Closes#19
* io_scheduler resume_token<poll_status> for poll()
* io_scheduler read/write now use poll_status + size return
See issue for more details, in general attempting to
implement a coro::thread_pool exposed that the coro::sync_wait
and coro::when_all only worked if the coroutines executed on
that same thread. They should now possibly have the ability
to execute on another thread, to be determined in a later issue.
Fixes#7
Lots of things tried including slabbing requests to reduce
allocations on schedule. Turns out just not calling read/write
by setting an atomic flag if its already been trigger was
a major win.
Tuned all the atomic operations with std::memory_order*
to release/acquire or relaxed appropriately.
When processing items in the accept queue they are grabbed
now in 128 task chunks and processed inline. This had a monster
speedup effect since the lock is significantly less contentious.
In all went from about 1.5mm ops/sec to 4mm ops/sec.
Good fun day.
The scheduler had a 'nice' optimization where any newly
submitted or resumed task would try and check if the current
thread its executing was the process event thread and if so
directly start or resume the task rather than pushing it into
the FIFO queues. Well this has a bad side effect of a recursive
task which generates sub tasks will eventually cause a
stackoverflow to occur. To avoid this the tasks for
submitting and resuming go through the normal FIFO queue which
is slower but removes the recursive function calls.