- Remove the edge trigger io hacks we had in place.
- Use level triggered io for the libcurl fds instead.
- Batch all curl events together and process them at the end
of our worker event loop.
A new acme process is created that communicates with the acme servers.
This process does not hold any of your private keys (no account keys,
no domain keys etc).
Whenever the acme process requires a signed payload it will ask the keymgr
process to do the signing with the relevant keys.
This process is also sandboxed with pledge+unveil on OpenBSD and seccomp
syscall filtering on Linux.
The implementation only supports the tls-alpn-01 challenge. This means that
you do not need to open additional ports on your machine.
http-01 and dns-01 are currently not supported (no wildcard support).
A new configuration option "acme_provider" is available and can be set
to the acme server its directory. By default this will point to the
live letsencrypt environment:
https://acme-v02.api.letsencrypt.org/directory
The acme process can be controlled via the following config options:
- acme_root (where the acme process will chroot/chdir into).
- acme_runas (the user the acme process will run as).
If none are set, the values from 'root' and 'runas' are taken.
If you want to turn on acme for domains you do it as follows:
domain kore.io {
acme yes
}
You do not need to specify certkey/certfile anymore, if they are present
still
they will be overwritten by the acme system.
The keymgr will store all certificates and keys under its root
(keymgr_root), the account key is stored as "/account-key.pem" and all
obtained certificates go under "certificates/<domain>/fullchain.pem" while
keys go under "certificates/<domain>/key.pem".
Kore will automatically renew certificates if they will expire in 7 days
or less.
Before kore needed to be built with NOTLS=1 to be able to do non TLS
connections. This has been like this for years.
It is time to allow non TLS listeners without having to rebuild Kore.
This commit changes your configuration format and will break existing
applications their config.
Configurations now get listener {} contexts:
listen default {
bind 127.0.0.1 8888
}
The above will create a listener on 127.0.0.1, port 8888 that will serve
TLS (still the default).
If you want to turn off TLS on that listener, specify "tls no" in that
context.
Domains now need to be attached to a listener:
Eg:
domain * {
attach default
}
For the Python API this kills kore.bind(), and kore.bind_unix(). They are
replaced with:
kore.listen("name", ip=None, port=None, path=None, tls=True).
With this commit all Kore processes (minus the parent) are running
under seccomp.
The worker processes get the bare minimum allowed syscalls while each module
like curl, pgsql, etc will add their own filters to allow what they require.
New API functions:
int kore_seccomp_filter(const char *name, void *filter, size_t len);
Adds a filter into the seccomp system (must be called before
seccomp is enabled).
New helpful macro:
define KORE_SYSCALL_ALLOW(name)
Allow the syscall with a given name, should be used in
a sock_filter data structure.
New hooks:
void kore_seccomp_hook(void);
Called before seccomp is enabled, allows developers to add their
own BPF filters into seccomp.
this change also stops python coroutines from waking up very
late after their timeout has expired.
in filerefs, don't prime the timer until we actually have something
to expire, and kill the timer when the last ref drops.
Move away from the parent constantly hitting the disk for every
accesslog the workers are sending.
The workers will now write their own accesslogs to shared
memory before the parent will pick those up. The parent
will flush them to disk once every second or if they grow
larger then 1MB.
This removes the heavy penalty for having access logs
turned on when you are dealing with a large volume
of requests.
Now anyone can schedule events and get a callback to work as long
as the user data structure that is added for the event begins
with a kore_event data structure.
All event state is now kept in that kore_event structure and renamed
CONN_[READ|WRITE]_POSSIBLE to KORE_EVENT_[READ|WRITE].
A filemap is a way of telling Kore to serve files from a directory
much like a traditional webserver can do.
Kore filemaps only handles files. Kore does not generate directory
indexes or deal with non-regular files.
The way files are sent to a client differs a bit per platform and
build options:
default:
- mmap() backed file transfer due to TLS.
NOTLS=1
- sendfile() under FreeBSD, macOS and Linux.
- mmap() backed file for OpenBSD.
The opened file descriptors/mmap'd regions are cached and reused when
appropriate. If a file is no longer in use it will be closed and evicted
from the cache after 30 seconds.
New API's are available allowing developers to use these facilities via:
void net_send_fileref(struct connection *, struct kore_fileref *);
void http_response_fileref(struct http_request *, struct kore_fileref *);
Kore will attempt to match media types based on file extensions. A few
default types are built-in. Others can be added via the new "http_media_type"
configuration directive.
The HTTP layer used to make a copy of each incoming header and its
value for a request. Stop doing that and make HTTP headers zero-copy
all across the board.
This change comes with some api function changes, notably the
http_request_header() function which now takes a const char ** rather
than a char ** out pointer.
This commit also constifies several members of http_request, beware.
Additional rework how the worker processes deal with the accept lock.
Before:
if a worker held the accept lock and it accepted a new connection
it would release the lock for others and back off for 500ms before
attempting to grab the lock again.
This approach worked but under high load this starts becoming obvious.
Now:
- workers not holding the accept lock and not having any connections
will wait less long before returning from kore_platform_event_wait().
- workers not holding the accept lock will no longer blindly wait
an arbitrary amount in kore_platform_event_wait() but will look
at how long until the next lock grab is and base their timeout
on that.
- if a worker its next_lock timeout is up and failed to grab the
lock it will try again in half the time again.
- the worker process holding the lock will when releasing the lock
double check if it still has space for newer connections, if it does
it will keep the lock until it is full. This prevents the lock from
bouncing between several non busy worker processes all the time.
Additional fixes:
- Reduce the number of times we check the timeout list, only do it twice
per second rather then every event tick.
- Fix solo worker count for TLS (we actually hold two processes, not one).
- Make sure we don't accidentally miscalculate the idle time causing new
connections under heavy load to instantly drop.
- Swap from gettimeofday() to clock_gettime() now that MacOS caught up.
- Change pools to use mmap() for allocating regions.
- Change kore_malloc() to use pools for commonly sized objects.
(split into multiple of 2 buckets, starting at 8 bytes up to 8192).
- Rename kore_mem_free() to kore_free().
The preallocated pools will hold up to 128K of elements per block size.
In case a larger object is to be allocated kore_malloc() will use
malloc() instead.
Setting the handle callback allows your application
to take care of network events for the connection.
Look at the connection state and flags to determine
if read/write is possible and go from there.
See kore_connection_handle() for more details.
This configuration option limits the maximum number
of connections a worker process can accept() in a single
event loop.
It can be used to more evenly spread out incoming connections
across workers when new connections arrive in a burst.
In cases where accept() failed Kore would not relinquish the
lock towards other worker processes.
This becomes evident when dealing with a high number of concurrent
connections to the point the fd table gets full. In this scenario
the worker with the full fd table will spin on attempt to accept
newer connections.
As a bonus, Kore now has allows exactly up to worker_max_connections
of connections per worker before no longer attempting to grab the
accept lock.
Introduces two new configuration knobs:
* socket_backlog (backlog for listen(2))
* http_request_limit
The second one is the most interesting one.
Before, kore would iterate over all received HTTP requests
in its queue before returning out of http_process().
Under heavy load this queue can cause Kore to spend a considerable
amount of time iterating over said queue. With the http_request_limit,
kore will process at MOST http_request_limit requests before returning
back to the event loop.
This means responses to processed requests are sent out much quicker
and allows kore to handle any other incoming requests more gracefully.
- The net code no longer has a recv_queue, instead reuse same recv buffer.
- Introduce net_recv_reset() to reset the recv buffer when needed.
- Have the workers spread the load better between them by slightly
delaying their next accept lock and giving them an accept treshold
so they don't go ahead and keep accepting connections if they end
up winning the race constantly between the workers.
- The kore_worker_acceptlock_release() is no longer available.
- Prepopulate the HTTP server response header that is added to each
response in both normal HTTP and SPDY modes.
- The path and host members of http_request are now allocated on the heap.
These changes overall result better performance on a multicore machine,
especially the worker load changes shine through.
Kore no longer passes the accept lock to the "next in line"
worker but instead all workers will attempt to grab the lock
if they can.
Also remember if we had the lock previous iteration of the
event loop and don't constantly disable/enable the accepting sockets.
Makes Kore scale even better across multiple cpu's.
Tasks are now assigned to available threads instead
of a global task list.
You can now pass messages between your page handler
and the created task using the kore_task_channel_*
functions.
Only one task per time can be assigned to a request
but I feel this is probably a bad design choice.
Preferably we'd want to be able to start tasks
regardless of being in a page handler or not,
this not only ads flexibility but seems like
a better choice overall as it opens a lot more
possibilities about how tasks can be used.