Redis-inspired key/value store written in PicoLisp
16 Jun 2020You can get it on GitHub.
This program mimics functionality of a Redis™ in-memory database, but is designed specifically for PicoLisp applications with optional on-disk persistence and encryption.
Note: This library DOES NOT use the RESP protocol and thus cannot work with the
redis-cli
or other Redis clients/servers.
The included server.l
and client.l
can be used to send and receive “Redis-like” commands over TCP or UNIX named pipess.
Requirements
- PicoLisp 32-bit/64-bit
v17.12
tov20.6.29
, orpil21
- Linux or UNIX-like OS (with support for named pipes)
Getting Started
This library is written in pure PicoLisp and contains no external dependencies.
To ensure everything works on your system, run the tests first: make check
Using the CLI tools
- Launch a server with:
./server.l --pass <yourpass> --verbose
- Check the server info with:
./client.l --pass <yourpass> INFO
That should return some interesting info about your server. See below for more examples.
Using as a library
Server
- Load the library in your project:
(load "libkv.l")
- Set the server password:
(setq *KV_pass "yourpass")
- Start listening for requests:
(kv-listen)
Client
- Load the client library in your project:
(load "libkvclient.l")
- Set the server password
(setq *KV_pass "yourpass")
- Start the client listener with
(kv-start-client)
- Send your command and arguments with
(kv-send-data '("INFO" "server"))
Received data will be returned as-is (list, integer, string, etc). Wrap the result like: (kv-print Result)
to send the output to STDOUT
:
: (load "libkvclient.l")
-> kv-start-client
: (setq *KV_pass "yourpass")
-> "yourpass"
: (kv-start-client)
-> T
: (kv-send-data '("set" "mykey" 12345))
-> "OK"
: (kv-send-data '("get" "mykey"))
-> 12345
: (kv-send-data '("set" "yourkey" "12345"))
-> "OK"
: (kv-send-data '("get" "yourkey"))
-> "12345"
Feel free to observe the example code in client.l.
Note: Using
(kv-send-data)
will send the data to the server and automatically block the client while waiting for a response.
Usage
This section describes usage information for the CLI tools server.l
and client.l
.
Server
The server listens in the foreground for TCP connections on port 6378
by default. Only the password
, port
, persistence
, and verbosity
are configurable, and a password
is required:
# server.l
Usage: ./server.l --pass <pass> [options]
Example: ./server.l --pass foobared --port 6378 --verbose --persist 60
Options:
--help Show this help message and exit
--binary Store data in binary format instead of text (default: plaintext)
--pass <password> Password used by clients to access the server (required)
--persist <seconds> Number of seconds between database persists to disk (default: disabled)
--port <port> TCP listen port for communication with clients (default: 6378)
--verbose Verbose flag (default: False)
Examples
./server.l --pass yourpass --verbose
Parent PID: 38867
[sibling]=38874
[child]=38873 [parent]=38867
[msg] from client: (pid: 38873) ::ffff:127.0.0.1 ("IDENT" ("id" . "3F21CC32") ("hostname" . "meta.lan"))
[msg] from child : (pid: 38873) ("message" 38873 ("IDENT" ("id" . "3F21CC32") ("hostname" . "meta.lan")))
[msg] to client: "OK 3F21CC32"
[msg] from client: (pid: 38873) ::ffff:127.0.0.1 ("INFO" "server")
[msg] from child : (pid: 38873) ("message" 38873 ("INFO" "server"))
[msg] to client: "^J# Server^Japp_version:0.11.0^Jos:Linux 4.19.34-tinycore64 x86_64^Jarch_bits:64^Jprocess_id:38874^Jtcp_port:6378^Juptime_in_seconds:1^Juptime_in_days:0^Jexecutable:/usr/bin/picolisp^J"
[child]=38873 exiting
[msg] from child : (pid: 38873) ("done" 38873 NIL)
Client
The client handles authentication, identification, and sending of “Redis-like” commands to the server. It then prints the result to STDOUT
and can be parsed by standard *NIX tools. The client receives PLIO data over a TCP socket, or named pipe (if client/server are on the same system).
# client.l
Usage: ./client.l --pass <pass> COMMAND [arguments]
Example: ./client.l --pass foobared --encrypt SET mysecret -- <(echo 'mypass')
Options:
--help Show this help message and exit
--commands Show the full list of commands and exit
--decrypt Enable decryption of values using a GPG public key (default: disabled)
--encrypt Enable encryption of values using a GPG public key (default: disabled)
--name <name> Easily identifiable client name (default: randomly generated)
--host <host> Hostname or IP of the key/value server (default: localhost)
--pass <data> Password used to access the server (required)
--poll <seconds> Number of seconds for polling the key/value server (default: don't poll)
--port <port> TCP port of the key/value server (default: 6378)
-- STDIN Reads an argument from STDIN
COMMAND LIST Commands are case-insensitive and don't always require arguments
APPEND key value Append a value to a key
BGSAVE Asynchronously save the dataset to disk
CLIENT ID Returns the client ID for the current connection
CLIENT KILL ID id [id ..] Kill the connection of a client
CLIENT LIST Get the list of client connections
CONVERT Convert a plaintext database to binary or vice-versa
DEL key [key ..] Delete a key
EXISTS key [key ..] Determine if a key exists
GET key Get the value of a key
GETSET key value Set the string value of a key and return its old value
HDEL key field [field ..] Delete one or more hash fields
HEXISTS key field Determine if a hash field exists
HGET key field Get the value of a hash field
HGETALL key Get all the fields and values in a hash
HKEYS key Get all the fields in a hash
HLEN key Get the number of fields in a hash
HMGET key field [field ..] Get the values of all the given hash fields
HSET key field value [field value ..] Set the string value of a hash field
HSTRLEN key field Get the length of the value of a hash field
HVALS key Get all the values in a hash
INFO [section] Get information and statistics about the server
LINDEX key index Get an element from a list by its index
LLEN key Get the length of a list
LPOP key Remove and get the first element in a list
LPOPRPUSH source destination Remove the first element in a list, append it to another list and return it
LPUSH key element [element ..] Prepend one or multiple elements to a list
LRANGE key start stop Get a range of elements from a list
LREM key count element Remove elements from a list
LSET key index element Set the value of an element in a list by its index
LTRIM key start stop Trim a list to the specified range
MGET key [key ..] Get the values of all the given keys
MSET key value [key value ..] Set multiple keys to multiple values
PING [message] Ping the server
RPOP key Remove and get the last element in a list
RPOLRPUSH source destination Remove the last element in a list, prepend it to another list and return it
RPUSH key element [element ..] Append one or multiple elements to a list
SAVE Synchronously save the dataset to disk
SET key value Set the string value of a key
STRLEN key Get the length of the value stored in a key
Notes and limitations
This section will explain some important technical details about the code, and limitations on what this library can and can’t do.
Technical notes
- All keys are stored under the prefix
*KV/
, example:*KV/mykeys
. This prefix is hardcoded everywhere and shouldn’t be changed. - Requests and commands generate statistics which are stored in memory (and lost when the server exits). Statistics are stored under the
*KV/%stats%/
prefix and are read-only by external clients. - Similar to the WebSockets protocol, a unique UUID:
7672FDB2-4D29-4F10-BA7C-8EAD0E29626E
is used during the handshake sequence between all clients and servers. For compatibility with future tools, please do not change it. - Passwords are hashed using a very simple and collision-prone hashing function. It is not cryptographically secure or used for that purpose. If using this library over a public network, please use stunnel or hitch as a TLS proxy between the client and server.
- Output from the
client.l
uses(prinl)
, so a result"2"
and2
will both appear the same. Lists are concatenated with a,
comma and also output using(prinl)
. Error messages are sent toSTDERR
and the client exits with error code1
. - Named pipes are created in the
(tmp)
directory of the server’s parent process, and will be removed when the parent exits cleanly. Please do not kill the parent process withkill -9
(orkill -KILL
) as it will leave an unresponsive zombie sibling with the TCP socket still open, and the named pipes will not be removed. - A best effort has been made to return the same datatypes and response types as Redis. Example: the
SET
command returnsOK
if the key was set. Not all responses are absolutely identical to Redis though. Please remember this library isn’t designed to be a perfect clone of Redis (see Limitations below).
Limitations
- This library is not used in production and has not been tested extensively (despite all the unit/integration tests). Please use at your own risk.
- This library DOES NOT use the RESP protocol and thus cannot work with the
redis-cli
or other Redis clients. - Since PicoLisp is not event-based, each new TCP connection spawns a new process, which limits concurrency to the host’s available resources.
- Not all Redis commands are implemented, because I didn’t have an immediate need for them. There are plans to slowly add new commands as the need arises.
- Using the
client.l
on the command-line, all values are stored as strings. Please use the TCP socket or named pipe directly to store integers and lists. Unlike Redis, there is no on-disk persistence and all keys will be lost when the server is restarted. This library was originally designed to be used as a temporary FIFO queue, with no need to persist the data. Support for persistence can be added eventually, and I’m open to pull-requests.Support for persistence has been added, see Persistence below.
How it works
For the server, everything starts with the (kv-listen)
function, which is where the TCP server is started:
+------------+ +---------------------+ +----------------+
| TCP client | | (parent) | | (sibling) |
+-----+---+--+ | +-------------+ | | +------------+ |
^ | | | TCP server | | | | Key/Value | |
| +---------> | (kv-listen) | | | | in-memory | |
| | +-------------+ | | | DB | |
| | | | +------------+ |
+-----+-------------> pipe_sibling +------> | |
| | | +-+-+-+----------+
| +--------+ | | | | |
+--+ child1 | <-----+ pipe_child_1 <----------+ | |
| +--------+ | | | |
| | | | |
| +--------+ | | | |
+--+ child2 | <-----+ pipe_child_2 <------------+ |
| +--------+ | | |
| | | |
| +--------+ | | |
+--+ child3 | <-----+ pipe_child_3 <--------------+
+--------+ | |
+---------------------+
Once (kv-listen)
is running, a TCP socket is opened on the configured port. An infinite loop begins and listens for incoming connections, giving each new TCP client its own forked child process for handling the request.
A named pipe called pipe_sibling
, is created in a temporary directory of the top-level parent process. This pipe will be used to communicate with other child processes, leaving the parent process to continue serving new TCP requests.
The parent process then forks another process, which we’ll call the sibling - an older sister if you prefer - and the sibling waits on the pipe_sibling
named pipe, listening for COMMANDS from the child processes.
The forked child processes will each create their own named pipe, called pipe_child_<pid>
, also in a temporary directory of the top-level parent process. The child process will listen on its own named pipe for messages sent by its older sister, the sibling. Once a message is received by the child, the response is sent back to the client over the TCP connection.
The idea is to have the sibling be the holder of all the keys. Every “Redis-like” command will have their data and statistics stored in the memory of the sibling process, and the sibling will handle receiving and sending its memory contents (keys/values) through named pipes to the respective child processes.
Persistence
Similar to Redis, this database implements “snapshotting” (full memory dump to disk) and “AOF” (append-only log file), however both features are tightly coupled, which makes for a much better experience.
- Persistence is disabled by default, but can be enabled with the
--persist N
parameter, whereN
is the number of seconds between eachBGSAVE
(background save to disk). - The database is stored in plaintext by default, but can be stored in binary with the
--binary
parameter. Binary format (PLIO) loads and saves much quicker than plaintext, but it becomes difficult to debug a corrupt entry. - The AOF follows the WAL approach, where each write command is first written to the AOF on disk, and then processed in the key/value memory store.
- The AOF only stores log entries since the previous
SAVE
orBGSAVE
, so it technically shouldn’t grow too large or unmanageable. - The database snapshot on disk is the most complete and important data, and should be backed up regularly.
- fsync is not managed by the database, so the server admin must ensure AOF log writes are actually persisted to disk.
- The AOF on-disk format is always plaintext, to allow easy debugging and repair of a corrupt entry.
- The
SAVE
andBGSAVE
commands can still be sent even if persistence is disabled. This will dump the in-memory data to disk as if persistence was enabled.
How persistence is implemented
Here we’ll assume persistence was previously enabled and data has already been written and saved to disk.
- On server start, some memory is pre-allocated according to the DB’s file size.
- The DB is then fully restored to memory
- If the AOF contains some entries, it is fully replayed to memory
- The DB is saved once more to disk and the AOF gets wiped
- A timer is started to perform periodic background DB saves
- Every new client connection sends the command to the AOF
- When a
BGSAVE
(non-blocking) command is received, a temporay copy of the AOF is made, the current AOF is wiped, and a background process is forked to save the DB to disk - A backup of the DB file is always made before overwriting the current DB file.
- To help handle concurrency and persistence, temporary files are named
kv.db.lock
,kv.db.tmp
,kv.aof.lock
, andkv.aof.tmp
. It’s best not to modify or delete those files while the server is running. They can be safely removed while the server is stopped.
AOF format
The AOF is stored by default in the kv.aof
file as defined by *KV_aof
.
Here are two separate entries in a typical AOF:
("1596099036.281142829" 54042 ("RPUSH" "mytestlist" ("four" "five" "six")))
("1596099059.683596840" 57240 ("RPUSH" "yourtestlist" ("seven" "eight" "nine")))
Each line is a PicoLisp list with only 3 items:
- Item 1:
String
Unix timestamp with nanoseconds for when the entry was created - Item 2:
Integer
Non-cryptographically secure hash (CRC) of the command and its arguments - Item 3:
List
Command name, first argument, and subsequent arguments
When replaying the AOF, the server will ensure the hash of command and arguments match, to guarantee the data is intact. Replaying an AOF can be slow, depending on the number of keys/values.
Note: Manually modifying the AOF will require recomputing and replacing the hash with the result from
(kv-hash)
or PicoLisp(hash)
.
(hash '("RPUSH" "mytestlist" ("four" "five" "zero")))
-> 61453
DB format
The DB is stored by default in the kv.db
file as defined by *KV_db
. When backed up, the new filename contains the suffix .old
.
Here are two separate entries in a typical DB:
("smalldata" ("test1" "test2" "test3" "test4" "test5" "test6"))
("fooh_1000" "test data 1000")
Each line is a PicoLisp list with the key in the (car)
, and values in the (cadr)
. They are quickly replayed and stored in memory with a simple (set)
command.
Differences from Redis
- Unlike Redis, persistence only allows specifying a time interval between each
BGSAVE
. Since the AOF is always enabled, it’s not necessary to “save after N changes”, so the config is much simpler. - Log rewriting is not something that “must be done”, because chances are the AOF will never grow too large. Of course that depends on the number of changes occurring between each
BGSAVE
, but even then the AOF is wiped when aBGSAVE
is initiated (and restored/rewritten if there was an error). - The DB snapshot is used to reconstruct the dataset in memory, not the AOF. The AOF is only used to replay the commands since the last DB save, which is much faster and more efficient, particularly when using
--binary
. - There is no danger of losing data when switching from
RDB
toAOF
, because such a concept doesn’t even exist.