Virtual File System for KDB/Q
qfuse is a virtual file system for KDB/Q that unifies multiple HDBs into a single VDB. It actively maintains a mapping of all files in source historical databases (HDBs) and collates their content into a single mounted directory simulating a virtual database (VDB). This utilizes FUSE which enables non-root applications to interact with POSIX disk commands on a mounted directory.
How did we get here?
A problem with large kdb infrastructure, especially written by different teams with a common end users, the users generally want to see all HDBs from one API. There are several patterns for tackling this with various degrees of technical complexity and maintenance.
One option leverages par.txt
which was meant for loading one segmented HDB across multiple volumes to instead load various HDBs. This works if there's no collisions in table names, a BIG if. Here's a previous article on how to implement it.
- Pros: Easy to implement
- Cons: Cannot handle name collisions
Another option is to use symlinks. Scan all source HDBs and collate them into one VDB with symlinks. Name collisions can be handled by renaming the symlinks. Only things to consider are schema prototypes and the date ranges for each of the tables or risk creating dead links to non-existent HDBs.
- Pros: Stable and straight foward to implement. You can rename symlinks to handle name collsions.
- Cons: Requires maintenance to expand date partitions in VDB. Need to handle inconsistent dates between databases.
The 3rd option, create your own virtual file system. qfuse utilizes FUSE (Filesystem in Userspace) library to register custom POSIX file system calls (e.g. open, read, write, stat) functions on a specific mounted directory. When a user ls mount
the readdir system call is routed to qfuse where it can list a custom directory tree structure. Partitioned tables are collated into dated folders, splays and symfiles are at the root. It looks for directories containing .d
(aka splayed tables) files are prefixed with namespace.
- Pros: Easy to configure. Handles name collisions by namespacing tables.
- Cons: Adds overhead when scanning directories and opening file handles.
Source HDBs:
bars/
├── 2025.08.30/
│ ├── daily/
│ └── minute/
└── sym_daily
ref/
└── sec/
└── sym_sec
Target VDB generated by qfuse:
vdb/
├── 2025.08.30/
│ ├── bars.daily/
│ ├── bars.minute/
└── ref.sec/
└── sym_daily
└── sym_sec
Internals
On start up, qfuse reads a config that lists all the source paths and their desired namespace names.
namespace,source
bars,hdb/bars
ref,hdb/ref
It will iterate through each path, scan the contents, and insert every sub-directory path and their files into a directory tree. A tree was chosen as there's 2 operations qfuse needs to be good at: listing contents of a sub-directory and finding the original path of a specific file.
There is a timer thread that will periodically re-scan the source directory for new files and remove any files that no longer exist. The child nodes are sorted to make look ups faster with a binary search.
Future Work
First iteration was to get the file system operational with decent performance. There's a few performance improvements I'll investigate further such as:
- Double buffering directory tree - keep 2 copies of the tree. While users are reading the 1 copy, the second copy after being re-scanned/sorted it is atomically swapped. This eliminates read/write locks on listdir/open operations.
- File notify - qfuse will periodically scan the source directory. Instead if should listen for file changes. Would be interesting to see these changes published to a message queue to notify users when data is ready.
As for feature enhancements, adding support for:
- Create VDBs within KDB using foreign functions API. Exposing these functions KDB instance could create it's own ephemeral VDB at startup.
- Point-in-time - KDB does not natively support multi-part partition, however if you were to save data in this format:
<date>/<asof>/table/
then qfuse could take an "asof" time parameter and create a VDB with the latest record for each date to up "asof" time.