wZD is a powerful storage and database server, designed for big data storage systems with small and large files for mixed use and dramatically reduces count of small files for extend abilities any normal or clustered POSIX compatible file systems.
Server written in Go language that uses a modified version of the BoltDB database as a backend for saving and distributing any number of small and large files, NoSQL keys/values, in a compact form inside micro Bolt databases (archives), with distribution of files and values in BoltDB databases depending on the number of directories or subdirectories and the general structure of the directories.
…and billions of files will no longer be a problem.
- Multi threading
- Multi servers for fault tolerance and load balancing
- Complete file and value search
- Supports HTTPS and IP authorization
- Supported HTTP methods: GET, HEAD, OPTIONS, PUT, POST and DELETE
- Manage read and write behavior through client headers
- Support for customizable virtual hosts
- Linear scaling of read and write using clustered file systems
- Effective methods of reading and writing data
- Supports CRC data integrity when writing or reading
- Support for Range and Accept-Ranges, If-None-Match and If-Modifed-Since headers
- Store and share 10,000 times more files than there are inodes on any Posix compatible file system, depending on the directory structure
- Support for adding, updating, deleting files and values, and delayed compaction/defragmentation of Bolt archives
- Allows the server to be used as a NoSQL database, with easy sharding based on the directory structure
- Bolt archives support for selective reading of a certain number of bytes from a value
- Easy sharding of data over thousands or millions of Bolt archives based on the directory structure
- Mixed mode support, with ability to save large files separately from Bolt archives
- Semi-dynamic buffers for minimal memory consumption and optimal network performance tuning
- Includes multi threaded wZA archiver for migrating files without stopping the service
Our cluster used has about 250,000,000 small pictures and 15,000,000 directories on separate SATA drives. It utilizes the MooseFS cluster file system. This works well with so many files, but at the same time, its Master servers consume 75 gigabytes of RAM, and since frequent dumps of a large amount of metadata occur, this is bad for SSD disks. Accordingly, there is also a limit of about 1 billion files in MooseFS itself with the one replica of each file.
With a fragmented directory structure, an average of 10 to 1000 files are stored in most directories. After installing wZD and archiving the files in Bolt archives, it turned out about 25 times less files, about 10,000,000. With proper planning of the structure, a smaller number of files could have been achieved, but this is not possible if the already existing structure remains unchanged. Proper planning would result in very large inodes savings, low memory consumption of the cluster FS, significant acceleration of the MooseFS operation itself, and a reduction in the actual space occupied on the MooseFS cluster FS. The fact is, MooseFS always allocates a block of 64KB for each file, that is, even if a file has a size of 3KB, will still be allocated 64KB.
The wZD server was designed for mixed use. One can write not only ordinary files, but even html or json generated documents, and one can even simply use NoSQL as a sharding database consisting of a large number of small BoltDB databases, and carry out all sharding through the structure of directories and subdirectories.