Description
lotus is a high speed web server that serves data files to Greenplum Databases.
aster is a streaming server that works together with lotus.
Release
Synopsis
lotus --help
The advantage of using lotus is that it is built up with modern web architecture c10k problem. It enables lotus to be easy to deal with complicated reqiurements. It is more configurable, adaptable and faster comparing with the gpfdist.
Beside its performance, it could be easily integrated with message bus like Kafka/RabbitMQ or with data streaming server like Flink.
We have also developed streaming server, which has been released along with lotus. The streaming server could ingest records/messages from Kafak/Rabbitmq and log agent like fluent-bit. Information from all these parties could be aggregated together and then fed into Greenplum database. There is demo at the bottom of the page.
lotus could replace the old tool called gpfdist which is single thread and non-configurable web architecture.
For readable external tables, lotus split data file and serves data cocurrently to all the segment instances in the Greenplum Database system when users SELECT from the external table.
For streaming server aster working with Kafka/RabbitMQ/Fluent-bit it would be released together with lotus.
HintWhen you have turned on Zero copy feature in lotus, please carefully tune accordingly parameters like thread_num or split_size, sometimes data copy speed could be even worse when mis-configuration happen.
Anyway, zero copy feature could reduce context change and would bring performace enhancement.
When you have turned on ZSTD feature in lotus, you need to add an entry in the external table ddl, things like this "Command: sh /tmp/gcurl.sh smdw:8081/dev/shm/lotus.txt". With such script, data stream IO on each segment would be redirected to the script itself. In the script, it is primarily a curl command with special http headers and unzstd command in linux pipe.
A sample script has been shipped together with lotus binary from version 1.1.0.
The main bottleneck for servers like lotus or gpfdist is the IO.Benchmark has been done with memory-mapped file which could be considered as high speed, so bottleneck is primarily from network card.
We have two 1000gb network cards, and one 10000gb network card. For lotus, just listening on 0.0.0.0, all the network cards would have 100% workload if there is no other bottlenecks. Below is the statistics.
Time cost | Number of segments in gpdb | Number of records | Size of records | |
gpfdist | 24 seconds | 192 | 300,303,000 | 37GB |
lotus | 16.85 seconds | 192 | 300,303,000 | 37GB |
The top speed for typical physical nic of 10000Gbps is about 10000/1.024/1.024/8=1192.0928955078 MB per second, so we can get that the time span to consume 11GB data is 11*1024/1192 = 9.4496644 seconds.
We have used only one 10000Gbps network card to do the benchmark, when ZSTD compression is on, time comsuption is around 3.2 seconds that is 3 times faster than full speed of the nic in traditional way. Below is the statistics.
Time cost | Number of segments in gpdb | Number of records | Size of records | |
lotus | 3182.563 milliseconds | 64 | 89,000,000 | 11GB |