Baical - P7 library

P7 is open source and cross-platform library for high-speed sending telemetry & trace data from your application with minimal usage of CPU and memory. Library contains exhaustive documentation inside package.

Features:

C++/C/C#/Python support
Cross platform (Linux x86/x64, Windows x86/x64)
Speed is priority, library and server are designed and optimized to suit high load, for example average performance for Intel i7-870 (more than 10 years old CPU) is

Traces:

0,5% CPU, one core: 450 000 traces per second (binary file or network)
100% CPU, one core: ~5.7 million traces per second to network
100% CPU, one core: ~10 million traces per second to binary file

Telemetry:

0,5% CPU, one core: 600 000 telemetry samples per second (binary file or network)
100% CPU, one core: ~6.8 million telemetry samples per second to network
100% CPU, one core: ~11 million telemetry samples per second to binary file

Small memory footprint (optional, min is 16KB) - used for embedded devices
Thread safe
Unicode support (UTF-8, UTF-32 for Linux, UTF-16 for Windows)
ANSI char support
No external dependencies
High-resolution time stamps (resolution depends on HW high-resolution performance counter, usually it is 100ns)
Different sinks (transport & storages) are supported:

Network (Baical server)
Binary File
Text File (Linux: UTF-8, Windows: UTF-16)
Console
Syslog (RFC 5424)
Auto (Baical server if reachable, else - binary file)
Null

Files rotation setting (by size or time)
Files max count setting (deleting old files automatically)
Remote management from Baical server (set verbosity per module, enable/disable telemetry counters)
Shared memory is used - create your trace and telemetry channels once and access it from any process module or class without passing handles
Crash handler or in case of user defined crash handler - special function to flush all P7 buffers for all P7 objects inside process in case of crush
Trace & telemetry files have compact binary format (due to speed requirements - binary files much more compact than raw text), export to text is available
Command line interface for configuration may be used in addition to application parameters
Big/Little endian support
Intel/AMD, ARM, MCST, PowerPC, Baikal T1, etc.
GCC, VC++, Clang

Internally P7 has very simple design, and consist of few sub-modules:

Channel - named data channel, used for wrapping user data into internal P7 format. For now there are next channels types are available:

Telemetry
Trace

Sink - module which provides a way of delivering serialized data from channels to final destination. Next types are available for now:

Network - deliver data directly to Baical server
Binary File - writes all user data into single binary file
Text File - writes all user data into single text file (Linux: UTF-8, Windows: UTF-16)
Console - writes all user data into console
Syslog - writes all user data into UDP socket using Syslog format
Auto - delivers to Baical server if it is reachable otherwise to file
Null - drops all incoming data, save CPU for the hosting process

Client - is a core module, it aggregate sink & channels together and manage them. Every client object can handle up to 32 independent channels

Let’s take an example (diagram below) - developed application has to write 2 independent log (trace) streams and 1 telemetry stream, and delivers them directly to Baical. Initialization sequence will be:

First of all you need to create P7 Client, and specify parameters for sink ”/P7.Sink=Baical /P7.Addr=127.0.0.1”
Using the client create:

first trace channel with name ”Core”
second trace channel named ”Module A”
telemetry channel named ”CPU, MEM”

From software engineer’s point of view trace is a source code line (function with variable arguments list):

...
P7_TRACE(0, TM("Test trace message #%d"), 0);
...

And at another side it looks like that:

It is very similar to logging, but unlike logging - trace gives your much more freedom, you don’t have to choose which information to write, you may write everything (without impacting on application performance, 50K traces per second with 0.5% CPU for example, for details see Speed test chapter in documentation) and then during debugging session use flexible filtering engine to find interesting parts, in this case you will be sure that all necessary information is available for you.

This approach became possible due to P7 performance. Trace module was designed with the idea of performance, especially on small embedded system.

To be able to send so much information next optimizations are used:

Do not delivers & records duplicated information every time - the most heavy text fields [Format string, Function name, File name, File line number, Module ID] are delivered & recorded once - only for first call (the same information will be transmitted once in case of new connection establishing)
Do not format trace string on client side, variable arguments formatting is a heavy operation and it will be done on server side by request
Deliver only changes for every subsequent trace call [variable arguments, sequence number, time with 100ns granularity, current thread, processor core number]

N.B.: The best performance is provided by C++ and C interfaces (release build), C# & Python wrappers provides less performing solutions.

From software engineer’s point of view telemetry is a few source code lines:

IP7_Client    hClient    = P7_Create_Client(TM("/P7.Sink=Baical /P7.Addr=127.0.0.1"));
IP7_Telemetry hTelemetry = P7_Create_Telemetry(hClient, TM(("AppStatistics"));
tUINT16       wCpuId     = 0;
tUINT16       wMemId     = 0;
tINT64        llCPU      = 0;
tINT64        llMem      = 0;

hTelemetry->Create(TM("System/CPU"), 0, 100, 90, 1, &wCpuId);
hTelemetry->Create(TM("System/Mem(mb)"), 0, 500, 450, 1, &wMemId);

while (/* ... */)
{
    //query in cycle qurrent CPU & mem values ...
    //llCPU = Get_CPU_Utilization();
    //llMem = Get_Mem_Utilization();

    //deliver info
    pTelemetry->Add(wCpuId, llCPU);
    pTelemetry->Add(wMemId, llMem);

    //do something ...
}

And at another side it may looks like that (thread cyclograms, buffers sizes, delays, handles count, etc.):

Telemetry is a simple and fast way to record any dynamically changed values for further or real time analysis on Baical server side. You may use it for a lot of things: system statistics (cpu, memory, hdd, etc.), buffers filling, threads cyclograms or synchronization, mutexes, networks delays, packets sizes, etc. There are plenty of possible usage cases.

Some facts about telemetry:

Every telemetry channel can handle up to 65k independent counters
No (or minimal) impact on application performance - on modern hardware (2014) spend only 300 ns for processing one telemetry sample (add(...) -> network -> Baical srv -> HDD), it is about 220 000 of samples per second with about 1% CPU usage
You can enable or disable counters online from Baical server - it allows you visualize and record only necessary data
Every telemetry sample contains 64 bit signed value & high resolution time stamp

N.B.: The best performance is provided by C++ and C interfaces (release build), C# & Python wrappers provides less performing solutions.

Overview

Components

Trace

Telemetry