In frames of this issue binary BVH tree produced by building algorithms was collapsed into 4-ary BVH (QBVH).
The BVH traversal code in GLSL was modified to process such trees correctly.
This allows to implore thread coherence, decrease BVH memory consumption (~2 times), and use traversal stack of the half size.
As a result, ray tracing scalability is improved, as well as rendering performance. For various setups, speedup is 12-18%.