About myself
Did Visual Basic and C# very early as child, learned C++ in 2013. First real projects doing videogame modding circa 2017 - 2019, all C++ with Qt and WinAPI
Then in 2020 I pivoted to machine learning because I was interested in speech synthesis for creativity. Went from zero Python experience then to maintainer of TensorFlowTTS in 2021 where I contributed the then-first open source TTS model that output 44.1KHz audio.
I went to uni 2019-2021 but dropped out to focus on machine learning aside from feeling like it was limiting me. I made TensorVox (C++ ML TTS app) and began freelancing, mostly just training and slightly modifying models until 2023 when I started working for Storyteller
and went in deep into model research itself, releasing Convolutional Attention Consistency and innovating things like LS-ALiBi
Now I focus on research in model architecture - and by that I mean mostly Transformers - for GenAI (mostly speech, but doing image and text too). Did you know I worked with GPTs before the ChatGPT hype? I finetuned GPT-J-6B back in 2021 with TPU Research Cloud (back then Tensorflow Research Cloud) TPUs. Also, check out my Substack blog
"I am not designed to come second or third. I am designed to win."
— Ayrton Senna
Currently 24 years old. I need compute and smart colleagues; email for business/job offers: nika109021@gmail.com
Projects by Year
2024
ReLUGT activation function Python Machine Learning PyTorch
GLU variants have been the dominant activation function for the feedfowards in Transformers. Here, inspired by an element of KANs, I propose a parametric ReLU
Early tests shows faster convergence than SwiGLU at the cost of slightly lower per-step time and more VRAM usage.
Blog post 1 (theory and small test)
Blog post 2 (GPT test)
(WIP) Celestia-TTS Python Machine Learning PyTorch
Most modern TTS models are incapable of modeling erratic, emotional speech; using FastSpeech2 as a base, I find out this is because they are not resilient to time series. Also, I pile on some of my Transformer upgrades and some others.
Introducing a duration predictor with 2xConv and 1xLSTM paired with a duration discriminator (not shown in paper). Currently incomplete due to lack of compute.
Audio samples and PDF
WinDiffusion C++ Qt ONNX
Many frontends for using Stable Diffusion models on local PC exist, but practically all of them are in Python and necessitate ~10GB of libraries installed via conda/pip that can break at any time.
This is a Stable Diffusion/SDXL program written in C++/Qt, without Python dependencies. Easy to install and lightweight, it makes local image generation accessible to people with minimal tech knowledge. Supports text-to-image, img2img and inpainting
Code on GitHub
2023
Tacotron 2 Convolutional Attention Consistency Python Machine Learning PyTorch
Tacotron 2 is a text-to-speech model, still very popular and one of the first to become so. However, its autoregressive RNN architecture makes it prone to attention problems
Many techniques have been already applied to help in this aspect. Here, leveraging recent advancements, I propose mine, which is more flexible, and simple to integrate.
Paper page
2021
TensorVoxC++ Qt PyTorch Tensorflow
Few programs for using modern text-to-speech models locally exist, and those that do are in Python and require a bunch of dependencies
Since I also got into freelancing training and selling voice models, I needed and made a C++/Qt program that was easy and lightweight to install and use for neural TTS.
Supports both PyTorch (exported with TorchScript) and Tensoflow (Tensorflow SavedModel), multiple languages and phonemizers (ARPA, GlobalPhone, IPA), includes spectrogram, waveform views, and can do multi-threaded generation
Almost 200 stars on GitHub, and succesfully delivered to all kinds of customers for years while I was freelancing.
Code on GitHub
2019
ZMapCharterC++ QtSFML
I had an interest in statistics and couldn't find a map charter and I liked the challenge so I made one
Supports global mapping via static region textures I took off a game, or custom triangulated shapefiles (a format I rolled up), and supports importing from .csv of .xlsx files. Uses SFML for rendering
Download binary
Chess Titans cloneC++ QtSFMLIrrlicht 3D
A Chess Titans clone made with C++, using Irrlicht for the 3D and SFML for sound, as well as Qt for a launcher (the game itself uses console arguments). Tested in both Windows and Linux (Kubuntu). The AI is not mine.
Download source and binaries
Gameplay video
2018
ZDEditorRSC++ Qt InterBase 5.x
There's a geopolitical strategy game called SuperPower 2, which supports modding.
For modifying the database, there is a tool called "GLEditor" provided; however, it is extremely glitchy and crashy due to it being antiquated; its subset of C++ was already long out of support by that time
With help from the game maker who provided the original database interaction code, I rewrote both the interface and "database interaction" in modern C++
Since its release, it has become the de-facto database modding tool for the SuperPower 2 community.
Code on GitHub
Download binary
2016
Plane InstallerC#
Automatic installer for a Microsoft Flight Simulator X plane mod.
Overbranding was because I was 16
Download binary
Open Source Contributions
2020
MFA duration extraction for FastSpeech2
I helped implement Montreal Forced Aligner-based duration extraction for FastSpeech2, eliminating the need for a teacher model. I had just started learning Python, so my code was not used directly.
View PR comment
C++ Support for TensorFlowTTS
I added C++ support for exporting and using text-to-speech models with C++, using the Tensorflow SavedModel C API
View PR
2021
Update to C++ support for TensorFlowTTS
Updated the C++ support to remove a phonemizer dependency and add my much lighter own implementation
View PR
44.1KHz pretrained model for TensorFlowTTS
Having made a copy of the LJSpeech dataset, but in 44.1KHz, I trained and added pretrained model to the repo, and introduced a new vocoder configuration: Multi-Band MelGAN + HiFi-GAN discriminator
View PR