The site I make about me

About myself

Did Visual Basic and C# very early as child, learned C++ in 2013. First real projects doing videogame modding circa 2017 - 2019, all C++ with Qt and WinAPI

Then in 2020 I pivoted to machine learning because I was interested in speech synthesis for creativity. Went from zero Python experience then to maintainer of TensorFlowTTS in 2021 where I contributed the then-first open source TTS model that output 44.1KHz audio.

I went to uni 2019-2021 but dropped out to focus on machine learning aside from feeling like it was limiting me. I made TensorVox (C++ ML TTS app) and began freelancing, mostly just training and slightly modifying models until 2023 when I started working for Storyteller and went in deep into model research itself, releasing Convolutional Attention Consistency and innovating things like LS-ALiBi

Now I focus on research in model architecture - and by that I mean mostly Transformers - for GenAI (mostly speech, but doing image and text too). Did you know I worked with GPTs before the ChatGPT hype? I finetuned GPT-J-6B back in 2021 with TPU Research Cloud (back then Tensorflow Research Cloud) TPUs. Also, check out my Substack blog

"I am not designed to come second or third. I am designed to win."

— Ayrton Senna

Currently 24 years old. I need compute and smart colleagues; email for business/job offers: nika109021@gmail.com

Projects by Year

2024

ReLUGT activation function Python Machine Learning PyTorch

Project Interface

GLU variants have been the dominant activation function for the feedfowards in Transformers. Here, inspired by an element of KANs, I propose a parametric ReLU

Early tests shows faster convergence than SwiGLU at the cost of slightly lower per-step time and more VRAM usage.

Result 1
Blog post 1 (theory and small test) Blog post 2 (GPT test)

(WIP) Celestia-TTS Python Machine Learning PyTorch

Project Interface

Most modern TTS models are incapable of modeling erratic, emotional speech; using FastSpeech2 as a base, I find out this is because they are not resilient to time series. Also, I pile on some of my Transformer upgrades and some others.

Introducing a duration predictor with 2xConv and 1xLSTM paired with a duration discriminator (not shown in paper). Currently incomplete due to lack of compute.

Audio samples and PDF

WinDiffusion C++ Qt ONNX

Project Interface

Many frontends for using Stable Diffusion models on local PC exist, but practically all of them are in Python and necessitate ~10GB of libraries installed via conda/pip that can break at any time.

This is a Stable Diffusion/SDXL program written in C++/Qt, without Python dependencies. Easy to install and lightweight, it makes local image generation accessible to people with minimal tech knowledge. Supports text-to-image, img2img and inpainting

Result 1 Result 2 Result 3
Code on GitHub

2023

Tacotron 2 Convolutional Attention Consistency Python Machine Learning PyTorch

Project Interface

Tacotron 2 is a text-to-speech model, still very popular and one of the first to become so. However, its autoregressive RNN architecture makes it prone to attention problems

Many techniques have been already applied to help in this aspect. Here, leveraging recent advancements, I propose mine, which is more flexible, and simple to integrate.

Paper page

2021

TensorVoxC++ Qt PyTorch Tensorflow

Project Interface

Few programs for using modern text-to-speech models locally exist, and those that do are in Python and require a bunch of dependencies
Since I also got into freelancing training and selling voice models, I needed and made a C++/Qt program that was easy and lightweight to install and use for neural TTS.
Supports both PyTorch (exported with TorchScript) and Tensoflow (Tensorflow SavedModel), multiple languages and phonemizers (ARPA, GlobalPhone, IPA), includes spectrogram, waveform views, and can do multi-threaded generation

Almost 200 stars on GitHub, and succesfully delivered to all kinds of customers for years while I was freelancing.

Code on GitHub

2019

ZMapCharterC++ QtSFML

Project Interface

I had an interest in statistics and couldn't find a map charter and I liked the challenge so I made one
Supports global mapping via static region textures I took off a game, or custom triangulated shapefiles (a format I rolled up), and supports importing from .csv of .xlsx files. Uses SFML for rendering

Result 1 Result 2 Result 3
Download binary

Chess Titans cloneC++ QtSFMLIrrlicht 3D

Project Interface

A Chess Titans clone made with C++, using Irrlicht for the 3D and SFML for sound, as well as Qt for a launcher (the game itself uses console arguments). Tested in both Windows and Linux (Kubuntu). The AI is not mine.

Download source and binaries Gameplay video

2018

ZDEditorRSC++ Qt InterBase 5.x

Project Interface

There's a geopolitical strategy game called SuperPower 2, which supports modding.

For modifying the database, there is a tool called "GLEditor" provided; however, it is extremely glitchy and crashy due to it being antiquated; its subset of C++ was already long out of support by that time
With help from the game maker who provided the original database interaction code, I rewrote both the interface and "database interaction" in modern C++
Since its release, it has become the de-facto database modding tool for the SuperPower 2 community.

Code on GitHub Download binary

2016

Plane InstallerC#

Project Interface

Automatic installer for a Microsoft Flight Simulator X plane mod.

Overbranding was because I was 16

Download binary

Open Source Contributions

2020

MFA duration extraction for FastSpeech2

I helped implement Montreal Forced Aligner-based duration extraction for FastSpeech2, eliminating the need for a teacher model. I had just started learning Python, so my code was not used directly.

View PR comment

C++ Support for TensorFlowTTS

I added C++ support for exporting and using text-to-speech models with C++, using the Tensorflow SavedModel C API

View PR

2021

Update to C++ support for TensorFlowTTS

Updated the C++ support to remove a phonemizer dependency and add my much lighter own implementation

View PR

44.1KHz pretrained model for TensorFlowTTS

Having made a copy of the LJSpeech dataset, but in 44.1KHz, I trained and added pretrained model to the repo, and introduced a new vocoder configuration: Multi-Band MelGAN + HiFi-GAN discriminator

View PR