Optimized logs processing algorithm with a result of tenfold speedup.
Made a research on parallelization using CUDA.
At the moment I'm working on distributed high-load fault-tolerant message processing system. We use Scala as a primary language for development and in particular akka framework. The project is based on Kafka messaging system.
My goal was to determine reasons of users doing one exact action. During the project I used Map-Reduce technology (C++ custom library), dremel (the base for BigQuery) - SQL like bigdata query system and my analytical skills. As a result I figured out several behavioural models giving the answer to the main question of the project. Some of them were obvious, and some of them were surprising for everybody. At the end I had a presentation of my results for other offices.
Mined data from user logs using Map-Reduce technology (mostly custom library for Python). Successfully mined most frequent object titles and split them into groups naming the same object, knowledge of statistical theory was applied here. Designed and implemented backend responding with random feed of pages for different users. The feed was various by objects and infinite but stable, i.e. the i-th element of feed was always the same for a given user. Backend was designed to be distributed.
Developing backend for the project. We use REST API. Implemented several new features (comments, ratings). Redesigned code base resulting in more than double reduction of code with business logic. Improved readabilty.
Distributed systems: Apache Kafka, Apache Zookeeper with Apache Curator, Apache Cassandra, Map Reduce libraries, Aerospike, Samza
Used in work: Apache Curator, Graphite, Cuda, Antlr4, Samza, MapReduce, Cuda, Latex, Unix utils, nginx
Concurrent Versions Systems: git, mercurial
Scala frameworks: Akka
C# frameworks: MVC, Entity Framework
Python frameworks: Django, Django REST framework
CSS frameworks: Bootstrap
Data storage systems: Apache Kafka, Apache Zookeeper, MySQL, Redis, Aerospike
Build tools: make, gradle, maven, grunt
Serialization: JSON, Protocol buffers (protobuf), Apache Avro
Worked with clouds: Google Compute Engine, Amazon Elastic Compute Cloud, Google Cloud Storage
Algorithmic problems solving, designing distributed fault-tolerant systems