Mark Andreev (Senior Software Engineer)

// Contacts mark.andreev@gmail.com Linkedin Github
// CV: English

// Blog Deep dive into Apache Parquet Format How test Java code structure in unit tests (archunit) Java Two Way SSL Client (+ Spring example) [ All snippets, External publications ]

// Talks "ML in production" at the FunTech ML-meetup "Streaming vs Batching" at the Conundrum Meetup

// Projects [App] fix parser [App] .gitignore generator [Demo] Time series player [Demo] Offline lock with Redis Bigdata Indicators Tornado-swagger EnduringNet

// Certificates CockroachDB Query Performance for Developers Redis for Java Developers ScyllaDB. Data Modeling and Application Development AWS Well-Architected Training Deep Dive into AWS S3, Glacer, EFS Deep Dive on Container Security

// Development Stack Java. Spring: MVC, Data, AMQP, Kafka, Integration, Batch, Security, State machine, Apache Camel, Vert.x, GraalVM. Python. Pandas, Scikit-learn, XGBoost, LightGBM, Catboost, Matplotlib, Tornado, FastAPI, Flask Data Processing. Spark, Flink, Cassandra, Hadoop, Kafka, PostgreSQL. Third party. Oracle Database, PostgreSQL, Clickhouse, Kafka, Keycloak, MongoDB, RabbitMQ, Redis, Prometheus, Docker, Kubernetes, Helm, Linux, Airflow Cloud. AWS {EC2, S3, RDS, CloudFront, SQS, SNS, Lambda, Batch, IAM, Registry}; Azure {VM, BLOB, Registry}

// Experience summary

Accomplished Senior Software Engineer, specializing in optimizing ML systems and MLOps with Java, Python, Kubernetes, Spring, and Vert.x for Kafka & Clickhouse feature stores. Also proficient in PostgreSQL and Redis. Recognized for significant achievements, including a 3x improvement in popular queries, a 5x reduction in CPU load for Kafka subscription proxies, and a stellar 15x increase in query speed. I bring a deep understanding of trade-offs, balancing performance enhancements with resource efficiency.

Actively contributing to open-source projects such as Apache Spark, Apache Airflow, Apache Camel, Apache Ignite, Keycloak, Clickhouse and Tornado Swagger, with diverse contributions from implementing a fix for header override by Azure Storage Blob download consumer to advanced features like target encoding preprocessor and CatBoost inference integration.

Armed with a Master’s Degree in applied mathematics and computer science from Lomonosov Moscow State University, I am passionate about driving innovation in machine learning technologies.

// Experience Man Group plc, 2024 - now Senior Software Engineer, May 2024 - now

[Java, Python, Oracle Database, Kafka, RabbitMQ, Spring]

Conundrum.ai, 2017 - 2024 Senior Software Engineer, Sep 2022 - May 2024

Implement low level optimization for feature store on top of Kafka & Clickhouse (projections, application level query planner) | Increase one of the most popular queries 3x times
Create performance regression tests (Gatling) | Cover 80% of queries that eliminates major performance degradations
Create performance optimizations for kafka subscription proxy (java 21 virtual threads, shared subscription) | Decrease CPU load 5x times
Implement security improvements for Platform (audit, L4 network policies, L7 network filter) | Apply security IS requirements at network level
Cover platform’s services with health performance metrics (Prometheus, Grafana, Alerts) | Decrease issue investigation time 3x times

[Java, Python, Query optimizations, Load tests, Kafka, Clickhouse, Kubernetes, Feature store, Spring, Vert.x, GraalVM]

Senior Software Engineer, Sep 2019 - Aug 2022

Migrate feature store to Kafka & Clickhouse (Column OLAP DB) | Increase query speed 15x times
Create low level connectors for Industrial Data Exchange formats (MQTT, OPC UA, Historian) | Decrease CPU load to exchange server 3x times
Migrate model serving runtime to Kubernetes (KubeAPI, Helm)
Deploy platform to AKS (Azure Cloud) & K3s (on premisses, no internet)

[Java, Python, Kubernetes, Helm, Kafka, Clickhouse, Azure, AKS, OLAP, Feature store, Spring]

Middle Software engineer, Nov 2017 - Sep 2019

Create feature store for sensor’s time series data (Java, Spring, PostgreSQL, TimescaleDB)
Create model serving runtime server (Python, Processes)
Create incident management service (Java, Spring, State machines)
Create ETL based on S3, SQS, S3 SFTP

[Java, Spring, Python, AWS, ETL, Feature store, PostgreSQL, TimescaleDB, State machines, S3, SQS, S3 SFTP]

Junior Machine learning engineer, May 2017 - Oct 2017

Airline data clusterization. Create approach for data splitting for offline AB tests
Telecom data chron. Create solution for offline chron scoring based on telecom data activity
Web data gender detection. Create solution for offline gender detection based on web activity
Mobile data geo analysis. Create reports about geo activity based on mobile location data
Timeseries data for Industrial data. Create data pipeline for failure prediction

[Python, PySpark, Spark, SQL, Scikit learn, Pandas, XGBoost]

Big Data Indicators Internship, Oct 2016 - May 2017 Machine Learning Engineer

Create data collection & processing pipeline
Use topic models for discover trends
Create sentiment analysis models for trend prediction

[Data mining, Python, MongoDB, Redis, Machine learning, NLP, Topic models]

// Education Lomonosov Moscow State University Master's degree. Computational Mathematics and Cybernetics Moscow Power Engineering Institute Bachelor's degree. Institute of automatics and computer science

// Contribute to Open Source

Apache Spark
- [SPARK-49044][SQL] ValidateExternalType should return child in error.
- [SPARK-49490][SQL] Add benchmarks for initCap.
- [SPARK-49549][SQL] Assign a name to the error conditions _LEGACY_ERROR_TEMP_3055, 3146.
Apache Airflow
- [AIRFLOW-43853] Add logging support for init containers in KubernetesPodOperator (#42498)
- [AIRFLOW-43847] Add random_name_suffix to SparkKubernetesOperator (#43800) (#43847)
- [AIRFLOW-43840] Fix logs with leading spaces in the Docker operator (#33692) (#43840)
- [AIRFLOW-27553] Add ipc_mode for DockerOperator (#27553)
Apache Ignite
- [IGNITE-13713] Implemented target encoding preprocessor.
- [IGNITE-13714] Implemented catboost inference integration.
- [IGNITE-13386] Implemented new distances (BrayCurtis, Canberra, JensenShannon and etc).
Keycloak
- [KEYCLOAK-19743] Fix null username in ldap.
Apache Camel
- [CAMEL-16092] Add fix for header override by Azure Storage Blob download consumer.
Clickhouse
- [ClickHouse-37228] Remove group.id from StorageKafka::createWriteBuffer.
Vaadin Flow
- [FLOW-20058] fix: add nodeVersion in gradle plugin settings.
Tornado Swagger
- [REPOSITORY] Swagger API Documentation builder for tornado server.

// Publications A New Approach to Determining the Attitude of Authors of Short Texts to the Topics Discussed in the Texts on the Example of Estimating the Inflations Expectations, Oct 2017 DATA ANALYTICS AND MANAGEMENT IN DATA INTENSIVE DOMAINS, DAMDID / RCDL’2017, Andreev M. Big Data approach to measure inflation expectations: the case of the Russian economy, Jul 16, 2017 IFABS 2017 Oxford Conference, Goloshchapova I., Andreev M. Measuring inflation expectations ofthe Russian population with the help of machine learning. Voprosy Ekonomiki. 2017;(6):71-93. (In Russ.), Goloshchapova I., Andreev M.

// Social Media