๐Ÿ‘‹ Welcome to Mind Hacking

Hello there. I'm thrilled to share my learning notes in this blog.

Deep Signal, Part 1: Problem Statement

DeepSignal โ€” an innovative open-source framework designed to redefine real-time video and audio processing on the Apache Spark platform.

Feb 18, 2024 ยท 5 min ยท Dima Statz

Aedo: Real-Time Content-Driven Ad Insertion Framework

ChatPal is a new VR game for Oculus, blending language learning with interactive fun. Inspired by Tamagotchi, kids engage with a virtual parrot to improve English skills through conversations and activities., ...

Feb 11, 2024 ยท 5 min ยท Dima Statz

ChatPal: English Learning Enhanced with VR

ChatPal is a new VR game for Oculus, blending language learning with interactive fun. Inspired by Tamagotchi, kids engage with a virtual parrot to improve English skills through conversations and activities., ...

Feb 9, 2024 ยท 5 min ยท Dima Statz

AI-Driven Ventures: A Framework for Developer Success

AI is a general purpose technology, meaning it is not usefull just for one thing, but it can be applied for a lots of different applications. Probably a good way to think about AI as a collection of tools: Supervised Learning, ...

Sep 18, 2023 ยท 5 min ยท Dima Statz

Semantic Kernel: Chat With Your Data

LLMs are amazing at generating text and have a wide range of applications, they're not a substitute for domain-specific knowledge and expertise.

Sep 14, 2023 ยท 5 min ยท Dima Statz

Building Virtual Assistants using LangChain

ChatGPT has impressive general knowledge, it can provide decent answers to various questions. However, when it comes to specific domains, its performance may fall short.

Aug 1, 2023 ยท 5 min ยท Dima Statz

Visum โ€” A Cloud Cost Optimization Platform

The worldwide infrastructure as a service (IaaS) market grew 41.4% in 2021, to total $90.9 billion, up from $64.3 billion in 2020. It is expected to be as high as $121.62 billion in 2022.

Sep 28, 2022 ยท 5 min ยท Dima Statz

Monitoring Spark Streaming on K8s with Prometheus and Grafana

Cost Efficiency and Portability are the main reason to migrate Apache Spark workloads from managed services like AWS EMR, Azure Databricks, or HDInsight to Kubernetes. You can learn more about the migration process from AWS EMR to K8s in the following article.

May 10, 2021 ยท 5 min ยท Dima Statz

Benchmarking Graviton2 processors with Apache Spark workloads

Amazon EC2 provides a broad portfolio of compute instances, including many that are powered by the latest-generation Intel and AMD processors. AWS Graviton2 processors add even more choice. AWS Graviton2 processors are custom-built by AWS using 64-bit Arm Neoverse cores to enable the best price-performance for workloads running on Amazon EC2

Dec 28, 2020 ยท 7 min ยท Dima Statz

Processing costs measurement on multi-tenant EMR clusters

One of the 5 pillars of the Well-Architectured Framework is Cost Optimization. The Cost Optimization pillar focuses on avoiding unnecessary costs, selecting the most appropriate resource types, analyzing spend over time, scaling in/out in order to meet business needs without overspending.

Nov 14, 2020 ยท 6 min ยท Dima Statz

Migrating Apache Spark workloads from AWS EMR to Kubernetes

ESG research found that 43% of respondents considering cloud as their primary deployment for Apache Spark. And it makes a lot of sense because the cloud provides scalability, reliability, availability, and massive economies of scale.

Sep 30, 2020 ยท 6 min ยท Dima Statz

Monitoring the performance of software teams using Github, Jira, and Grafana

There are a bunch of good articles on the web about transitioning to fully remote work, my favorite one is โ€œThe Remote Manifestoโ€ by GitLab. In addition, if you somewhat like us, and you are trying to build a data-driven team, you probably will need some good metrics to rely on in order to monitor your teamโ€™s performance

July 9, 2020 ยท 6 min ยท Dima Statz

Monitoring Distributed Jetty Servers in K8s using Prometheus and Grafana

Monitoring and alerting is a mandatory part of any software system running in a production environment. To keep software systems healthy, to optimize performance and resource utilization, you need a unified operational view, real-time granular data, and historical reference.

May 28, 2020 ยท 6 min ยท Dima Statz

No-Code Data Collect API on AWS

This article is all about moving data into Big Data Pipelines running on AWS. Since most data pipelines have 5 steps in common: collection -> storage-> processing -> analysis-> visualization, AWS has a very solid foundation for building all these steps.

May 17, 2020 ยท 9 min ยท Dima Statz

Handling Data Skew in Apache Spark

One of the well-known problems in parallel computational systems is data skewness. Usually, in Apache Spark, data skewness is caused by transformations that change data partitioning like join, groupBy, and orderBy.

Apr 30, 2020 ยท 6 min ยท Dima Statz

VerticaDB performance test with Locust.io

Locust is all about coding. You can manage all your tests in source control, share with your team, you can easily add, remove, fix any test, and you can automatically deploy it to any environment.

Feb 3, 2020 ยท 6 min ยท Dima Statz

No-Code Data Collect API

Building a data pipeline that handles 1,000,000 and more events per second is not a trivial task. To handle such big traffic, all data pipeline components should be designed and implemented properly. Fortunately, not all data pipeline components should be built from scratch.

Dec 5, 2019 ยท 8 min ยท Dima Statz

An honest AWS MSK review - July 2019

AWS MSK is a fully managed service that enables you to build and run applications that use Apache Kafka to process streaming data. Amazon MSK makes it easy to ingest and process streaming data in real time ...

Jul 21, 2019 ยท 7 min ยท Dima Statz

A Scala tutorial for Java developers

Scala was first introduced in January 2004 by Martin Odersky, it is JVM based and statically typed programming language. Scala supports both object-oriented and functional programming paradigms. The most well-known products written in Scala are Apache Spark, Apache Kafka, Apache Flink ...

Apr 20, 2019 ยท 8 min ยท Dima Statz