Mathematics is the foundation on which machine learning is build. We will refresh few basic concepts to get started.
Vector is a construct that represents both a direction as well as a magnitude.
Algebraically, a vector is the collection of coordinates that a point has in a given space. Geometrically, vector is a ray that connects origin to the point.
Following figure shows examples of vectors in two dimensional Euclidean space.
L2 norm calculates the distance of the vector coordinate from the origin of the vector space. It is also known as the Euclidean norm as it is calculated as…
Apache Storm is a free and open source distributed realtime computation system. Apache Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing.
Apache Storm cluster runs Topologies, which processes messages forever.
Storm cluster consists of two kinds of nodes (master node and worker node) along with a resource manager (Zookeeper).
Machine learning (ML) is to train a machine so that it can make decisions for us. This can be achieved by expert system or machine learning.
Expert system is a computer system that emulates the decision-making ability of a human expert.
Expert system are also known as Rule Based Systems. It emulates how a human makes a decision. Humans look at inputs (features) and then based on previous experience, decide the output. For example an expert system for disease diagnosis will use a fact database and then use the if-then statements to infer the disease from symptoms.
For many well…
EC2 is one of the most popular AWS offering. It provides the capabilities of:
Note: Ensure write permission to certificate file (chmod 400 certificate_file)
Using certificate location with CLI
ssh -i ~/certificates/aws.pem ec2user@ec2-instance-url
Alternative add the configuration to ~/.ssh/config
Now use the Host name to ssh
It is possible to bootstrap the instance using user data script. Bootstrapping means launching commands when a machine starts. This script…
Initially big data started with collecting huge volume of data and processing it in smaller and regular batches using distributed computing frameworks such as Apache Spark. Changing business requirements needed to produce results within minutes or even in seconds.
This requirement is achieved by running the jobs in smaller interval (micro-batch) as per the result duration. There are various problems that arise with smaller intervals e.g. whether to process all data, which is inefficient, to produce result or does incremental processing and add the new result with the earlier results. How to ensure that records within an interval are available…
Initially software applications were small which can be deployed on a single computer. Over a period of time, data volume processed by these application grew in size. Hence, the requirement for storage and computing power grew. These requirements were fulfilled by rapid advancements in storage and compute hardware by having larger disks and faster CPUs. This way of scaling is termed as vertical scaling, which soon became costlier when applications started being consumed over Internet.
An overview of using Golang modules with CircleCI, which is a continuous integration and delivery platform.
Golang modules is a Golang dependency management system. It makes dependency version information explicit and easier to manage. It lets you work from any directory and not just from GOPATH. It allows to install specific version(s) of a dependency package to avoid breaking changes. The go.mod file list the dependency of the project so that all dependencies need not be distributed with the package.
A module is like a package that you can share with other people. …
Virtualization solutions allow multiple operating systems and applications to run in independent partitions on a single computer. Using virtualization capabilities, one physical computer system can function as multiple “virtual” systems.
Virtualizing a platform implies more than a processor partitioning: it also includes other important components that make up a platform, e.g. storage, networking, and other hardware resources. We will look at some of the component virtualization in this article, specific to Intel platform.
ISA (Industry Standard Architecture) was first standard used for buses connecting peripheral devices to CPU. ISA was developed for 16-bit machines and did it’s job pretty well…
Here is a typical architecture having Sources, Sinks, Connect Cluster, Kafka Cluster and Kafka Streams Applications.
Kafka Cluster is made up of multiple brokers. There are Source(s) which we want to get data from and put in Kafka Cluster. In between comes Connect Cluster, which is made of multiple workers. Workers pull data from Sources  by specifying the Connector and corresponding configuration and uses the logic embedded in the connector. After getting the data, it pushes this data to Kafka Cluster .
Now, data may need to be transformed by transformation, aggregation, joins etc. This is done by using…
Apache Kafka is a distributed publish-subscribe messaging system that is designed to be fast, scalable, and durable. Kafka stores streams of records (messages) in topics. Each record consists of a key, a value, and a timestamp. Producers write data to topics and consumers read from topics.
Topics refer to a particular stream of data. It is similar to a table in a database. A topic is identified by its name.
For each topic, the Kafka cluster maintains a partitioned log. Topics are split in partitions. …