Posts

Showing posts from 2018

Plotting Choropleths with Shapefiles in R- ggplot2 tutorial

Image
We all love some nice colored maps which can convey certain characteristics of the data. There are a lot of different libraries that provide in-built maps which can be rendered with one-liners. This post attempts to create maps or choropleths out of maps which can be created using the shapefiles available over the internet. The advantage clearly is that we can build any kind of map be it a country, state or region within a district or city. The dataset I'm experimenting with here is available at Kaggle . It is a dataset about the demographics of 500 top cities of India. It publishes details like Adult and child populations, Sex ratio, Literacy figures, and Graduate population figures in each of these cities. Through the course of this post, I'll explore the various functions and libraries of R that have been integrated together while telling some interesting facts about the dataset and Yes! all through the choropleths. What are choropleths and shapefiles anyway? Accord...

HBase Setup: Standalone Mode on windows machine

Image
HBase is a popular NoSQL kind of database over HDFS. It is a column-oriented database based on the Google's BigTable. It can be setup in 3-modes: Standalone Mode (without HDFS) Pseudo-Distributed mode (With single node HDFS) Fully-distributed mode Out of the 3, for now it can be setup on windows only in standalone mode . But this mode is enough to play around and get familiar with it. It is quite easy to set-up but there are a few caveats that come in handy while troubleshooting especially if you are new to HBase. I'll mention them as I go in red.  So,let's get started with the basic setup: First step is to download the stable version(found in the folder named "stable" at the mirror loction) of HBase as a tarball from the website and unzip into a suitable location.  From the suggested mirror site, look for the hbase- - bin.tar.gz Next thing to ensure is that Java is installed on your system and JAVA_HOME is set. To verify that open comm...

Multiprocessing in Python: with Windows machine

Multiprocessing  generally involves two approaches: via  threads  or via multiple  processes . Although multiple processes come with an overhead of combining the results from various processes; it makes better utilization of multiple CPU cores which is commonplace these days. There are 3 main elements to the execution of this concept: dividing the workload in logical chunks, passing the chunks to individual processes, and compiling the output from each of those processes. There are various libraries available in python for achieving real parallel programming, one popular choice being  multiprocessing . There are various blog posts available on the internet describing the nitty-gritty of working with this library (I have listed a couple in the references section). However, this post explores two other choices we can easily use to achieve the same. Concurrent.futures This module provides an interface for asynchronously executing callables. A ...

Self-Organizing Maps: An interesting Neural Network

Image
Introduction Kohonen Map or Self Organizing Maps are based on how our brain cells are organized to form topologically ordered maps that react to different sensory input signals by activating different regions of cerebral cortex. On similar lines, SOM also builds a lattice of neurons with similar neurons being closer to each other and uses this lattice to cluster the data together in lower dimensional space. It is an artificial neural network algorithm developed by Teuvo Kohonen to be applied in the field of unsupervised learning. It is based on the principle of competitive learning where the neurons compete with each other for every input sample with winning neuron making its output as one while rest all make it zero.  The process of learning happens in two phases: Ordering Phase where the neurons order themselves topologically and Convergence Phase where the final update of weights happens to move them further closer to the input space. SOM provides an effective and ea...