Run non-native code in Apache Beam/Dataflow
To run external code (non-native code) in Apache Beam/Dataflow is not straight forward, especially with little documentation/support available online. Apache beam supports Python and Java...
Streaming large files between S3 and GCS (python)
Streaming files between s3 and gcs can be a pain sometimes (1GB+). To stream data, your first choice should be Google Storage Transfer Service or...
Tips/Tricks to Freelancing
Today, i will share my experience about Freelancing. I have been doing freelancing for about 10 years now and things have worked pretty good for...
Floating Header in HTML using JQuery
You all know the freezed pane feature of MS Excel. What if you want to implement that in HTML ?? For example you have an...
Hadoop Distributed File System (HDFS)
Apache Hadoop is an open source framework for distributed storage and processing. The distributed storage part of framework is commonly known as Hadoop Distributed Filesystem...
Bulk Synchronous Parallel Model (BSP)
A high level introduction to Bulk Synchronous Parallel Model (BSP) with references