What you need to know about the new web architecture.

The traditional 3-tier architecture is dead, or at least its dying quickly. In a traditional 3-tier web architecture the tiers were defined as:

  1. Client: HTML, CSS and JavaScript
  2. Server: A server-side framework in Java, Python, Ruby, PHP, Node.js/JavaScript, etc.
  3. Database: A relational database including stored procedures inside the database

Each tier had a specific job to do:
Client: render the UI
Server: business logic (controller) plus generate updates to the UI (view) based on queries run against the database (model)
Database: data access and storage

So what’s changing? Literally, everything. Every layer of the stack is undergoing a massive change that necessitates a change to the architecture.

Client to UI/UX

TL;DR: The client-layer has evolved from static HTML to advanced, thick clients composed of JavaScript. These new JavaScript apps require a UI/UX API to provide a portion of their functionality. Further, mobile platforms often share the same UI/UX API.

The web client has evolved from HTML, CSS, and JavaScript to JavaScript, CSS, and HTML (where order indicates the importance of the code in delivering a high-quality user experience).

JavaScript heavy clients have become the norm and are table stakes for today’s modern web apps. The emergence of JavaScript in thick web clients (aka Single Page Applications, or SPAs) has given rise to a larger number of advanced UI JavaScript developers who must deliver increasingly advanced functionality in their apps. As a result, modern apps demand more from UI/UX developers and this has driven the need for UI/UX engineers to have control of their own server-side API.

Node.js has emerged as the go-to solution for UI/UX server-side API development, although other scripting languages such as Python and Ruby remain popular choices. Essentially, a Node.js API (or equivalent) is a thin API layer that represents the Model portion of the older, monolithic server-side frameworks.

Further, mobile development for iOS and Android require a UI/UX API. Consolidating this new API requirement within the UI/UX team allows all customer facing application development to move at a faster pace.

The last big driver that necessitates the creation of a UI/UX API is the fact that the UI/UX API calls a multitude of other internal APIs (for various platform services or data services) and/or external APIs. The UI/UX API tier helps to consolidate these various API calls into a single API endpoint that can be called from JavaScript, Objective C/Swift, or Java

Server to Services

TL;DR: The older server-side MVC monoliths have been broken apart into specialized functions. The Model layer has been pushed down into Data Services, The View layer has been pushed up into the UI/UX team, and the Controller is now an entire Services API layer that provides common functionality that is used by multiple apps.

The traditional server-tier has been broken apart into specialized functions. Traditional server-side frameworks consisted of an MVC architecture (Model, View, Controller). These older applications were monolithic code bases that did everything from querying the data layer, running business logic to rendering UI components.

As discussed above, the View portion of server-side MVC has been taken over by the new UI/UX API server.

The traditional MVC server-side frameworks have given way to a more specialized business logic layer, which consists of APIs capable of handling various service-oriented functions. The new Services APIs consist of common Platform Services plus reusable app Services APIs.

The Model layer, or server-side data access layer, has been pushed into a new layer known as data services. Data Services is the newly evolved data team. We’ll discuss the data layer more below.

While it may appear that the server tier has been reduced in scope, the reality is that the Services API layer is the core infrastructure team. Neither the UI/UX layer nor the Data Services layer would be able to develop functionality as quickly as they do without the platform and shared services delivered via the Services API tier.

Database to Data Services

TL;DR: Data storage and query is undergoing a revolution.

Much as the client layer has undergone an explosion of capability, the data layer now consists of a myriad of technologies.

Live was easy for the data team when relational databases were the only option. RDBMS’ provide a fully integrated data environment, complete with the SQL query language, an integrated query engine, stored procedures, logical abstractions and physical storage.

However, the modern data layer consists of a multitude of specialized data components that often separate the query language, the query engine, logical abstractions and physical storage.

Let’s use Cassandra as a quick example. In Cassandra, data engineers write queries using CQL. However, to actually run CQL the data engineer must embed the CQL in Java, Python or another supported language. So, now the data engineer requires an execution environment for their query code and they must give access to the query layer to the Services team and the UI/UX team. The obvious solution is for the Data Services team is to run their own Data Services API layer, which is exactly what has happened. Contrast this with an RDBMS where all a stored procedure is embedded inside the database and the API is the stored proc’s function signature.

Summary

The traditional 3-tier architecture of client, server and database are being replaced by new tiers that more closely align with modern applications:
– UI/UX
– Services
– Data Services.

The UI/UX layer now contains a full stack of its own including rich, thick clients written in JavaScript plus it’s own server-side API.

The Services layer is now more specialized as the view layer has been pushed to UI/UX and the model layer has been pushed to Data Services. This enables the Services layer to focus on what it does best, which is write advanced business logic and provide platform services that are common across multiple apps.

Data Services, which was previously confined to relational databases, now runs multiple data storage technologies, just one of which is a relational database. Data Services now runs its own API layer as well.

These changes align well with modern application development and help accelerate development cycles. UI/UX can deliver client functionality faster by leveraging the core infrastructure provided by the Services team and owns it’s own server-side API to quickly integrate the data provided by the Data Services team.

Advertisements

Child safety app idea

My wife and I found a lost child at the farmer’s market today. After looking all over for the parents we called 911 to notify the police.

To say the least, it was extremely frustrating realizing that there was no way to communicate with people in the local area about the missing child. I’m sure the child’s parents also wish there was a way to broadcast their search for their child.

This got me thinking. What I needed at that moment was a proximity based social network where information about the found child could be broadcast and where the parents could broadcast the fact that the child was missing. Further safety measures could be put in place such as the child’s and parents’ pictures, physical characteristics, etc. This type of rich information, combined with the child’s picture, could then be sent directly to the police. There are numerous other safety features that could be added.

As for a business model, charge parents a monthly fee to be a part of this network.

Hello blog

It’s been a year or so since I’ve written a new blog post, but I’ve been itching to get started again.

I’m currently working on several new technologies and I’m building a new startup, so I have a lot on my mind.

While I work on Node.js and AngularJS day to day, I’ve been learning mobile development for both iOS (Swift) and Android (Java) in parallel. Plus my wife (who’s a data engineer) is getting ready to re-enter the workforce. She’s a fairly kickass engineer and has been brushing up her skills in Cassandra and Python. So, I’ve been learning both so that I don’t sound like an idiot when she talks to me.

I also plan to clean up my blog categories and to mark old articles as such. Thanks for listening.

Akbar

How to create webm videos on Windows (convert mp4 to webm)

Overview

WebM is a free and open video format designed for HTML5. WebM is an open source project sponsored by Google. You can learn more at the WebM website.

Install Miro Video Converter

  1. Open http://www.mirovideoconverter.com.
  2. Click Download. When the download is finished double-click MiroVideoConverter_Setup.msi.
  3. Click Next.
  4. Select Custom Installation.
    1. Uncheck Install the AVG toolbar and set AVG Secure Search as my default search provider.
    2. Uncheck Set AVG Secure Search as my homepage and newly opened tabs.
  5. Click Next.
  6. Click Finish.

Convert mp4 video to webm

  • Open the Miro Video Converter via your Start menu.
  • Click Choose Files… in the Miro UI.
  • Find an mp4 file, or multiple files, on your harddrive. Click Open.
  • Click format, select Video, click WebM HD (assuming you want to create an HD video).
  • Click Convert to WebM HD.

Debug a Play Framework 2.0 application with Eclipse

Introduction

Debugging a Play Framework 2.0 application with Eclipse is exceptionally easy to setup. Importantly, using the debugger is integral to developing high quality, complex applications as it provides an easy way to step into your code.

YouTube Version

I have created a YouTube video that shows the steps below. You can watch the YouTube video at:

How to attach the Eclipse debugger to a Play Framework 2.0 application (YouTube)

Note: Change the playback quality to 720p with a large window for the best display.

Configure Play

Note: Prototyper is the name of a project that I use for prototyping code. Replace Prototyper with the name of the project that you want to debug.

Open a command prompt (Linux) or PowerShell (Windows), then enter the following
commands:

cd Prototyper
play clean compile
play debug run

Configure Eclipse

  • Open Eclipse.
  • Select the project (ex. Prototyper) in Navigator in the left pane.
  • Select the Run menu, click Debug Configurations…
  • In the Debug Configurations dialog box, double-click on Remote Java Application in the left pane.
  • In the right pane, a new remote Java application configuration will be created for you. Change the Port to 9999.
  • Click Apply.
  • Click Debug.
  • Add a breakpoint in your Java code by pressing Ctrl + Shift + B.
  • Open a web browser to http://localhost:9000 and navigate to the page where the breakpoint will be activated.

Revert an uncommitted file in Git

I’m a relatively recent convert from Subversion to Git, so getting to know the git equivalent of an svn command is challenging.

Reverting a file in git actually uses the checkout command.

For example, if you want to revert your uncommitted changes for a file named package/File.java, then you would use the following command:

git checkout package/File.java

Hadoop Distributions

The following is a repost of my answer to a question on LinkedIn, but I thought it may prove useful to people evaluating Hadoop distributions.

The following is a substantially over simplified set of choices (in alphabetical order):

Amazon: Apache Hadoop provided as a web service. Good solution if your data is collected on Amazon…saves you the trouble of uploading gigs and gigs of data.

Apache: Apache Hadoop is the core code based upon which the various distributions are based.

Cloudera: CHD3 is based on Hadoop 1 (the current stable version) and CDH4 is based on Hadoop 2. CDH is based on Apache Hadoop. The only piece that’s not open source (AFAIK) is Cloudera Manager, which allows you to install up to 50 nodes for free before you go to the paid version. Cloudera is an extremely popular solution that runs on a wide variety of operating systems.

Hortonworks: HDP1 is 100% open source and is based on Hadoop 1. HDP is designed to run on RedHat/CentOS/Oracle Linux.

IBM: IBM BigInsights adds the GPFS filesystem to Hadoop, and is a good choice if your company already is an IBM shop…and you need to integrate with other IBM solutions. Free version is available as InfoSphere BigInsights Basic Edition. Basic Edition does not include all of the value add features found in Enterprise Edition (such as GPFS-SNC).

MapR: MapR uses a proprietary file system plus additional changes to Hadoop that addresses issues with the platform. They have a shared nothing architeture for the NameNode and JobTracker. MapR M3 is available for free, while M5 is a paid version with more features (such as the shared nothing NameNode). People who have used MapR tend to like it.

Understing the Hadoop High Availability (HA) Options

Once you start to use Hadoop in your day-to-day business operations, you’ll quickly find that uptime is an important consideration. No one wants to explain to the CEO why a report is not delivered. While most of Hadoop’s architecture is designed to work in the face of node failure (such as the DataNodes), other components such as the NameNode must be configured with an HA option.

The following is a quick and dirty list of Hadoop HA options:

  • Cloudera CDH4 (free)
    • Uses shared storage
  • Hortonworks (free)
    • Option 1: Use Linux HA (Uses shared storage)
    • Option 2: Use VMWare
  • IBM BigInsights ($$$)
    • GPFS-SNC: Provides a shared nothing HA option
  • MapR M5 ($$$)
    • Shared nothing HA for both NameNode and JobTracker

 

If you’re brave, you can also apply Facebook’s patches to Apache Hadoop to get an “Avatar” based HA option. This is what FB uses in production.

Convert CDH4 from YARN (MRv2) to MRv1

Introduction

I had configured only YARN in my original post on how to Install Cloudera Hadoop (CDH4) with YARN (MRv2) in Pseudo mode on Ubuntu 12.04 LTS.

Importantly, YARN is not ready for production yet, so we’ll go ahead and install MRv1 to get some production development done.

Stop the YARN Daemons

We first have to stop all daemons associated with YARN only packages.

sudo service hadoop-yarn-resourcemanager stop
sudo service hadoop-yarn-nodemanager stop
sudo service hadoop-mapreduce-historyserver stop

Install the Missing MRv1 Packages

Next, we’ll install 2 packages that are required for Map Reduce v1, but were not also part of the MRv2/YARN installation.

sudo apt-get install hadoop-0.20-mapreduce-jobtracker
sudo apt-get install hadoop-0.20-mapreduce-tasktracker

Start the MapReduce v1 Daemons

sudo service hadoop-0.20-mapreduce-jobtracker start
sudo service hadoop-0.20-mapreduce-tasktracker start

HBase Command Line Tutorial

Introduction

Start the HBase Shell

All subsequent commands in this post assume that you are in the HBase shell, which is started via the command listed below.

hbase shell

You should see output similar to:


12/08/12 12:30:52 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.92.1-cdh4.0.1, rUnknown, Thu Jun 28 18:13:01 PDT 2012

Create a Table

We will initially create a table named test with one column family named columnfamily1.

Using a long column family name, such as columnfamily1 is a horrible idea in production. Every cell (i.e. every value) in HBase is stored fully qualified. This basically means that long column family names will balloon the amount of disk space required to store your data. In summary, keep your column family names as terse as possible.

create 'table1', 'columnfamily1'

List all Tables

list

You’ll see output similar to:


TABLE
table1 1 row(s) in 0.0370 seconds

Let’s now create a second table so that we can see some of the features of the list command.

create 'test', 'cf1'
list

You will see output similar to:

TABLE
table1
test
2 row(s) in 0.0320 seconds

If we only want to see the test table, or all tables that start with “te”, we can use the following command.

list 'te'

or

list 'te.*'

Manually Insert Data into HBase

If you’re using HBase, then you likely have data sets that are TBs in size. As a result, you’ll never actually insert data manually. However, knowing how to insert data manually could prove useful at times.

To start, I’m going to create a new table named cars. My column family is vi, which is an abbreviation of vehicle information.

The schema that follows below is only for illustration purposes, and should not be used to create a production schema. In production, you should create a Row ID that helps to uniquely identify the row, and that is likely to be used in your queries. Therefore, one possibility would be to shift the Make, Model and Year left and use these items in the Row ID.

create 'cars', 'vi'

Let’s insert 3 column qualifies (make, model, year) and the associated values into the first row (row1).

put 'cars', 'row1', 'vi:make', 'bmw'
put 'cars', 'row1', 'vi:model', '5 series'
put 'cars', 'row1', 'vi:year', '2012'

Now let’s add a second row.

put 'cars', 'row2', 'vi:make', 'mercedes'
put 'cars', 'row2', 'vi:model', 'e class'
put 'cars', 'row2', 'vi:year', '2012'

Scan a Table (i.e. Query a Table)

We’ll start with a basic scan that returns all columns in the cars table.

scan 'cars'

You should see output similar to:

ROW           COLUMN+CELL
 row1          column=vi:make, timestamp=1344817012999, value=bmw
 row1          column=vi:model, timestamp=1344817020843, value=5 series
 row1          column=vi:year, timestamp=1344817033611, value=2012
 row2          column=vi:make, timestamp=1344817104923, value=mercedes
 row2          column=vi:model, timestamp=1344817115463, value=e class
 row2          column=vi:year, timestamp=1344817124547, value=2012
2 row(s) in 0.6900 seconds

Reading the output above you’ll notice that the Row ID is listed under ROW. The COLUMN+CELL field shows the column family after column=, then the column qualifier, a timestamp that is automatically created by HBase, and the value.

Importantly, each row in our results shows an individual row id + column family + column qualifier combination. Therefore, you’ll notice that multiple columns in a row are displayed in multiple rows in our results.

The next scan we’ll run will limit our results to the make column qualifier.

scan 'cars', {COLUMNS => ['vi:make']}

If you have a particularly large result set, you can limit the number of rows returned with the LIMIT option. In this example I arbitrarily limit the results to 1 row to demonstrate how LIMIT works.

scan 'cars', {COLUMNS => ['vi:make'], LIMIT => 1}

To learn more about the scan command enter the following:

help 'scan'

Get One Row

The get command allows you to get one row of data at a time. You can optionally limit the number of columns returned.

We’ll start by getting all columns in row1.

get 'cars', 'row1'

You should see output similar to:


COLUMN                   CELL
 vi:make                 timestamp=1344817012999, value=bmw
 vi:model                timestamp=1344817020843, value=5 series
 vi:year                 timestamp=1344817033611, value=2012
3 row(s) in 0.0150 seconds

When looking at the output above, you should notice how the results under COLUMN show the fully qualified column family:column qualifier, such as vi:make.

To get one specific column include the COLUMN option.

get 'cars', 'row1', {COLUMN => 'vi:model'}

You can also get two or more columns by passing an array of columns.

get 'cars', 'row1', {COLUMN => ['vi:model', 'vi:year']}

To learn more about the get command enter:

help 'get'

Delete a Cell (Value)

delete 'cars', 'row2', 'vi:year'

Let’s check that our delete worked.

get 'cars', 'row2'

You should see output that shows 2 columns.


COLUMN    CELL
vi:make   timestamp=1344817104923, value=mercedes
vi:model   timestamp=1344817115463, value=e class
2 row(s) in 0.0080 seconds

Disable and Delete a Table

disable 'cars'
drop 'cars'
disable 'table1'
drop 'table1'
disable 'test'
drop 'test'

View HBase Command Help

help

Exit the HBase Shell

exit