How to create webm videos on Windows (convert mp4 to webm)
Overview
WebM is a free and open video format designed for HTML5. WebM is an open source project sponsored by Google. You can learn more at the WebM website.
Install Miro Video Converter
- Open http://www.mirovideoconverter.com.
- Click Download. When the download is finished double-click MiroVideoConverter_Setup.msi.
- Click Next.
- Select Custom Installation.
- Uncheck Install the AVG toolbar and set AVG Secure Search as my default search provider.
- Uncheck Set AVG Secure Search as my homepage and newly opened tabs.
- Click Next.
- Click Finish.
Convert mp4 video to webm
- Open the Miro Video Converter via your Start menu.
- Click Choose Files… in the Miro UI.
- Find an mp4 file, or multiple files, on your harddrive. Click Open.
- Click format, select Video, click WebM HD (assuming you want to create an HD video).
- Click Convert to WebM HD.
Debug a Play Framework 2.0 application with Eclipse
Introduction
Debugging a Play Framework 2.0 application with Eclipse is exceptionally easy to setup. Importantly, using the debugger is integral to developing high quality, complex applications as it provides an easy way to step into your code.
YouTube Version
I have created a YouTube video that shows the steps below. You can watch the YouTube video at:
How to attach the Eclipse debugger to a Play Framework 2.0 application (YouTube)
Note: Change the playback quality to 720p with a large window for the best display.
Configure Play
Note: Prototyper is the name of a project that I use for prototyping code. Replace Prototyper with the name of the project that you want to debug.
Open a command prompt (Linux) or PowerShell (Windows), then enter the following
commands:
cd Prototyper
play clean compile
play debug run
Configure Eclipse
- Open Eclipse.
- Select the project (ex. Prototyper) in Navigator in the left pane.
- Select the Run menu, click Debug Configurations…
- In the Debug Configurations dialog box, double-click on Remote Java Application in the left pane.
- In the right pane, a new remote Java application configuration will be created for you. Change the Port to 9999.
- Click Apply.
- Click Debug.
- Add a breakpoint in your Java code by pressing Ctrl + Shift + B.
- Open a web browser to http://localhost:9000 and navigate to the page where the breakpoint will be activated.
Revert an uncommitted file in Git
I’m a relatively recent convert from Subversion to Git, so getting to know the git equivalent of an svn command is challenging.
Reverting a file in git actually uses the checkout command.
For example, if you want to revert your uncommitted changes for a file named package/File.java, then you would use the following command:
git checkout package/File.java
Hadoop Distributions
The following is a repost of my answer to a question on LinkedIn, but I thought it may prove useful to people evaluating Hadoop distributions.
The following is a substantially over simplified set of choices (in alphabetical order):
Amazon: Apache Hadoop provided as a web service. Good solution if your data is collected on Amazon…saves you the trouble of uploading gigs and gigs of data.
Apache: Apache Hadoop is the core code based upon which the various distributions are based.
Cloudera: CHD3 is based on Hadoop 1 (the current stable version) and CDH4 is based on Hadoop 2. CDH is based on Apache Hadoop. The only piece that’s not open source (AFAIK) is Cloudera Manager, which allows you to install up to 50 nodes for free before you go to the paid version. Cloudera is an extremely popular solution that runs on a wide variety of operating systems.
Hortonworks: HDP1 is 100% open source and is based on Hadoop 1. HDP is designed to run on RedHat/CentOS/Oracle Linux.
IBM: IBM BigInsights adds the GPFS filesystem to Hadoop, and is a good choice if your company already is an IBM shop…and you need to integrate with other IBM solutions. Free version is available as InfoSphere BigInsights Basic Edition. Basic Edition does not include all of the value add features found in Enterprise Edition (such as GPFS-SNC).
MapR: MapR uses a proprietary file system plus additional changes to Hadoop that addresses issues with the platform. They have a shared nothing architeture for the NameNode and JobTracker. MapR M3 is available for free, while M5 is a paid version with more features (such as the shared nothing NameNode). People who have used MapR tend to like it.
Understing the Hadoop High Availability (HA) Options
Once you start to use Hadoop in your day-to-day business operations, you’ll quickly find that uptime is an important consideration. No one wants to explain to the CEO why a report is not delivered. While most of Hadoop’s architecture is designed to work in the face of node failure (such as the DataNodes), other components such as the NameNode must be configured with an HA option.
The following is a quick and dirty list of Hadoop HA options:
- Cloudera CDH4 (free)
- Uses shared storage
- Hortonworks (free)
- Option 1: Use Linux HA (Uses shared storage)
- Option 2: Use VMWare
- IBM BigInsights ($$$)
- GPFS-SNC: Provides a shared nothing HA option
- MapR M5 ($$$)
- Shared nothing HA for both NameNode and JobTracker
If you’re brave, you can also apply Facebook’s patches to Apache Hadoop to get an “Avatar” based HA option. This is what FB uses in production.
Convert CDH4 from YARN (MRv2) to MRv1
Introduction
I had configured only YARN in my original post on how to Install Cloudera Hadoop (CDH4) with YARN (MRv2) in Pseudo mode on Ubuntu 12.04 LTS.
Importantly, YARN is not ready for production yet, so we’ll go ahead and install MRv1 to get some production development done.
Stop the YARN Daemons
We first have to stop all daemons associated with YARN only packages.
sudo service hadoop-yarn-resourcemanager stop
sudo service hadoop-yarn-nodemanager stop
sudo service hadoop-mapreduce-historyserver stop
Install the Missing MRv1 Packages
Next, we’ll install 2 packages that are required for Map Reduce v1, but were not also part of the MRv2/YARN installation.
sudo apt-get install hadoop-0.20-mapreduce-jobtracker
sudo apt-get install hadoop-0.20-mapreduce-tasktracker
Start the MapReduce v1 Daemons
sudo service hadoop-0.20-mapreduce-jobtracker start
sudo service hadoop-0.20-mapreduce-tasktracker start
HBase Command Line Tutorial
Introduction
Start the HBase Shell
All subsequent commands in this post assume that you are in the HBase shell, which is started via the command listed below.
hbase shell
You should see output similar to:
12/08/12 12:30:52 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.92.1-cdh4.0.1, rUnknown, Thu Jun 28 18:13:01 PDT 2012
Create a Table
We will initially create a table named test with one column family named columnfamily1.
Using a long column family name, such as columnfamily1 is a horrible idea in production. Every cell (i.e. every value) in HBase is stored fully qualified. This basically means that long column family names will balloon the amount of disk space required to store your data. In summary, keep your column family names as terse as possible.
create 'table1', 'columnfamily1'
List all Tables
list
You’ll see output similar to:
TABLE
table1 1 row(s) in 0.0370 seconds
Let’s now create a second table so that we can see some of the features of the list command.
create 'test', 'cf1'
list
You will see output similar to:
TABLE
table1
test
2 row(s) in 0.0320 seconds
If we only want to see the test table, or all tables that start with “te”, we can use the following command.
list 'te'
or
list 'te.*'
Manually Insert Data into HBase
If you’re using HBase, then you likely have data sets that are TBs in size. As a result, you’ll never actually insert data manually. However, knowing how to insert data manually could prove useful at times.
To start, I’m going to create a new table named cars. My column family is vi, which is an abbreviation of vehicle information.
The schema that follows below is only for illustration purposes, and should not be used to create a production schema. In production, you should create a Row ID that helps to uniquely identify the row, and that is likely to be used in your queries. Therefore, one possibility would be to shift the Make, Model and Year left and use these items in the Row ID.
create 'cars', 'vi'
Let’s insert 3 column qualifies (make, model, year) and the associated values into the first row (row1).
put 'cars', 'row1', 'vi:make', 'bmw'
put 'cars', 'row1', 'vi:model', '5 series'
put 'cars', 'row1', 'vi:year', '2012'
Now let’s add a second row.
put 'cars', 'row2', 'vi:make', 'mercedes'
put 'cars', 'row2', 'vi:model', 'e class'
put 'cars', 'row2', 'vi:year', '2012'
Scan a Table (i.e. Query a Table)
We’ll start with a basic scan that returns all columns in the cars table.
scan 'cars'
You should see output similar to:
ROW COLUMN+CELL
row1 column=vi:make, timestamp=1344817012999, value=bmw
row1 column=vi:model, timestamp=1344817020843, value=5 series
row1 column=vi:year, timestamp=1344817033611, value=2012
row2 column=vi:make, timestamp=1344817104923, value=mercedes
row2 column=vi:model, timestamp=1344817115463, value=e class
row2 column=vi:year, timestamp=1344817124547, value=2012
2 row(s) in 0.6900 seconds
Reading the output above you’ll notice that the Row ID is listed under ROW. The COLUMN+CELL field shows the column family after column=, then the column qualifier, a timestamp that is automatically created by HBase, and the value.
Importantly, each row in our results shows an individual row id + column family + column qualifier combination. Therefore, you’ll notice that multiple columns in a row are displayed in multiple rows in our results.
The next scan we’ll run will limit our results to the make column qualifier.
scan 'cars', {COLUMNS => ['vi:make']}
If you have a particularly large result set, you can limit the number of rows returned with the LIMIT option. In this example I arbitrarily limit the results to 1 row to demonstrate how LIMIT works.
scan 'cars', {COLUMNS => ['vi:make'], LIMIT => 1}
To learn more about the scan command enter the following:
help 'scan'
Get One Row
The get command allows you to get one row of data at a time. You can optionally limit the number of columns returned.
We’ll start by getting all columns in row1.
get 'cars', 'row1'
You should see output similar to:
COLUMN CELL
vi:make timestamp=1344817012999, value=bmw
vi:model timestamp=1344817020843, value=5 series
vi:year timestamp=1344817033611, value=2012
3 row(s) in 0.0150 seconds
When looking at the output above, you should notice how the results under COLUMN show the fully qualified column family:column qualifier, such as vi:make.
To get one specific column include the COLUMN option.
get 'cars', 'row1', {COLUMN => 'vi:model'}
You can also get two or more columns by passing an array of columns.
get 'cars', 'row1', {COLUMN => ['vi:model', 'vi:year']}
To learn more about the get command enter:
help 'get'
Delete a Cell (Value)
delete 'cars', 'row2', 'vi:year'
Let’s check that our delete worked.
get 'cars', 'row2'
You should see output that shows 2 columns.
COLUMN CELL
vi:make timestamp=1344817104923, value=mercedes
vi:model timestamp=1344817115463, value=e class
2 row(s) in 0.0080 seconds
Disable and Delete a Table
disable 'cars'
drop 'cars'
disable 'table1'
drop 'table1'
disable 'test'
drop 'test'
View HBase Command Help
help
Exit the HBase Shell
exit
Debugging HBase: org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable location to assign region -ROOT
Introduction
I ran into an annoying error in HBase due to the localhost loopback. The solution was simple, but took some trial and error.
Error
I was following the HBase logs with the following command:
tail -1000f /var/log/hbase/hbase-hbase-master-freshstart.log
The following error kept poping up in the log file.
org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable location to assign region -ROOT
Solution
sudo vi /etc/hosts
I changed:
127.0.0.1 localhost
127.0.1.1 freshstart
to:
#127.0.0.1 localhost
#127.0.1.1 freshstart
192.168.2.15 freshstart
127.0.0.1 localhost
192.168.2.15 is my internal IP address, and freshstart is my hostname.
At this point I rebooted as a quick and dirty way to restart all Hadoop / HBase services. Alternatively, you can start/stop all services.
Additional Thoughts
Updating the hosts file is an option for me currently since I have everything installed on a single machine. However, it seems that this error is a name resolution related issue, so a properly configured DNS server is likely necessary when deploying Hadoop / HBase in a production cluster.
Install Pyramid on Ubuntu 12.04 LTS in the Rackspace Cloud
Check the Installed Python Version
python --version
You should see the following output:
Python 2.7.3
Install Prerequisites
apt-get install python-setuptools python-pip python-virtualenv virtualenvwrapper
Install Prerequisites for Pyramid Speedups
apt-get install gcc cpp libc6-dev python2.7-dev
Install nginx
apt-get install nginx nginx-full nginx-common
Create a wwwuser that waitress (the web server) will run as
useradd wwwuser -d /home/wwwuser -k /etc/skel -m -s /bin/bash -U
Setup the Virtual Environment
mkdir -p /var/www/delixus.com
mkdir /var/www/environments
cd /var
chown -R wwwuser:wwwuser www
We are now going to change users to wwwuser user.
su - wwwuser
cd /var/www/environments
virtualenv env_delixus
Install Pyramid
You must perform the following steps as the wwwuser user.
cd /var/www/environments/env_delixus
source bin/activate
You should see the environment name as the prefix in the command prompt, such as:
(env_delixus)wwwuser@ws2:
easy_install Pyramid
pip install waitress
Checkout the Pyramid Project
cd /var/www/delixus.com
Change the SVN checkout command to something that matches your server. If you use git, then change appropriately.
svn checkout https://repo.company.com/source/delixus/tags/1.0 .
Install the delixus.com Pyramid project
cd /var/www/delixus.com/delixus
vi production.ini
Under [app:main], add a [server:main] configuration as follows:
# http://docs.pylonsproject.org/projects/waitress/en/latest/arguments.html
[server:main]
use = egg:waitress#main
host = 127.0.0.1
port = %(http_port)s
# default # of threads = 4
threads = 8
url_scheme = http
I don’t think you need to install the development version of the site, but it seems to be the only way that I get everything to work while debugging…go figure.
python setup.py develop
pserve development.ini
Then open the site in a text-based web browser.
links http://localhost:6543
You should be able to view your site at this point.
Now, let’s install the production version of the site.
python setup.py install
Start Waitress
First we’re going to start and test waitress, then we’ll start it as a deamon.
pserve production.ini http_port=5000
links http://localhost:5000
Again, you should be able to view your site.
pserve production.ini start --daemon --pid-file=/var/www/5000.pid \ --log-file=/var/www/5000.log --monitor-restart http_port=5000
pserve production.ini start --daemon --pid-file=/var/www/5001.pid \ --log-file=/var/www/5001.log --monitor-restart http_port=5001
Check the waitress process.
ps -ef | grep pserve
You should see the pserve process running.
Configure nginx as a Proxy for Waitress
The following steps must be performed as root.
cd /etc/nginx/sites-available
vi delixus
Paste the following into the delixus.conf file.
upstream delixus-site {
server 127.0.0.1:5000;
server 127.0.0.1:5001;
}
server {
listen 80;
server_name localhost www.delixus.com delixus.com;
access_log /var/log/nginx/delixus.com-access.log;
location / {
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
client_max_body_size 10m;
client_body_buffer_size 128k;
proxy_connect_timeout 60s;
proxy_send_timeout 90s;
proxy_read_timeout 90s;
proxy_buffering off;
proxy_temp_file_write_size 64k;
proxy_pass http://delixus-site;
proxy_redirect off;
}
location /static {
root /var/www/delixus.com/delixus/delixus;
expires 30d;
add_header Cache-Control public;
access_log off;
}
}
rm /etc/nginx/sites-enabled/default
ln -s /etc/nginx/sites-available/delixus /etc/nginx/sites-enabled/delixus
service nginx stop
service nginx start
A good next step at this point is to setup Supervisor to control pserve/waitress.
Install RazorSQL on Ubuntu 12.04 LTS x64
Introduction
RazorSQL is a GUI tool for working with Postgresql.
Install RazorSQL
First, create the razorsql directory.
mkdir ~/Downloads/razorsql
Download RazorSQL into the ~/Downloads/navicat directory.
- Open a web browser to http://www.razorsql.com/download_linux.html.
- Click Download next to Linux (64-bit).
After the download completes, open a command prompt and enter the following commands. I have assumed that you downloaded the .zip file to the Downloads/razorsql directory.
cd ~/Downloads/razorsql
unzip razorsql5_6_4_linux_x64.zip
mv razorsql ~/bin/
cd ~/bin/razorsql
chmod 755 razorsql.sh
./razorsql.sh
Connect to a Database
The following steps are performed in the RazorSQL GUI.

