Archive for 2013

Apache Hadoop in VirtualBox

Apache Hadoop is framework for processing large amount of data in parallel fashion. Hadoop framework heavily relies onto map and reduce functions of functional programming languages. A user only defines the map and reduce functions, all other operations like distributing data and work over network, re-running failed jobs and collecting results is handled automatically via Hadoop. 
Hadoop first applies the user defined map function to key-value pairs, and results of mappings are sorted abd distributed over the nodes according to their key values. Each node applies user defined reduce function to each key-value pair and commonly writes results to a file on Hadoop Distributed File System (HDFS).

Setting Hadoop onto Virtual Machine

Setup uses VirtualBox, Hadoop 1.2.1, Ubuntu 12.04.3 Server.

Virtual Box and Ubuntu Setup

1. Download Virtual Box 4.2.18 or later version which suits your operation system.

2. Install Virtual Box with the desired setting.

3. Download Ubuntu 12.04.3 Server edition image to be guest operation system to run Hadoop.
           From: http://www.ubuntu.com/download/server

4. Open Virtual Box and click 'New' option to create a new virtual machine.

5. Choose 'Type' as 'Linux' and 'Version' as 'Ubuntu' as shown in picture below.

6. Proceed with selecting desired options like RAM size and Disk Size.

7. Select the virtual machine just created and click 'Start' option.

8. Virtual Box would ask a disk image to boot select the  Ubuntu 12.04.3 Server edition image, you downloaded at step 3. If 'FATAL: No bootable medium found! System halted.' message occurred click 'Devices' menu -> 'CD/DVD Devices' -> 'Choose a virtual CD/DVD disk file...' option then browse and select Ubuntu image you downloaded. After selecting Ubuntu image click 'Machine' and then 'Reset'.

9. Setup Ubuntu in Virtual Machine with desired options.

Setup Hadoop

10. Set port forwarding for virtual machine as shown here: 

11. After installing virtual machine, run virtual machine and login with username and password you picked in the setup. Go to home with "cd ~/" command.

12. Install ssh client and server for connecting virtual machine with ssh and scp with following commands.
After this step you may connect virtual machine with ssh client on the host by using "ssh -P 2222 localhost".
    
# Install ssh client and server
sudo apt-get install ssh
sudo apt-get install openssh-server

13. Add ssh keys as trusted as follows.

# add ssh key to trusted servers
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

14. Install Java Runtime Environment and Java Developer Kit on Ubuntu Server by running the following commands.

# install jre and jdk (java)
cd ~/
wget https://github.com/flexiondotorg/oab-java6/raw/0.3.0/oab-java.sh -O oab-java.sh
chmod +x oab-java.sh
sudo ./oab-java.sh
sudo apt-get install sun-java6-jre
sudo apt-get install sun-java6-jdk
rm oab-java.sh
rm -f oab-java.sh.log

15.  Rsync is necessary for Hadoop if you dont have it in Ubuntu setup run the following command.

# rsync is necessary for hadoop
sudo apt-get install rsync

16. Install Hadoop with using following commands.

# install hadoop
cd ~/
wget http://www.nic.funet.fi/pub/mirrors/apache.org/hadoop/common/hadoop-1.2.1/hadoop-1.2.1.tar.gz
tar -xvf hadoop-1.2.1.tar.gz
rm hadoop-1.2.1.tar.gz

17. Next you need to configure Hadoop for pseudo distributed mode. You may run following commands or change files with a text editor to contain following content.

#configure hadoop to psuedo distirbuted
 mv ~/hadoop-1.2.1/conf/core-site.xml ~/hadoop-1.2.1/conf/core-site-backup.xml
 echo "<configuration> 
     <property> 
         <name>fs.default.name</name> 
         <value>hdfs://localhost:9000</value> 
     </property>
</configuration>" > ~/hadoop-1.2.1/conf/core-site.xml 

mv ~/hadoop-1.2.1/conf/hdfs-site.xml ~/hadoop-1.2.1/conf/hdfs-site-backup.xml
echo "<configuration>
     <property>
         <name>dfs.replication</name>
         <value>1</value>
     </property>
</configuration>" > ~/hadoop-1.2.1/conf/hdfs-site.xml

mv ~/hadoop-1.2.1/conf/mapred-site.xml ~/hadoop-1.2.1/conf/mapred-site-backup.xml
echo "<configuration>
     <property>
         <name>mapred.job.tracker</name>
         <value>localhost:9001</value>
     </property>
</configuration>" > ~/hadoop-1.2.1/conf/mapred-site.xml

echo 'export JAVA_HOME=/usr/lib/jvm/java-6-sun' >> ~/hadoop-1.2.1/conf/hadoop-env.sh 

18. Set environment variables necessary for Hadoop setup. You may run the commands or just export variables but in that case it will be only valid for the session you are using.

#set java home and hadoop home in bash
echo 'export JAVA_HOME=/usr/lib/jvm/java-6-sun' >> ~/.bashrc
echo 'export HADOOP_HOME=/~/hadoop-1.2.1/bin' >> ~/.bashrc
echo 'export PATH=~/hadoop-1.2.1/bin:$PATH' >> ~/.bashrc

source ~/.bashrc

19. Now you may start and stop Hadoop by using following scripts respectively.

start-all.sh
stop-all.sh


Alternatively you may download and run the following script on your Ubuntu setup to perform all operations listed above.

cd ~/
wget http://www.cs.hut.fi/~dongelr1/hadoopscript.sh
chmod +x hadoopscript.sh
./hadoopscript.sh

WordCount Example

20. Run the following commands to load WordCount example. If you have run hadoopscript.sh you may skip this step.

#get scripts and wordcount example
cd ~/
wget http://www.cs.hut.fi/~dongelr1/WordCount.tar
tar -xvf WordCount.tar
rm WordCount.tar
cd WordCount
chmod +x start_commands
chmod +x run_commands

21. Run start_commands to start Hadoop as follows.

cd ~/WordCount
./start_commands

22. Check http://localhost:50070/dfshealth.jsp to see if there is any data node live. If there is no alive node then run "stop-all.sh" and remove everything in tmp by "rm -r /tmp/*" and redo the step 21.

23. You may run "run_commands" to run WordCount example and you can examine the commands with "cat run_commands". run_commands script basically makes an input directory in Hadoop Distributed File System and puts file1.txt and file2.txt to that direactory. Afterwards, compiles WordCount.java into wordcount_classes folder and creates a wordcount.jar file from compilation. Then executes a Hadoop job with wordcount.jar to count input/ directory and output it to output/ directory.

 cd ~/WordCount
./run_commands 
Friday, October 11, 2013
Posted by Ridvan Döngelci

Virtual Box Port Forwarding For Hadoop

1. Go to main screen in Virtual Box
2. Select virtual machine to forward ports and click Settings
3. Select Network from left panel in popup settings window.

4. Click Advance to see more options
5. Click Port Forwarding
6. Click green plus to add rules
7. Add the rules seen in picture
  (Map ports 22, 50070, 50060, 50075, 50030 to 2222, 50070, 50060, 50075, 50030)
8. Click OK
9. Return to main screen

Tuesday, October 1, 2013
Posted by Ridvan Döngelci

Bussit

Light weight Helsinki metropolitan area transportation widget.

Some Screenshots comments and bug reports are welcomed as well as feature suggestions.

https://play.google.com/store/apps/details?id=com.bussitpro

Bussit is also available for Tampere
https://play.google.com/store/apps/details?id=com.bussittampere




F.A.Q

How Can I Add Favorites?

There is several way of adding favorites but probably easiest way is to click the heart icon on the results page as shown below.
Other ways of adding favorites are: clicking entries on history screen, long clicking a point on map an tapping the title to show menu, clicking heart icon on stop screen.


Thursday, June 13, 2013
Posted by Ridvan Döngelci

Mobile Website Debugging

Manually debugging a mobile webpage is a lot of effort. You need to run an emulator and it is usually slow and debugging tools are not existing. Opera has a solution for this problem: Opera Mobile Emulator. It is quite fast to run and you may test on many settings.
 
Furthermore It has Dragon fly support for remote debugging. It is explained here. But I would like to give a quick review. First you need to run opera web browser and open Dragonfly with ctrl+shift+I and start listening desired socket. Then you need to open Opera Mobile Emulator and enter opera:debug to address bar. You will see following screen:
 
then connect to Dragonfly. And then type address of the webpage you need to test to same tab and you will see data goes through Dragonfly. Great!
References:http://www.opera.com/dragonfly/documentation/remote/
Saturday, April 27, 2013
Posted by Ridvan Döngelci

Linearization of Multiplication

So multiplication is generally not linearizable. However you can linearize binary multiplication. Let's assume x and y is two binary variables (either 0 or 1), we can linearize z = x*y as follow:

z <= x; 
z <= y;
z >= x+y-1;
z >= 0;  

Further more you can use this trick to multiply two variables x and y with value {-1,1}. Here is the trick is noticing if x and y is equal then multiplication z is 1 otherwise it is -1. Thus we can map variable x to {0,1} by (x+1)/2 or we can use binary variable to generate variables with value {-1,1} by 2*binary-1. After we get binary variable x' and y', we multiply them as above to get z'. And we may flip x' and y', to get x'' and y'' as x''=(1-x'); and we may multiply them as show above as well to get z''. If we add z' and z'', we get sum which is 1 if both values of x and y are same (1,1 or -1,-1) and 0 if they are different (-1,1 or 1,-1). We may map sum to correct result by final_result=(2*sum-1)formula. Bellow you may find a truth table:

Truth Table
x'
y'
z'
x''
y''
z''
sum
x
y
z
0
0
0
1
1
1
1
-1
-1
1
0
1
0
1
0
0
0
-1
1
-1
1
0
0
0
1
0
0
1
-1
-1
1
1
1
0
0
0
1
1
1
1
Friday, April 26, 2013
Posted by Ridvan Döngelci

Related Links

Popular Post

Tricks & Snippets - Metrominimalist - Powered by Blogger - Designed by Johanes Djogan -