Wednesday, January 20, 2016

Hadoop-Hbase Installation

TEPS TO INSTALL HADOOP-1.0.3 SINGLE NODE CLUSTER IN UBUNTU-14.0.4 LTS
1] First login as super user from normal user and then only start installation
Ex: praveen@delllaptop] sudo su
      by giving super user password then prompt changes to                
      root@delllaptop]
2] Connect to internet and Update ubuntu by giving the command
                  sudo apt-get update
3] Install java from internet by giving command
                sudo apt-get install openjdk-7-jdk
    check for its  installation  in the path /usr/lib/jvm/java-1.7.0-openjdk-amd64
4] Install openssh server and create keys and configure it
   sudo apt-get install openssh-server
   ssh-keygen –t dsa –P  ‘ ‘ –f   ~/.ssh/id_rsa
  cat ~/.ssh/id_rsa.pub  >>  ~/.ssh/authorized_keys

5] Create a directory named  hadoop in /usr/local/
                  sudo mkdir /usr/local/hadoop
                  Copy the above tar file  hadoop-1.2.0 .tar.gz  from home to hadoop  directory
6] Unzip the tar file from there /usr/local/hadoop
                        sudo tar –zxvf  hadoop-1.2.0.tar.gz
 7] Set the path variable by editing the bashrc file by typing the command
                        sudo nano $HOME/.bashrc
                        goto the end of the file and add these two lines
                        export HADOOP_HOME=/usr/local/hadoop/hadoop-0.20.0
                        export PATH=$PATH:$HADOOP_HOME/bin

 8]    Run the bash shell from terminal
exec bash
9]    Give the path command at the terminal
                           $PATH
10] Set the hadoop configuration files by navigating to /usr/local/hadoop/hadoop-1.0.3/conf
            a)hadoop-env.sh -------to setup the java environment to tell hadoop version of java
                        sudo nano hadoop-env.sh
Remove the comment from the line and edit the java installation path
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-i386 (for virtual machine)
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64(for desktops)                                                                                                                                                                                                                                                                                                                                                                   

b) core-site.xml ----to configure the name node and tmp directory
                        sudo nano core-site.xml
                        add these lines
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:10001</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/hadoop-1.2.0/tmp</value>
</property>

c) mapred-site.xml ---------to set  jobtracker
                                    sudo nano mapred-site.xml
<property>
<name>mapred.job.tracker</name>
</value>localhost:10002</value>
</property>


11] Create a tmp directory under the /usr/local/hadoop/hadoop-0.20.0
            Sudo mkdir tmp
Give the permission for the user
Sudo chown harish  /usr/local/hadoop/hadoop-1.2.0
Sudo chmod 777 /usr/local/hadoop/hadoop-1.2.0/tmp
12] Format the namenode and start the nodes
Goto bin directory
hadoop namenode –format
13] Start all hadoop daemons by giving the command
            Start-all.sh
To start the specific node
            hadoop daemons start namenode
 14] Look for the browser interface by Access the webpage
Go to web browser and type localhost: 50070
 15] To run the default word count program in ubuntu. Follow these steps

              
create file  by giving command sudo nano one.txt and  type some contents in it .
               create a input folder under hdfs by giving command hadoop fs –mkdir input
               copy one.txt into  input directory by giving command hadoop fs –copyFromLocal one.txt input
               check for the copied file by giving command hadoop fs –ls input
               Then run the word count program from /usr/local/hadoop/hadoop-1.2.0  by giving command
            hadoop jar hadoop-examples-1.2.0.jar wordcount input output
            Then look for the output in output folder by typing the command
               hadoop fs –ls output
            Then open the part-r-00000 file by typing the command
            hadoop fs –cat part-r-00000

Installing HBase in Standalone Mode
1.Download the latest stable version of HBase form http://www.interiordsgn.
com/apache/hbase/stable/ using “wget” command, and extract it using the tar “zxvf”
command. See the following command:
$ cd usr/local/
$ wget http://www.interior-dsgn.com /apache/hbase/stable/hbase-0.94.8-tar.gz
$ tar -zxvf hbase-0.94.8.tar.gz
2. Configuring HBase in Standalone Mode:
hbase-env.sh:
Set the java Home for HBase and open hbase-env.sh file from the conf folder. Edit JAVA_HOME
environment variable and change the existing path to your current JAVA_HOME variable as shown
below:

cd /usr/local//Hadoop/Hbase/conf
gedit hbase-env.sh

This will open the env.sh file of HBase. Now replace the existing JAVA_HOME value with your
current value as shown below.

export JThis is the main configuration file of HBase. Set the data directory to an appropriate location by opening the HBase home folder in /usr/local/HBase. Inside the conf folder, you will find several files,
open the hbase-site.xml file as shown below.

#cd /usr/local//hadoopHBase/
#cd conf
# gedit hbase-site.xm l

Inside the hbase-site.xml file, you will find the <configuration> and </configuration> tags.
Within them, set the HBase directory under the property key with the name “hbase.rootdir” as
shown below.AVA_HOME=/usr/lib/jvm /java-1.7.0
<configuration>
//Here you have to set the path where you want HBase to store its files.

<property>
<name>hbase.rootdir</nam e>
<value>file:/home/hadoop/HBase/HFiles</value>
</property>

 //Here you have to set the path where you want HBase to store its built in zookeeper files.
<property>
<nam e>hbase.zookeeper.property.dataDir</nam e>
<value>/home/hadoop/zookeeper</value>
</property>
</configuration>
With this, the HBase installation and configuration part is successfully complete.
We can start HBase by using start-hbase.sh script provided in the bin folder of HBase. For that, open HBase Home Folder and run HBase start script as shown below:

$ cd /usr/local/hadoop/HBase/bin
$ ./start-hbase.sh
If everything goes well, when you try to run HBase start script, it will prompt you a message saying
that HBase has started:
starting m aster, logging to /usr/local/Hadoop/HBase/bin/../logs/hbase-tpmaster localhost.
Localdomain.out
Starting HBaseShell
After Installing HBase successfully, you can start HBase Shell. Open the terminal, and login as super user.
Start Hadoop File System:
Browse through Hadoop home sbin folder and start Hadoop file system as shown below:
$ cd $ HADOOP_HOME/sbin
$ start-all.sh
Start HBase:

Browse through the HBase root directory bin folder and start HBase.

$ cd /usr/local/HBase
$ ./bin/start-hbase.sh
Start HBase Master Server:

This will be the same directory. Start it as shown below:

$ ./bin/local-m aster-backup.sh start 2 (num ber signifies specific server.)

Start Region:

Start the region server as shown below:
$ ./bin/./local-regionservers.sh start 3

Start HBase Shell

You can start HBase shell using the following command:
$ cd bin
$ ./hbase shell

This will give you the HBase Shell Prompt as shown below.
2014-12-09 14:24:27,526 INFO [main] Configuration.deprecation:
hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported com m ands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.98.8-hadoop2, r6cfc8d064754251365e070a10a82eb169956d5fe, Fri
Nov 14 18:26:29 PST 2014
hbase(main):001:0>
HBase Web Interface:

To access the web interface of HBase, type the following url in the browser:

http://localhost:60010

This interface lists your currently running Region servers, backup masters and HBase tables.
Setting Java Environment:

We can also communicate with HBase using Java libraries, but before accessing HBase using Java
API you need to set classpath for those libraries.

Setting the Classpath:

Before proceeding with programming, set the classpath to HBase libraries in .bashrc file.
Open .bashrc in any of the editors as shown below.

$ gedit ~/.bashrc

Set classpath for HBase libraries (lib folder in HBase) in it as shown below.
export CLASSPATH = $ CLASSPATH://hom e/hadoop/hbase/lib/*

This is to prevent the “class not found” exception while accessing the HBase using java API.
Hbase shell commands:
HBase shell commands are mainly categorized into 6 parts

1)      General  HBase shell commands:

status
Show cluster status. Can be ‘summary’, ‘simple’, or ‘detailed’.
The default is ‘summary’.
hbase> status
hbase> status ‘simple’
hbase> status ‘summary’
hbase> status ‘detailed’
version
Output this HBase versionUsage:

hbase> version

whoami Show the current hbase user.Usage:
hbase> whoami

2)      Tables Management commands:

alter
Alter column family schema; pass table name and a dictionary
specifying new column family schema.
hbase> alter ‘t1’, NAME => ‘f1’, VERSIONS => 5
You can operate on several column families:
hbase> alter ‘t1’, ‘f1’, {NAME => ‘f2’, IN_MEMORY => true}, {NAME => ‘f3’, VERSIONS => 5}
To delete the ‘f1’ column family in table ‘t1’, use one of:hbase> alter ‘t1’, NAME => ‘f1’, METHOD => ‘delete’
hbase> alter ‘t1’, ‘delete’ => ‘f1’
You can also change table-scope attributes like MAX_FILESIZE, READONLY,
MEMSTORE_FLUSHSIZE, DEFERRED_LOG_FLUSH, etc. These can be put at the end;
for example, to change the max size of a region to 128MB, do:

hbase> alter ‘t1’, MAX_FILESIZE => ‘134217728’

hbase> alter ‘t1’, CONFIGURATION => {‘hbase.hregion.scan.loadColumnFamiliesOnDemand’ => ‘true’}

hbase> alter ‘t1’, {NAME => ‘f2’, CONFIGURATION => {‘hbase.hstore.blockingStoreFiles’ => ’10’}}

You can also remove a table-scope attribute:

hbase> alter ‘t1’, METHOD => ‘table_att_unset’, NAME => ‘MAX_FILESIZE’

hbase> alter ‘t1’, METHOD => ‘table_att_unset’, NAME => ‘coprocessor$1’

There could be more than one alteration in one command:

hbase> alter ‘t1’, { NAME => ‘f1’, VERSIONS => 3 },
{ MAX_FILESIZE => ‘134217728’ }, { METHOD => ‘delete’, NAME => ‘f2’ },
OWNER => ‘johndoe’, METADATA => { ‘mykey’ => ‘myvalue’ }

create
Create table; pass table name, a dictionary of specifications per
column family, and optionally a dictionary of table configuration.
hbase> create ‘t1’, {NAME => ‘f1’, VERSIONS => 5}
hbase> create ‘t1’, {NAME => ‘f1’}, {NAME => ‘f2’}, {NAME => ‘f3’}
hbase> # The above in shorthand would be the following:
hbase> create ‘t1’, ‘f1’, ‘f2’, ‘f3’
hbase> create ‘t1’, {NAME => ‘f1’, VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true}
hbase> create ‘t1’, {NAME => ‘f1’, CONFIGURATION => {‘hbase.hstore.blockingStoreFiles’ => ’10’}}
Table configuration options can be put at the end.

describe
Describe the named table.

hbase> describe ‘t1’

disable

Start disable of named table
hbase> disable ‘t1’
disable_all
Disable all of tables matching the given regex
hbase> disable_all ‘t.*’
is_disabled
verifies Is named table disabled
hbase> is_disabled ‘t1’
drop 
Drop the named table. Table must first be disabled
hbase> drop ‘t1’
drop_all
Drop all of the tables matching the given regex
hbase> drop_all ‘t.*’
enable
Start enable of named table
hbase> enable ‘t1’
enable_all
Enable all of the tables matching the given regex
hbase> enable_all ‘t.*’
is_enabled
verifies Is named table enabled
hbase> is_enabled ‘t1’
exists
Does the named table exist
hbase> exists ‘t1’
list
List all tables in hbase. Optional regular expression parameter could
be used to filter the output
hbase> list
hbase> list ‘abc.*’

show_filters
Show all the filters in hbase.
hbase> show_filters
alter_status

Get the status of the alter command. Indicates the number of regions of the table that have received the updated schema Pass table name.
hbase> alter_status ‘t1’



1 comment:

  1. Are you trying to make money from your visitors via popup advertisments?
    If so, did you take a look at PopCash?

    ReplyDelete

AI & M L Lab - 18CSL76

  Lab programmes: View