Praveen Kumar K C: Hadoop-Hbase Installation

TEPS TO INSTALL HADOOP-1.0.3 SINGLE NODE CLUSTER IN UBUNTU-14.0.4 LTS

1] First login as super user from normal user and then only start installation

Ex: praveen@delllaptop] sudo su
by giving super user password then prompt changes to
root@delllaptop]

2] Connect to internet and Update ubuntu by giving the command

sudo apt-get update

3] Install java from internet by giving command

sudo apt-get install openjdk-7-jdk

check for its installation in the path /usr/lib/jvm/java-1.7.0-openjdk-amd64

4] Install openssh server and create keys and configure it

sudo apt-get install openssh-server

ssh-keygen –t dsa –P ‘ ‘ –f ~/.ssh/id_rsa

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

5] Create a directory named hadoop in /usr/local/

sudo mkdir /usr/local/hadoop

Copy the above tar file hadoop-1.2.0 .tar.gz from home to hadoop directory

6] Unzip the tar file from there /usr/local/hadoop

sudo tar –zxvf hadoop-1.2.0.tar.gz

7] Set the path variable by editing the bashrc file by typing the command

sudo nano $HOME/.bashrc

goto the end of the file and add these two lines

export HADOOP_HOME=/usr/local/hadoop/hadoop-0.20.0

export PATH=$PATH:$HADOOP_HOME/bin

8] Run the bash shell from terminal

exec bash

9] Give the path command at the terminal
$PATH

10] Set the hadoop configuration files by navigating to /usr/local/hadoop/hadoop-1.0.3/conf

a)hadoop-env.sh -------to setup the java environment to tell hadoop version of java

sudo nano hadoop-env.sh

Remove the comment from the line and edit the java installation path
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-i386 (for virtual machine)

export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64(for desktops)

b) core-site.xml ----to configure the name node and tmp directory

sudo nano core-site.xml

add these lines

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:10001</value>
</property>

<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/hadoop-1.2.0/tmp</value>
</property>

c) mapred-site.xml ---------to set jobtracker

sudo nano mapred-site.xml

<property>
<name>mapred.job.tracker</name>
</value>localhost:10002</value>
</property>

11] Create a tmp directory under the /usr/local/hadoop/hadoop-0.20.0

Sudo mkdir tmp

Give the permission for the user

Sudo chown harish /usr/local/hadoop/hadoop-1.2.0

Sudo chmod 777 /usr/local/hadoop/hadoop-1.2.0/tmp

12] Format the namenode and start the nodes

Goto bin directory

hadoop namenode –format

13] Start all hadoop daemons by giving the command

Start-all.sh

To start the specific node

hadoop daemons start namenode

14] Look for the browser interface by Access the webpage

Go to web browser and type localhost: 50070

15] To run the default word count program in ubuntu. Follow these steps

               create file by giving command sudo nano one.txt and type some contents in it .
               create a input folder under hdfs by giving command hadoop fs –mkdir input
               copy one.txt into input directory by giving command hadoop fs –copyFromLocal one.txt input
               check for the copied file by giving command hadoop fs –ls input
               Then run the word count program from /usr/local/hadoop/hadoop-1.2.0 by giving command

            hadoop jar hadoop-examples-1.2.0.jar wordcount input output
            Then look for the output in output folder by typing the command
               hadoop fs –ls output
            Then open the part-r-00000 file by typing the command
            hadoop fs –cat part-r-00000

Installing HBase in Standalone Mode

1.Download the latest stable version of HBase form http://www.interiordsgn.

com/apache/hbase/stable/ using “wget” command, and extract it using the tar “zxvf”

command. See the following command:

$ cd usr/local/

$ wget http://www.interior-dsgn.com /apache/hbase/stable/hbase-0.94.8-tar.gz

$ tar -zxvf hbase-0.94.8.tar.gz

2. Configuring HBase in Standalone Mode:

hbase-env.sh:

Set the java Home for HBase and open hbase-env.sh file from the conf folder. Edit JAVA_HOME

environment variable and change the existing path to your current JAVA_HOME variable as shown

below:

cd /usr/local//Hadoop/Hbase/conf

gedit hbase-env.sh

This will open the env.sh file of HBase. Now replace the existing JAVA_HOME value with your

current value as shown below.

export JThis is the main configuration file of HBase. Set the data directory to an appropriate location by opening the HBase home folder in /usr/local/HBase. Inside the conf folder, you will find several files,

open the hbase-site.xml file as shown below.

#cd /usr/local//hadoopHBase/

#cd conf

# gedit hbase-site.xm l

Inside the hbase-site.xml file, you will find the <configuration> and </configuration> tags.

Within them, set the HBase directory under the property key with the name “hbase.rootdir” as

shown below.AVA_HOME=/usr/lib/jvm /java-1.7.0

//Here you have to set the path where you want HBase to store its files.

<name>hbase.rootdir</nam e>

<value>file:/home/hadoop/HBase/HFiles</value>

</property>

//Here you have to set the path where you want HBase to store its built in zookeeper files.

<nam e>hbase.zookeeper.property.dataDir</nam e>

<value>/home/hadoop/zookeeper</value>

</property>

</configuration>

With this, the HBase installation and configuration part is successfully complete.

We can start HBase by using start-hbase.sh script provided in the bin folder of HBase. For that, open HBase Home Folder and run HBase start script as shown below:

$ cd /usr/local/hadoop/HBase/bin

$ ./start-hbase.sh

If everything goes well, when you try to run HBase start script, it will prompt you a message saying

that HBase has started:

starting m aster, logging to /usr/local/Hadoop/HBase/bin/../logs/hbase-tpmaster localhost.

Localdomain.out

Starting HBaseShell

After Installing HBase successfully, you can start HBase Shell. Open the terminal, and login as super user.

Start Hadoop File System:

Browse through Hadoop home sbin folder and start Hadoop file system as shown below:

$ cd $ HADOOP_HOME/sbin

$ start-all.sh

Start HBase:

Browse through the HBase root directory bin folder and start HBase.

$ cd /usr/local/HBase

$ ./bin/start-hbase.sh

Start HBase Master Server:

This will be the same directory. Start it as shown below:

$ ./bin/local-m aster-backup.sh start 2 (num ber signifies specific server.)

Start Region:

Start the region server as shown below:

$ ./bin/./local-regionservers.sh start 3

Start HBase Shell

You can start HBase shell using the following command:

$ cd bin

$ ./hbase shell

This will give you the HBase Shell Prompt as shown below.

2014-12-09 14:24:27,526 INFO [main] Configuration.deprecation:

hadoop.native.lib is deprecated. Instead, use io.native.lib.available

HBase Shell; enter 'help<RETURN>' for list of supported com m ands.

Type "exit<RETURN>" to leave the HBase Shell

Version 0.98.8-hadoop2, r6cfc8d064754251365e070a10a82eb169956d5fe, Fri

Nov 14 18:26:29 PST 2014

hbase(main):001:0>

HBase Web Interface:

To access the web interface of HBase, type the following url in the browser:

http://localhost:60010

This interface lists your currently running Region servers, backup masters and HBase tables.

Setting Java Environment:

We can also communicate with HBase using Java libraries, but before accessing HBase using Java

API you need to set classpath for those libraries.

Setting the Classpath:

Before proceeding with programming, set the classpath to HBase libraries in .bashrc file.

Open .bashrc in any of the editors as shown below.

$ gedit ~/.bashrc

Set classpath for HBase libraries (lib folder in HBase) in it as shown below.

export CLASSPATH = $ CLASSPATH://hom e/hadoop/hbase/lib/*

This is to prevent the “class not found” exception while accessing the HBase using java API.

Hbase shell commands:

HBase shell commands are mainly categorized into 6 parts

1) General HBase shell commands:

status

Show cluster status. Can be ‘summary’, ‘simple’, or ‘detailed’.

The default is ‘summary’.

hbase> status

hbase> status ‘simple’

hbase> status ‘summary’

hbase> status ‘detailed’

version

Output this HBase versionUsage:

hbase> version

whoami Show the current hbase user.Usage:

hbase> whoami

2) Tables Management commands:

alter

Alter column family schema; pass table name and a dictionary

specifying new column family schema.

hbase> alter ‘t1’, NAME => ‘f1’, VERSIONS => 5

You can operate on several column families:

hbase> alter ‘t1’, ‘f1’, {NAME => ‘f2’, IN_MEMORY => true}, {NAME => ‘f3’, VERSIONS => 5}

To delete the ‘f1’ column family in table ‘t1’, use one of:hbase> alter ‘t1’, NAME => ‘f1’, METHOD => ‘delete’

hbase> alter ‘t1’, ‘delete’ => ‘f1’

You can also change table-scope attributes like MAX_FILESIZE, READONLY,

MEMSTORE_FLUSHSIZE, DEFERRED_LOG_FLUSH, etc. These can be put at the end;

for example, to change the max size of a region to 128MB, do:

hbase> alter ‘t1’, MAX_FILESIZE => ‘134217728’

hbase> alter ‘t1’, CONFIGURATION => {‘hbase.hregion.scan.loadColumnFamiliesOnDemand’ => ‘true’}

hbase> alter ‘t1’, {NAME => ‘f2’, CONFIGURATION => {‘hbase.hstore.blockingStoreFiles’ => ’10’}}

You can also remove a table-scope attribute:

hbase> alter ‘t1’, METHOD => ‘table_att_unset’, NAME => ‘MAX_FILESIZE’

hbase> alter ‘t1’, METHOD => ‘table_att_unset’, NAME => ‘coprocessor$1’

There could be more than one alteration in one command:

hbase> alter ‘t1’, { NAME => ‘f1’, VERSIONS => 3 },

{ MAX_FILESIZE => ‘134217728’ }, { METHOD => ‘delete’, NAME => ‘f2’ },

OWNER => ‘johndoe’, METADATA => { ‘mykey’ => ‘myvalue’ }

create

Create table; pass table name, a dictionary of specifications per

column family, and optionally a dictionary of table configuration.

hbase> create ‘t1’, {NAME => ‘f1’, VERSIONS => 5}

hbase> create ‘t1’, {NAME => ‘f1’}, {NAME => ‘f2’}, {NAME => ‘f3’}

hbase> # The above in shorthand would be the following:

hbase> create ‘t1’, ‘f1’, ‘f2’, ‘f3’

hbase> create ‘t1’, {NAME => ‘f1’, VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true}

hbase> create ‘t1’, {NAME => ‘f1’, CONFIGURATION => {‘hbase.hstore.blockingStoreFiles’ => ’10’}}

Table configuration options can be put at the end.

describe

Describe the named table.

hbase> describe ‘t1’

disable

Start disable of named table

hbase> disable ‘t1’

disable_all

Disable all of tables matching the given regex

hbase> disable_all ‘t.*’

is_disabled

verifies Is named table disabled

hbase> is_disabled ‘t1’

drop

Drop the named table. Table must first be disabled

hbase> drop ‘t1’

drop_all

Drop all of the tables matching the given regex

hbase> drop_all ‘t.*’

enable

Start enable of named table

hbase> enable ‘t1’

enable_all

Enable all of the tables matching the given regex

hbase> enable_all ‘t.*’

is_enabled

verifies Is named table enabled

hbase> is_enabled ‘t1’

exists

Does the named table exist

hbase> exists ‘t1’

list

List all tables in hbase. Optional regular expression parameter could

be used to filter the output

hbase> list

hbase> list ‘abc.*’

show_filters

Show all the filters in hbase.

hbase> show_filters

alter_status

Get the status of the alter command. Indicates the number of regions of the table that have received the updated schema Pass table name.

hbase> alter_status ‘t1’

Praveen Kumar K C

Wednesday, January 20, 2016

Hadoop-Hbase Installation

1 comment:

AI & M L Lab - 18CSL76

Search This Blog