Praveen Kumar K C: HBase – Overview of Architecture and Data Model

Introduction
HBase is a column-oriented database that’s an open-source implementation of Google’s Big Table storage architecture. It can manage structured and semi-structured data and has some built-in features such as scalability, versioning, compression and garbage collection. Since its uses write-ahead logging and distributed configuration, it can provide fault-tolerance and quick recovery from individual server failures. HBase built on top of Hadoop / HDFS and the data stored in HBase can be manipulated using Hadoop’s MapReduce capabilities. Let’s now take a look at how HBase (a column-oriented database) is different from some other data structures and concepts that we are familiar with Row-Oriented vs. Column-Oriented data stores. As shown below, in a row-oriented data store, a row is a unit of data that is read or written together. In a column-oriented data store, the data in a column is stored together and hence quickly retrieved.

Row-oriented data stores:

Data is stored and retrieved one row at a time and hence could read unnecessary data if only some of the data in a row is required.
Easy to read and write records
Well suited for OLTP systems
Not efficient in performing operations applicable to the entire dataset and hence aggregation is an expensive operation
Typical compression mechanisms provide less effective results than those on column-oriented data stores

Column-oriented data stores:

Data is stored and retrieved in columns and hence can read only relevant data if only some data is required
Read and Write are typically slower operations
Well suited for OLAP systems
Can efficiently perform operations applicable to the entire dataset and hence enables aggregation over many rows and columns
Permits high compression rates due to few distinct values in columns

Introduction Relational Databases vs. HBase

When talking of data stores, we first think of Relational Databases with structured data storage and a sophisticated query engine. However, a Relational Database incurs a big penalty to improve performance as the data size increases. HBase, on the other hand, is designed from the ground up to provide scalability and partitioning to enable efficient data structure serialization, storage and retrieval.

Differences between a Relational Database and HBase are:

Relational Database:

Is Based on a Fixed Schema
Is a Row-oriented datastore
Is designed to store Normalized Data
Contains thin tables
Has no built-in support for partitioning.

HBase:

Is Schema-less
Is a Column-oriented datastore
Is designed to store Denormalized Data
Contains wide and sparsely populated tables
Supports Automatic Partitioning

HDFS vs. HBase

HDFS is a distributed file system that is well suited for storing large files. It’s designed to support batch processing of data but doesn’t provide fast individual record lookups. HBase is built on top of HDFS and is designed to provide access to single rows of data in large tables.

Differences between HDFS and HBase are

HDFS:

Is suited for High Latency operations batch processing
Data is primarily accessed through MapReduce
Is designed for batch processing and hence doesn’t have a concept of random reads/writes

HBase:

Is built for Low Latency operations
Provides access to single rows from billions of records
Data is accessed through shell commands, Client APIs in Java, REST, Avro or Thrift

HBase Architecture

The HBase Physical Architecture consists of servers in a Master-Slave relationship as shown below. Typically, the HBase cluster has one Master node, called HMaster and multiple Region Servers called HRegionServer. Each Region Server contains multiple Regions – HRegions.

Just like in a Relational Database, data in HBase is stored in Tables and these Tables are stored in Regions. When a Table becomes too big, the Table is partitioned into multiple Regions. These Regions are assigned to Region Servers across the cluster. Each Region Server hosts roughly the same number of Regions.

The HMaster in the HBase is responsible for

Performing Administration
Managing and Monitoring the Cluster
Assigning Regions to the Region Servers
Controlling the Load Balancing and Failover

On the other hand, the HRegionServer perform the following work

Hosting and managing Regions
Splitting the Regions automatically
Handling the read/write requests
Communicating with the Clients directly

Each Region Server contains a Write-Ahead Log (called HLog) and multiple Regions. Each Region in turn is made up of a MemStore and multiple StoreFiles (HFile). The data lives in these StoreFiles in the form of Column Families (explained below). The MemStore holds in-memory modifications to the Store (data).

The mapping of Regions to Region Server is kept in a system table called .META. When trying to read or write data from HBase, the clients read the required Region information from the .META table and directly communicate with the appropriate Region Server. Each Region is identified by the start key (inclusive) and the end key (exclusive)

HBase Data Model

The Data Model in HBase is designed to accommodate semi-structured data that could vary in field size, data type and columns. Additionally, the layout of the data model makes it easier to partition the data and distribute it across the cluster. The Data Model in HBase is made of different logical components such as Tables, Rows, Column Families, Columns, Cells and Versions.

Tables – The HBase Tables are more like logical collection of rows stored in separate partitions called Regions. As shown above, every Region is then served by exactly one Region Server. The figure above shows a representation of a Table.

Rows – A row is one instance of data in a table and is identified by a rowkey. Rowkeys are unique in a Table and are always treated as a byte[].

Column Families – Data in a row are grouped together as Column Families. Each Column Family has one more Columns and these Columns in a family are stored together in a low level storage file known as HFile. Column Families form the basic unit of physical storage to which certain HBase features like compression are applied. Hence it’s important that proper care be taken when designing Column Families in table. The table above shows Customer and Sales Column Families. The Customer Column Family is made up 2 columns – Name and City, whereas the Sales Column Families is made up to 2 columns – Product and Amount.

Columns – A Column Family is made of one or more columns. A Column is identified by a Column Qualifier that consists of the Column Family name concatenated with the Column name using a colon – example: columnfamily:columnname. There can be multiple Columns within a Column Family and Rows within a table can have varied number of Columns.

Cell – A Cell stores data and is essentially a unique combination of rowkey, Column Family and the Column (Column Qualifier). The data stored in a Cell is called its value and the data type is always treated as byte[].

Version – The data stored in a cell is versioned and versions of data are identified by the timestamp. The number of versions of data retained in a column family is configurable and this value by default is 3.

Apache HBase

HBase is an open source, non-relational, distributed database modeled after Google's BigTable and is written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed Filesystem), providing BigTable-like capabilities for Hadoop.
HBase features compression, in-memory operation, and Bloom filters on a per-column basis as outlined in the original BigTable paper. Tables in HBase can serve as the input and output for MapReduce jobs run in Hadoop, and may be accessed through the Java API but also through REST, Avro or Thrift gateway APIs.

What is HBase?
HBase is a column-oriented database management system that runs on top of HDFS. It is well suited for sparse data sets, which are common in many big data use cases. Unlike relational database systems, HBase does not support a structured query language like SQL; in fact, HBase isn’t a relational data store at all. HBase applications are written in Java much like a typical MapReduce application. HBase does support writing applications in Avro, REST, and Thrift.
An HBase system comprises a set of tables. Each table contains rows and columns, much like a traditional database. Each table must have an element defined as a Primary Key, and all access attempts to HBase tables must use this Primary Key. An HBase column represents an attribute of an object; for example, if the table is storing diagnostic logs from servers in your environment, where each row might be a log record, a typical column in such a table would be the timestamp of when the log record was written, or perhaps the server name where the record originated. In fact, HBase allows for many attributes to be grouped together into what are known as column families, such that the elements of a column family are all stored together. This is different from a row-oriented relational database, where all the columns of a given row are stored together. With HBase you must predefine the table schema and specify the column families. However, it’s very flexible in that new columns can be added to families at any time, making the schema flexible and therefore able to adapt to changing application requirements.
Just as HDFS has a NameNode and slave nodes, and MapReduce has JobTracker and TaskTracker slaves, HBase is built on similar concepts. In HBase a master node manages the cluster and region servers store portions of the tables and perform the work on the data. In the same way HDFS has some enterprise concerns due to the availability of the NameNode , HBase is also sensitive to the loss of its master node.

What is NoSQL DataBase?
A NoSQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Motivations for this approach include simplicity of design, horizontal scaling and finer control over availability. NoSQL databases are often highly optimized key–value stores intended primarily for simple retrieval and appending operations, whereas an RDBMS is intended as a general purpose data store. There will thus be some operations where NoSQL is faster and some where an RDBMS is faster. NoSQL databases are finding significant and growing industry use in big data and real-time web applications.[1] NoSQL systems are also referred to as "Not only SQL" to emphasize that they may in fact allow SQL-like query languages to be used.

HBase Architecture

The HBase Architecture consists of servers in a Master-Slave relationship as shown below. Typically, the HBase cluster has one Master node, called HMaster and multiple Region Servers called HRegionServer. Each Region Server contains multiple Regions – HRegions.

The HMaster in the HBase is responsible for

Performing Administration
Managing and Monitoring the Cluster
Assigning Regions to the Region Servers
Controlling the Load Balancing and Failover

On the other hand, the HRegionServer perform the following work

Hosting and managing Regions
Splitting the Regions automatically
Handling the read/write requests
Communicating with the Clients directly

HBase Tables and Regions

Table is made up of any number of regions.

Region is specified by its startKey and endKey.

Empty table: (Table, NULL, NULL)
Two-region table: (Table, NULL, “com.ABC.www”) and (Table, “com.ABC.www”, NULL)

Each region may live on a different node and is made up of several HDFS files and blocks, each of which is replicated by Hadoop. HBase uses HDFS as its reliable storage layer.It Handles checksums, replication, failover

HBase Tables:

Tables are sorted by Row in lexicographical order
Table schema only defines its column families
Each family consists of any number of columns
Each column consists of any number of versions
Columns only exist when inserted, NULLs are free
Columns within a family are sorted and stored together
Everything except table names are byte[]
Hbase Table format (Row, Family:Column, Timestamp) -> Value

Hbase consists of,

Java API, Gateway for REST, Thrift, Avro
Master manages cluster
RegionServer manage data
ZooKeeper is used the “neural network” and coordinates cluster

Data is stored in memory and flushed to disk on regular intervals or based on size

Small flushes are merged in the background to keep number of files small
Reads read memory stores first and then disk based files second
Deletes are handled with “tombstone” markers

MemStores:

After data is written to the WAL the RegionServer saves KeyValues in memory store

Flush to disk based on size, is hbase.hregion.memstore.flush.size
Default size is 64MB
Uses snapshot mechanism to write flush to disk while still serving from it and accepting new data at the same time

Compactions:

Two types: Minor and Major Compactions

Minor Compactions

Combine last “few” flushes
Triggered by number of storage files

Major Compactions

Rewrite all storage files
Drop deleted data and those values exceeding TTL and/or number of versions

Key Cardinality:

The best performance is gained from using row keys

Time range bound reads can skip store files
So can Bloom Filters
Selecting column families reduces the amount of data to be scanned

Fold, Store, and Shift:

All values are stored with the full coordinates,including: Row Key, Column Family, Column Qualifier, and Timestamp

Folds columns into “row per column”
NULLs are cost free as nothing is stored
Versions are multiple “rows” in folded table

Hbase Data model

Hbase Data model - These six concepts form the foundation of HBase.
Table: HBase organizes data into tables. Table names are Strings and composed of characters that are safe for use in a file system path.
Row: Within a table, data is stored according to its row. Rows are identified uniquely by their rowkey. Rowkeys don’t have a data type and are always treated as a byte[].
Column family: Data within a row is grouped by column family. Column families also impact the physical arrangement of data stored in HBase. For this reason,they must be
defined up front and aren’t easily modified. Every row in a table has the same column families, although a row need not store data in all its families.Column family names are Strings and composed of characters that are safe for use in a file system path.
Column qualifier: Data within a column family is addressed via its column qualifier,or column. Column qualifiers need not be specified in advance. Column qualifiers need not be consistent between rows. Like rowkeys, column qualifiers don’t have a data type and are always treated as a byte[].
Cell: A combination of rowkey, column family, and column qualifier uniquely identifies a cell. The data stored in a cell is referred to as that cell’s value. Values also don’t have a data type and are always treated as a byte[].
Version: Values within a cell are versioned. Versions are identified by their timestamp,a long. When a version isn’t specified, the current timestamp is used as the
basis for the operation. The number of cell value versions retained by HBase is configured via the column family. The default number of cell versions is three. Versions stored in decreasing order of timestamp.

HBase Shell Commands

whoami:
Show the current hbase user. Example:
hbase> whoami

alter:
Alter column family schema; pass table name and a dictionary specifying new column family schema. Dictionaries are described below in the GENERAL NOTES section. Dictionary must include name
of column family to alter. For example,
To change or add the 'f1' column family in table 't1' from defaults to instead keep a maximum of 5 cell VERSIONS, do:
hbase> alter 't1', {NAME => 'f1', VERSIONS => 5}
To delete the 'f1' column family in table 't1', do:
hbase> alter 't1', {NAME => 'f1', METHOD => 'delete'}
You can also change table-scope attributes like MAX_FILESIZE
MEMSTORE_FLUSHSIZE and READONLY.
For example, to change the max size of a family to 128MB, do:
hbase> alter 't1', {METHOD => 'table_att', MAX_FILESIZE => '134217728'}

count:
Count the number of rows in a table. This operation may take a LONG time (Run '$HADOOP_HOME/bin/hadoop jar hbase.jar rowcount' to run a counting mapreduce job). Current count is shown every 1000 rows by default. Count interval may be optionally specified. Examples:
hbase> count 't1'
hbase> count 't1', 100000
hbase> t.count INTERVAL => 100000
hbase> t.count CACHE => 1000
hbase> t.count INTERVAL => 10, CACHE => 1000

create:

Create table; pass table name, a dictionary of specifications per column family, and optionally a dictionary of table configuration. Dictionaries are described below in the GENERAL NOTES section.

Examples:

hbase> create 't1', {NAME => 'f1', VERSIONS => 5}

hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}

hbase> # The above in shorthand would be the following:

hbase> create 't1', 'f1', 'f2', 'f3'

hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, \

BLOCKCACHE => true}

describe:

Describe the named table Example:
e.g. "hbase> describe 't1'"

delete:

Put a delete cell value at specified table/row/column and optionally timestamp coordinates. Deletes must match the deleted cell's coordinates exactly. When scanning, a delete cell suppresses older versions. Takes arguments like the 'put' command described below Example:
hbase> delete ‘t1′, ‘r1′, ‘c1′, ts1

deleteall:

Delete all cells in a given row; pass a table name, row, and optionally a column and timestamp
Delete all cells in a given row; pass a table name, row, and optionally a column and timestamp. Examples:
hbase> deleteall ‘t1′, ‘r1′
hbase> deleteall ‘t1′, ‘r1′, ‘c1′
hbase> deleteall ‘t1′, ‘r1′, ‘c1′, ts1
The same commands also can be run on a table reference. Suppose you had a reference t to table ‘t1′, the corresponding command would be: Example:
hbase> t.deleteall ‘r1′
hbase> t.deleteall ‘r1′, ‘c1′
hbase> t.deleteall ‘r1′, ‘c1′, ts1

disable:

Disable the named table: Example:
e.g. "hbase> disable 't1'"
disable_all:
Disable all of tables matching the given regex Example:
hbase> disable_all ‘t.*’

drop:

Drop the named table. Table must first be disabled. Example:
hbase> drop ‘t1′
drop_all:
Drop all of the tables matching the given regex Example:
hbase> drop_all ‘t.*’

enable:

Enable the named table Example:
hbase> enable ‘t1′
enable_all:
Enable all of the tables matching the given regex Example:
hbase> enable_all ‘t.*’
is_enabled:
verifies Is named table enabled Example:
hbase> is_enabled ‘t1′

exists:

Does the named table exist? Example:
e.g. "hbase> exists 't1'"

exit:

Type "hbase> exit" to leave the HBase Shell

get:

Get row or cell contents; pass table name, row, and optionally a dictionary of column(s), timestamp and versions.

Examples:

hbase> get 't1', 'r1'

hbase> get 't1', 'r1', {COLUMN => 'c1'}

hbase> get 't1', 'r1', {COLUMN => ['c1', 'c2', 'c3']}

hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}

hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, \

VERSIONS => 4}

list:

List all tables in hbase Example:
hbase> list
hbase> list ‘abc.*’

put:

Put a cell 'value' at specified table/row/column and optionally timestamp coordinates. To put a cell value into table 't1' at row 'r1' under column 'c1' marked with the time 'ts1', do:

Example:

hbase> put 't1', 'r1', 'c1', 'value', ts1

tools:

Listing of hbase surgery tools

scan:

Scan a table; pass table name and optionally a dictionary of scanner specifications. Scanner specifications may include one or more of the following: LIMIT, STARTROW, STOPROW, TIMESTAMP, or COLUMNS. If no columns are specified, all columns will be scanned. To scan all members of a column family, leave the qualifier empty as in 'col_family:'.

Examples:

hbase> scan '.META.'

hbase> scan '.META.', {COLUMNS => 'info:regioninfo'}

hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, \

STARTROW => 'xyz'}

For experts, there is an additional option -- CACHE_BLOCKS -- which switches block caching for the scanner on (true) or off (false). By default it is enabled.

Examples:

hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false}

status:

Show cluster status. Can be 'summary', 'simple', or 'detailed'. The default is 'summary'.

Examples:

hbase> status

hbase> status 'simple'

hbase> status 'summary'

hbase> status 'detailed'

shutdown:

Shut down the cluster.

truncate:

Disables, drops and recreates the specified table. Example:
hbase>truncate ‘t1′

version:

Output this HBase version Example:
hbase> version
show_filters:
Show all the filters in hbase. Example:
hbase> show_filters

alter_status:

Get the status of the alter command. Indicates the number of regions of the table that have received the updated schema Pass table name.

Example:

hbase> alter_status ‘t1′

flush:

Flush all regions in passed table or pass a region row to flush an individual region.

Example:

hbase> flush ‘TABLENAME’

hbase> flush ‘REGIONNAME’

major_compact:

Run major compaction on passed table or pass a region row to major compact an individual region. To compact a single column family within a region specify the region name followed by the column family name.

Examples:

Compact all regions in a table:

hbase> major_compact ‘t1′

Compact an entire region:

hbase> major_compact ‘r1′

Compact a single column family within a region:

hbase> major_compact ‘r1′, ‘c1′

Compact a single column family within a table:

hbase> major_compact ‘t1′, ‘c1′

split:

Split entire table or pass a region to split individual region. With the second parameter, you can specify an explicit split key for the region.

Examples:

hbase>split ‘tableName’

hbase>split ‘regionName’ # format: ‘tableName,startKey,id’

hbase>split ‘tableName’, ‘splitKey’

hbase>split ‘regionName’, ‘splitKey’

zk_dump:

Dump status of HBase cluster as seen by ZooKeeper.

Example:

hbase>zk_dump

start_replication:

Restarts all the replication features. The state in which each stream starts in is undetermined.

WARNING: start/stop replication is only meant to be used in critical load situations.

Examples:

hbase> start_replication

stop_replication:

Stops all the replication features. The state in which each stream stops in is undetermined.

WARNING: start/stop replication is only meant to be used in critical load situations.

Examples:

hbase> stop_replication

grant:

Grant users specific rights.

Syntax : grant permissions is either zero or more letters from the set “RWXCA”.

READ(‘R’), WRITE(‘W’), EXEC(‘X’), CREATE(‘C’), ADMIN(‘A’)

Example:

hbase> grant ‘bobsmith’, ‘RWXCA’

hbase> grant ‘bobsmith’, ‘RW’, ‘t1′, ‘f1′, ‘col1′

revoke:

Revoke a user’s access rights.

Syntax : revoke

Example:

hbase> revoke ‘bobsmith’, ‘t1′, ‘f1′, ‘col1′

user_permission:

Show all permissions for the particular user.

Syntax : user_permission

Example:

hbase> user_permission

hbase> user_permission ‘table1′

HBase Shell Commands

whoami:
Show the current hbase user. Example:
hbase> whoami

create:

Create table; pass table name, a dictionary of specifications per column family, and optionally a dictionary of table configuration. Dictionaries are described below in the GENERAL NOTES section.

Examples:

hbase> create 't1', {NAME => 'f1', VERSIONS => 5}

hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}

hbase> # The above in shorthand would be the following:

hbase> create 't1', 'f1', 'f2', 'f3'

hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, \

BLOCKCACHE => true}

describe:

Describe the named table Example:
e.g. "hbase> describe 't1'"

delete:

deleteall:

disable:

Disable the named table: Example:
e.g. "hbase> disable 't1'"
disable_all:
Disable all of tables matching the given regex Example:
hbase> disable_all ‘t.*’

drop:

Drop the named table. Table must first be disabled. Example:
hbase> drop ‘t1′
drop_all:
Drop all of the tables matching the given regex Example:
hbase> drop_all ‘t.*’

enable:

exists:

Does the named table exist? Example:
e.g. "hbase> exists 't1'"

exit:

Type "hbase> exit" to leave the HBase Shell

get:

Get row or cell contents; pass table name, row, and optionally a dictionary of column(s), timestamp and versions.

Examples:

hbase> get 't1', 'r1'

hbase> get 't1', 'r1', {COLUMN => 'c1'}

hbase> get 't1', 'r1', {COLUMN => ['c1', 'c2', 'c3']}

hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}

hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, \

VERSIONS => 4}

list:

List all tables in hbase Example:
hbase> list
hbase> list ‘abc.*’

put:

Put a cell 'value' at specified table/row/column and optionally timestamp coordinates. To put a cell value into table 't1' at row 'r1' under column 'c1' marked with the time 'ts1', do:

Example:

hbase> put 't1', 'r1', 'c1', 'value', ts1

tools:

Listing of hbase surgery tools

scan:

Examples:

hbase> scan '.META.'

hbase> scan '.META.', {COLUMNS => 'info:regioninfo'}

hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, \

STARTROW => 'xyz'}

For experts, there is an additional option -- CACHE_BLOCKS -- which switches block caching for the scanner on (true) or off (false). By default it is enabled.

Examples:

hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false}

status:

Show cluster status. Can be 'summary', 'simple', or 'detailed'. The default is 'summary'.

Examples:

hbase> status

hbase> status 'simple'

hbase> status 'summary'

hbase> status 'detailed'

shutdown:

Shut down the cluster.

truncate:

Disables, drops and recreates the specified table. Example:
hbase>truncate ‘t1′

version:

Output this HBase version Example:
hbase> version
show_filters:
Show all the filters in hbase. Example:
hbase> show_filters

alter_status:

Get the status of the alter command. Indicates the number of regions of the table that have received the updated schema Pass table name.

Example:

hbase> alter_status ‘t1′

flush:

Flush all regions in passed table or pass a region row to flush an individual region.

Example:

hbase> flush ‘TABLENAME’

hbase> flush ‘REGIONNAME’

major_compact:

Examples:

Compact all regions in a table:

hbase> major_compact ‘t1′

Compact an entire region:

hbase> major_compact ‘r1′

Compact a single column family within a region:

hbase> major_compact ‘r1′, ‘c1′

Compact a single column family within a table:

hbase> major_compact ‘t1′, ‘c1′

split:

Split entire table or pass a region to split individual region. With the second parameter, you can specify an explicit split key for the region.

Examples:

hbase>split ‘tableName’

hbase>split ‘regionName’ # format: ‘tableName,startKey,id’

hbase>split ‘tableName’, ‘splitKey’

hbase>split ‘regionName’, ‘splitKey’

zk_dump:

Dump status of HBase cluster as seen by ZooKeeper.

Example:

hbase>zk_dump

start_replication:

Restarts all the replication features. The state in which each stream starts in is undetermined.

WARNING: start/stop replication is only meant to be used in critical load situations.

Examples:

hbase> start_replication

stop_replication:

Stops all the replication features. The state in which each stream stops in is undetermined.

WARNING: start/stop replication is only meant to be used in critical load situations.

Examples:

hbase> stop_replication

grant:

Grant users specific rights.

Syntax : grant permissions is either zero or more letters from the set “RWXCA”.

READ(‘R’), WRITE(‘W’), EXEC(‘X’), CREATE(‘C’), ADMIN(‘A’)

Example:

hbase> grant ‘bobsmith’, ‘RWXCA’

hbase> grant ‘bobsmith’, ‘RW’, ‘t1′, ‘f1′, ‘col1′

revoke:

Revoke a user’s access rights.

Syntax : revoke

Example:

hbase> revoke ‘bobsmith’, ‘t1′, ‘f1′, ‘col1′

user_permission:

Show all permissions for the particular user.

Syntax : user_permission

Example:

hbase> user_permission

hbase> user_permission ‘table1′

HBase Examples

Go to HBase Mode
$hbase shell
List all the tables
hbase>list
Create HBase table with Normal Mode
hbase>create ‘cars’, ‘vi’
Let’s insert 3 column qualifies (make, model, year) and the associated values into the first row (row1).
hbase>put ‘cars’, ‘row1’, ‘vi:make’, ‘BMW’
hbase>put ‘cars’, ‘row1’, ‘vi:model’, ‘5 series’
hbase>put ‘cars’, ‘row1’, ‘vi:year’, ‘2012’
Now let’s add second row
hbase>put ‘cars’, ‘row2’, ‘vi:make’, ‘Ferari’
hbase>put ‘cars’, ‘row2’, ‘vi:model’, ‘e series’
hbase>put ‘cars’, ‘row2’, ‘vi:year’, ‘2012’
Now let’s add third row
hbase>put ‘cars’, ‘row3’, ‘vi:make’, ‘Honda’
hbase>put ‘cars’, ‘row3’, ‘vi:model’, ‘f series’
hbase>put ‘cars’, ‘row3’, ‘vi:year’, ‘2013’
Sacn the table
hbase>scan ‘cars’
The next scan we’ll run will limit our results to the make column qualifier.
hbase>scan ‘cars’, {COLUMNs=>[‘vi:make’]}
1 row to demonstrate how LIMIT works.
hbase>scan ‘cars’, {COLUMNS =>[‘vi:make’], LIMIT => 1}
We’ll start by getting all columns in row1.
hbase>get ‘cars’, ‘row1’
You should see output similar to:
COLUMN CELL
vi:make timestamp=1344817012999, value=bmw
vi:model timestamp=1344817020843, value=5 series
vi:year timestamp=1344817033611, value=2012

To get one specific column include the COLUMN option.

hbase>get ‘cars’, ‘row1’, {COLUMNS => ‘vi:make’}

You can also get two or more columns by passing an array of columns.

hbase>get ‘cars’, ‘row1’, {COLUMNS => [‘vi:make’, ‘vi:year’]}

Delete a cell (value)

hbase>delete ‘cars’, ‘row2’, ‘vi:year’

Let’s check that our delete worked

hbase>get ‘cars’, ‘row2’

You should see output that shows 2 columns.

COLUMN CELL
vi:make timestamp=1344817104923, value=mercedes
vi:model timestamp=1344817115463, value=e class 2
row(s) in 0.0080 seconds
Disable and drop tables
>disable ‘cars’
>drop ‘cars’
Exit the table
>exit

Praveen Kumar K C

Wednesday, February 10, 2016

HBase – Overview of Architecture and Data Model

Apache HBase

HBase Architecture

Hbase Data model

HBase Shell Commands

HBase Shell Commands

HBase Examples

1 comment:

Full Stack Development- BIS601 — covering JavaScript, React, Express, MongoDB, and the complete MERN stack.

Search This Blog