澳门美高梅手机网站HBase Getting Started(入门指南)

1 环境准备

1)安装jdk1.7或以上版本
2)安装hadoop-2.2.0 开发条件

入门指南

2 HBase安装包下载

下载地址:点击进入下载界面

下载界面.png

本次教程中动用的
hbase-0.98.23-hadoop2-bin.tar.gz。所以到对应的目录下载安装包即可

1. 简介

Quickstart会让您启动和运行一个单节点单机HBase。

3 部署类型

2. 飞跃启动 – 单点HBase

这部分讲述单节点单机HBase的安排。一个单例拥有富有的HBase守护线程—Master,RegionServers和ZooKeeper,运行一个独门JVM持久化到本地文件系统。这是大家最基础的布置文档。我们将会向你显得咋样通过hbase
shell
CLI在HBase中创制一个表格,在表中插入行,执行put和scan操作,让表使能和启动和平息HBase等等操作。

除去下载HBase,这多少个过程大约需要不到10分钟地时间。

HBase
0.94.x此前的本子希望回送IP地址为127.0.0.1,而UBuntu和此外批发版默认是127.0.1.1,这将会给您造成麻烦。查看Why
does HBase care about
/etc/hosts?
收获更多细节

在Ubuntu上运行0.94.x事先版本的HBase,/etc/hosts文档应该以下面所写的模板来担保正常运行

127.0.0.1 localhost

127.0.0.1 ubuntu.ubuntu-domain ubuntu

hbase-0.96.0本子之后的已经修复了。

3.1 HBase单机情势部署

这种安排情势实际没多大意义,它跟伪分布式情势的安排区别就在于伪分布式部署
添加了下边安装

  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>

单机形式中具备医护进程都运行在同一个JVM实例中,而伪分布式部署中则每个守护进程(HMaster、HRegionServer,HQuorumPeer)运行在独立的进程中。所以这里我们直接跳过单机模式部署的以身作则,直接进去伪分布式部署的有的。

2.1. JDK 版本要求

HBase
需要设置JDK。查看Java来赢得每个HBase版本所辅助的JDK版本。

3.2 HBase伪分布式部署

1)执行以下命令切换至hadoop用户下: su hadoop
2)进入/usr/java目录:cd /usr/java
3)从本土上传hbase安装包:rz,然后拔取hbase-0.98.23-hadoop2-bin.tar.gz
4)解压安装包:tar -zxvf hbase-0.98.23-hadoop2-bin.tar.gz
5)修改文件夹名称:mv hbase-0.98.23-hadoop2 hbase
6)切换至root用户
7)配置环境变量,修改 /etc/profile:vi /etc/profile扩张下面内容:

HBASE_HOME=/usr/java/hbase
PATH= $HBASE_HOME/bin:$PATH 
export HBASE_HOME PATH

封存并退出
8)使环境变量生效:source
/etc/profile,假如你发现在root用户执行生效没用的话,你可以切换来hadoop用户下,再举行三回
9)进入conf 目录,修改 conf/hbase-env.sh
9.1)去掉 JAVA_HOME 前的 “#”,并将其修改成温馨设置的 Java 路径。

修改JAVA_HOME.png

9.2) 去掉 HBASE_MANAGES_ZK 前的 “#”,并设置其值为 true(HBase
管理自己的 ZooKeeper,这样就不需要安装 ZooKeeper)。

修改 HBASE_MANAGES_ZK .png

因为hbase-env.sh文件行数较多,那么找起来相比劳顿,可以在vi hase-env.sh
进入文件从此按在 / 然后输入HBASE_MANAGES_ZK
,再按enter便可按图索骥到字符串

搜索 HBASE_MANAGES_ZK .png

  1. vi hbase-site.xml
    进入你会意识只有<configuration></configuration>,往configuration标签里面添加以下内容

<configuration>

        <property>
                <name>hbase.rootdir</name>
                <value>hdfs://szh:9000/hbase</value>
        </property>
        <property>
                <name>hbase.tmp.dir</name>
                <value>/usr/java/hbase/tmp</value>
        </property>
         <property>
                <name>hbase.ZooKeeper.quorum</name>
                <value>szh</value>
        </property>
         <property>
                <name>ZooKeeper.session.timeout</name>
                <value>6000</value>
        </property>
         <property>
                <name>hbase.ZooKeeper.property.clientPort</name>
                <value>2181</value>
        </property>
        <property>
                <name>hbase.cluster.distributed</name>
                <value>true</value>
         </property>
</configuration>

hdfs://szh:9000/
是在hadoop的core-site.xml
定义fs.default.name,所以在hbase-site.xml中布置的时候要留意甄别

11)启动hadoop
12)start-hbase.sh

开头的线程

13)进入WebUI界面在浏览器输入szh:60010

HBase Web 界面

以上的伪分布式系统现已可以满足HBase的操作,后续我们将会组成Hadoop集群一起来演示HBase全分布式部署。

2.2. 最先利用 HBase

3.3 HBase全分布式部署

待续…

经过:下载,配置,和开行单机格局HBase

  1. Apache Download
    Mirrors
    列表中选一个下载节点。点击显示的链接。这将会带你到一个HBase发表版本的镜像。点击名字为stable的公文夹然后下载文件结尾为.tar.gz的二进制文件到您的地方文件系统中。不要下载文件结尾为src.tar.gz的文本。

2.提取下载文件同时将它内置新建的目录。

$ tar xzvf hbase-2.0.0-SNAPSHOT-bin.tar.gz

$ cd hbase-2.0.0-SNAPSHOT/

3.
您需要在起步HBase在此以前安装好JAVA_HOME环境变量,你能够经过你的操作系统常用方法来设置这多少个变量,然而HBase提供了一种中央编制,conf/hbse-env.sh。编辑那么些文档,将JAVA-HOME这一行的注释给撤废,然后将他的值设为您的操作系统中JAVA的设置路径。JAVA_HOME变量应该安装包含可执行文件bin/java的路子。大多数现代的Linux操作系统提供一种机制,例如在RHEL或者CentOS是/usr/bin/alternatives,为了可以呈现地切换Java版本。在这种情状,你可以在装置JAVA_HOME为包含bin/java符号链接的目录,平日是/usr。

JAVA_HOME=/usr

4.
编纂conf/hbase-site.xml,该文档是HBase配置文件。在这些时间点你只需要在本土文件系统中指定HBase和ZooKeeper写多少的目录。默认状况下,会在/tmp目录下开创一个新目录。许多服务器会配备为一旦reboot那么会删除/tmp目录下的始末,所以您应该在另外地点贮存数据。接下来的配备将会储存HBase的数量在hbase目录下,放在用户testuser的主目录下。新安装的HBase下 标签里面的情节是空,粘贴 标签到 下举行布局。

Example 1. Example hbase-site.xml for Standalone HBase

hbase.rootdir

file:///home/testuser/hbase

hbase.zookeeper.property.dataDir

/home/testuser/zookeeper

您不需要创制HBase数据目录。HBase将会为你创立。如若您自己创造了,HBase将会总计一个你并不想要的搬迁。

下边例子中hbase.rootdir针对本地文件系统的目录。大家用‘file:/’前缀来表示当地文件系统。将HBase的home目录配置在已部分HDFS实例上,设置hbase.rootdir指向您的HDFS实例,例如hdfs://namenode.example.org:8020/hbase.关于这个变量的细节,请查看上面在HDFS上布置单机HBase部分。

5.
bin/start-hbase.sh脚本将提供一个便利的艺术来启动HBase。发出这个命令并且运行卓越的话,一条标准的中标启动的消息会打印在控制台上。你可以通过jps命令来判定你是否曾经运行一个HMaster进程。在单价形式下,HBase会在这一个独立的JVM中启动HMater,HRegionServer和ZooKeeper守护进程。在http://localhost:16010查看HBase
WebUI 。

急需设置Java并且使之可用。假若你已经设置了,可是却报错提醒您没有安装,可能安装在一个非标准路径下,编辑conf/hbase-env.sh并且修改JAVA_HOME,将包含bin/java的目录赋给它

过程:第一次使用HBase

1)连接HBase

使在您HBase安装目录下的bin/ 下用hbase
shell命令行来连接HBase。在这个事例中,会打印一些您在开行的HBase
shell用时遗漏的用法和版本新闻。HBase Shell用>符号来表示结束。

$ ./bin/hbase shell

hbase(main):001:0>

2)显示HBase协助文本

出口help按下Enter,彰显HBase
Shell的功底运用音讯,以及部分示范命令。需要留意的是表名,行,列都必须用引用符号。

创建表

运用create命令来成立一个新表。你不能够不指定表名和列族名

hbase(main):001:0> create ‘test’, ‘cf’

0 row(s) in 0.4170 seconds

=> Hbase::Table – test

列出表的信息

使用list命令

hbase(main):002:0> list ‘test’

TABLE

test

1 row(s) in 0.0180 seconds

=> [“test”]

插入数据到表中

应用put命令来插入数据.

hbase(main):003:0> put ‘test’, ‘row1’, ‘cf:a’, ‘value1’

0 row(s) in 0.0850 seconds

hbase(main):004:0> put ‘test’, ‘row2’, ‘cf:b’, ‘value2’

0 row(s) in 0.0110 seconds

hbase(main):005:0> put ‘test’, ‘row3’, ‘cf:c’, ‘value3’

0 row(s) in 0.0100 seconds

在这边,大家每回插入一条数据,总共三条。第一次将value1插入到row1,列cf:a中。HBase中的列以列族名为前缀,例子中是cf,前面是冒号和列的限制符后缀,例子中是a。

五遍查看所有数据

行使scan命令行来围观表格的多寡。你可以界定你的扫视,不过当前,所有的数量都被获取了。

hbase(main):006:0> scan ‘test’

ROW                                      COLUMN+CELL

row1                                    column=cf:a,
timestamp=1421762485768, value=value1

row2                                    column=cf:b,
timestamp=1421762491785, value=value2

row3                                    column=cf:c,
timestamp=1421762496210, value=value3

3 row(s) in 0.0230 seconds

收获一行的数码

用get命令两次得到一行数据

hbase(main):007:0> get ‘test’, ‘row1’

COLUMN                                   CELL

cf:a                                    timestamp=1421762485768,
value=value1

1 row(s) in 0.0350 seconds

禁用表

设若您想要删除一个表或者变更它的配备,以及另外一些场地,你首先需要用disable命令来禁用表。

hbase(main):008:0> disable ‘test’

0 row(s) in 1.1820 seconds

hbase(main):009:0> enable ‘test’

0 row(s) in 0.1770 seconds

在启用‘test’之后重新禁用‘test’

hbase(main):010:0> disable ‘test’

0 row(s) in 1.1820 seconds

删除表

用drop命令来删除表

hbase(main):011:0> drop ‘test’

0 row(s) in 0.1370 seconds

退出HBase Shell

使用exit来与HBase断开连接,但HBase依旧在后台运行

过程:关闭HBase

跟bin/start-hbase.sh脚本一样方便地启动HBase,用bin/stop-hbase.sh脚本来截至它。

$ ./bin/stop-hbase.sh

stopping hbase………………..

$

在爆发这几个命令之后,将消费几分钟的流年来关闭。使用jps来保管HMaster和HRegionServer已经关闭。

位置的内容早已向你显得了什么启动和平息一个单机HBase。在下有些我们将提供其他模式的部署。

2.3. 伪分布式本地安装

在通过quickstart启航了单机格局之后,你可以重新配置来运行伪分布式格局。伪分布式形式表示HBase依旧运行在一个节点上,可是每个HBase的医护进程(HMaster,
HRegionServer, and
ZooKeeper)运行在独立的历程中:在单机形式中有着的看护进程都运行在一个JVM实例中。默认情状下,除非您安排像quickstart中所描述的安排 hbase.rootdir属性,你的数目仍然蕴藏在/tmp/中。在这一次演示中,我们将数据存储在HDFS中,确保您HDFS是可用的。你可以跳过HDFS配置后续将数据存储在地面文件系统中

Hadoop配置

以此过程假设你早已在当地系统或者远程系统中配备好Hadoop和HDFS,并且可以运转和保管可用。也只要你使用Hadoop2.Setting
up a Single Node
Cluster
将指导怎么样搭建单节点Hadoop

1)假设HBase还在运行请截至它

要是你早就到位quickstart中的率领并且HBase如故在运转,请结束他。那一个进程将成立一个新的目录来存储它的数目,所以在此之前您成立的数据库将会丢掉。

2)配置HBase

编辑hbase-site.xml进行配置. 第一,添加下边 property来 带领HBase运行分布式形式, 每个守护进程运行在一个JVM上。

hbase.cluster.distributed

true

接下去, 将hbase.rootdir由本土系统改为HDFS实例的地方, 使用 hdfs://// URI
语法. 在那个例子当中, HDFS 运行在端口 8020上.

hbase.rootdir

hdfs://localhost:8020/hbase

您不需要在HDFS上成立一个索引。HBase会自己创制。假若你自己创造了,HBase会试图做一些你并不想要的搬迁。

3)启动HBase

使用bin/start-hbase.sh指令来启动HBase.
如若您的体系安排是不利的话,使用jps命令将会看出HMaster和HRegionServer已经运行。

4)检查HBase在HDFS中的目录

固然拥有都运行正确的话,HBase将会在HDFS中创立它的目录。在地点的布置中,它将积存在HDFS的/hbase中。你可以在Hadoop的bin/下使用hadoop
fs命令行来列出这多少个目录下的富有文件。

$ ./bin/hadoop fs -ls /hbase

Found 7 items

drwxr-xr-x   – hbase users          0 2014-06-25 18:58 /hbase/.tmp

drwxr-xr-x   – hbase users          0 2014-06-25 21:49 /hbase/WALs

drwxr-xr-x   – hbase users          0 2014-06-25 18:48 /hbase/corrupt

drwxr-xr-x   – hbase users          0 2014-06-25 18:58 /hbase/data

-rw-r–r–   3 hbase users         42 2014-06-25 18:41 /hbase/hbase.id

-rw-r–r–   3 hbase users          7 2014-06-25 18:41
/hbase/hbase.version

drwxr-xr-x   – hbase users          0 2014-06-25 21:49 /hbase/oldWALs

5)创制一个报表并插入数据

您能够行使HBase
Shell来创制一个表格,插入数据,扫描和获取数据,使用形式和shell
exercises
所显示的同样。

6)启动和平息一个HMaster备用服务器

澳门美高梅手机网站,在同一个硬件环境上运行六个HMaster实例的图景不可能冒出在生产条件,同样伪分布式也是不同意的。这些手续只适用于测试和学习

HMaster服务器控制HBase
集群。你可以启动9个HMaster服务器,那么10个HMaster一起执行总计。使用local-master-backup.sh来启动一个HMaster备用服务器。你想要启动的各样备用服务器都要添加一个意味master的端口参数。每个备用HMaster使用多个端口(默认是16010,16020,16030)端口都是以默认默认端口举行偏移的,偏移量为2的话,备用HMaster的端口会是16012,16022,16032。上面的命令用来启动3个端口分别为16012/16022/16032、
16013/16023/16033和16015/16025/16035的HMaster。

$ ./bin/local-master-backup.sh 2 3 5

想要杀掉一个备用master而不是虚掩所有过程,你需要找到她的ID(PID)。PID存储在一个名字为/tmp/hbase-USER-X-master.pid的文本中。该公文之中的始末唯有PID。你可以运用kill-9命令来杀掉PID。上面的命令杀掉端口为偏移量1的master,而集群仍旧运行:

$ cat /tmp/hbase-testuser-1-master.pid |xargs kill -9

7)启动和终止此外的RegionServers

HRegionServer被HMaster率领管理它StoreFiles里的多少。平日来说,集群中的每个节点都运行一个HReigionServer,运行六个HRegionServer在同样系统当中可以用来测试伪分布式情势。使用local-regionservers.sh命令运行三个RegionServers。跟local-master-backup.sh一样,为每个实例提供端口偏移量。每个RegionServer需要两个端口,默认端口为16020和16030。然则,1.0.0本子的主干端口已经被HMaster所使用,所以RegionServer不能使用默认端口。所有骨干端口改为16200和16300。你可以在一个劳务中运行99额外RegionServer而不是一个HMaster或者HMaster。下边的命令用来启动端口从16202/16302从头接二连三的附加的RegionServer。

$ .bin/local-regionservers.sh start 2 3 4 5

动用local-regionservers.sh 命令
和要关闭的server的偏移量参数来手动停止RegionServer。

$ .bin/local-regionservers.sh stop 3

8)停止 HBase

您能够应用quickstart中论述的指令bin/stop-hbase.sh来停止HBase。

2.4. 尖端 – 全分布式

事实上,你需要一个全分布式的部署来测试完整的HBase并且将它用在实际世界的应用场景中。在一个分布式配置中,集群包括多少个节点,每个节点运行一个或者两个HBase守护进程。这么些包括重要的和备用Master实例,五个ZooKeeper节点和四个RegionServer节点。

那个高级配置比quickstart中多添加了五个节点,结构如下:

Table 1. Distributed Cluster Demo Architecture

Node Name               Master    ZooKeeper    RegionServer

node-a.example.com      yes                yes                        
 no

node-b.example.com   backup           yes                         yes

node-c.example.com      no                 yes                        
yes

这一个高速启动设定每个节点都是一个虚拟机而且她们在同一的网络上。它搭建在事先的quickstart和Pseudo-Distributed
Local
Install
如上,设定你前面安排类别为node-a。在持续操作此前请截至HBase。

防火墙也应有关闭确保所节点都可以互相通信。假使你看来no route to
host的报错,检查你的防火墙。

进程:配置无密钥SSH登陆

node-a 需要报到到node-b和node-c来启动守护进程。最简便的兑现形式是在具备的主机上拔取相同用户名,配置无密钥SSH登陆。

1)在 node-a上转变密钥对

登陆这些要运行HBase的用户,使用下面发号施令生成一个SSH密钥对:

$ ssh-keygen -t rsa

假诺该命令成功执行,那么密钥对的不二法门就会打印到专业输出。公钥的默认名字为id_rsa.pub

2)在其他节点创造用来囤积密钥的门路。

在node-b和node-c,登陆HBase用户同时在用户的home目录下创立.ssh/目录,要是该目录不存在的话。要是已经存在,要发现到他或许曾经包含其他密钥了。

3)复制密钥到其他节点

动用scp或者其他安全的不二法门将密钥安全地从node-a复制到其他各样节点上。每个节点上只要不设有.ssh/authorized_keys以此文件的话,那么成立一个,然后将id_rsa.pub文本的情节添加到该文件末端。需要验证的是你需要在node-a做相同的操作。

$ cat id_rsa.pub >> ~/.ssh/authorized_keys

4)测试无密钥登陆.

设若全勤运行顺畅的话,那么你能够运用SSH用相同的用户名而不需要密钥的情形下登陆其他节点。

5)因为node-b将会运作一个备用Master,重复上述的经过,将能见到的node-a都换成node-b。确保不要覆盖已经存在的.ssh/authorized_keys的文档,但可以用>>符号将密钥追加到已存在的文档后边。

过程:预备node-a**

node-a将会运行主master和ZooKeeper进程,不过尚未RegionServers。在node-a将RegionServer停掉。

1)编辑conf/regionservers和移除包含localhost的那一行。添加node-b和node-c的主机名和IP地址。

虽然你想要在node-a运行一个RegionServer,你应有给她点名一个主机名便于其他服务可以和它通讯。在这么些事例当中,主机名为node-a.example.com。这使得你可以分布布局到集群每个节点来避免主机名争辨。保存文档。

2)将node-b配置为一个备用master。

由此在conf/目录下创制一个名为backup-master的新文件,然后添加一行node-b的主机名。在这个示例当中,主机名为node-b.example.com

3)配置ZooKeeper

实际上,你应该认真的布局你的ZooKeeper。你可以在zookeeper找到更多关于ZooKeeper的细节。那些配置会指点HBase的起步和管理集群的每个节点中的ZooKeeper实例。

On node-a, editconf/hbase-site.xmland add the following properties.

hbase.zookeeper.quorum

node-a.example.com,node-b.example.com,node-c.example.com

hbase.zookeeper.property.dataDir

/usr/local/zookeeper

4)在您的配置中把node-a配置为主机的地方转移指向主机名的引用以致其他节点可以选择它来表示node-a。在这一个示例当中,主机名是node-a.example.com。

过程:预备node-bnode-c**

node-b 将会运作一个备用master 服务器和一个ZooKeeper 实例.

1)下载和解压HBase.

在node-b下下载和解压HBase,跟你在quickstart和伪分布式中所做的一模一样。

2)从node-a复制配置消息到node-b和node-c

集群中的每个节点需要平等的布局音信。复制conf/下的始末到node-b和node-c下conf/。

进程:启动和测试你的集群

1)确保其他节点上从未有过运行HBase

假若你在前头测试中忘记截至HBase,就会出错。用jps命令行检查HBase是否运行。看看HMaster,HRegionServer和HQuorumPeer是否留存,假使存在,那么杀掉。

2)启动集群

在node-a上,运行start-hbase.sh命令。就会打出接近下面的输出:

$ bin/start-hbase.sh

node-c.example.com: starting zookeeper, logging to
/home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-c.example.com.out

node-a.example.com: starting zookeeper, logging to
/home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-a.example.com.out

node-b.example.com: starting zookeeper, logging to
/home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-b.example.com.out

starting master, logging to
/home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-node-a.example.com.out

node-c.example.com: starting regionserver, logging to
/home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-c.example.com.out

node-b.example.com: starting regionserver, logging to
/home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-b.example.com.out

node-b.example.com: starting master, logging to
/home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-nodeb.example.com.out

先启动ZooKeeper,接着master,然后RegionServers,最终是备用masters。

3)检查过程是否运行

在集群中的每个节点,运行jps命令检查每个服务是否正规运作。你恐怕会师到其他用于其余目的Java进程也运行着。

Example 2.node-ajpsOutput**

$ jps

20355 Jps

20071 HQuorumPeer

20137 HMaster

Example 3.node-bjpsOutput**

$ jps

15930 HRegionServer

16194 Jps

15838 HQuorumPeer

16010 HMaster

Example 4.node-ajpsOutput**

$ jps

13901 Jps

13639 HQuorumPeer

13737 HRegionServer

ZooKeeper进程名字

HQuorumPeer 进程就是ZooKeeper实例由HBase启动用来支配HBase的。假诺你在此处运用ZooKeeper,那么会限制集群中每个节点有一个实例并且只适用于测试。假若ZooKeeper运行在HBase之外,那么进程名为QuorumPeer。请到zookeeper查阅更多关于ZooKeeper配置包括假如用外表ZooKeeper控制HBase。

4)浏览Web

Web访问端口改变

比方HBase的本子高于0.98.x,那么登陆master的端口由60010改为16010,登陆RegionServer的端口由60030改为16030。

一经安排都不利的话,你应有可以运用浏览器通过http://node-a.example.com:16010/连接Master,通过http://node-b.example.com:16010/连天备用Master。假使您不得不通过本地主机登陆而此外主机不能,检查你的防火墙规则。你可以由此ip:16030来连接RegionServers,也可以在Master的Web界面中点击相关链接来登陆。

5)当节点仍旧服务没有时测试一下发出了什么

正如您安排的多少个节点,事情并不连续如您所想。你可以因而杀死进程观察log来探望当主Master或者RegionServer消失时暴发了什么?

下面是原文


Getting Started

  1. Introduction

Quickstartwill
get you up and running on a single-node, standalone instance of HBase.

  1. Quick Start – Standalone HBase

This section describes the setup of a single-node standalone HBase.
Astandaloneinstance has all HBase daemons — the Master, RegionServers,
and ZooKeeper — running in a single JVM persisting to the local
filesystem. It is our most basic deploy profile. We will show you how to
create a table in HBase using thehbase shellCLI, insert rows into the
table, perform put and scan operations against the table, enable or
disable the table, and start and stop HBase.

Apart from downloading HBase, this procedure should take less than 10
minutes.

Prior to HBase 0.94.x, HBase expected the loopback IP address to be
127.0.0.1. Ubuntu and some other distributions default to 127.0.1.1 and
this will cause problems for you. SeeWhy does HBase care about
/etc/hosts?
for
detail

The following/etc/hostsfile works correctly for HBase 0.94.x and
earlier, on Ubuntu. Use this as a template if you run into trouble.

127.0.0.1 localhost

127.0.0.1 ubuntu.ubuntu-domain ubuntu

This issue has been fixed in hbase-0.96.0 and beyond.

2.1. JDK Version Requirements

HBase requires that a JDK be installed.
SeeJavafor
information about supported JDK versions.

2.2. Get Started with HBase

Procedure: Download, Configure, and Start HBase in Standalone Mode

Choose a download site from this list ofApache Download
Mirrors
.
Click on the suggested top link. This will take you to a mirror ofHBase
Releases
. Click on the folder namedstableand then download the binary
file that ends in.tar.gzto your local filesystem. Do not download the
file ending insrc.tar.gzfor now.

Extract the downloaded file, and change to the newly-created directory.

$ tar xzvf hbase-2.0.0-SNAPSHOT-bin.tar.gz

$ cd hbase-2.0.0-SNAPSHOT/

You are required to set theJAVA_HOMEenvironment variable before
starting HBase. You can set the variable via your operating system’s
usual mechanism, but HBase provides a central
mechanism,conf/hbase-env.sh. Edit this file, uncomment the line
starting withJAVA_HOME, and set it to the appropriate location for your
operating system. TheJAVA_HOMEvariable should be set to a directory
which contains the executable filebin/java. Most modern Linux
operating systems provide a mechanism, such as /usr/bin/alternatives on
RHEL or CentOS, for transparently switching between versions of
executables such as Java. In this case, you can setJAVA_HOMEto the
directory containing the symbolic link tobin/java, which is
usually/usr.

JAVA_HOME=/usr

Editconf/hbase-site.xml, which is the main HBase configuration file.
At this time, you only need to specify the directory on the local
filesystem where HBase and ZooKeeper write data. By default, a new
directory is created under /tmp. Many servers are configured to delete
the contents of/tmpupon reboot, so you should store the data
elsewhere. The following configuration will store HBase’s data in
thehbasedirectory, in the home directory of the user calledtestuser.
Paste thetags beneath thetags, which should be empty in a new HBase
install.

Example 1. Examplehbase-site.xmlfor Standalone HBase

hbase.rootdirfile:///home/testuser/hbasehbase.zookeeper.property.dataDir/home/testuser/zookeeper

You do not need to create the HBase data directory. HBase will do this
for you. If you create the directory, HBase will attempt to do a
migration, which is not what you want.

Thehbase.rootdirin the above example points to a directory in
thelocal filesystem. The ‘file:/’ prefix is how we denote local
filesystem. To home HBase on an existing instance of HDFS, set
thehbase.rootdirto point at a directory up on your instance:
e.g.hdfs://namenode.example.org:8020/hbase. For more on this variant,
see the section below on Standalone HBase over HDFS.

Thebin/start-hbase.shscript is provided as a convenient way to start
HBase. Issue the command, and if all goes well, a message is logged to
standard output showing that HBase started successfully. You can use
thejpscommand to verify that you have one running process calledHMaster.
In standalone mode HBase runs all daemons within this single JVM, i.e.
the HMaster, a single HRegionServer, and the ZooKeeper daemon. Go
tohttp://localhost:16010to
view the HBase Web UI.

Java needs to be installed and available. If you get an error indicating
that Java is not installed, but it is on your system, perhaps in a
non-standard location, edit theconf/hbase-env.shfile and modify
theJAVA_HOMEsetting to point to the directory that
containsbin/javayour system.

Procedure: Use HBase For the First Time

Connect to HBase.

Connect to your running instance of HBase using thehbase shellcommand,
located in thebin/directory of your HBase install. In this example,
some usage and version information that is printed when you start HBase
Shell has been omitted. The HBase Shell prompt ends with a>character.

$ ./bin/hbase shell

hbase(main):001:0>

Display HBase Shell Help Text.

Typehelpand press Enter, to display some basic usage information for
HBase Shell, as well as several example commands. Notice that table
names, rows, columns all must be enclosed in quote characters.

Create a table.

Use thecreatecommand to create a new table. You must specify the table
name and the ColumnFamily name.

hbase(main):001:0> create ‘test’, ‘cf’

0 row(s) in 0.4170 seconds

=> Hbase::Table – test

List Information About your Table

Use thelistcommand to

hbase(main):002:0> list ‘test’

TABLE

test

1 row(s) in 0.0180 seconds

=> [“test”]

Put data into your table.

To put data into your table, use theputcommand.

hbase(main):003:0> put ‘test’, ‘row1’, ‘cf:a’, ‘value1’

0 row(s) in 0.0850 seconds

hbase(main):004:0> put ‘test’, ‘row2’, ‘cf:b’, ‘value2’

0 row(s) in 0.0110 seconds

hbase(main):005:0> put ‘test’, ‘row3’, ‘cf:c’, ‘value3’

0 row(s) in 0.0100 seconds

Here, we insert three values, one at a time. The first insert is atrow1,
columncf:a, with a value ofvalue1. Columns in HBase are comprised of a
column family prefix,cfin this example, followed by a colon and then a
column qualifier suffix,ain this case.

Scan the table for all data at once.

One of the ways to get data from HBase is to scan. Use thescancommand to
scan the table for data. You can limit your scan, but for now, all data
is fetched.

hbase(main):006:0> scan ‘test’

ROW                                      COLUMN+CELL

row1                                    column=cf:a,
timestamp=1421762485768, value=value1

row2                                    column=cf:b,
timestamp=1421762491785, value=value2

row3                                    column=cf:c,
timestamp=1421762496210, value=value3

3 row(s) in 0.0230 seconds

Get a single row of data.

To get a single row of data at a time, use thegetcommand.

hbase(main):007:0> get ‘test’, ‘row1’

COLUMN                                  CELL

cf:a                                    timestamp=1421762485768,
value=value1

1 row(s) in 0.0350 seconds

Disable a table.

If you want to delete a table or change its settings, as well as in some
other situations, you need to disable the table first, using
thedisablecommand. You can re-enable it using theenablecommand.

hbase(main):008:0> disable ‘test’

0 row(s) in 1.1820 seconds

hbase(main):009:0> enable ‘test’

0 row(s) in 0.1770 seconds

Disable the table again if you tested theenablecommand above:

hbase(main):010:0> disable ‘test’

0 row(s) in 1.1820 seconds

Drop the table.

To drop (delete) a table, use thedropcommand.

hbase(main):011:0> drop ‘test’

0 row(s) in 0.1370 seconds

Exit the HBase Shell.

To exit the HBase Shell and disconnect from your cluster, use
thequitcommand. HBase is still running in the background.

Procedure: Stop HBase

In the same way that thebin/start-hbase.shscript is provided to
conveniently start all HBase daemons, thebin/stop-hbase.shscript stops
them.

$ ./bin/stop-hbase.sh

stopping hbase………………..

$

After issuing the command, it can take several minutes for the processes
to shut down. Use thejpsto be sure that the HMaster and HRegionServer
processes are shut down.

The above has shown you how to start and stop a standalone instance of
HBase. In the next sections we give a quick overview of other modes of
hbase deploy.

2.3. Pseudo-Distributed Local Install

After working your way
throughquickstartstandalone
mode, you can re-configure HBase to run in pseudo-distributed mode.
Pseudo-distributed mode means that HBase still runs completely on a
single host, but each HBase daemon (HMaster, HRegionServer, and
ZooKeeper) runs as a separate process: in standalone mode all daemons
ran in one jvm process/instance. By default, unless you configure
thehbase.rootdirproperty as described
inquickstart,
your data is still stored in/tmp/. In this walk-through, we store your
data in HDFS instead, assuming you have HDFS available. You can skip the
HDFS configuration to continue storing your data in the local
filesystem.

Hadoop Configuration

This procedure assumes that you have configured Hadoop and HDFS on your
local system and/or a remote system, and that they are running and
available. It also assumes you are using Hadoop 2. The guide onSetting
up a Single Node
Cluster
in
the Hadoop documentation is a good starting point.

Stop HBase if it is running.

If you have just
finishedquickstartand
HBase is still running, stop it. This procedure will create a totally
new directory where HBase will store its data, so any databases you
created before will be lost.

Configure HBase.

Edit thehbase-site.xmlconfiguration. First, add the following
property. which directs HBase to run in distributed mode, with one JVM
instance per daemon.

hbase.cluster.distributedtrue

Next, change thehbase.rootdirfrom the local filesystem to the address of
your HDFS instance, using thehdfs:////URI syntax. In this example, HDFS
is running on the localhost at port 8020.

hbase.rootdirhdfs://localhost:8020/hbase

You do not need to create the directory in HDFS. HBase will do this for
you. If you create the directory, HBase will attempt to do a migration,
which is not what you want.

Start HBase.

Use thebin/start-hbase.shcommand to start HBase. If your system is
configured correctly, thejpscommand should show the HMaster and
HRegionServer processes running.

Check the HBase directory in HDFS.

If everything worked correctly, HBase created its directory in HDFS. In
the configuration above, it is stored in/hbase/on HDFS. You can use
thehadoop fscommand in Hadoop’sbin/directory to list this directory.

$ ./bin/hadoop fs -ls /hbase

Found 7 items

drwxr-xr-x  – hbase users          0 2014-06-25 18:58 /hbase/.tmp

drwxr-xr-x  – hbase users          0 2014-06-25 21:49 /hbase/WALs

drwxr-xr-x  – hbase users          0 2014-06-25 18:48 /hbase/corrupt

drwxr-xr-x  – hbase users          0 2014-06-25 18:58 /hbase/data

-rw-r–r–  3 hbase users        42 2014-06-25 18:41 /hbase/hbase.id

-rw-r–r–  3 hbase users          7 2014-06-25 18:41
/hbase/hbase.version

drwxr-xr-x  – hbase users          0 2014-06-25 21:49 /hbase/oldWALs

Create a table and populate it with data.

You can use the HBase Shell to create a table, populate it with data,
scan and get values from it, using the same procedure as inshell
exercises
.

Start and stop a backup HBase Master (HMaster) server.

Running multiple HMaster instances on the same hardware does not make
sense in a production environment, in the same way that running a
pseudo-distributed cluster does not make sense for production. This step
is offered for testing and learning purposes only.

The HMaster server controls the HBase cluster. You can start up to 9
backup HMaster servers, which makes 10 total HMasters, counting the
primary. To start a backup HMaster, use thelocal-master-backup.sh. For
each backup master you want to start, add a parameter representing the
port offset for that master. Each HMaster uses three ports (16010,
16020, and 16030 by default). The port offset is added to these ports,
so using an offset of 2, the backup HMaster would use ports 16012,
16022, and 16032. The following command starts 3 backup servers using
ports 16012/16022/16032, 16013/16023/16033, and 16015/16025/16035.

$ ./bin/local-master-backup.sh 2 3 5

To kill a backup master without killing the entire cluster, you need to
find its process ID (PID). The PID is stored in a file with a name
like/tmp/hbase-USER-X-master.pid. The only contents of the file is the
PID. You can use thekill -9command to kill that PID. The following
command will kill the master with port offset 1, but leave the cluster
running:

$ cat /tmp/hbase-testuser-1-master.pid |xargs kill -9

Start and stop additional RegionServers

The HRegionServer manages the data in its StoreFiles as directed by the
HMaster. Generally, one HRegionServer runs per node in the cluster.
Running multiple HRegionServers on the same system can be useful for
testing in pseudo-distributed mode. Thelocal-regionservers.shcommand
allows you to run multiple RegionServers. It works in a similar way to
thelocal-master-backup.shcommand, in that each parameter you provide
represents the port offset for an instance. Each RegionServer requires
two ports, and the default ports are 16020 and 16030. However, the base
ports for additional RegionServers are not the default ports since the
default ports are used by the HMaster, which is also a RegionServer
since HBase version 1.0.0. The base ports are 16200 and 16300 instead.
You can run 99 additional RegionServers that are not a HMaster or backup
HMaster, on a server. The following command starts four additional
RegionServers, running on sequential ports starting at 16202/16302 (base
ports 16200/16300 plus 2).

$ .bin/local-regionservers.sh start 2 3 4 5

To stop a RegionServer manually, use thelocal-regionservers.shcommand
with thestopparameter and the offset of the server to stop.

$ .bin/local-regionservers.sh stop 3

Stop HBase.

You can stop HBase the same way as in
thequickstartprocedure,
using thebin/stop-hbase.shcommand.

2.4. Advanced – Fully Distributed

In reality, you need a fully-distributed configuration to fully test
HBase and to use it in real-world scenarios. In a distributed
configuration, the cluster contains multiple nodes, each of which runs
one or more HBase daemon. These include primary and backup Master
instances, multiple ZooKeeper nodes, and multiple RegionServer nodes.

This advanced quickstart adds two more nodes to your cluster. The
architecture will be as follows:

Table 1. Distributed Cluster Demo Architecture

Node NameMasterZooKeeperRegionServer

node-a.example.com

yes

yes

no

node-b.example.com

backup

yes

yes

node-c.example.com

no

yes

yes

This quickstart assumes that each node is a virtual machine and that
they are all on the same network. It builds upon the previous
quickstart,Pseudo-Distributed Local
Install
,
assuming that the system you configured in that procedure is nownode-a.
Stop HBase onnode-abefore continuing.

Be sure that all the nodes have full access to communicate, and that no
firewall rules are in place which could prevent them from talking to
each other. If you see any errors likeno route to host, check your
firewall.

Procedure: Configure Passwordless SSH Access

node-aneeds to be able to log intonode-bandnode-c(and to itself) in
order to start the daemons. The easiest way to accomplish this is to use
the same username on all hosts, and configure password-less SSH login
fromnode-ato each of the others.

Onnode-a, generate a key pair.

While logged in as the user who will run HBase, generate a SSH key pair,
using the following command:

$ ssh-keygen -t rsa

If the command succeeds, the location of the key pair is printed to
standard output. The default name of the public key isid_rsa.pub.

Create the directory that will hold the shared keys on the other nodes.

Onnode-bandnode-c, log in as the HBase user and create a.ssh/directory
in the user’s home directory, if it does not already exist. If it
already exists, be aware that it may already contain other keys.

Copy the public key to the other nodes.

Securely copy the public key fromnode-ato each of the nodes, by using
thescpor some other secure means. On each of the other nodes, create a
new file called.ssh/authorized_keysif it does not already exist, and
append the contents of theid_rsa.pubfile to the end of it. Note that
you also need to do this fornode-aitself.

$ cat id_rsa.pub >> ~/.ssh/authorized_keys

Test password-less login.

If you performed the procedure correctly, if you SSH fromnode-ato either
of the other nodes, using the same username, you should not be prompted
for a password.

Sincenode-bwill run a backup Master, repeat the procedure above,
substitutingnode-beverywhere you seenode-a. Be sure not to overwrite
your existing.ssh/authorized_keysfiles, but concatenate the new key
onto the existing file using the>>operator rather than
the>operator.

Procedure: Preparenode-a

node-awill run your primary master and ZooKeeper processes, but no
RegionServers. . Stop the RegionServer from starting onnode-a.

Editconf/regionserversand remove the line which containslocalhost. Add
lines with the hostnames or IP addresses fornode-bandnode-c.

Even if you did want to run a RegionServer onnode-a, you should refer to
it by the hostname the other servers would use to communicate with it.
In this case, that would benode-a.example.com. This enables you to
distribute the configuration to each node of your cluster any hostname
conflicts. Save the file.

Configure HBase to usenode-bas a backup master.

Create a new file inconf/calledbackup-masters, and add a new line to
it with the hostname fornode-b. In this demonstration, the hostname
isnode-b.example.com.

Configure ZooKeeper

In reality, you should carefully consider your ZooKeeper configuration.
You can find out more about configuring ZooKeeper
inzookeeper.
This configuration will direct HBase to start and manage a ZooKeeper
instance on each node of the cluster.

Onnode-a, editconf/hbase-site.xmland add the following properties.

hbase.zookeeper.quorumnode-a.example.com,node-b.example.com,node-c.example.comhbase.zookeeper.property.dataDir/usr/local/zookeeper

Everywhere in your configuration that you have referred
tonode-aaslocalhost, change the reference to point to the hostname that
the other nodes will use to refer tonode-a. In these examples, the
hostname isnode-a.example.com.

Procedure: Preparenode-bandnode-c

node-bwill run a backup master server and a ZooKeeper instance.

Download and unpack HBase.

Download and unpack HBase tonode-b, just as you did for the standalone
and pseudo-distributed quickstarts.

Copy the configuration files fromnode-atonode-b.andnode-c.

Each node of your cluster needs to have the same configuration
information. Copy the contents of theconf/directory to
theconf/directory onnode-bandnode-c.

Procedure: Start and Test Your Cluster

Be sure HBase is not running on any node.

If you forgot to stop HBase from previous testing, you will have errors.
Check to see whether HBase is running on any of your nodes by using
thejpscommand. Look for the processesHMaster,HRegionServer,
andHQuorumPeer. If they exist, kill them.

Start the cluster.

Onnode-a, issue thestart-hbase.shcommand. Your output will be similar to
that below.

$ bin/start-hbase.sh

node-c.example.com: starting zookeeper, logging to
/home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-c.example.com.out

node-a.example.com: starting zookeeper, logging to
/home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-a.example.com.out

node-b.example.com: starting zookeeper, logging to
/home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-b.example.com.out

starting master, logging to
/home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-node-a.example.com.out

node-c.example.com: starting regionserver, logging to
/home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-c.example.com.out

node-b.example.com: starting regionserver, logging to
/home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-b.example.com.out

node-b.example.com: starting master, logging to
/home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-nodeb.example.com.out

ZooKeeper starts first, followed by the master, then the RegionServers,
and finally the backup masters.

Verify that the processes are running.

On each node of the cluster, run thejpscommand and verify that the
correct processes are running on each server. You may see additional
Java processes running on your servers as well, if they are used for
other purposes.

Example 2.node-ajpsOutput

$ jps

20355 Jps

20071 HQuorumPeer

20137 HMaster

Example 3.node-bjpsOutput

$ jps

15930 HRegionServer

16194 Jps

15838 HQuorumPeer

16010 HMaster

Example 4.node-ajpsOutput

$ jps

13901 Jps

13639 HQuorumPeer

13737 HRegionServer

ZooKeeper Process Name

TheHQuorumPeerprocess is a ZooKeeper instance which is controlled and
started by HBase. If you use ZooKeeper this way, it is limited to one
instance per cluster node, , and is appropriate for testing only. If
ZooKeeper is run outside of HBase, the process is calledQuorumPeer. For
more about ZooKeeper configuration, including using an external
ZooKeeper instance with HBase,
seezookeeper.

Browse to the Web UI.

Web UI Port Changes

Web UI Port Changes

In HBase newer than 0.98.x, the HTTP ports used by the HBase Web UI
changed from 60010 for the Master and 60030 for each RegionServer to
16010 for the Master and 16030 for the RegionServer.

If everything is set up correctly, you should be able to connect to the
UI for the
Masterhttp://node-a.example.com:16010/or
the secondary master
athttp://node-b.example.com:16010/for
the secondary master, using a web browser. If you can connect
vialocalhostbut not from another host, check your firewall rules. You
can see the web UI for each of the RegionServers at port 16030 of their
IP addresses, or by clicking their links in the web UI for the Master.

Test what happens when nodes or services disappear.

With a three-node cluster like you have configured, things will not be
very resilient. Still, you can test what happens when the primary Master
or a RegionServer disappears, by killing the processes and watching the
logs.

发表评论

电子邮件地址不会被公开。 必填项已用*标注