Spring Cloud官方文档粤语版【澳门美高梅手机网站】-服务意识:Eureka服务端

Hadoop实战实例

官文档地址为:http://cloud.spring.io/spring-cloud-static/Dalston.SR3/#spring-cloud-eureka-server

 

文中例子我进行了有测试于:http://git.oschina.net/dreamingodd/spring-cloud-preparation

    
Hadoop 是Google MapReduce的一个Java实现。MapReduce是同样种植简化的分布式编程形式,让程序自动分布及一个由于普通机器组成的超大集群达冒出执行。就似乎java程序员可以无考虑内存泄露一样, MapReduce的run-time系统会缓解输入数据的分布细节,跨越机器集群的程序执行调度,处理机器的失灵,并且管理机器里的简报请求。这样的情势允许程序员可以不待来啊并发处理或者分布式系统的更,就好拍卖超大的分布式系统得资源。

 

一、概论

    作为Hadoop程序员,他而举办的业务虽是:
    1、定义Mapper,处理输入的Key-Value对,输出中间结果。
    2、定义Reducer,可选,对中结果举行规,输出最终结出。
    3、定义InputFormat 和OutputFormat,可选拔,InputFormat将每行输入文件的情转换为Java类供Mapper函数使用,不定义时默认为String。
    4、定义main函数,在其中定义一个Job并运行它们。
    

    然后的事情就是付出系统了。
    1.基本概念:Hadoop的HDFS实现了google的GFS文件系统,NameNode作为文件系统的负担调度运行在master,DataNode运行在每个机器及。同时Hadoop实现了Google的MapReduce,JobTracker作为MapReduce的总调度运行在master,TaskTracker则运行于每个机器及推行Task。

    2.main()函数,制造JobConf,定义Mapper,Reducer,Input/OutputFormat 和输入输出文件目录,最终把Job提交给JobTracker,等待Job截至。

    3.JobTracker,创制一个InputFormat的实例,调用它的getSplits()方法,把输入目录的公文拆分成FileSplist作为Mapper task 的输入,生成Mapper task参加Queue。

    4.TaskTracker 朝 JobTracker索求下一个Map/Reduce。
      
     Mapper Task先由InputFormat制造RecordReader,循环读入FileSplits的情很成Key与Value,传于Mapper函数,处理了晚中结果写成SequenceFile.
     Reducer Task 从运行Mapper的TaskTracker的Jetty上以http协议拿到所欲的中级内容(33%),Sort/Merge后(66%),执行Reducer函数,最终以OutputFormat写副结果目录。 

      TaskTracker 每10秒为JobTracker报告同不好运行状态,每完成一个Task10秒后,就谋面为JobTracker索求下一个Task。

      Nutch项目的全体数额处理还构建以Hadoop之上,详见Scalable Computing with Hadoop

瑟维斯(Service)(Service) Discovery: Eureka Server 服务意识:Eureka服务端

二、程序员编写的代码

How to Include Eureka Server 如何成立Eureka服务端

To include Eureka Server in your project use the starter with group
org.springframework.cloud and artifact id
spring-cloud-starter-eureka-server. See the Spring Cloud Project page
for details on setting up your build system with the current Spring
Cloud Release Train.

动Eureka服务要引入org.springframework.cloud的spring-cloud-starter-eureka-server项目。可以参考http://projects.spring.io/spring-cloud/来创建你的第一个Eureka服务。

 (可以查阅hadoop-examples-0.20.203.0.jar,里面为发一个近乎grep)

    大家举办一个简练的分布式的Grep,简单对输入文件举行逐行的正则匹配,假若符合就拿该行打印及输出文件。因为是简单的上上下下输出,所以咱们就算写Mapper函数,不用写Reducer函数,也非用定义Input/Output Format。

  1. package  demo.hadoop  
  2. public   class  HadoopGrep {  
  3.      public   static   class  RegMapper  extends  MapReduceBase  implements  Mapper {  
  4.                private  Pattern pattern;  
  5.                public   void  configure(JobConf job) {  
  6.                          pattern  =  Pattern.compile(job.get( ” mapred.mapper.regex ” ));  
  7.               }  
  8.               public   void  map(WritableComparable key, Writable value, OutputCollector output, Reporter reporter)  
  9.                      throws  IOException {  
  10.                          String text  =  ((Text) value).toString();  
  11.                          Matcher matcher  =  pattern.matcher(text);  
  12.                        if  (matcher.find()) {  
  13.                         output.collect(key, value);  
  14.              }  
  15.     }  
  16.  }  
  17.   
  18.   private  HadoopGrep () {  
  19.  }  //  singleton   
  20.   
  21. public   static   void  main(String[] args)  throws  Exception {  
  22.   JobConf grepJob  =   new  JobConf(HadoopGrep. class );  
  23.   grepJob.setJobName( ” grep-search ” );  
  24.   grepJob.set( ” mapred.mapper.regex ” , args[ 2 ]);  
  25.   
  26.   grepJob.setInputPath( new  Path(args[ 0 ]));  
  27.   grepJob.setOutputPath( new  Path(args[ 1 ]));  
  28.   grepJob.setMapperClass(RegMapper. class );  
  29.   grepJob.setReducerClass(IdentityReducer. class );  
  30.   JobClient.runJob(grepJob);  
  31.  }  
  32. }  

         
RegMapper类的configure()函数接受由main函数传入的索字符串,map() 函数举行正则分外,key是行数,value是文件实施的内容,符合的文书行放入中间结果。
        main()函数定义由命执行参数传入的输入输出目录和匹配字符串,Mapper函数为RegMapper类,Reduce函数是什么都未举办,直接把中结果输出到结尾结果的底IdentityReducer类,运行Job。

任何代码十分简单,丝毫无分布式编程的旁细节。

How to Run a Eureka Server 咋样运行Eureka服务端

@SpringBootApplication
@EnableEurekaServer
public class Application {
    public static void main(String[] args) {
        new SpringApplicationBuilder(Application.class).web(true).run(args);
    }
}

The server has a home page with a UI, and HTTP API endpoints per the
normal Eureka functionality under /eureka/*.

Eureka服务端有一个默认的UI主页,每个Eureka服务端你还来一个HTTP
API节点在/eureka/*

Eureka background reading: see flux capacitor and google group
discussion.

相思了然再多Eureka背景知识,推荐阅读 https://github.com/cfregly/fluxcapacitor/wiki/NetflixOSS-FAQ#eureka-service-discovery-load-balancerhttps://groups.google.com/forum/?fromgroups#!topic/eureka_netflix/g3p2r7gHnN0

TIP Due to Gradle’s dependency resolution rules and the lack of a
parent bom feature, simply depending on
spring-cloud-starter-eureka-server can cause failures on application
startup. To remedy this the Spring Boot Gradle plugin must be added and
the Spring cloud starter parent bom must be imported like so:

gradle引入:

build.gradle

buildscript {
  dependencies {
    classpath("org.springframework.boot:spring-boot-gradle-plugin:1.3.5.RELEASE")
  }
}
apply plugin: "spring-boot"
dependencyManagement {
  imports {
    mavenBom "org.springframework.cloud:spring-cloud-dependencies:Brixton.RELEASE"
  }
}

三.运行Hadoop程序

        Hadoop这点的文档写得不完美,综合参考GettingStartedWithHadoop 与Nutch Hadoop Tutorial 两首后,再点了成千上万钉子才算是完全的飞起了,记录如下:     
3.1 local运行格局
       完全不举办其他分布式总结,不动其他namenode,datanode的做法,适合一始开调试代码。
       解压hadoop,其中conf目录是布置目录,hadoop的配置文件在hadoop-default.xml,假如要改配置,不是直接修改该文件,而是修改hadoop-site.xml,将拖欠属性在hadoop-site.xml里再赋值。
       hadoop-default.xml的默认配置都是local运行,不用任何改动,配置目录里唯一要修改的凡hadoop-env.sh 里JAVA_HOME的位置。
       将编译好的HadoopGrep与RegMapper.class 放入hadoop/build/classes/demo/hadoop/目录 

       
或者编译成jar包HadoopGrep.jar放入hadoop/build/classes/demo/hadoop/目录

        找一个较充裕之xx.log文件放,然后运行

        bin/hadoop demo.hadoop.HadoopGrep  input   /tmp/out  “[a-b]”
        (jar包运行:bin/hadoop jar HadoopGrep.jar  HadoopGrep  input  
/tmp/output  “[a-b]” )
        说明:
         input  为xx.log文件所在目录 
         /tmp/output为出口目录 
         “[a-b]”   grep的字符串 

        查看输出目录的结果,查看hadoop/logs/里的运行日志。  
        在还运行前,先删掉输出目录。
  

  3.2 集群运行格局

   
(查看集群配置:http://blog.csdn.net/hguisu/article/details/7237395)

      1 )执行bin/hadoop dfs 可以视她所支撑之文书操作指令。   

      2) 创建目录输入inpu:   
           $ bin/hadoop dfs -mkdir input    

      3)上污染文件xx.log到指定目录 input :   
           $ bin/hadoop dfs -put xx.log input

       4 )  执行 bin/hadoop demo.hadoop.HadoopGrep input  output
             (jar包运行:bin/hadoop jar HadoopGrep.jar  HadoopGrep 
input   /tmp/output  “[a-b]” )

       5 ) 查看输出文件:

 

           将出口文件由分布式文件系统拷贝到本地文件系统查看:
            $ bin/hadoop fs -get output output
            $ cat output/*

            或者
            在分布式文件系统上查看输出文件:
            $ bin/hadoop fs -cat output/*

            重新履行前,运行hadoop/bin/hadoop dfs -rm
output删除output目录

       7.运行hadoop/bin/stop-all.sh 结束。
    

High Availability, Zones and Regions 高可用,地区以及所在

The Eureka server does not have a backend store, but the service
instances in the registry all have to send heartbeats to keep their
registrations up to date (so this can be done in memory). Clients also
have an in-memory cache of eureka registrations (so they don’t have to
go to the registry for every single request to a service).

Eureka服务器并无在后端存储,但注册机中之劳动实例都得用心跳音讯保持他们的新星的报状态(也就是说可以以内存完成)。客户端同样为生一致客注册音信缓存是内存里。

By default every Eureka server is also a Eureka client and requires (at
least one) service URL to locate a peer. If you don’t provide it the
service will run and work, but it will shower your logs with a lot of
noise about not being able to register with the peer.
默认境况下,每一个Eureka服务器也时时一个Eureka客户端,同样用(至少一个)service
URL来定位节点。固然就开发人员不提供如故可以走,可是会大方起有无法登记的垃圾log。

See also below for details of Ribbon support on the client side for
Zones and Regions.

四、效率

    经测试,Hadoop并无是万用灵丹,很在文件之分寸和数量,处理的复杂度以及群集机器的数,相连的牵动富,当以上四者并无异常时,hadoop优势并无肯定。
    比如,不用hadoop用java写的简单grep函数处理100M的log文件要4秒,用了hadoop local的主意运行是14秒,用了hadoop单机集群的章程是30秒,用双机集群10M网口的讲话更慢,慢到不好意思说出来的境地。

Standalone Mode 单机格局

The combination of the two caches (client and server) and the heartbeats
make a standalone Eureka server fairly resilient to failure, as long as
there is some sort of monitor or elastic runtime keeping it alive (e.g.
Cloud Foundry). In standalone mode, you might prefer to switch off the
client side behaviour, so it doesn’t keep trying and failing to reach
its peers. Example:

要是是某种监控或弹性运行时刻来假诺劳动存活(如Cloud
Foundry),七只缓存和心跳协议的组合就会让单机的Eureka服务器对故障保有分外之弹性。单机形式下,开发人士可能又欣赏关闭服务器的客户端表现,这样服务端就无须平素尝试失利地走访节点。

application.yml (Standalone Eureka Server)

server:
  port: 8761
eureka:
  instance:
    hostname: localhost
  client:
    registerWithEureka: false
    fetchRegistry: false
    serviceUrl:
      defaultZone: http://${eureka.instance.hostname}:${server.port}/eureka/

Notice that the serviceUrl is pointing to the same host as the local
instance.

瞩目serviceUrl跟当地实例是平的。

Peer Awareness 节点感知(高可用)

Eureka can be made even more resilient and available by running multiple
instances and asking them to register with each other. In fact, this is
the default behaviour, so all you need to do to make it work is add a
valid serviceUrl to a peer, e.g.

Eureka服务器通过跑多独实例并求登记相互,可以变换得又富有弹性。实际上,这是Eureka的默认使用形式,那么您用为节点出席使得的serviceUrl使其健康办事。例如:

application.yml (Two Peer Aware Eureka Servers)

---
spring:
  profiles: peer1
eureka:
  instance:
    hostname: peer1
  client:
    serviceUrl:
      defaultZone: http://peer2/eureka/
---
spring:
  profiles: peer2
eureka:
  instance:
    hostname: peer2
  client:
    serviceUrl:
      defaultZone: http://peer1/eureka/

In this example we have a YAML file that can be used to run the same
server on 2 hosts (peer1 and peer2), by running it in different Spring
profiles. You could use this configuration to test the peer awareness on
a single host (there’s not much value in doing that in production) by
manipulating /etc/hosts to resolve the host names. In fact, the
eureka.instance.hostname is not needed if you are running on a machine
that knows its own hostname (it is looked up using java.net.InetAddress
by default).

透过运行不同的Spring
profile,本例中的YAML配置文件是于不同主机及之一模一样服务器(节点1及节点2)。开发人士通过操纵/etc/hosts伪造127.0.0.1的主机名,可以以一如既往主机上测试(这样做生产条件上一贯不价值)。实际上,eureka.instance.hostname这一个布局起在生育环境没有用(PC会查询java.net.InetAddres获取到)。

You can add multiple peers to a system, and as long as they are all
connected to each other by at least one edge, they will synchronize the
registrations amongst themselves. If the peers are physically separated
(inside a data centre or between multiple data centres) then the system
can in principle survive split-brain type failures.

使节点内互相连接,开发人员可以呢系统添加五只节点,它们中间会联合注册音信。假诺节点是大体分离之(在一个要两个数据核心),那么网规范达成得以无视split-brain型故障而运作。

Prefer IP Address 使用IP

In some cases, it is preferable for Eureka to advertise the IP Adresses
of services rather than the hostname. Set
eureka.instance.preferIpAddress to true and when the application
registers with eureka, it will use its IP Address rather than its
hostname.

一些意况下,使用IP比hostname好。eureka.instance.preferIpAddress=true可以好就或多或少。

 

dreamingodd原创作品,如转载请注解出处。

发表评论

电子邮件地址不会被公开。 必填项已用*标注