Realtime Analytics – Storm 开源方案调研

如果你打开 Storm 官网,你会在原理介绍发现这句话:

MapReduce, Hadoop, and related technologies have made it possible to store and process data at scales previously unthinkable. Unfortunately, these data processing technologies are not realtime systems, nor are they meant to be. There’s no hack that will turn Hadoop into a realtime system; realtime data processing has a fundamentally different set of requirements than batch processing. Storm fills that hole.(MapReduce,Hadoop 和其大数据的相关技术可以使之前无法想象的大规模数据存储和数据处理成为可能。但是这些数据处理技术都不是实时计算,实时数据处理与批处理有着截然不同的要求。Storm 填补了这里的空白。)

其实除了 Storm 还有类似 Flink 这样的实时处理框架,那么这两者的区别究竟在哪儿呢?正好最近疫情在家隔离,也得空出来仔细研究下 Realtime Analytics 场景下两个比较有特点的实时数据处理框架。

Apache Storm是一个开源,可容错,可扩展的实时流处理计算系统。它是实时分布式数据处理的框架。它着重于事件处理或流处理。Storm实现了一种容错机制来执行计算或调度事件的多个计算。

Storm 核心架构

架构图如下所示:

Storm 核心概念及工作流程

简单概括下工作流程大概是:

1、客户端编写Storm应用程序,编译打包成Storm可识别的Topology图,上传到Nimbus主节点
2、Nimbus将Topology解析,把任务分发到各个Supervisor节点
3、Supervisor接收到任务后启动Executor,运行Task任务

Nimbus:主节点运行的守护进程,很类似于 Hadoop 的“JobTracker”。Nimbus 负责在集群内进行代码分发,主要工作是将任务分配给机器,并监控故障。

Supervisor:工作节点运行的守护进程,主要能力是监听分配给其机器的工作,并根据 Nimbus 分配给它的内容在必要时启动和停止工作进程。每个工作进程执行一个子集;正在运行的工作进程由分布在多个机器上的多个工作进程组成。

当然 Nimbus 和 Supervisor 之间的所有协调都是通过 Zookeeper 集群完成的。

 

Streams (流)

除了这些之外,Storm 还有另外一个核心概念是 Streams (流),大概的工作性质我在官网找了下,是这样的

流的一些情况如下:

Spout:Spout 是流的 Source(源端) ,它主要负责的是流的来源接入,E.g. 可以使用 Spout 来接收消息队列或者 API 信息来进行下一步计算。

Bolt:Bolt 是流的 Consumes(消费端),它主要负责具体流式计算部分,如需要多个步骤也可以级联运行。Bolt 可以做任何事情,比如运行函数、过滤元组、流聚合、流连接、与数据库对接等等。

 

 

Topology (拓扑)

Tuple:数据流,Topology 中传递的数据。

Spout及Bolt,Tuple 统一称为一个 Topology,每个 Topology 都将永远运行,直到手动杀死它。同时 Storm 会自动重新分配任何失败的任务,Storm 将保证不会丢失数据,即使机器宕机和消息丢失。

一个简单的 Topology 代码:

TopologyBuilder builder = new TopologyBuilder();        
builder.setSpout("words", new TestWordSpout(), 10);        
builder.setBolt("exclaim1", new ExclamationBolt(), 3)
        .shuffleGrouping("words");
builder.setBolt("exclaim2", new ExclamationBolt(), 2)
        .shuffleGrouping("exclaim1");

Topplogy 的运行流程我在网上找了一个图,说的还算是比较清楚,大概是这样的;

ACK 机制

Ack 是 Storm 中的消息容错机制,Ack 的使用和算法是 storm 最突出的亮点。
由 Spout 流出的每一个 tuple,直到最后 bolt 处理结束,都会被 ack 标记为成功或失败,失败数据可重新返回Spout进行处理,保障数据的可靠性,这就是 Storm 中 ack 机制。

storm 会专门启动若干 acker 线程,来追踪 tuple 的处理过程。可以使用 Config.TOPOLOGY_ACKERS 为拓扑配置中的拓扑设置acker任务的数量。Storm 默认 TOPOLOGY_ACKERS 为每个 worker 一个任务。

每一个 Tuple 在 Spout 中生成的时候,都会分配到一个64位的 messageId。通过对 messageId 进行哈希我们可以执行要对哪个 acker 线程发送消息来通知它监听这个 Tuple。

acker线程收到消息后,会将发出消息的 Spout 和那个 messageId 绑定起来。然后开始跟踪该 tuple 的处理流程。如果这个 tuple 全部都处理完,那么 acker 线程就会调用发起这个tuple 的那个 spout 实例的 ack()方法。如果超过一定时间这个 tuple 还没处理完,那么 acker 线程就会调用对应 spout 的 fail()方 法,通知 spout 消息处理失败。spout 组件就可以重新发送这个 tuple。

 

每个元组都知道它存在于它们的元组树中的所有 spout 元组的 id。当在 Bolt 中发起创建一个新元组时,来自元组锚点的 spout 元组 ID 将被复制到新元组中。当元组被确认时,它会向 acker 进程任务发送一条有关元组树被更改的信息,告诉 acker “在树中完成了创建了 spout 元组,这个 spout 元组归属于我”。

例如,如果元组“D”和“E”是基于元组“C”创建的,下面是当“C”被确认时元组树的变化:

由于在添加“D”和“E”的同时从树中删除“C”,因此树永远不会完成。
Ack 机制中:
1、在规定时间内,如果 spout 收到 ack 响应,则认为改tuple被成功处理
2、在规定时间内,spout 没有收到 ack 响应或收到 fail 响应,则认为该tuple处理失败
3、超过规定时间默认标记为失败,通过 Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS 设置超时时间,这里默认是30秒。

 

Ack 整体的异常重试策略是这样:

由于任务终止,元组没有被确认:在这种情况下,失败元组的树最开始的 spout 元组 ID 将超时并被重播。
Acker 任务失败:在这种情况下,acker 跟踪的所有 spout 元组都将超时并被重播。
Spout 任务失败:在这种情况下,spout 源负责重放消息。例如,当客户端断开连接时,像 Kestrel 和 RabbitMQ 这样的队列会将所有待处理的消息放回队列中。

 

ack 机制除了数据容错之外,还可以用作限流:
当 bolt 处理速度跟不上 spout 生产速度时,可以通过设置 pending 数量来进行限流,当 spout 有等于或超过 pending 数的 tuple 没有收到 ack 或 fail 响应时,跳过执行 nextTuple, 从而限制 spout 发送数据。
Config.TOPOLOGY_MAX_SPOUT_PENDING 可设置 pending 数量。

同样,如果需要数据可重放,可以通过将 acker bolt 的数量 Config.TOPOLOGY_ACKERS  设置为 0 来禁用容错。

由此可以看出,Storm 的可靠性机制是完全分布式的、可扩展的和容错的。这个也是 Storm 的核心亮点之一。

 

Storm 并行工作

 

概念如下:

一个 Worker 执行 Topplogy 的一个子集。一个工作进程属于一个特定的 Topplogy 结构,并且可以为这个 Topplogy 结构的一个或多个组件(spouts 或 bolts)运行一个或多个执行器。一个正在运行的 Topplogy 由在 Storm 集群中的多台机器上运行的许多此类进程组成。

一个 Executor 是由一个工作进程催生了一个线程。它可以为同一个组件(spout 或 bolt)运行一项或多项任务。

一个 Task 实际的数据处理——在代码中实现的每个 spout 或 bolt 都在集群中执行尽可能多的任务。在拓扑的整个生命周期中,组件的任务数量始终相同,但组件的执行器(线程)数量可能会随时间变化。E.g. #threads ≤ #tasks.  默认情况下,Task 数设置为与Executor 数相同,即 Storm 为每个线程运行一个任务。

 

由此可见,在Storm中,Worker不是组件执行的最小单位。Executor才是,Executor可以理解为是一个线程。我们在创建topology的时候,可以设置执行spout的线程数和bolt的线程数。

假设bluespout,greenbolt和yellowbolt的线程数加起来设置了10个,然后设置了2个worker,那么这10个线程可能就会被分配到2个worker中,代码及分配结果如下:

 

Config conf = new Config();
conf.setNumWorkers(2); // use two worker processes

topologyBuilder.setSpout("blue-spout", new BlueSpout(), 2); // set parallelism hint to 2

topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2)
               .setNumTasks(4)
               .shuffleGrouping("blue-spout");

topologyBuilder.setBolt("yellow-bolt", new YellowBolt(), 6)
               .shuffleGrouping("green-bolt");

StormSubmitter.submitTopology(
        "mytopology",
        conf,
        topologyBuilder.createTopology()
    );

 

 

总结

如果你想要的是一个允许增量计算的,无状态的高速事件处理系统,Storm会是最佳选择。它完善的 ACK 容错机制可最大限度的保证每个并发Worker执行。但 Storm 的安装和部署有些棘手。它仍然依赖 Zookeeper 集群来与状态,集群和 Task 信息进行协调。

 



529 thoughts on “Realtime Analytics – Storm 开源方案调研”

  • Appreciate you for sharing most of these wonderful articles. In addition, the best travel plus medical insurance program can often eliminate those worries that come with vacationing abroad. Some sort of medical emergency can rapidly become very expensive and that’s absolute to quickly put a financial weight on the family finances. Putting in place the great travel insurance package deal prior to setting off is well worth the time and effort. Thanks

  • Another thing I have really noticed is always that for many people, a bad credit score is the results of circumstances past their control. By way of example they may have been saddled through an illness so they have substantial bills for collections. It could be due to a employment loss or maybe the inability to go to work. Sometimes divorce proceedings can send the financial circumstances in an opposite direction. Thanks for sharing your thinking on this site.

  • Good post. I study something tougher on completely different blogs everyday. It would always be stimulating to learn content material from other writers and apply a bit of something from their store. I?d choose to use some with the content material on my weblog whether or not you don?t mind. Natually I?ll give you a link in your net blog. Thanks for sharing.

  • I acquired more a new challenge on this losing weight issue. A single issue is that good nutrition is extremely vital whenever dieting. A massive reduction in junk food, sugary foodstuff, fried foods, sugary foods, red meat, and white flour products could be necessary. Keeping wastes organisms, and poisons may prevent desired goals for losing belly fat. While selected drugs temporarily solve the issue, the nasty side effects will not be worth it, plus they never present more than a temporary solution. It is a known indisputable fact that 95 of dietary fads fail. Many thanks sharing your ideas on this site.

  • I have observed that in digital camera models, specialized receptors help to {focus|concentrate|maintain focus|target|a**** automatically. Those kind of sensors involving some video cameras change in in the area of contrast, while others work with a beam of infra-red (IR) light, specifically in low lumination. Higher specs cameras occasionally use a blend of both programs and could have Face Priority AF where the digital camera can ‘See’ your face while keeping focused only on that. Many thanks for sharing your thinking on this weblog.

  • Thanks for sharing superb informations. Your website is so cool. I am impressed by the details that you?ve on this web site. It reveals how nicely you understand this subject. Bookmarked this website page, will come back for extra articles. You, my friend, ROCK! I found just the info I already searched all over the place and simply couldn’t come across. What an ideal site.

  • Thanks for the different tips shared on this web site. I have noticed that many insurance agencies offer prospects generous discounts if they favor to insure several cars together. A significant quantity of households currently have several autos these days, particularly people with more mature teenage youngsters still dwelling at home, as well as the savings with policies might soon mount up. So it will pay to look for a bargain.

  • A further issue is that video games are normally serious in nature with the key focus on finding out rather than leisure. Although, it comes with an entertainment factor to keep children engaged, every game is generally designed to work towards a specific group of skills or curriculum, such as numbers or scientific research. Thanks for your posting.

  • Throughout the great design of things you’ll secure a B+ with regard to effort. Exactly where you actually confused me was in the details. You know, they say, details make or break the argument.. And that could not be more true here. Having said that, let me tell you just what exactly did give good results. Your authoring is pretty powerful and that is probably why I am making the effort in order to comment. I do not really make it a regular habit of doing that. 2nd, despite the fact that I can easily see a jumps in reasoning you make, I am definitely not convinced of exactly how you appear to connect the details which inturn produce the final result. For now I will subscribe to your point however wish in the future you actually connect the dots better.

  • Thanks for the suggestions you talk about through your blog. In addition, lots of young women which become pregnant will not even attempt to get health care insurance because they have anxiety they wouldn’t qualify. Although many states right now require that insurers present coverage despite the pre-existing conditions. Charges on these kind of guaranteed options are usually greater, but when with the high cost of medical care bills it may be your safer approach to take to protect one’s financial future.

  • One thing I’ve noticed is that there are plenty of misconceptions regarding the financial institutions intentions when talking about foreclosures. One misconception in particular is the fact the bank wants your house. The lending company wants your hard earned cash, not the house. They want the bucks they loaned you along with interest. Averting the bank will still only draw a new foreclosed final result. Thanks for your posting.

  • I like the helpful information you provide in your articles. I?ll bookmark your weblog and check again here frequently. I’m quite certain I will learn lots of new stuff right here! Best of luck for the next!

  • I don?t even know how I ended up here, but I thought this post was good. I don’t know who you are but definitely you’re going to a famous blogger if you aren’t already 😉 Cheers!

  • I think this is one of the most vital information for me. And i’m happy studying your article. However want to commentary on few general issues, The site taste is perfect, the articles is truly excellent : D. Just right job, cheers

  • Can I just say what a aid to find someone who actually is aware of what theyre speaking about on the internet. You positively know the way to carry a problem to gentle and make it important. Extra people have to read this and perceive this aspect of the story. I cant believe youre not more fashionable because you undoubtedly have the gift.

  • I have realized that of all types of insurance, medical health insurance is the most controversial because of the struggle between the insurance policies company’s necessity to remain making money and the user’s need to have insurance policy. Insurance companies’ commissions on well being plans are certainly low, so some companies struggle to make a profit. Thanks for the tips you talk about through this blog.

  • Thank you for sharing superb informations. Your web-site is very cool. I am impressed by the details that you?ve on this blog. It reveals how nicely you understand this subject. Bookmarked this website page, will come back for extra articles. You, my pal, ROCK! I found simply the info I already searched everywhere and just couldn’t come across. What an ideal web site.

  • I have really learned newer and more effective things from your blog post. Also a thing to I have noticed is that generally, FSBO sellers can reject an individual. Remember, they will prefer to not use your providers. But if an individual maintain a gentle, professional partnership, offering aid and being in contact for about four to five weeks, you will usually manage to win interviews. From there, a house listing follows. Thanks

  • I loved as much as you’ll receive carried out right here. The sketch is tasteful, your authored material stylish. nonetheless, you command get bought an shakiness over that you wish be delivering the following. unwell unquestionably come more formerly again since exactly the same nearly a lot often inside case you shield this increase.

  • In this grand design of things you actually get an A with regard to effort and hard work. Exactly where you confused us ended up being in the particulars. As as the maxim goes, details make or break the argument.. And that couldn’t be more true at this point. Having said that, allow me inform you just what exactly did work. Your authoring is quite persuasive which is possibly the reason why I am making the effort to comment. I do not really make it a regular habit of doing that. Next, although I can easily see a leaps in logic you come up with, I am not necessarily certain of just how you seem to unite the details which inturn produce the final result. For right now I will subscribe to your position however hope in the foreseeable future you actually link your dots much better.

  • A formidable share, I simply given this onto a colleague who was doing a little bit evaluation on this. And he in fact purchased me breakfast as a result of I discovered it for him.. smile. So let me reword that: Thnx for the treat! However yeah Thnkx for spending the time to discuss this, I feel strongly about it and love reading extra on this topic. If attainable, as you develop into experience, would you mind updating your blog with more details? It’s highly useful for me. Big thumb up for this blog publish!

  • I’m impressed, I have to admit. Rarely do I come across a blog that’s equally educative and engaging, and without a doubt, you’ve hit the nail on the head. The issue is something which too few people are speaking intelligently about. I am very happy that I came across this in my search for something relating to this.

  • Thanks for expressing your ideas. I might also like to mention that video games have been ever before evolving. Modern technology and improvements have made it easier to create practical and fun games. All these entertainment video games were not that sensible when the real concept was being tried. Just like other designs of technologies, video games way too have had to evolve through many generations. This itself is testimony towards fast growth of video games.

  • This is really interesting, You are a very skilled blogger. I’ve joined your rss feed and look forward to seeking more of your excellent post. Also, I have shared your site in my social networks!

  • Thank you for sharing excellent informations. Your website is very cool. I am impressed by the details that you?ve on this website. It reveals how nicely you perceive this subject. Bookmarked this website page, will come back for extra articles. You, my friend, ROCK! I found simply the info I already searched all over the place and simply couldn’t come across. What a great site.

  • Good blog post. The things i would like to add is that laptop memory must be purchased but if your computer can no longer cope with anything you do along with it. One can mount two random access memory boards having 1GB each, for example, but not certainly one of 1GB and one of 2GB. One should make sure the car maker’s documentation for the PC to be sure what type of memory space is necessary.

  • I’m typically to running a blog and i really admire your content. The article has really peaks my interest. I’m going to bookmark your website and keep checking for brand spanking new information.

  • An additional issue is that video games are normally serious in nature with the main focus on learning rather than leisure. Although, there is an entertainment element to keep your young ones engaged, every single game will likely be designed to develop a specific experience or area, such as mathmatical or scientific research. Thanks for your publication.

发表评论

邮箱地址不会被公开。 必填项已用*标注

39 ÷ = 13