如果你打开 Storm 官网,你会在原理介绍发现这句话:
MapReduce, Hadoop, and related technologies have made it possible to store and process data at scales previously unthinkable. Unfortunately, these data processing technologies are not realtime systems, nor are they meant to be. There’s no hack that will turn Hadoop into a realtime system; realtime data processing has a fundamentally different set of requirements than batch processing. Storm fills that hole.(MapReduce,Hadoop 和其大数据的相关技术可以使之前无法想象的大规模数据存储和数据处理成为可能。但是这些数据处理技术都不是实时计算,实时数据处理与批处理有着截然不同的要求。Storm 填补了这里的空白。)
其实除了 Storm 还有类似 Flink 这样的实时处理框架,那么这两者的区别究竟在哪儿呢?正好最近疫情在家隔离,也得空出来仔细研究下 Realtime Analytics 场景下两个比较有特点的实时数据处理框架。
Apache Storm是一个开源,可容错,可扩展的实时流处理计算系统。它是实时分布式数据处理的框架。它着重于事件处理或流处理。Storm实现了一种容错机制来执行计算或调度事件的多个计算。
Storm 核心架构
架构图如下所示:
Storm 核心概念及工作流程
简单概括下工作流程大概是:
1、客户端编写Storm应用程序,编译打包成Storm可识别的Topology图,上传到Nimbus主节点
2、Nimbus将Topology解析,把任务分发到各个Supervisor节点
3、Supervisor接收到任务后启动Executor,运行Task任务
Nimbus:主节点运行的守护进程,很类似于 Hadoop 的“JobTracker”。Nimbus 负责在集群内进行代码分发,主要工作是将任务分配给机器,并监控故障。
Supervisor:工作节点运行的守护进程,主要能力是监听分配给其机器的工作,并根据 Nimbus 分配给它的内容在必要时启动和停止工作进程。每个工作进程执行一个子集;正在运行的工作进程由分布在多个机器上的多个工作进程组成。
当然 Nimbus 和 Supervisor 之间的所有协调都是通过 Zookeeper 集群完成的。
Streams (流)
除了这些之外,Storm 还有另外一个核心概念是 Streams (流),大概的工作性质我在官网找了下,是这样的
流的一些情况如下:
Spout:Spout 是流的 Source(源端) ,它主要负责的是流的来源接入,E.g. 可以使用 Spout 来接收消息队列或者 API 信息来进行下一步计算。
Bolt:Bolt 是流的 Consumes(消费端),它主要负责具体流式计算部分,如需要多个步骤也可以级联运行。Bolt 可以做任何事情,比如运行函数、过滤元组、流聚合、流连接、与数据库对接等等。
Topology (拓扑)
Tuple:数据流,Topology 中传递的数据。
Spout及Bolt,Tuple 统一称为一个 Topology,每个 Topology 都将永远运行,直到手动杀死它。同时 Storm 会自动重新分配任何失败的任务,Storm 将保证不会丢失数据,即使机器宕机和消息丢失。
一个简单的 Topology 代码:
TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("words", new TestWordSpout(), 10); builder.setBolt("exclaim1", new ExclamationBolt(), 3) .shuffleGrouping("words"); builder.setBolt("exclaim2", new ExclamationBolt(), 2) .shuffleGrouping("exclaim1");
Topplogy 的运行流程我在网上找了一个图,说的还算是比较清楚,大概是这样的;
ACK 机制
Ack 是 Storm 中的消息容错机制,Ack 的使用和算法是 storm 最突出的亮点。
由 Spout 流出的每一个 tuple,直到最后 bolt 处理结束,都会被 ack 标记为成功或失败,失败数据可重新返回Spout进行处理,保障数据的可靠性,这就是 Storm 中 ack 机制。
storm 会专门启动若干 acker 线程,来追踪 tuple 的处理过程。可以使用 Config.TOPOLOGY_ACKERS 为拓扑配置中的拓扑设置acker任务的数量。Storm 默认 TOPOLOGY_ACKERS 为每个 worker 一个任务。
每一个 Tuple 在 Spout 中生成的时候,都会分配到一个64位的 messageId。通过对 messageId 进行哈希我们可以执行要对哪个 acker 线程发送消息来通知它监听这个 Tuple。
acker线程收到消息后,会将发出消息的 Spout 和那个 messageId 绑定起来。然后开始跟踪该 tuple 的处理流程。如果这个 tuple 全部都处理完,那么 acker 线程就会调用发起这个tuple 的那个 spout 实例的 ack()方法。如果超过一定时间这个 tuple 还没处理完,那么 acker 线程就会调用对应 spout 的 fail()方 法,通知 spout 消息处理失败。spout 组件就可以重新发送这个 tuple。
每个元组都知道它存在于它们的元组树中的所有 spout 元组的 id。当在 Bolt 中发起创建一个新元组时,来自元组锚点的 spout 元组 ID 将被复制到新元组中。当元组被确认时,它会向 acker 进程任务发送一条有关元组树被更改的信息,告诉 acker “在树中完成了创建了 spout 元组,这个 spout 元组归属于我”。
例如,如果元组“D”和“E”是基于元组“C”创建的,下面是当“C”被确认时元组树的变化:
由于在添加“D”和“E”的同时从树中删除“C”,因此树永远不会完成。
Ack 机制中:
1、在规定时间内,如果 spout 收到 ack 响应,则认为改tuple被成功处理
2、在规定时间内,spout 没有收到 ack 响应或收到 fail 响应,则认为该tuple处理失败
3、超过规定时间默认标记为失败,通过 Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS 设置超时时间,这里默认是30秒。
Ack 整体的异常重试策略是这样:
由于任务终止,元组没有被确认:在这种情况下,失败元组的树最开始的 spout 元组 ID 将超时并被重播。
Acker 任务失败:在这种情况下,acker 跟踪的所有 spout 元组都将超时并被重播。
Spout 任务失败:在这种情况下,spout 源负责重放消息。例如,当客户端断开连接时,像 Kestrel 和 RabbitMQ 这样的队列会将所有待处理的消息放回队列中。
ack 机制除了数据容错之外,还可以用作限流:
当 bolt 处理速度跟不上 spout 生产速度时,可以通过设置 pending 数量来进行限流,当 spout 有等于或超过 pending 数的 tuple 没有收到 ack 或 fail 响应时,跳过执行 nextTuple, 从而限制 spout 发送数据。
Config.TOPOLOGY_MAX_SPOUT_PENDING 可设置 pending 数量。
同样,如果需要数据可重放,可以通过将 acker bolt 的数量 Config.TOPOLOGY_ACKERS 设置为 0 来禁用容错。
由此可以看出,Storm 的可靠性机制是完全分布式的、可扩展的和容错的。这个也是 Storm 的核心亮点之一。
Storm 并行工作
概念如下:
一个 Worker 执行 Topplogy 的一个子集。一个工作进程属于一个特定的 Topplogy 结构,并且可以为这个 Topplogy 结构的一个或多个组件(spouts 或 bolts)运行一个或多个执行器。一个正在运行的 Topplogy 由在 Storm 集群中的多台机器上运行的许多此类进程组成。
一个 Executor 是由一个工作进程催生了一个线程。它可以为同一个组件(spout 或 bolt)运行一项或多项任务。
一个 Task 实际的数据处理——在代码中实现的每个 spout 或 bolt 都在集群中执行尽可能多的任务。在拓扑的整个生命周期中,组件的任务数量始终相同,但组件的执行器(线程)数量可能会随时间变化。E.g. #threads ≤ #tasks. 默认情况下,Task 数设置为与Executor 数相同,即 Storm 为每个线程运行一个任务。
由此可见,在Storm中,Worker不是组件执行的最小单位。Executor才是,Executor可以理解为是一个线程。我们在创建topology的时候,可以设置执行spout的线程数和bolt的线程数。
假设bluespout,greenbolt和yellowbolt的线程数加起来设置了10个,然后设置了2个worker,那么这10个线程可能就会被分配到2个worker中,代码及分配结果如下:
Config conf = new Config(); conf.setNumWorkers(2); // use two worker processes topologyBuilder.setSpout("blue-spout", new BlueSpout(), 2); // set parallelism hint to 2 topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2) .setNumTasks(4) .shuffleGrouping("blue-spout"); topologyBuilder.setBolt("yellow-bolt", new YellowBolt(), 6) .shuffleGrouping("green-bolt"); StormSubmitter.submitTopology( "mytopology", conf, topologyBuilder.createTopology() );
总结
如果你想要的是一个允许增量计算的,无状态的高速事件处理系统,Storm会是最佳选择。它完善的 ACK 容错机制可最大限度的保证每个并发Worker执行。但 Storm 的安装和部署有些棘手。它仍然依赖 Zookeeper 集群来与状态,集群和 Task 信息进行协调。
Howdy! Someone in my Myspace group shared this site with us so I came to give it a look. I’m definitely enjoying the information. I’m bookmarking and will be tweeting this to my followers! Outstanding blog and great design.
Thanks for the useful information on credit repair on your web-site. Some tips i would tell people would be to give up the mentality that they can buy today and shell out later. Like a society we all tend to do that for many things. This includes vacations, furniture, as well as items we’d like. However, you should separate your current wants from all the needs. If you are working to improve your credit score you really have to make some trade-offs. For example you may shop online to save money or you can check out second hand merchants instead of costly department stores intended for clothing.
Throughout the awesome pattern of things you’ll receive a B+ just for effort. Where you misplaced me personally was first on the particulars. You know, it is said, details make or break the argument.. And that could not be more true right here. Having said that, let me reveal to you precisely what did work. The article (parts of it) is definitely highly engaging and that is possibly the reason why I am making an effort in order to comment. I do not make it a regular habit of doing that. Second, whilst I can easily notice a jumps in reason you make, I am not really certain of just how you appear to connect the details that produce the conclusion. For now I will yield to your position but hope in the future you connect the facts much better.
Thanks for any other informative website. Where else may just I get that kind of info written in such a perfect means? I’ve a undertaking that I’m simply now operating on, and I’ve been at the glance out for such info.
obviously like your website however you have to test the spelling on quite a few of your posts. A number of them are rife with spelling issues and I in finding it very bothersome to tell the reality nevertheless I?ll definitely come back again.
I have noticed that online diploma is getting preferred because attaining your degree online has developed into popular option for many people. A lot of people have definitely not had a chance to attend a regular college or university however seek the increased earning possibilities and career advancement that a Bachelor’s Degree provides. Still others might have a diploma in one discipline but would like to pursue a thing they now develop an interest in.
Good ? I should certainly pronounce, impressed with your web site. I had no trouble navigating through all the tabs and related information ended up being truly easy to do to access. I recently found what I hoped for before you know it at all. Quite unusual. Is likely to appreciate it for those who add forums or something, web site theme . a tones way for your customer to communicate. Excellent task..
Someone necessarily lend a hand to make critically posts I might state. This is the very first time I frequented your website page and so far? I surprised with the research you made to make this actual submit amazing. Great task!
Hi, Neat post. There is a problem with your web site in internet explorer, would check this? IE still is the market leader and a good portion of people will miss your wonderful writing because of this problem.
I will right away grab your rss feed as I can not find your e-mail subscription link or newsletter service. Do you’ve any? Kindly let me know so that I could subscribe. Thanks.
Thanks for writing this article. I enjoy the topic too.
You helped me a lot by posting this article and I love what I’m learning.
You’ve been very helpful to me. Thank you!
Your articles are incredibly helpful to me. Thank you! May I request more information?
Please tell me more about your excellent articles
naturally like your website but you need to check the spelling on several of your posts. A number of them are rife with spelling problems and I find it very bothersome to tell the truth nevertheless I’ll certainly come back again.
Thanks for your post, it helped me a lot. It helped me in my situation and hopefully it can help others too.
Thanks for posting such an excellent article. It helped me a lot and I love the subject matter.
Thank you for your excellent articles. Would you be able to help me out?
I’d have to examine with you here. Which is not one thing I usually do! I take pleasure in reading a post that may make folks think. Additionally, thanks for permitting me to comment!
May I request that you elaborate on that? Your posts have been extremely helpful to me. Thank you!
Thank you for writing such an excellent article. It helped me a lot and I love the topic.
Thank you for posting this post. I found it extremely helpful because it explained what I was trying to say. I hope it can help others as well.
Your articles are extremely beneficial to me. May I request more information?
There is no doubt that your post was a big help to me. I really enjoyed reading it.
okmark your blog and check again here regularly. I’m quite certain I will learn many new stuff right here! Good luck for the next!
The articles you write help me a lot and I like the topic
Your articles are incredibly helpful to me. Thank you! May I request more information?
That’s what i mean when i say that content is the king!
Thank you for writing about this topic. It helped me a lot and I hope it can help others too.
Dude these articles are great. They helped me a lot.
Thank you for writing this article!
Thanks for your help and for posting this article. It’s been great.
Thanks for your advice on this blog. A single thing I want to say is always that purchasing electronic products items from the Internet is certainly not new. The truth is, in the past few years alone, the marketplace for online gadgets has grown significantly. Today, you can get practically any specific electronic unit and product on the Internet, which include cameras and also camcorders to computer pieces and gaming consoles.
great post, very informative. I’m wondering why the other specialists of this sector don’t understand this. You must continue your writing. I am confident, you’ve a great readers’ base already!
Thanks for giving your ideas with this blog. As well, a fantasy regarding the banking institutions intentions while talking about foreclosures is that the traditional bank will not take my payments. There is a certain amount of time which the bank can take payments from time to time. If you are way too deep inside hole, they’re going to commonly desire that you pay that payment in full. However, that doesn’t mean that they will not take any sort of payments at all. If you and the loan company can manage to work one thing out, this foreclosure approach may halt. However, if you ever continue to pass up payments underneath the new plan, the foreclosure process can pick up exactly where it was left off.
I must say you’ve been a big help to me. Thanks!
I really appreciate your help
Thank you for writing the article. I like the topic too.
You can definitely see your skills in the work you write. The world hopes for even more passionate writers like you who are not afraid to say how they believe. Always follow your heart.
It?s really a great and useful piece of info. I?m glad that you shared this useful info with us. Please keep us up to date like this. Thanks for sharing.
Thanks for your posting. What I want to comment on is that when evaluating a good internet electronics go shopping, look for a site with comprehensive information on critical indicators such as the security statement, safety measures details, payment methods, along with terms along with policies. Generally take time to look at help plus FAQ segments to get a better idea of what sort of shop functions, what they are capable of doing for you, and exactly how you can make best use of the features.
Thanks for posting such an excellent article. It helped me a lot and I love the subject matter.
Thank you for providing me with these article examples. May I ask you a question?
Thank you for your post. I really enjoyed reading it, especially because it addressed my issue. It helped me a lot and I hope it will help others too.
Thanks for posting. I really enjoyed reading it, especially because it addressed my issue. It helped me a lot and I hope it will help others too.
Can you write more about it? Your articles are always helpful to me. Thank you!
What?s Taking place i am new to this, I stumbled upon this I have found It positively helpful and it has helped me out loads. I’m hoping to give a contribution & assist different customers like its aided me. Good job.
Thanks for your article. One other thing is that individual American states have their own personal laws which affect property owners, which makes it quite hard for the the legislature to come up with a fresh set of recommendations concerning foreclosure on house owners. The problem is that every state provides own regulations which may have impact in an adverse manner in relation to foreclosure policies.
I?d should check with you here. Which is not something I usually do! I enjoy reading a publish that will make people think. Additionally, thanks for permitting me to remark!