关于FaaS异步队列重试能力建设的二三事

经常用FaaS能力的同学肯定遇到过事件重试,目前为止SCF除了两个常见的出口触发器 APIGW/CLB和一个流式kafka同步触发器,大部分触发器其实都是都是异步的。如何保证异步触发器在出现各种报错的情况下合理的被重试和消费是这篇文章探索的主题。为了找到答案,几乎有段时间都是在焦虑中度过的,研发的挑战,客户的挑战,成本的压力,非准大客户完全相悖的使用习惯,几乎能被挑战的点都被我们趟了个遍。功能也马上就上线了,就用这篇日志分享下FaaS场景下,异步队列重试我是如何思考与建设相关能力的吧。

 

工欲善其事。我们先来讨论下腾讯云云函数在异步调用场景下的常见报错:

第一类 4xx,这里包括400(无效请求),430(用户代码执行错误),432(并发超限),438(函数关停)等等报错

第二类 5xx,这里主要就是500(内部错误),532(计算资源不足)

这么看上去其实很明确,也就两类报错内容。对比,参考友商后第一版的重试策略如下:

4xx错误:帮助用户重试两次,如果配置DLQ(死信队列)后将失败的信息投递至死信队列,如果没有配置DLQ则该事件会直接被丢弃。

5xx错误:考虑5xx一般为系统原因导致错误,故相关事件指数退避重试24小时,24小时后如果还未恢复,该事件将会被丢失。同样DLQ依然对5xx生效。

 

第一版重试策略沿用了很久,直到云函数将并发上移到用户可配置后,矛盾点爆发了。

首先,是对默认4xx重试两次的挑战,有的用户明确表示自己不希望做任何重试。这点很好解决,我们将默认4xx重试的次数交给用户,用户可以从 0-2 中任意选择所需的重试次数。

其次,是对SLA的挑战,在之前的重试队列是没有任何淘汰机制的,打个比方如果用户将保留配额(reserved concurrency)放到 0 ,并且在异步队列中塞进 100条事件内容。在之前的策略中,该100条事件将永远无法被消费,甚至成为永久的脏数据。所以我们需要一个统一的淘汰机制来保证队列消息不会被用户行为打爆。这里我们将异步队列设置一个时间长度,以最大6小时为限,超过6小时的事件将直接被丢弃,如果配置DLQ,会将相关消息投递至DLQ。

看起来这两个策略是完美的了,至少它可以解决我们现在的用户问题。但,这仅仅完成了最简单的部分。

我们都知道异步队列最大的作用其实是保证用户缓冲某个时刻大量的高并发数据。之前并发都是和 work match的,并发上移后启多少work就变成了一个很玄学的事情,打个最简单的比方,下图是我们经常会遇到的情况并发超限。并发超限按照第一版的重试策略会跟随全部4xx,重试两次后丢弃。

什么?数据丢失?并发配额上线那段时间几乎每天都有在抱怨丢数据。这在完全托管的异步队列是完全不能出现的情况,比较理想的状态就是用户有多少并发就启多少 work,这样超限错误永远就不会存在了。如下图

但是的但是,这里基本是不可能实现的,或者实现起来难度非常大,对资源调度的管理要求太高。那又有什么办法可以保证用户的数据不丢失呢?我几乎想尽了所有办法,还是AWS给了我灵感,如下是AWS并发超限时 Throttles 的监控图:

我恍然大悟,这块也是重试策略的能力啊,我们为什么不把超限错误重新投回队列进行重试呢?这样不久完美解决了超限数据丢失?其实根本不需要什么复杂的算法来解决超限啊。所以,最新的重试是这样的:

 

运行错误(含用户代码运行错误和 Runtime 错误):当发生该类错误时,函数平台将默认重试两次或使用配置的重试次数,固定间隔1分钟。在自动重试的同时,新的触发事件仍可正常处理。如果您配置了死信队列,重试两次失败后的事件将传入死信队列,否则事件将被函数平台丢弃。
系统错误:当发生该类错误时,函数平台会根据您配置的最长等待时间持续重试(默认持续重试6小时),重试间隔按照指数退避增加到5分钟。如果您配置了死信队列,重试超过最长等待时间仍失败的事件会被发送到死信队列,由用户进行进一步处理,否则事件将被函数平台丢弃。
超限错误:当发生该类错误时,函数平台会根据您配置的最长等待时间持续重试(默认持续重试6小时),重试间隔为1分钟。如果您配置了死信队列,重试超过最长等待时间仍失败的事件会被发送到死信队列,由用户进行进一步处理,否则事件将被函数平台丢弃。
调用请求错误和调用方错误:当发生该类错误时,平台将不会对该类其他错误进行重试,因为其他请求错误即便重试也不会成功。(超限错误(432)除外)

新版的重试完美解决了并发超限给用户带来的丢数据的风险,也让异步队列发挥了它应有的价值。异步调用的并发超限用户再也无需进行任何操作,在设定的最长等待时间内,函数平台会自动对并发超限错误进行重试。

 

这个探索的过程相当煎熬,也几乎和研发同学PK了无数轮。好在最后我们完成了目标,当然是在经过两轮的方案推倒重做后。当然也在探索这里关于更开放的错误码重试策略的事情,但总感觉为时过早还需要论证和调研,所以这里就不介绍了。

 

总结一些思路,寥寥数语,如有偏颇还请谅解。



245 thoughts on “关于FaaS异步队列重试能力建设的二三事”

  • 你好!这是我在这里的first评论,所以我只想快速大声说出来,say我genuinely喜欢reading through你的blog posts. 你能推荐任何其他deal with相同topics的博客/网站/论坛吗?感谢您的宝贵时间!

  • I have seen loads of useful points on your site about desktops. However, I have the view that laptops are still not quite powerful more than enough to be a good selection if you often do tasks that require a great deal of power, just like video croping and editing. But for web surfing, word processing, and majority of other typical computer functions they are just great, provided you may not mind small screen size. Appreciate sharing your thinking.

  • An outstanding share! I have just forwarded this onto a coworker who has been doing a little research on this. And he in fact bought me lunch because I discovered it for him… lol. So allow me to reword this…. Thank YOU for the meal!! But yeah, thanks for spending time to talk about this issue here on your site.

  • After I initially left a comment I appear to have clicked the -Notify me when new comments are added- checkbox and from now on each time a comment is added I receive four emails with the exact same comment. There has to be an easy method you are able to remove me from that service? Thank you.

  • Thanks for discussing your ideas. One thing is that students have an alternative between government student loan along with a private education loan where it really is easier to select student loan debt consolidation reduction than in the federal education loan.

  • Thanks for revealing your ideas with this blog. Also, a myth regarding the lenders intentions whenever talking about property foreclosures is that the bank will not have my installments. There is a certain quantity of time which the bank can take payments occasionally. If you are way too deep in the hole, they should commonly demand that you pay the actual payment 100 . However, that doesn’t mean that they will have any sort of installments at all. Should you and the loan company can be capable to work some thing out, your foreclosure procedure may cease. However, when you continue to miss payments beneath new system, the foreclosures process can pick up exactly where it was left off.

  • Thanks for these guidelines. One thing I should also believe is the fact credit cards presenting a 0 apr often entice consumers together with zero rate, instant acceptance and easy on the web balance transfers, however beware of the main factor that may void your 0 easy street annual percentage rate and to throw anybody out into the very poor house rapidly.

  • Can I simply just say what a comfort to discover an individual who truly understands what they are discussing on the web. You actually understand how to bring a problem to light and make it important. A lot more people need to read this and understand this side of the story. I was surprised you’re not more popular because you definitely possess the gift.

  • Thanks a lot for sharing this with all of us you actually know what you’re talking about! Bookmarked. Kindly also visit my website =). We could have a link exchange arrangement between us!

  • It’s my belief that mesothelioma will be the most deadly cancer. It has unusual features. The more I look at it the harder I am confident it does not work like a true solid tissue cancer. If perhaps mesothelioma is a rogue virus-like infection, then there is the potential for developing a vaccine plus offering vaccination for asbestos subjected people who are open to high risk connected with developing long term asbestos related malignancies. Thanks for sharing your ideas about this important health issue.

  • Your style is unique in comparison to other folks I’ve read stuff from. Many thanks for posting when you’ve got the opportunity, Guess I’ll just bookmark this web site.

  • Hello, i think that i saw you visited my website thus i came to ?return the favor?.I am attempting to find things to enhance my web site!I suppose its ok to use some of your ideas!!

  • I blog quite often and I genuinely thank you for your content. The article has really peaked my interest. I am going to take a note of your site and keep checking for new details about once a week. I opted in for your RSS feed as well.

  • Some tips i have observed in terms of laptop or computer memory is the fact there are features such as SDRAM, DDR and so on, that must match up the specifications of the mother board. If the computer’s motherboard is very current while there are no computer OS issues, modernizing the memory literally usually takes under one hour. It’s among the list of easiest laptop upgrade procedures one can imagine. Thanks for revealing your ideas.

  • An outstanding share! I have just forwarded this onto a friend who was conducting a little homework on this. And he actually bought me dinner simply because I discovered it for him… lol. So let me reword this…. Thank YOU for the meal!! But yeah, thanks for spending the time to talk about this issue here on your internet site.

  • Thank you a bunch for sharing this with all of us you actually know what you are talking about! Bookmarked. Please additionally visit my web site =). We could have a link change agreement between us!

  • Thanks for your post. What I want to comment on is that when looking for a good on the internet electronics store, look for a internet site with complete information on critical indicators such as the personal privacy statement, safety measures details, payment options, and various terms and policies. Generally take time to read the help in addition to FAQ segments to get a better idea of how a shop is effective, what they are capable of doing for you, and just how you can use the features.

  • Thanks for the ideas you have discussed here. One more thing I would like to say is that laptop memory requirements generally rise along with other advancements in the technological innovation. For instance, whenever new generations of processors are introduced to the market, there is certainly usually a related increase in the size demands of both laptop memory and also hard drive room. This is because the software operated simply by these processor chips will inevitably rise in power to make use of the new technologies.

  • An impressive share! I’ve just forwarded this onto a co-worker who was conducting a little homework on this. And he in fact ordered me lunch simply because I discovered it for him… lol. So let me reword this…. Thank YOU for the meal!! But yeah, thanx for spending the time to talk about this subject here on your site.

  • One more thing to say is that an online business administration program is designed for students to be able to easily proceed to bachelors degree courses. The Ninety credit education meets the lower bachelor diploma requirements and once you earn your associate of arts in BA online, you may have access to the modern technologies in such a field. Some reasons why students need to get their associate degree in business is because they can be interested in the field and want to obtain the general education and learning necessary just before jumping right bachelor college diploma program. Thanks for the tips you actually provide as part of your blog.

  • Your style is very unique compared to other folks I have read stuff from. I appreciate you for posting when you’ve got the opportunity, Guess I will just book mark this page.

  • You actually make it seem so easy with your presentation but I find this topic to be really something that I think I would never understand. It seems too complex and extremely broad for me. I am looking forward for your next post, I?ll try to get the hang of it!

  • Thanks for your article. One other thing is that if you are promoting your property by yourself, one of the troubles you need to be mindful of upfront is how to deal with home inspection records. As a FSBO vendor, the key to successfully transferring your property in addition to saving money about real estate agent profits is know-how. The more you are aware of, the softer your property sales effort will probably be. One area in which this is particularly significant is assessments.

发表评论

邮箱地址不会被公开。 必填项已用*标注

92 − = 90