使用SCF Trigger,30行代码搞定 Ckafka To Elasticsearch

昨天刚刚才开始研究Ckafka To Elasticsearch这个方案,如有不太对的地方还请大佬们指教。首先,我们来介绍下概念和场景部分:

SCF触发器:云函数SCF是典型的事件触发(Event-Triggered)形态的无服务器运行环境,核心组件是 SCF 函数和事件源。其中,事件源是发布事件(Event)的腾讯云服务或用户自定义代码,SCF 函数是事件的处理者,而函数触发器就是管理函数和事件源对应关系的集合。

Kafka: Apache Kafka 消息队列引擎,提供高吞吐性能、高可扩展性的消息队列服务。

Elasticsearch :Elasticsearch Service(ES)是基于开源搜索引擎 Elasticsearch 构建的高可用、可伸缩的云端托管 Elasticsearch 服务。Elasticsearch 是一款分布式的基于 RESTful API 的搜索分析引擎,可以应用于日益增多的海量数据搜索和分析等应用场景。

众所周知,在日志处理场景中最常见的其实是ELK (Elasticsearch、Logstash、Kibana), 腾讯云的Elasticsearch其实已经集成了Elasticsearch、Kibana 那在ELK这个使用场景下缺的应该是一个Logstash的角色,如果可以的话我们为什么不用SCF Ckafka的trigger来替代原有的Logstash呢? 其实原理很简单,在trigger上拉到数据,然后原封不动传给Elasticsearch就可以了。

说干就干,千里之行始于足下,我们先搂一眼 Elasticsearch 的SDK:

感觉还是比较清爽的,不过这里选择版本的时候必须选择高于当前使用 Elasticsearch 版本的 SDK 否则会报406错误,血的教训。

我这里操作的话使用的 Elasticsearch 7.x 版本的 Python SDK 链接在这里:https://pypi.org/project/elasticsearch7/

Elasticsearch 对SDK的封装特别简单,基本就是对API简单包了一层,其实自己请求 POST 也是可以满足需求的,当然如果自己愿意在去封装的话。。

这里我直接用的 腾讯云基础版本的Elasticsearch服务作为演示,自建也是同理,只要搞对VPC网络就可以了,画个重点这块需要云函数与Elasticsearch同地域同VPC同子网

搞完之后我们先看下怎么连接到腾讯云的ES把,腾讯云Es连接需要在Elasticsearch函数的参数中设置如下3个参数关闭节点嗅探:

sniff_on_start=False
sniff_on_connection_fail=False
sniffer_timeout=None

这里具体应该是这样的,填上自己的es信息就好了,简单的单条插入是这么写的:

from elasticsearch import Elasticsearch

es = Elasticsearch(["http://xx.xx.xx.xx:9200"],
      http_auth=('user', 'passwd'),
          sniff_on_start=False,
          sniff_on_connection_fail=False,
          sniffer_timeout=None)

res = es.index(index="my_index", doc_type="my_type", id=1, body={"title": "One", "tags": ["ruby"]})
print(res)

千万注意与云函数同地域,不然这块一定会报错。

事情远远没有那么简单,我们考虑的是ELK海量的处理场景,所以这块直接For循环es.index命令一定没法满足我们的要求,之前实验过顺序向es的my_index索引(该索引已存在)写入100条文档,却花费了大约7秒左右,这种速度在大量数据的时候,肯定不行。

后来,想到一个办法通过elasticsearch模块导入helper,通过helper.bulk来批量处理大量的数据。首先我们将所有的数据定义成字典形式,各字段含义如下:

_index对应索引名称,并且该索引必须存在。
_type对应类型名称。
_source对应的字典内,每一篇文档的字段和值,可有有多个字段。

首先将每一篇文档(组成的字典)都整理成一个大的列表,然后,通过helper.bulk(es, action)将这个列表写入到es对象中。然后,这个程序要执行的话——你就要考虑,这个一千万个元素的列表,是否会把你的内存撑爆(MemoryError)!很可能还没到没到写入es那一步,却因为列表过大导致内存错误而使写入程序崩溃!代码如下:

    for i in range(1, 100001, 1000):
        action = ({
            "_index": "my_index",
            "_type": "doc",
            "_source": {
                "title": k
            }
        } for k in range(i, i + 1000))
        helpers.bulk(es, action)

最后的最后,找到了一个方法,将生成器交给es侧去处理,不在函数处理!这样,Python的压力更小了,不过这里的es压力会更大,无论是分批处理还是使用生成器,es的压力都不小,写入操作本来就耗时嘛:

def Taobrss():
    """ 使用生成器批量写入数据 """
    action = ({
        "_index": "my_index",
        "_type": "doc",
        "_source": {
            "title": i
        }
    } for i in range(100000))
    helpers.bulk(es, action)

那么,得到的demo应该是这样的:

#!/usr/bin/python
# -*- coding: UTF-8 -*-
from datetime import datetime
from elasticsearch import Elasticsearch
from elasticsearch import helpers

esServer = "http://172.16.16.53:9200"  # 修改为 es server 地址+端口 E.g. http://172.16.16.53:9200
esUsr = "elastic" # 修改为 es 用户名 E.g. elastic
esPw = "Cc*******" # 修改为 es 密码 E.g. PW2312321321
esIndex = "pre1"  # es中已经创建的 index ,可以直接通过 es.indices.create(index='my-index111')

# ... or specify common parameters as kwargs
es = Elasticsearch([esServer],
      http_auth=(esUsr, esPw),
          sniff_on_start=False,
          sniff_on_connection_fail=False,
          sniffer_timeout=None)

def timer(func):
    def wrapper(*args, **kwargs):
        start = time.time()
        res = func(*args, **kwargs)
        print('共耗时约 {:.2f} 秒'.format(time.time() - start))
        return res

    return wrapper

def main_handler(event, context):
    action = ({      
        "_index": esIndex,
        "_source": {
            "msgBody": record["Ckafka"]["msgBody"] # 获取 Ckafka 触发器 msgBody
        }
    } for record in event["Records"])  # 获取 event Records 字段 数据结构 https://cloud.tencent.com/document/product/583/17530 
    print(action)  
    helpers.bulk(es, action)
    return("successful!")

在触发器拿到 event Records 字段 ,搞到ES 完事大吉。触发器这块的设置如下,可以直接在Ckafka消息转储功能中选择通用模板:

CKafka 实例:配置连接的 CKafka 实例,仅支持选择同地域下的实例。
Topic:支持在 CKafka 实例中已经创建的 Topic。
最大批量消息数:在拉取并批量投递给当前云函数时的最大消息数,目前支持最高配置为10000。结合消息大小、写入速度等因素影响,每次触发云函数并投递的消息数量不一定能达到最大值,而是处在1 – 最大消息数之间的一个变动值。
起始位置:触发器消费消息的起始位置,默认从最新位置开始消费。支持最新、最开始、按指定时间点三种配置。
重试次数:函数发生运行错误(含用户代码错误和 Runtime 错误)时的最大重试次数。

看一眼log和kibana,确认上传信息万事大吉:

代码下载:点击下载



4,222 thoughts on “使用SCF Trigger,30行代码搞定 Ckafka To Elasticsearch”

  • I have noticed that in digital camera models, exceptional sensors help to {focus|concentrate|maintain focus|target|a**** automatically. The actual sensors with some cameras change in contrast, while others start using a beam with infra-red (IR) light, especially in low lighting. Higher standards cameras oftentimes use a mix of both devices and might have Face Priority AF where the digital camera can ‘See’ a new face while keeping focused only on that. Thank you for sharing your ideas on this blog.

  • I’ve been surfing on-line more than 3 hours as of late, yet I never discovered any fascinating article like yours. It is lovely value enough for me. In my opinion, if all web owners and bloggers made good content material as you did, the web can be much more helpful than ever before.

  • There are actually numerous particulars like that to take into consideration. That may be a great point to carry up. I provide the thoughts above as normal inspiration however clearly there are questions like the one you convey up where crucial factor will be working in honest good faith. I don?t know if best practices have emerged around issues like that, however I am sure that your job is clearly recognized as a fair game. Both girls and boys really feel the impression of only a second?s pleasure, for the rest of their lives.

  • I have realized that in digital camera models, specialized receptors help to {focus|concentrate|maintain focus|target|a**** automatically. Those kind of sensors of some cameras change in contrast, while others work with a beam involving infra-red (IR) light, specifically in low lumination. Higher standards cameras occasionally use a combination of both methods and could have Face Priority AF where the digital camera can ‘See’ some sort of face while keeping focused only in that. Thank you for sharing your ideas on this blog.

  • I have observed that car insurance organizations know the autos which are at risk of accidents along with risks. They also know what kind of cars are inclined to higher risk and the higher risk they’ve already the higher your premium fee. Understanding the easy basics associated with car insurance just might help you choose the right sort of insurance policy that may take care of your needs in case you happen to be involved in an accident. Many thanks for sharing the particular ideas in your blog.

  • I have observed that clever real estate agents everywhere are Advertising. They are recognizing that it’s more than just placing a poster in the front yard. It’s really pertaining to building connections with these dealers who one of these days will become buyers. So, whenever you give your time and efforts to serving these dealers go it alone – the “Law associated with Reciprocity” kicks in. Good blog post.

  • Can I simply say what a relief to seek out someone who actually is aware of what theyre speaking about on the internet. You undoubtedly know the best way to convey a difficulty to light and make it important. More folks need to learn this and perceive this aspect of the story. I cant believe youre not more common because you definitely have the gift.

  • Thank you for the sensible critique. Me & my neighbor were just preparing to do some research on this. We got a grab a book from our local library but I think I learned more clear from this post. I am very glad to see such great information being shared freely out there.

  • I loved as much as you’ll receive carried out right here. The sketch is attractive, your authored material stylish. nonetheless, you command get bought an nervousness over that you wish be delivering the following. unwell unquestionably come more formerly again as exactly the same nearly a lot often inside case you shield this hike.

  • Aw, this was a really nice post. In concept I wish to put in writing like this additionally ? taking time and precise effort to make a very good article? however what can I say? I procrastinate alot and certainly not appear to get one thing done.

  • I liked up to you’ll receive performed proper here. The comic strip is tasteful, your authored material stylish. nonetheless, you command get got an shakiness over that you wish be delivering the following. ill definitely come further until now again since exactly the similar just about very steadily inside of case you shield this hike.

  • Excellent read, I just passed this onto a friend who was doing some research on that. And he actually bought me lunch as I found it for him smile So let me rephrase that: Thank you for lunch!

  • Nice post. I used to be checking constantly this blog and I’m inspired! Extremely helpful information particularly the closing phase 🙂 I take care of such info a lot. I used to be seeking this particular info for a very lengthy time. Thank you and best of luck.

  • Hiya, I am really glad I have found this information. Nowadays bloggers publish just about gossips and web and this is actually frustrating. A good blog with exciting content, that’s what I need. Thanks for keeping this site, I’ll be visiting it. Do you do newsletters? Can’t find it.

  • I also believe that mesothelioma is a unusual form of most cancers that is commonly found in individuals previously familiar with asbestos. Cancerous tissues form inside mesothelium, which is a protecting lining that covers a lot of the body’s body organs. These cells ordinarily form in the lining of the lungs, stomach, or the sac that really encircles one’s heart. Thanks for revealing your ideas.

  • Thank you for this article. I’d also like to say that it can possibly be hard if you are in school and starting out to establish a long history of credit. There are many students who are simply trying to endure and have a long or positive credit history is often a difficult point to have.

  • excellent publish, very informative. I ponder why the opposite specialists of this sector do not realize this. You should proceed your writing. I’m confident, you’ve a great readers’ base already!

  • A person necessarily lend a hand to make critically posts I’d state. That is the very first time I frequented your website page and to this point? I surprised with the research you made to create this actual put up extraordinary. Excellent activity!

  • Thanks , I have recently been looking for information about this topic for ages and yours is the greatest I have discovered so far. But, what about the conclusion? Are you sure about the source?

  • After research a few of the weblog posts in your web site now, and I truly like your means of blogging. I bookmarked it to my bookmark website checklist and can be checking back soon. Pls take a look at my web page as properly and let me know what you think.

  • The subsequent time I learn a weblog, I hope that it doesnt disappoint me as a lot as this one. I imply, I know it was my choice to learn, but I really thought youd have one thing attention-grabbing to say. All I hear is a bunch of whining about one thing that you can repair when you werent too busy on the lookout for attention.

  • Another thing I have noticed is the fact for many people, bad credit is the reaction to circumstances past their control. One example is they may have been saddled with illness so they really have more bills for collections. It might be due to a job loss or inability to do the job. Sometimes separation and divorce can really send the financial situation in the undesired direction. Many thanks for sharing your opinions on this web site.

  • Hi I am so grateful I found your blog, I really found you by error, while I was researching on Bing for something else, Nonetheless I am here now and would just like to say thanks for a incredible post and a all round entertaining blog (I also love the theme/design), I don’t have time to read it all at the minute but I have book-marked it and also added in your RSS feeds, so when I have time I will be back to read much more, Please do keep up the fantastic job.

发表评论

邮箱地址不会被公开。 必填项已用*标注

÷ 2 = 2