{"id":4398,"date":"2023-06-25T21:54:51","date_gmt":"2023-06-25T13:54:51","guid":{"rendered":"\/?p=4398"},"modified":"2023-07-18T21:58:18","modified_gmt":"2023-07-18T13:58:18","slug":"6-spark%e7%9b%b8%e5%85%b3%e8%81%9a%e5%90%88%e7%ae%97%e5%ad%90","status":"publish","type":"post","link":"http:\/\/xinblog.ltd\/?p=4398","title":{"rendered":"6.Spark\u76f8\u5173\u805a\u5408\u7b97\u5b50"},"content":{"rendered":"<p>\u4e4b\u524d\u6211\u4eec\u8bf4\u4e86\u4e00\u4e9b\u548c\u805a\u5408\u4e0d\u76f8\u5173\u7684\u8f6c\u6362\u7b97\u5b50\uff0c\u6bd4\u5982map,mapPartitions,faltMap,filter<\/p>\n<p>\u8fd9\u6b21\u6211\u4eec\u9700\u8981\u8bb2\u4e00\u4e0b\u5728Spark\u4e2d\u548c\u6570\u636e\u805a\u5408\u76f8\u5173\u7b97\u5b50\uff0c\u6bd4\u5982groupByKey,reduceByKey,aggregateByKey\u548csortByKey,\u800c\u8fd9\u4e9b\u7b97\u5b50\u90fd\u6d89\u53ca\u5230Shuffle\u8ba1\u7b97\uff0c\u4e5f\u662fSpark\u4e2d\u6bd4\u8f83\u91cd\u903b\u8f91\u7684\u4e00\u90e8\u5206\u3002<\/p>\n<p>\u800c\u4e14\u8fd9\u51e0\u4e2a\u7b97\u5b50\uff0c\u90fd\u662f\u9762\u5bf9Paired RDD\u7684\uff0c\u4ece\u800c\u8fdb\u884cRDD\u5185\u6570\u636e\u805a\u5408<\/p>\n<p>\u90a3\u4e48\u6211\u4eec\u8fd9\u6b21\u5c31\u8bf4\u4e0b\u8fd9\u51e0\u4e2a\u7b97\u5b50\u7684\u4f7f\u7528<\/p>\n<p>1.\u00a0\u00a0\u00a0\u00a0 groupByKey\uff0c\u6309\u7ec4\u6536\u96c6<\/p>\n<p>\u5bf9\u4e8ePairedRDD\uff0c\u4f1a\u5148\u6309\u7167Key\u8fdb\u884c\u5206\u7ec4\uff0c\u7136\u540e\u628a\u76f8\u5173\u7684Value\u503c\uff0c\u4ee5\u96c6\u5408\u7684\u5f62\u5f0f\u6536\u96c6\u5230\u4e00\u8d77\u3002\u4e5f\u5c31\u662f\u5c06RDD[(Key,Value)]\u8f6c\u6362\u4e3aRDD[(Key,Value\u96c6\u5408)]<\/p>\n<p>\u4f7f\u7528\u8d77\u6765\u4e5f\u5e76\u4e0d\u56f0\u96be\uff0c\u6211\u4eec\u62ffWordCount\u6765\u4e3e\u4f8b<\/p>\n<p>val kvRDD: RDD[(String,Iterable[String])] = kvRDD.groupByKey()<\/p>\n<p>\u4ece\u4e0a\u9762\u53ef\u4ee5\u770b\u51fagroupByKey\u4f7f\u7528\u5e76\u4e0d\u56f0\u96be\uff0c\u4f46\u662fgroupByKey\u5b58\u5728\u7740\u6027\u80fd\u95ee\u9898\uff0c\u8fd9\u662f\u7531\u4e8egroupByKey\u4f1a\u5c06\u6570\u636e\u8fdb\u884c\u5168\u91cf\u641c\u96c6\uff0c\u5e76\u5728Shuffle\u540e\u53d1\u9001\u5230\u76f8\u540c\u7684\u6570\u636e\u5206\u533a\uff0c\u8fd9\u5c31\u5bfc\u81f4\u6570\u636e\u91cf\u5927\u7684\u65f6\u5019\uff0c\u4f1a\u4ea7\u751f\u5927\u91cf\u7684\u78c1\u76d8IO\u6216\u8005\u7f51\u7edcIO\uff0c\u4ece\u800c\u4e25\u91cd\u5f71\u54cd\u4f5c\u4e1a\u7684\u6267\u884c\u6027\u80fd\u3002<\/p>\n<p>\u4e0d\u8fc7\u4e00\u822cgroupByKey\u7684\u4f7f\u7528\u9891\u7387\u4e0d\u9ad8\uff0c\u800c\u662f\u91c7\u7528\u5176\u4ed6\u7684\u805a\u5408\u7b97\u5b50\uff0c\u90a3\u4e48\u6211\u4eec\u5c31\u770b\u770b\u8fd9\u4e9b\u5e38\u7528\u7684\u805a\u5408\u7b97\u5b50<\/p>\n<p>2.\u00a0\u00a0\u00a0\u00a0 reduceByKey \u5206\u7ec4\u805a\u5408<\/p>\n<p>\u4e5f\u5c31\u662f\u6309\u7167Key\u503c\u8fdb\u884c\u805a\u5408\uff0c\u5c06Key\u76f8\u540c\u7684\u5143\u7d20\uff0c\u805a\u5408\u6210\u4e00\u4e2a\u5143\u7d20<\/p>\n<p>\u5bf9\u5e94\u7684\u4ee3\u7801\u53ef\u4ee5\u5982\u4e0b\u53c2\u8003<\/p>\n<p>val wordCounts: RDD[(String,Int)] = kvRDD.reduceByKey((x:Int,y:Int) -&gt; x + y)<\/p>\n<p>\u4e0a\u9762\u7684\u4ee3\u7801\u53ef\u4ee5\u7406\u89e3\u4e3a<\/p>\n<p>\u5c06PairedRDD\u7684\u76f8\u540cKey\u6570\u636e\u7684Value\u8fdb\u884c\u4e86\u76f8\u52a0<\/p>\n<p>\u800c\u4e14\u6211\u4eec\u5728\u4e0a\u9762\u4f7f\u7528\u7684\u662f\u533f\u540d\u7b97\u5b50\uff0c\u9664\u4e86\u533f\u540d\u7b97\u5b50\uff0c\u6211\u4eec\u53ef\u4ee5\u58f0\u660e\u76f8\u5173\u7684\u5177\u4f53\u51fd\u6570\uff0c\u6bd4\u5982\u6211\u4eec\u9700\u8981\u63d0\u53d6\u540c\u4e00\u4e2aKey\u7684\u6700\u5927\u503c\uff0c\u5c31\u6bd4\u5982\u5982\u4e0b\u7684\u51fd\u6570<\/p>\n<p>def f(x:Int, y:Int): Int = {<\/p>\n<p>return math.max(x,y)<\/p>\n<p>}<\/p>\n<p>\u7136\u540e\u76f4\u63a5reduceByKey(f)\u5373\u53ef<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" width=\"865\" height=\"432\" class=\"wp-image-4399\" src=\"\/wp-content\/uploads\/2023\/07\/unnamed-file-11.png\" alt=\"\u56fe\u7247\" srcset=\"http:\/\/xinblog.ltd\/wp-content\/uploads\/2023\/07\/unnamed-file-11.png 865w, http:\/\/xinblog.ltd\/wp-content\/uploads\/2023\/07\/unnamed-file-11-300x150.png 300w, http:\/\/xinblog.ltd\/wp-content\/uploads\/2023\/07\/unnamed-file-11-768x384.png 768w\" sizes=\"(max-width: 865px) 100vw, 865px\" \/><\/p>\n<p>\u800c\u4e14\u5728\u4f7f\u7528\u8fc7\u7a0b\u4e2d\uff0c\u7531\u4e8eMap\u7aef\u548cReduce\u6bb5\u4f7f\u7528\u7684\u805a\u5408\u903b\u8f91\u662f\u4e00\u81f4\u7684\uff0c\u662f\u7531\u51fd\u6570f\u5b9a\u4e49\u7684\uff0c\u6240\u4ee5\u4f1a\u8fdb\u884c\u4e00\u4e9b\u4f18\u5316\uff0c\u4e5f\u5c31\u662f\u5728Map\u7aef\u7684\u65f6\u5019\uff0c\u5c31\u5df2\u7ecf\u8fdb\u884c\u4e86\u9884\u805a\u5408\uff0c\u7136\u540e\u5728reduce\u7aef\u8fdb\u884c\u4e8c\u5ea6\u805a\u5408\u5373\u53ef<\/p>\n<p>\u7531\u4e8e\u8fd9\u79cd\u4f18\u5316\u7684\u5b58\u5728\uff0c\u5728\u8fdb\u884c\u6267\u884c\u7684\u65f6\u5019\uff0c\u5f80\u5f80\u6267\u884c\u6548\u7387\u80fd\u81f3\u5c11\u63d0\u5347\u4e00\u500d<\/p>\n<p>\u4e0d\u8fc7reduceByKey\u5b58\u5728\u7684\u5c40\u9650\u6027\u662fMap\u9636\u6bb5\u548cReduce\u9636\u6bb5\u7684\u805a\u5408\u903b\u8f91\u8981\u4fdd\u6301\u4e00\u81f4<\/p>\n<p>\u5982\u679c\u6211\u4eec\u5e0c\u671bMap\u9636\u6bb5\u548cReduce\u9636\u6bb5\u7684\u903b\u8f91\u4e0d\u4e00\u81f4\u7684\u8bdd\uff0c\u53ef\u4ee5\u8003\u8651\u4e0b\u9762\u7684aggregateByKey\u7b97\u5b50\u4e86<\/p>\n<p>3.\u00a0\u00a0\u00a0\u00a0 aggregateByKey \u66f4\u52a0\u7075\u6d3b\u7684\u7b97\u5b50<\/p>\n<p>\u805a\u5408\u7b97\u5b50\u7684\u53c2\u6570\u6bd4\u8f83\u591a\uff0c\u9700\u8981\u63d0\u4f9b\u4e09\u4e2a\u53c2\u6570\uff0c\u5206\u522b\u662f\u4e00\u4e2a\u521d\u59cb\u503c\uff0c\u4e00\u4e2aMap\u7aef\u805a\u5408\u51fd\u6570f1\uff0c\u4e00\u4e2areduce\u7aef\u805a\u5408\u51fd\u6570f2<\/p>\n<p>\u57fa\u672c\u4ee3\u7801\u5982\u4e0b<\/p>\n<p>rdd.aggregateByKey(\u521d\u59cb\u503c\uff0cf1,f2)<\/p>\n<p>\u5176\u4e2d\u9700\u8981\u6ce8\u610f\uff0cf1\u7684\u53c2\u6570\u7c7b\u578b\u548cf2\u8981\u4fdd\u6301\u4e00\u81f4\uff0c\u800c\u4e14f1\u7684\u884c\u53c2\u8981\u548cPaired RDD\u7684value\u7c7b\u578b\u4e00\u81f4\uff0c\u521d\u59cb\u503c\u7684\u7c7b\u578b\u4e5f\u8981\u548cPaired RDD\u7684value\u7c7b\u578b\u4e00\u81f4<\/p>\n<p>\u5177\u4f53\u7684\u4ee3\u7801\u53ef\u4ee5\u53c2\u8003\u5982\u4e0b<\/p>\n<p>def f1(x:Int, y:Int): Int = {<\/p>\n<p>return math.max(x,y)<\/p>\n<p>}<\/p>\n<p>def f1(x:Int, y:Int): Int = {<\/p>\n<p>return x+y<\/p>\n<p>}<\/p>\n<p>val wordCounts: RDD[(String,Int)] = rdd.aggregateByKey(0)(f1,f2)<\/p>\n<p>\u8fd9\u6837\u5c31\u662f\u5728\u5229\u7528\u4e86\u548creduceByKey\u4e00\u6837\u7684\u4f18\u5316\uff0c\u5e76\u4e14\u517c\u5bb9\u4e86\u4e24\u79cd\u4e0d\u540c\u7684\u805a\u5408\u51fd\u6570<\/p>\n<p>\u6700\u540e\u5219\u662fsortByKey,\u5b57\u9762\u610f\u601d\u662f\u6309\u7167Key\u8fdb\u884c\u6392\u5e8f\uff0c\u4f7f\u7528\u65b9\u6cd5\u4e5f\u5f88\u7b80\u5355\uff0c\u53ea\u9700\u8981\u5728RDD\u4e0a\u8c03\u7528sortByKey()\u5373\u53ef<\/p>\n<p>\u7136\u540e\u5982\u679c\u9700\u8981\u6309\u7167\u5347\u5e8f\u6216\u8005\u5012\u5e8f\u8fdb\u884c\u6392\u5217\u7684\u8bdd\uff0c\u53ea\u9700\u8981\u4f20\u5165true\u6216\u8005false\u5373\u53ef<\/p>\n<p>\u5982\u679c\u6309\u7167\u964d\u5e8f\u8fdb\u884c\u6392\u5e8f\u7684\u8bdd\uff0c\u53ea\u9700\u8981\u4f20\u5165false<\/p>\n<p>\u90a3\u4e48\u6211\u4eec\u603b\u7ed3\u4e00\u4e0b\u4eca\u5929\u8bf4\u7684\u51e0\u4e2a\u7b97\u5b50\uff0c\u5206\u522b\u662fgroupByKey,reduceByKey,aggregateByKey\u548csortByKey<\/p>\n<p>\u5927\u81f4\u8bf4\u660e\u4e86\u8fd9\u51e0\u4e2a\u7b97\u5b50\u7684\u7528\u6cd5\u548c\u6027\u80fd\u4f18\u5316\u7684\u70b9<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u4e4b\u524d\u6211\u4eec\u8bf4\u4e86\u4e00\u4e9b\u548c\u805a\u5408\u4e0d\u76f8\u5173\u7684 [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[36],"tags":[],"_links":{"self":[{"href":"http:\/\/xinblog.ltd\/index.php?rest_route=\/wp\/v2\/posts\/4398"}],"collection":[{"href":"http:\/\/xinblog.ltd\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/xinblog.ltd\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/xinblog.ltd\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/xinblog.ltd\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=4398"}],"version-history":[{"count":0,"href":"http:\/\/xinblog.ltd\/index.php?rest_route=\/wp\/v2\/posts\/4398\/revisions"}],"wp:attachment":[{"href":"http:\/\/xinblog.ltd\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=4398"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/xinblog.ltd\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=4398"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/xinblog.ltd\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=4398"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}