)]}' {"/COMMIT_MSG":[{"author":{"_account_id":10987,"name":"Igor Degtiarov","username":"tovstun","inactive":true},"change_message_id":"29d64fd161c1ea010429aa7d9d2ed275472d3279","unresolved":false,"context_lines":[{"line_number":4,"context_line":"Commit: Ilya Tyaptin \u003cityaptin@mirantis.com\u003e"},{"line_number":5,"context_line":"CommitDate: 2015-03-27 15:39:36 +0300"},{"line_number":6,"context_line":""},{"line_number":7,"context_line":"Spec for using a aggregation pipeline in MongoDB"},{"line_number":8,"context_line":""},{"line_number":9,"context_line":"Change-Id: I5cc675c4d2c6b18b0eee3223c706a66d5eac3d8c"},{"line_number":10,"context_line":"Implements: blueprint mongodb-aggregation-pipeline"}],"source_content_type":"text/x-gerrit-commit-message","patch_set":1,"id":"da9b358b_3f413e0c","line":7,"range":{"start_line":7,"start_character":14,"end_line":7,"end_character":17},"updated":"2015-04-01 12:20:11.000000000","message":"s/a/an","commit_id":"d4f9b5e467fa04a164e572907065a309f61c00e9"}],"specs/liberty/mongodb-aggregation-pipeline.rst":[{"author":{"_account_id":7478,"name":"Nadya Shakhat","email":"nadmi4@gmail.com","username":"nprivalova"},"change_message_id":"123398e2a236ef4a58b9d544b781d613c74665bb","unresolved":false,"context_lines":[{"line_number":14,"context_line":"\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d"},{"line_number":15,"context_line":""},{"line_number":16,"context_line":"Currently, when we make a GET \"/v2/meter/\u003cmeter_type/statistics\" with MongoDB"},{"line_number":17,"context_line":"backend it starts a native map-reduce job in MongoDB instance. Tests and deep"},{"line_number":18,"context_line":"researching show that this job have a lack of performance in work with huge"},{"line_number":19,"context_line":"amount of samples (several millions and above). For example, job processes"},{"line_number":20,"context_line":"~10000 samples per second on my test environment (16 GB RAM, 8 CPU, 1 TB disk,"}],"source_content_type":"text/x-rst","patch_set":1,"id":"9aa53dc9_c6e01dfd","line":17,"updated":"2015-04-08 10:30:48.000000000","message":"job have \u003d\u003e job has","commit_id":"d4f9b5e467fa04a164e572907065a309f61c00e9"},{"author":{"_account_id":7478,"name":"Nadya Shakhat","email":"nadmi4@gmail.com","username":"nprivalova"},"change_message_id":"123398e2a236ef4a58b9d544b781d613c74665bb","unresolved":false,"context_lines":[{"line_number":15,"context_line":""},{"line_number":16,"context_line":"Currently, when we make a GET \"/v2/meter/\u003cmeter_type/statistics\" with MongoDB"},{"line_number":17,"context_line":"backend it starts a native map-reduce job in MongoDB instance. Tests and deep"},{"line_number":18,"context_line":"researching show that this job have a lack of performance in work with huge"},{"line_number":19,"context_line":"amount of samples (several millions and above). For example, job processes"},{"line_number":20,"context_line":"~10000 samples per second on my test environment (16 GB RAM, 8 CPU, 1 TB disk,"},{"line_number":21,"context_line":"15000000 samples). So job for 15M samples work ~1500 seconds. It\u0027s longer"}],"source_content_type":"text/x-rst","patch_set":1,"id":"9aa53dc9_06ba85c0","line":18,"updated":"2015-04-08 10:30:48.000000000","message":"job \u003d\u003e the job. We are talking about \u0027get statistics\u0027 one, right?","commit_id":"d4f9b5e467fa04a164e572907065a309f61c00e9"},{"author":{"_account_id":11564,"name":"Chris Dent","email":"cdent@anticdent.org","username":"chdent"},"change_message_id":"b963148ca26187f869f4bc9abb1b1126d09ef3fa","unresolved":false,"context_lines":[{"line_number":21,"context_line":"15000000 samples). So job for 15M samples work ~1500 seconds. It\u0027s longer"},{"line_number":22,"context_line":"than default api timeout, 1 minute."},{"line_number":23,"context_line":""},{"line_number":24,"context_line":"Of course, with Gnocchi dispatcher we haven\u0027t issue with statistics, but"},{"line_number":25,"context_line":"users which are going to use only MongoDB backend will have troubles with alarm"},{"line_number":26,"context_line":"work and making user reports."},{"line_number":27,"context_line":""}],"source_content_type":"text/x-rst","patch_set":1,"id":"9aa53dc9_55b9f113","line":24,"updated":"2015-04-10 12:15:38.000000000","message":"Given the desire for gnocchi to be the future, is it worth extending effort on adding features to the mongo and sqla implementations? Shouldn\u0027t we instead consider the storage implementations in the ceilometer core legacy code and maintain it, fix bugs and tune performance where possible, but not add features?","commit_id":"d4f9b5e467fa04a164e572907065a309f61c00e9"},{"author":{"_account_id":7478,"name":"Nadya Shakhat","email":"nadmi4@gmail.com","username":"nprivalova"},"change_message_id":"ffa788ba74b87b4c11bab56c4c53810c19ad7535","unresolved":false,"context_lines":[{"line_number":21,"context_line":"15000000 samples). So job for 15M samples work ~1500 seconds. It\u0027s longer"},{"line_number":22,"context_line":"than default api timeout, 1 minute."},{"line_number":23,"context_line":""},{"line_number":24,"context_line":"Of course, with Gnocchi dispatcher we haven\u0027t issue with statistics, but"},{"line_number":25,"context_line":"users which are going to use only MongoDB backend will have troubles with alarm"},{"line_number":26,"context_line":"work and making user reports."},{"line_number":27,"context_line":""}],"source_content_type":"text/x-rst","patch_set":1,"id":"7aaa499b_ebbe7dfe","line":24,"in_reply_to":"9aa53dc9_55b9f113","updated":"2015-04-13 09:37:42.000000000","message":"Chris, AFAIU the \u0027feature\u0027 described in this spec is definitely about improving performance in ceilometer core legacy code, in MongoDB driver. Now we use map-reduce jobs in get_meter_statistics method, after applying the change they will be replaced with another approach that guarantees better performance. So it\u0027s not about the new functionality but mostly about performance tuning.","commit_id":"d4f9b5e467fa04a164e572907065a309f61c00e9"},{"author":{"_account_id":7478,"name":"Nadya Shakhat","email":"nadmi4@gmail.com","username":"nprivalova"},"change_message_id":"123398e2a236ef4a58b9d544b781d613c74665bb","unresolved":false,"context_lines":[{"line_number":27,"context_line":""},{"line_number":28,"context_line":"Proposed change"},{"line_number":29,"context_line":"\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d"},{"line_number":30,"context_line":""},{"line_number":31,"context_line":"This will add a implementation of method get_meter_statistics via MongoDB"},{"line_number":32,"context_line":"aggregation pipeline framework."},{"line_number":33,"context_line":""}],"source_content_type":"text/x-rst","patch_set":1,"id":"9aa53dc9_a6f99982","line":30,"updated":"2015-04-08 10:30:48.000000000","message":"Maybe start with \u0027Add\u0027? Not clear what is \u0027this\u0027. And please a \u003d\u003e an. And remove comma","commit_id":"d4f9b5e467fa04a164e572907065a309f61c00e9"},{"author":{"_account_id":7478,"name":"Nadya Shakhat","email":"nadmi4@gmail.com","username":"nprivalova"},"change_message_id":"123398e2a236ef4a58b9d544b781d613c74665bb","unresolved":false,"context_lines":[{"line_number":30,"context_line":""},{"line_number":31,"context_line":"This will add a implementation of method get_meter_statistics via MongoDB"},{"line_number":32,"context_line":"aggregation pipeline framework."},{"line_number":33,"context_line":""},{"line_number":34,"context_line":"This framework modeled on the concept of data processing pipelines. Documents"},{"line_number":35,"context_line":"enter a multi-stage pipeline that transforms the documents into an aggregated"},{"line_number":36,"context_line":"result. The most basic pipeline stages provide filters that operate like"}],"source_content_type":"text/x-rst","patch_set":1,"id":"9aa53dc9_46e1ad99","line":33,"updated":"2015-04-08 10:30:48.000000000","message":"Is it a quote from Mongo docs? If it is please start with \"From mongo docs: \"...\" \"","commit_id":"d4f9b5e467fa04a164e572907065a309f61c00e9"},{"author":{"_account_id":7478,"name":"Nadya Shakhat","email":"nadmi4@gmail.com","username":"nprivalova"},"change_message_id":"123398e2a236ef4a58b9d544b781d613c74665bb","unresolved":false,"context_lines":[{"line_number":42,"context_line":"string. The pipeline provides efficient data aggregation using native"},{"line_number":43,"context_line":"operations within MongoDB, and is the preferred method for data aggregation"},{"line_number":44,"context_line":"in MongoDB."},{"line_number":45,"context_line":""},{"line_number":46,"context_line":"My researches show what aggregation pipeline is faster than native map-reduce"},{"line_number":47,"context_line":"job to ~10 times. So, processing of 15M samples, in same test environment works"},{"line_number":48,"context_line":"128 seconds vs 1500 seconds with map-reduce."}],"source_content_type":"text/x-rst","patch_set":1,"id":"9aa53dc9_86fe9536","line":45,"updated":"2015-04-08 10:30:48.000000000","message":"what \u003d\u003e that","commit_id":"d4f9b5e467fa04a164e572907065a309f61c00e9"},{"author":{"_account_id":7478,"name":"Nadya Shakhat","email":"nadmi4@gmail.com","username":"nprivalova"},"change_message_id":"123398e2a236ef4a58b9d544b781d613c74665bb","unresolved":false,"context_lines":[{"line_number":43,"context_line":"operations within MongoDB, and is the preferred method for data aggregation"},{"line_number":44,"context_line":"in MongoDB."},{"line_number":45,"context_line":""},{"line_number":46,"context_line":"My researches show what aggregation pipeline is faster than native map-reduce"},{"line_number":47,"context_line":"job to ~10 times. So, processing of 15M samples, in same test environment works"},{"line_number":48,"context_line":"128 seconds vs 1500 seconds with map-reduce."},{"line_number":49,"context_line":""}],"source_content_type":"text/x-rst","patch_set":1,"id":"9aa53dc9_66ab911e","line":46,"updated":"2015-04-08 10:30:48.000000000","message":"Please remove commas :) in same \u003d\u003e in the same","commit_id":"d4f9b5e467fa04a164e572907065a309f61c00e9"},{"author":{"_account_id":7478,"name":"Nadya Shakhat","email":"nadmi4@gmail.com","username":"nprivalova"},"change_message_id":"123398e2a236ef4a58b9d544b781d613c74665bb","unresolved":false,"context_lines":[{"line_number":52,"context_line":"\"statistics\" features."},{"line_number":53,"context_line":""},{"line_number":54,"context_line":"Risks:"},{"line_number":55,"context_line":""},{"line_number":56,"context_line":"This framework have specified limits. It restricted by 100 MB RAM or needs to"},{"line_number":57,"context_line":"write temporary files at disk. Also this MongoDB mechanisms have limit in size"},{"line_number":58,"context_line":"of finally document in 16 MB, same as map-reduce job."}],"source_content_type":"text/x-rst","patch_set":1,"id":"9aa53dc9_c15b67f5","line":55,"updated":"2015-04-08 10:30:48.000000000","message":"Not clear about writing files on disk. 100 MB. \"It restricted by 100 MB RAM *because* needs to write temporary files at disk\"?","commit_id":"d4f9b5e467fa04a164e572907065a309f61c00e9"},{"author":{"_account_id":7729,"name":"Ilya Tyaptin","email":"ityaptin@mirantis.com","username":"ityaptin"},"change_message_id":"d32d4d0369cdb3bb6f9086c7c05d13d6d1d16b94","unresolved":false,"context_lines":[{"line_number":52,"context_line":"\"statistics\" features."},{"line_number":53,"context_line":""},{"line_number":54,"context_line":"Risks:"},{"line_number":55,"context_line":""},{"line_number":56,"context_line":"This framework have specified limits. It restricted by 100 MB RAM or needs to"},{"line_number":57,"context_line":"write temporary files at disk. Also this MongoDB mechanisms have limit in size"},{"line_number":58,"context_line":"of finally document in 16 MB, same as map-reduce job."}],"source_content_type":"text/x-rst","patch_set":1,"id":"9aa53dc9_a010215a","line":55,"in_reply_to":"9aa53dc9_c15b67f5","updated":"2015-04-09 15:21:16.000000000","message":"Nadya, workflow of this framework is a sequence of stages, if stage uses a more than 100 MB RAM it creates a temporary file with intermediate results in a disk.","commit_id":"d4f9b5e467fa04a164e572907065a309f61c00e9"},{"author":{"_account_id":6537,"name":"gordon chung","email":"gord@live.ca","username":"chungg"},"change_message_id":"4ef7e15720374a268f0ba8acf29e26a4e0e5311c","unresolved":false,"context_lines":[{"line_number":97,"context_line":"Performance/Scalability Impacts"},{"line_number":98,"context_line":"-------------------------------"},{"line_number":99,"context_line":""},{"line_number":100,"context_line":"Improve performance of GET \"/v2/\u003cmeter_name\u003e/statistics\" request."},{"line_number":101,"context_line":""},{"line_number":102,"context_line":"Other deployer impact"},{"line_number":103,"context_line":"---------------------"}],"source_content_type":"text/x-rst","patch_set":2,"id":"dac4157f_72767ea5","line":100,"updated":"2015-04-22 14:25:19.000000000","message":"i think you should add a blurb to address this: https://review.openstack.org/#/c/65962/\n\nwe actually tried this at one point and it was broken.","commit_id":"85a641741eb033aa136b5852e691a47d796d9d71"},{"author":{"_account_id":6537,"name":"gordon chung","email":"gord@live.ca","username":"chungg"},"change_message_id":"383339fe9aa8a6e9e4165940c885b59a51370932","unresolved":false,"context_lines":[{"line_number":97,"context_line":"Performance/Scalability Impacts"},{"line_number":98,"context_line":"-------------------------------"},{"line_number":99,"context_line":""},{"line_number":100,"context_line":"Improve performance of GET \"/v2/\u003cmeter_name\u003e/statistics\" request."},{"line_number":101,"context_line":""},{"line_number":102,"context_line":"Other deployer impact"},{"line_number":103,"context_line":"---------------------"}],"source_content_type":"text/x-rst","patch_set":2,"id":"5afe65bd_e2dcfa13","line":100,"in_reply_to":"5afe65bd_ac363101","updated":"2015-06-03 20:39:56.000000000","message":"can we add this to the spec. i think it\u0027d be good to capture this outside of gerrit","commit_id":"85a641741eb033aa136b5852e691a47d796d9d71"},{"author":{"_account_id":7729,"name":"Ilya Tyaptin","email":"ityaptin@mirantis.com","username":"ityaptin"},"change_message_id":"88056b0e6e812a2061d35131aa77950bf78dd006","unresolved":false,"context_lines":[{"line_number":97,"context_line":"Performance/Scalability Impacts"},{"line_number":98,"context_line":"-------------------------------"},{"line_number":99,"context_line":""},{"line_number":100,"context_line":"Improve performance of GET \"/v2/\u003cmeter_name\u003e/statistics\" request."},{"line_number":101,"context_line":""},{"line_number":102,"context_line":"Other deployer impact"},{"line_number":103,"context_line":"---------------------"}],"source_content_type":"text/x-rst","patch_set":2,"id":"5afe65bd_ac363101","line":100,"in_reply_to":"dac4157f_72767ea5","updated":"2015-06-02 18:15:04.000000000","message":"Hi, Gordon! \nIn original bug issue is caused by aggregate memory limit (10% of all PhysMem in MongoDB\u003c\u003d2.4 and 100 Mb in MongoDB\u003e\u003d2.6). We can use the option `allowDiskUse\u003dTrue` in aggregation command for avoiding this issue in MongoDB\u003e\u003d2.6. This option allows to write intermediate staging data to temporary files. It seems what primary risks of this approach are a necessity of free space on disk and a slow performance of disk writing and reading. \n\nI researched these points at lab with 4 CPU, 16 GB RAM and 200 GB real samples in `meter` collection.\n\n1. Accordingly to my research, the \"$sort\" command creates the most amount of intermediate data for follow stages. With the unindexed documents the 1Gb of additional data have been created. In same time, the indexed fields (like timestamp in our db) does not need the any additional data in the disk, it uses the create index for sorting.\n\nOther commands contains only processed and grouped data and use additional space only in worst case (huge amount of resources and group by resource_id).\n\n2. Case with running intermediate data to the temporary files is slower than using inmemory aggregation but it\u0027s faster than Map-Reduce up to 5 times.","commit_id":"85a641741eb033aa136b5852e691a47d796d9d71"}]}