-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-16076 test: Automate dmg scale test to be run on Aurora #14616
Conversation
Steps: 1. Format storages 2. System query 3. Create a 100% pool that spans all engines 4. Pool query 5. Pool destroy 6. Create 50 pools spanning all the engines with each pool using a 1/50th of the capacity 7. Pool list 8. Get around 80 pool metrics 9. Destroy all 50 pools 10. System stop 11. System start Skip-unit-tests: true Skip-fault-injection-test: true Signed-off-by: Makito Kano <makito.kano@intel.com>
Ticket title is 'Automate dmg scale test to be run on Aurora' |
Skip-unit-tests: true Skip-fault-injection-test: true Signed-off-by: Makito Kano <makito.kano@intel.com>
Skip-unit-tests: true Skip-fault-injection-test: true Signed-off-by: Makito Kano <makito.kano@intel.com>
Skip-unit-tests: true Skip-fault-injection-test: true Signed-off-by: Makito Kano <makito.kano@intel.com>
… for the remaining 48 pools Skip-unit-tests: true Skip-fault-injection-test: true Signed-off-by: Makito Kano <makito.kano@intel.com>
Skip-unit-tests: true Skip-fault-injection-test: true
"engine_pool_block_allocator_frags_small", | ||
"engine_pool_block_allocator_free_blks", | ||
"engine_pool_ops_key2anchor" | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why we want to keep this here? I feel we have to use the metrics list available under TelemetryUtils.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to keep them here because these are scattered across different variables in TelemetryUtils.py. Also, they can be moved around or removed by someone else in TelemetryUtils.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW @phender And I have gone back and forth on this too. I tend to agree with @shimizukko here: keeping them here makes it much less likely that someone accidentally breaks them in the utils
""" | ||
# This is a manual test and we need to find the durations from job.log, so add "##" to make | ||
# it easy to search. The log is usually over 1 million lines. | ||
self.log_step("## System query") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to not put formatting for log_step because it already formats the messages
self.log_step("## System query") | |
self.log_step("System query") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is printed as:
==> Step 4: ## System query [elapsed since last step: 0.00s]
If we don't use ##
, We could search with ==>
, but I'm using ##
in other places such as total pool create duration. In my experience, it's easier to search with the same search string across the entire job.log than switching the strings to search different values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts @phender ? Similar rationale has come up before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the cleaner way is to measure the duration for each step, but now we use the harness for some operations such as self.server_managers[0].system_stop()
, so measuring the command duration isn't straightforward. Also, this test is manually executed only at RC (4 times in each RC), so I'm not sure if I want to put more effort into it.
Remove unnecessary tags. Skip-unit-tests: true Skip-fault-injection-test: true Signed-off-by: Makito Kano <makito.kano@intel.com>
Skip-unit-tests: true Skip-fault-injection-test: true
…_start() Also update variable names and comment. Skip-unit-tests: true Skip-fault-injection-test: true Signed-off-by: Makito Kano <makito.kano@intel.com>
Skip-unit-tests: true Skip-fault-injection-test: true
Skip-unit-tests: true Skip-fault-injection-test: true Signed-off-by: Makito Kano <makito.kano@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@shimizukko I didn't think about it until after merging, but couldn't this be made to work in CI so we know if the test is accidentally broken? |
…ack#14616) Steps: 1. Format storages 2. System query 3. Create a 100% pool that spans all engines 4. Pool query 5. Pool destroy 6. Create 49 pools spanning all the engines with each pool using a 1/50th of the capacity 7. Pool list 8. Get around 80 pool metrics 9. Destroy all 49 pools 10. System stop 11. System start Signed-off-by: Makito Kano <makito.kano@intel.com>
Steps: 1. Format storages 2. System query 3. Create a 100% pool that spans all engines 4. Pool query 5. Pool destroy 6. Create 49 pools spanning all the engines with each pool using a 1/50th of the capacity 7. Pool list 8. Get around 80 pool metrics 9. Destroy all 49 pools 10. System stop 11. System start Skip-unit-tests: true Skip-fault-injection-test: true Signed-off-by: Makito Kano <makito.kano@intel.com>
…15126 Skip-test: true Skip-build: true Steps: 1. Format storages 2. System query 3. Create a 100% pool that spans all engines 4. Pool query 5. Pool destroy 6. Create 49 pools spanning all the engines with each pool using a 1/50th of the capacity 7. Pool list 8. Get around 80 pool metrics 9. Destroy all 49 pools 10. System stop 11. System start Skip-unit-tests: true Skip-fault-injection-test: true Signed-off-by: Makito Kano <makito.kano@intel.com> Signed-off-by: Dalton Bohning <dalton.bohning@intel.com>
…15126 Skip-test: true Skip-build: true Steps: 1. Format storages 2. System query 3. Create a 100% pool that spans all engines 4. Pool query 5. Pool destroy 6. Create 49 pools spanning all the engines with each pool using a 1/50th of the capacity 7. Pool list 8. Get around 80 pool metrics 9. Destroy all 49 pools 10. System stop 11. System start Skip-unit-tests: true Skip-fault-injection-test: true Signed-off-by: Makito Kano <makito.kano@intel.com> Signed-off-by: Dalton Bohning <dalton.bohning@intel.com>
…15126 Skip-test: true Skip-build: true Steps: 1. Format storages 2. System query 3. Create a 100% pool that spans all engines 4. Pool query 5. Pool destroy 6. Create 49 pools spanning all the engines with each pool using a 1/50th of the capacity 7. Pool list 8. Get around 80 pool metrics 9. Destroy all 49 pools 10. System stop 11. System start Skip-unit-tests: true Skip-fault-injection-test: true Signed-off-by: Makito Kano <makito.kano@intel.com> Signed-off-by: Dalton Bohning <dalton.bohning@intel.com>
…#15126) Steps: 1. Format storages 2. System query 3. Create a 100% pool that spans all engines 4. Pool query 5. Pool destroy 6. Create 49 pools spanning all the engines with each pool using a 1/50th of the capacity 7. Pool list 8. Get around 80 pool metrics 9. Destroy all 49 pools 10. System stop 11. System start Signed-off-by: Makito Kano <makito.kano@intel.com>
Steps:
Skip-unit-tests: true
Skip-fault-injection-test: true
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: