Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-16386 utils: Add DDB Feature and RM_POOL Support #15062

Merged
merged 8 commits into from
Nov 8, 2024

Conversation

wangshilong
Copy link
Contributor

This PR enhances the DDB functionality for CR purposes with the following updates:

  1. Pool Behavior Control:

Administrators can now control certain vos pool behaviors, such as skipping vos pool loading or setting a vos pool to immutable mode.

  1. Manual Pool Shard Removal:

A new command ddb rm_pool <vos_pool> has been introduced, allowing administrators to manually remove pool shards.

  1. SPDK Environment Initialization Bug Fix:

Fixed an issue where spdk_env_init() would fail during reinitialization.

These updates aim to improve system flexibility and stability, providing administrators with more robust management capabilities.

Required-githooks: true
Signed-off-by: Wang Shilong shilong.wang@intel.com

Required-githooks: true

Before requesting gatekeeper:

  • Two review approvals and any prior change requests have been resolved.
  • Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • Commit messages follows the guidelines outlined here.
  • Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • You are the appropriate gatekeeper to be landing the patch.
  • The PR has 2 reviews by people familiar with the code, including appropriate owners.
  • Githooks were used. If not, request that user install them and check copyright dates.
  • Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • All builds have passed. Check non-required builds for any new compiler warnings.
  • Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • If applicable, the PR has addressed any potential version compatibility issues.
  • Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • Extra checks if forced landing is requested
    • Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • No new NLT or valgrind warnings. Check the classic view.
    • Quick-build or Quick-functional is not used.
  • Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

Copy link

github-actions bot commented Sep 4, 2024

Ticket title is 'Enhance DDB for CR purpose'
Status is 'In Review'
https://daosio.atlassian.net/browse/DAOS-16386

@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15062/1/testReport/

@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-16329_1 branch from e1b57a1 to 7011933 Compare September 5, 2024 01:51
Base automatically changed from Nasf-Fan/DAOS-16329_1 to master September 25, 2024 12:21
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15062/3/testReport/

This PR enhances the DDB functionality for CR purposes with
the following updates:

1. Pool Behavior Control:

Administrators can now control certain vos pool behaviors,
such as skipping vos pool loading or setting a vos pool to immutable  mode.

2. Manual Pool Shard Removal:

A new command ddb rm_pool <vos_pool> has been introduced,
allowing administrators to manually remove pool shards.

3. SPDK Environment Initialization Bug Fix:

Fixed an issue where spdk_env_init() would fail during reinitialization.

These updates aim to improve system flexibility and stability,
providing administrators with more robust management capabilities.

Required-githooks: true
Signed-off-by: Wang Shilong <shilong.wang@intel.com>

Required-githooks: true
@wangshilong wangshilong marked this pull request as ready for review September 29, 2024 06:33
@wangshilong wangshilong requested review from a team as code owners September 29, 2024 06:33
tanabarr
tanabarr previously approved these changes Sep 30, 2024
Copy link
Contributor

@tanabarr tanabarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, is there a reason we couldn't use go-flags for the go cli front-end and instead had to pull in "grumble" package?

@wangshilong
Copy link
Contributor Author

LGTM, is there a reason we couldn't use go-flags for the go cli front-end and instead had to pull in "grumble" package?

I tried to follow @ryon-jensen , maybe he could help answer this.

Copy link
Contributor

@kjacque kjacque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go changes look reasonable to me, but I'm not sure what everything in the ddb C code is doing. I think we need to get someone more experienced with ddb and/or vos in general to review the other portion of the changes.

out:
return rc;
}

int
ddb_main(struct ddb_io_ft *io_ft, int argc, char *argv[])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little confused because I can't figure out what ddb_main is used for. It seems to think it is the main function of ddb, but the Go stuff is the entry point now. No references to ddb_main outside of this directory. Should it (and the superfluous main function in ddb_entry.c) be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, i am also a bit confused about not sure about this, i think maybe we could remove it, but i'd like to keep it is for now, and remove it in the separate discussion if you are fine with this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I won't block either way. It would be great to prune unused code whenever we can.

@@ -20,6 +20,56 @@
#include <daos_srv/dtx_srv.h>
#include <daos_srv/vos_types.h>

#define VOS_POOL_COMPAT_FLAG_IMMUTABLE (1ULL << 0)
#define VOS_POOL_COMPAT_FLAG_SKIP_LOAD (1ULL << 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name "skip_load" seems some confused, skip load for what? According to the patch, seems the pool will not be started if the flag is set. So maybe called "skip_start" or just "skip_pool" or some more clear name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean skip load, let's change it to start start.

#define VOS_POOL_COMPAT_FLAG_IMMUTABLE (1ULL << 0)
#define VOS_POOL_COMPAT_FLAG_SKIP_LOAD (1ULL << 1)
#define VOS_POOL_COMPAT_FLAG_SKIP_REBUILD (1ULL << 2)
#define VOS_POOL_COMPAT_FLAG_SKIP_DTX_RESYNC (1ULL << 3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebuild depends on DTX resync, if DTX resync is skipped, then rebuild also needs to be skipped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we only need FLAG to DTX resync then, it will disable rebuild.

ddb_run_rm_pool(struct ddb_ctx *ctx, struct rm_pool_options *opt)
{
if (ddb_pool_is_open(ctx)) {
ddb_error(ctx, "Must close pool before can open another\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What means? the pool to be destroyed is opened or some other pool is opened?

@@ -48,8 +48,18 @@ struct ddb_array {
};

/* Open and close a pool for a ddb_ctx */
int dv_pool_open(char *path, daos_handle_t *poh);
int
dv_pool_open(const char *path, daos_handle_t *poh, uint32_t flags);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please adjust related format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit annoying, this is auto-format by clang format, let me fix this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git hooks required this..i leave it for now.

@@ -50,6 +49,35 @@ dv_pool_open(char *path, daos_handle_t *poh)
return rc;
}

int
dv_pool_destroy(const char *path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please to add some test cases for the pool destroy via ddb? On the other hand, since the DDB tool can be used for recovery from disaster, if SMD information does not match pool case or say SMD is corrupted, can we still operate on the pool by force?

opt->clear_incompat_flags)
ctx->dc_write_mode = true;
else
ctx->dc_write_mode = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the feature option be used together with open (with -w) option?

Wang Shilong added 2 commits October 24, 2024 21:44
Required-githooks: true

Signed-off-by: Wang Shilong <shilong.wang@intel.com>
@daosbuild1
Copy link
Collaborator

Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15062/6/execution/node/360/log

@daosbuild1
Copy link
Collaborator

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15062/6/execution/node/273/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15062/6/execution/node/363/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15062/6/execution/node/326/log

@daosbuild1
Copy link
Collaborator

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15062/6/execution/node/323/log

Required-githooks: true

Signed-off-by: Wang Shilong <shilong.wang@intel.com>
if (ddb_pool_is_open(ctx)) {
if (!ctx->dc_write_mode && !opt->show_features)
return -DER_INVAL;
goto skip;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if the pool is opened without -w option, but some {set,clear}_{in}compat_flags is valid? What is the expected behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will return -DER_INVAL? maybe should return NOPERM.

{
if (ddb_pool_is_open(ctx)) {
ddb_error(ctx, "Must close pool before can open another\n");
return -DER_EXIST;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want to express that "the pool cannot be destroyed because it is opened" ? if yes, then -DER_BUSY may be better. Or I mis-understand anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-DER_BUSY is better.

if (str_has_value(pa->pa_r_cmd_run)) {
rc = ddb_parse_cmd_str(ctx, pa->pa_r_cmd_run, open);
if (rc)
return rc;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*open = true for else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be assigned inside ddb_parse_cmd_str().

Wang Shilong added 2 commits October 30, 2024 02:58
Required-githooks: true

Signed-off-by: Wang Shilong <shilong.wang@intel.com>
@daosbuild1
Copy link
Collaborator

Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15062/8/testReport/

@daosbuild1
Copy link
Collaborator

Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15062/8/testReport/

tanabarr
tanabarr previously approved these changes Oct 30, 2024

func featureCompleter(prefix string, args []string) []string {
suggestions := []string{"-h", "-e", "--enable", "-d", "--disable", "-s", "--show"}
suggestions = append(suggestions, listDirVos(defMntPrefix)...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L38-L53 is identical to L57-L72 and should be extracted as a helper function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -223,3 +223,36 @@ func ddbDtxActAbort(ctx *DdbContext, path string, dtx_id string) error {
/* Run the c code command */
return daosError(C.ddb_run_dtx_act_abort(&ctx.ctx, &options))
}

func ddbFeature(ctx *DdbContext, path string, enable string, disable string, show bool) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: don't need to specify type for sequential parameters that share type, e.g. can do path, enable, disable string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@Nasf-Fan Nasf-Fan self-requested a review October 31, 2024 02:47
Required-githooks: true

Signed-off-by: Wang Shilong <shilong.wang@intel.com>
Copy link
Contributor

@kjacque kjacque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go changes LGTM.

@wangshilong wangshilong requested a review from a team November 1, 2024 12:05
@NiuYawei NiuYawei merged commit 9aae306 into master Nov 8, 2024
54 of 55 checks passed
@NiuYawei NiuYawei deleted the shilongw/DAOS-16386 branch November 8, 2024 02:29
wangshilong pushed a commit that referenced this pull request Nov 8, 2024
This PR enhances the DDB functionality for CR purposes with
the following updates:

1. Pool Behavior Control:

Administrators can now control certain vos pool behaviors,
such as skipping vos pool loading or setting a vos pool to immutable  mode.

2. Manual Pool Shard Removal:

A new command ddb rm_pool <vos_pool> has been introduced,
allowing administrators to manually remove pool shards.

3. SPDK Environment Initialization Bug Fix:

Fixed an issue where spdk_env_init() would fail during reinitialization.

These updates aim to improve system flexibility and stability,
providing administrators with more robust management capabilities.

Required-githooks: true

Signed-off-by: Wang Shilong <shilong.wang@intel.com>
wangshilong pushed a commit that referenced this pull request Nov 16, 2024
This PR enhances the DDB functionality for CR purposes with
the following updates:

1. Pool Behavior Control:

Administrators can now control certain vos pool behaviors,
such as skipping vos pool loading or setting a vos pool to immutable  mode.

2. Manual Pool Shard Removal:

A new command ddb rm_pool <vos_pool> has been introduced,
allowing administrators to manually remove pool shards.

3. SPDK Environment Initialization Bug Fix:

Fixed an issue where spdk_env_init() would fail during reinitialization.

These updates aim to improve system flexibility and stability,
providing administrators with more robust management capabilities.

Required-githooks: true

Signed-off-by: Wang Shilong <shilong.wang@intel.com>
daltonbohning pushed a commit that referenced this pull request Jan 10, 2025
This PR enhances the DDB functionality for CR purposes with
the following updates:

1. Pool Behavior Control:

Administrators can now control certain vos pool behaviors,
such as skipping vos pool loading or setting a vos pool to immutable  mode.

2. Manual Pool Shard Removal:

A new command ddb rm_pool <vos_pool> has been introduced,
allowing administrators to manually remove pool shards.

3. SPDK Environment Initialization Bug Fix:

Fixed an issue where spdk_env_init() would fail during reinitialization.

These updates aim to improve system flexibility and stability,
providing administrators with more robust management capabilities.

Signed-off-by: Wang Shilong <shilong.wang@intel.com>
@mjmac mjmac mentioned this pull request Mar 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants