Skip to content

Extend cachbench with value validation #131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

igchor
Copy link
Contributor

@igchor igchor commented Apr 15, 2022

The main purpose of this patch is to better simulate workloads in
cachebench. Setting validateValue to true allows to see performance
impact of using different mediums for memory cache.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 15, 2022
@haowu14
Copy link
Contributor

haowu14 commented Apr 15, 2022

Do you mind sharing the use case scenario for this diff? If the intention is to validate that the value written is the same as value read, wouldn't consistency checking be enough?

@igchor
Copy link
Contributor Author

igchor commented Apr 18, 2022

The main intention is to see how much overhead reading a value introduces when using different memory mediums (e g., DRAM vs. CXL-mem or PMEM). We would like to measure this overhead for different value sizes.

For check consistency, there is some additional overhead and, I think, also some additional requirements on values.

@sathyaphoenix
Copy link
Contributor

@haowu14 We discussed this in the call with Intel to merge PRs that are independent and useful. In this case, like @igchor pointed out using the consistency checking mode to read the value adds other overheads in addition and the intent here is to trigger reading the item in pmem like scenarios where bandwidth is a concern for byte addressable medium.

@@ -352,6 +361,9 @@ class Cache {
// tracker for consistency monitoring.
std::unique_ptr<ValueTracker> valueTracker_;

// expected value of all items in Cache.
std::optional<std::string> expectedValue_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like we are validating the value, but we are only validating a few bytes in each item. So either it is named incorrectly or we are intending to do something, but the code does not do all of it.

What's the intent with adding this value validation ? My understanding was that this was to read the item's payload in entirety so that we can benchmark any DRAM or PMem bandwidth numbers. If that's the case, why are we reading only a subset of the item ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your understanding is correct. I believe we are verifying all bytes of the value right now. expectedValue_ is being set to hardcodedString_ which is 4MB. The std::min<size_t>(expected.size(), getSize(it)) part is just to make sure that we are not reading outside of the item. This is analogous to: https://github.com/facebook/CacheLib/blob/main/cachelib/cachebench/cache/Cache-inl.h#L624

Is this reasoning correct?

Copy link
Contributor

@sathyaphoenix sathyaphoenix Apr 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expectedValue_ is being set to hardcodedString_ which is 4MB.

@igchor The value is set by the caller (stressor) in a certain way, and we are leaking that into the cache. The fact that expectedValue[0 - item.getSize() -1] is the expected value is assumed in the cache.

Taking a step back, we are not really building this feature to check item's content. That functionality is already handled by the consistency checking feature. We want this feature to simply read the item payload. Can this be done by the CacheStressor instead ? We also don't need to compare it against any expected value. As in, the result of that comparison is simply irrelevant. So there is no need to abort if the value is incorrect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right that we are not really building this to check item's content. We came up with the idea of using memcmp to make sure that the code which reads the value is not optimized out by the compiler.

I don't think we can easily move the checking to the CacheStressor (all the latency checking code is done inside Cache). However, we can rewrite validateValue to not do any comparison. We can just iterate over the value and add a volatile somewhere to make sure the code is not optimized out (e.g. calculate the number of zero-bytes in the value and assign this number to a local volatile var). Would this be acceptable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@igchor yeah, that seems more reasonable, simpler and we don't need to think about terminating the run based on correctness. There is probably a way to do it without the volatile (check out folly::doNotOptimizeAway).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've changed the implementation. Now, I use std::accumulate on the value and folly::doNotOptimize on the sum. Let me know if you think this is OK.

@@ -227,9 +230,15 @@ class Cache {
// @param keys list of keys that the stressor uses for the workload.
void enableConsistencyCheck(const std::vector<std::string>& keys);

// enables validating all values on find. Each value is compared to
// expected Value.
void enableValueValidating(const std::string& expectedValue);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like this is intended to be called before the cache is used for the stress test. Why not make it part of the cache initialization instead ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of consistency checking, the enable happens later since the keys are generated by inserting into the cache and then we enable the consistency checking for the keys added to the cache. It is a bit convoluted, but we don't need to do the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside Cache ctor we do not know what is the expectedValue yet. This is why I call enableValueValidating from CacheStressor which is responsible for generating the values (hardcodedString_ for now).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this function and added bool param to the Cache ctor - is it OK?

Copy link
Contributor

@sathyaphoenix sathyaphoenix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@igchor the base rev seems incorrect. PTAL and update. Once that is done, I'll try to import it.

@@ -50,8 +50,10 @@ uint64_t Cache<Allocator>::fetchNandWrites() const {
template <typename Allocator>
Cache<Allocator>::Cache(const CacheConfig& config,
ChainedItemMovingSync movingSync,
std::string cacheDir)
std::string cacheDir,
bool touchValue)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can touchValue be part of the CacheConfig ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm. I understand why it might not be a good idea to do that. We are really bending our back to add this code so that it fits under the place where find() api latency is measured and it is looking a bit convoluted. But that's fine for now.

XDCHECK(touchValueEnabled());

auto ptr = reinterpret_cast<const uint8_t*>(getMemory(it));
auto sum = std::accumulate(ptr, ptr + getSize(it), 0ULL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave a comment here that accumulate is intended to access all bytes of the item and nothing more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -408,13 +419,20 @@ typename Cache<Allocator>::ItemHandle Cache<Allocator>::find(Key key,
// find from cache and wait for the result to be ready.
auto it = cache_->findImpl(key, mode);
it.wait();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code looks different from

tracker = util::LatencyTracker(cacheFindLatency_);

Are you on a stale base commit for this PR ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - rebased.

@facebook-github-bot
Copy link
Contributor

@sathyaphoenix has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@igchor
Copy link
Contributor Author

igchor commented Jun 2, 2022

Do you have any more concerns about this PR?

@therealgymmy
Copy link
Contributor

@igchor: can you rebase to latest trunk? We recently made some API change to CacheBench to use Read/Write handles. I will re-import again.

The main purpose of this patch is to better simulate workloads in
cachebench. Setting touchValue to true allows to see performance
impact of using different mediums for memory cache.
@facebook-github-bot
Copy link
Contributor

@igchor has updated the pull request. You must reimport the pull request before landing.

@igchor
Copy link
Contributor Author

igchor commented Jun 15, 2022

@therealgymmy I've rebased, thanks

@facebook-github-bot
Copy link
Contributor

@therealgymmy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants