-
Notifications
You must be signed in to change notification settings - Fork 281
Extend cachbench with value validation #131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Do you mind sharing the use case scenario for this diff? If the intention is to validate that the value written is the same as value read, wouldn't consistency checking be enough? |
The main intention is to see how much overhead reading a value introduces when using different memory mediums (e g., DRAM vs. CXL-mem or PMEM). We would like to measure this overhead for different value sizes. For check consistency, there is some additional overhead and, I think, also some additional requirements on values. |
@haowu14 We discussed this in the call with Intel to merge PRs that are independent and useful. In this case, like @igchor pointed out using the consistency checking mode to read the value adds other overheads in addition and the intent here is to trigger reading the item in pmem like scenarios where bandwidth is a concern for byte addressable medium. |
cachelib/cachebench/cache/Cache.h
Outdated
@@ -352,6 +361,9 @@ class Cache { | |||
// tracker for consistency monitoring. | |||
std::unique_ptr<ValueTracker> valueTracker_; | |||
|
|||
// expected value of all items in Cache. | |||
std::optional<std::string> expectedValue_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like we are validating the value, but we are only validating a few bytes in each item. So either it is named incorrectly or we are intending to do something, but the code does not do all of it.
What's the intent with adding this value validation ? My understanding was that this was to read the item's payload in entirety so that we can benchmark any DRAM or PMem bandwidth numbers. If that's the case, why are we reading only a subset of the item ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your understanding is correct. I believe we are verifying all bytes of the value right now. expectedValue_
is being set to hardcodedString_
which is 4MB. The std::min<size_t>(expected.size(), getSize(it))
part is just to make sure that we are not reading outside of the item. This is analogous to: https://github.com/facebook/CacheLib/blob/main/cachelib/cachebench/cache/Cache-inl.h#L624
Is this reasoning correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expectedValue_ is being set to hardcodedString_ which is 4MB.
@igchor The value is set by the caller (stressor) in a certain way, and we are leaking that into the cache. The fact that expectedValue[0 - item.getSize() -1] is the expected value is assumed in the cache.
Taking a step back, we are not really building this feature to check item's content. That functionality is already handled by the consistency checking feature. We want this feature to simply read the item payload. Can this be done by the CacheStressor instead ? We also don't need to compare it against any expected value. As in, the result of that comparison is simply irrelevant. So there is no need to abort if the value is incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right that we are not really building this to check item's content. We came up with the idea of using memcmp to make sure that the code which reads the value is not optimized out by the compiler.
I don't think we can easily move the checking to the CacheStressor (all the latency checking code is done inside Cache). However, we can rewrite validateValue
to not do any comparison. We can just iterate over the value and add a volatile
somewhere to make sure the code is not optimized out (e.g. calculate the number of zero-bytes in the value and assign this number to a local volatile var). Would this be acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@igchor yeah, that seems more reasonable, simpler and we don't need to think about terminating the run based on correctness. There is probably a way to do it without the volatile (check out folly::doNotOptimizeAway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've changed the implementation. Now, I use std::accumulate on the value and folly::doNotOptimize on the sum. Let me know if you think this is OK.
cachelib/cachebench/cache/Cache.h
Outdated
@@ -227,9 +230,15 @@ class Cache { | |||
// @param keys list of keys that the stressor uses for the workload. | |||
void enableConsistencyCheck(const std::vector<std::string>& keys); | |||
|
|||
// enables validating all values on find. Each value is compared to | |||
// expected Value. | |||
void enableValueValidating(const std::string& expectedValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like this is intended to be called before the cache is used for the stress test. Why not make it part of the cache initialization instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of consistency checking, the enable happens later since the keys are generated by inserting into the cache and then we enable the consistency checking for the keys added to the cache. It is a bit convoluted, but we don't need to do the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inside Cache ctor we do not know what is the expectedValue
yet. This is why I call enableValueValidating
from CacheStressor which is responsible for generating the values (hardcodedString_
for now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this function and added bool param to the Cache ctor - is it OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@igchor the base rev seems incorrect. PTAL and update. Once that is done, I'll try to import it.
@@ -50,8 +50,10 @@ uint64_t Cache<Allocator>::fetchNandWrites() const { | |||
template <typename Allocator> | |||
Cache<Allocator>::Cache(const CacheConfig& config, | |||
ChainedItemMovingSync movingSync, | |||
std::string cacheDir) | |||
std::string cacheDir, | |||
bool touchValue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can touchValue be part of the CacheConfig ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm. I understand why it might not be a good idea to do that. We are really bending our back to add this code so that it fits under the place where find() api latency is measured and it is looking a bit convoluted. But that's fine for now.
XDCHECK(touchValueEnabled()); | ||
|
||
auto ptr = reinterpret_cast<const uint8_t*>(getMemory(it)); | ||
auto sum = std::accumulate(ptr, ptr + getSize(it), 0ULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leave a comment here that accumulate is intended to access all bytes of the item and nothing more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -408,13 +419,20 @@ typename Cache<Allocator>::ItemHandle Cache<Allocator>::find(Key key, | |||
// find from cache and wait for the result to be ready. | |||
auto it = cache_->findImpl(key, mode); | |||
it.wait(); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code looks different from
tracker = util::LatencyTracker(cacheFindLatency_); |
Are you on a stale base commit for this PR ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done - rebased.
@sathyaphoenix has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Do you have any more concerns about this PR? |
@igchor: can you rebase to latest trunk? We recently made some API change to CacheBench to use Read/Write handles. I will re-import again. |
The main purpose of this patch is to better simulate workloads in cachebench. Setting touchValue to true allows to see performance impact of using different mediums for memory cache.
@igchor has updated the pull request. You must reimport the pull request before landing. |
@therealgymmy I've rebased, thanks |
@therealgymmy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
The main purpose of this patch is to better simulate workloads in
cachebench. Setting validateValue to true allows to see performance
impact of using different mediums for memory cache.