Skip to content

Shorten critical section in findEviction #132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

igchor
Copy link
Contributor

@igchor igchor commented Apr 15, 2022

Remove the item from mmContainer and drop the lock before attempting eviction.

The change improves throughput for default hit_ratio/graph_cache_leader_fbobj config by ~30%. It also reduces p99 latencies significantly. The improvement is even bigger for multi-tier approach (multiple memory tiers) which we are working on here: https://github.com/pmem/CacheLib

I was not able to find any races/synchronization problems with this approach but there is a good chance I missed something - it would be great if you could review and evaluate this patch.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 15, 2022
@igchor igchor marked this pull request as ready for review April 27, 2022 15:15
@igchor
Copy link
Contributor Author

igchor commented Apr 28, 2022

The implementation in first commit suffers from a few problems:

  • the item is added back to the MMContainer if eviction fails (becomes "hot")
  • the item can potentially be destroyed after dropping MMContainer lock (?)
  • there might be some races with SlabRelease

One idea we have for solving this issues is presented in second patch.

It increments ref count of the item before dropping the lock, which synchronizes with other threads doing eviction and prevents item destruction.

@therealgymmy
Copy link
Contributor

therealgymmy commented May 23, 2022

@igchor: Hi, I will be reviewing this week. In the meanwhile, could you run the following cachebench tests? These test the consistency of item values that involve chained items.

My concern is around prematurely dropping the eviction lock for a regular item when a concurrent chained item eviction is happening, and vice-versa. An additional concern is around the interaction between Slab-release and eviction code path. I will focus on these during my review.

`cachelib/cachebench/test_configs/consistency/chained-items-stress-moving.json`
`cachelib/cachebench/test_configs/consistency/chained-items-stress-no-moving.json`
`cachelib/cachebench/test_configs/consistency/ram-with-deletes.json`

Copy link
Contributor

@therealgymmy therealgymmy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@igchor: Below are the scenarios I went through with this change. There is one potential bug (See (4) * (5) ).

When we drop the lock and try to remove an item from mm-container, what happens if:

  1. another thread is trying to evict from the same mmcontainer?
    1. It can’t happen because only one thread can remove an item from AccessContainer. The concurrent thread will fail to remove this item from access-container due to refcount > 1. (Current thread is holding at least 1 refcount, and the concurrent thread must acquire a refcount before proceeding to remove from access-container.)
  2. another thread is trying to evict one of its chained items from the same mmcontainer? And vice-versa?
    1. It can’t happen because only one thread can remove an item from AccessContainer. Either the current thread or the concurrent thread will skip this item.
  3. another thread is trying to evict one of its chained items from a different mmcontainer? And vice-versa?
    1. It can’t happen because only one thread can remove an item from AccessContainer. Either the current thread or the concurrent thread will skip this item.
  4. [BUG] another thread is evicting a slab and trying to move/evict the same item? (CacheAllocator::evictForSlabRelease)
    1. Can an item be marked as moving concurrently as it was evicted from LRU?
      1. This is possible. The intent is that we cannot mark moving unless item is still in LRU. Dropping the lock changes this since we’re doing an atomic compare and exchange with acq_rel, but reading with relaxed. Eventually we can free this item even if the eviction thread had already freed it. This would be a double-free.
        1. Please see “markMoving()” and “isMoving()” in CacheItem-inl.h and Refcount.h
        2. In slab release, we mark items as moving in markMovingForSlabRelease() in CacheAllocator-inl.h
      2. Fix is to make sure the atomic exchange and read are properly ordered. I’m forgetting details in the ordering, but first glance I think reading the moving bit with acquire and make sure we the failure scenario’s ordering in the exchange is also with acquire should do.
    2. Can an item be moved concurrently as it was evicted from LRU?
      1. Yes. See above.
    3. Can an item be evicted concurrently as it was evicted from LRU?
      1. Yes. See above.
  5. [bug] another thread is evicting a slab and trying to move/evict one of its chained items? And vice versa?
    1. Similar to above. The chained item can be marked as moving even if it were removed from LRU from the eviction thread. After that we can get into CacheAllocator-inl.h: 2617 if (item.isOnlyMoving()) check, and if the eviction thread had freed the parent and all its chained item, we will pass this check and then free this chained item again. That would be a double-free.

@@ -1267,6 +1274,8 @@ CacheAllocator<CacheTrait>::findEviction(PoolId pid, ClassId cid) {
// from the beginning again
if (!itr) {
itr.resetToBegin();
for (int i = 0; i < searchTries; i++)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't seem like we need this. Is the assumption the items we previously tried still has an outstanding refcount? (If so the logic at L1226 should cover it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was supposed to also cover some other failure (e.g. when item is marked as moving) but I think we can just check that at the top of the loop.

itr->isChainedItem()
? advanceIteratorAndTryEvictChainedItem(itr)
: advanceIteratorAndTryEvictRegularItem(mmContainer, itr);
auto toReleaseHandle = candidate->isChainedItem()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we already incremented refcount at L1231. We no longer need to get a "handle" back. The reason we previously got a handle was to have a positive refcount on the item, so that after we release the lock on MMCotnainer, another thread cannot come in and evict this item.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I reworked this part. I also noticed there was significant code duplication between advanceIteratorAndTryEvictChainedItem and advanceIteratorAndTryEvictRegularItem so I merged them into a single function. Please let me know if this is OK.

@igchor
Copy link
Contributor Author

igchor commented May 25, 2022

@therealgymmy So, the only needed fix for correctness is to change the memory order on reads (at least for non-chained items)? Why the relaxed memory order is enough right now?

Also, here is my rationale for why 4th and 5th cases should work (perhaps after fixing memory order):

In my patch, if the item is removed from the LRU, then it must have also been removed from AccessContainer. This means that the item cannot be moved due to this check:

if (!oldItem.isAccessible() || oldItem.isExpired()) {
. It can only be evicted.

If the item was already freed in findEviction (before any operation in SlabRelease), markMoving() will not be set (we see that the item is freed) and so, we can just finish.

There could be a problem when slabRlease calls markMoving() and then evictForSlabRelease just before:

releaseBackToAllocator(itemToRelease, RemoveContext::kEviction,
. But at that time, the element is already removed from LRU, causing markMoving() to fail.

Also, if markMoving() is called after the element is removed from AccessContainer and before it is removed from LRU (inside tryEvictRegularItem) then the thread which is doing findEvition, will exit before freeing the item (because markMoving is set).

Are there any issues with my reasoning?

@therealgymmy
Copy link
Contributor

therealgymmy commented Jun 1, 2022

the only needed fix for correctness is to change the memory order on reads (at least for non-chained items)? Why the relaxed memory order is enough right now?

@igchor: please ignore my comments on the mem ordering. I overlooked that after we remove from mm-container, we unmark an item's kLinked bit with acq_rel ordering, which will force the "eviction thread" and "slab release thread" to be correctly ordered.

My concern previously was this:

  1. slab release thread marks an item as moving as it sees item's kLinked bit is still marked
  2. eviction thread sees the item moving bit UNmarked (reading a stale value from (1))

I thought it could happen because L2397 (markMovingForSlabRelease) will mark the bit with acq_rel but L1370 (if (evictHandle->isMoving())) reads it with relaxed. However, the code in L1360 mmContainer.remove(itr) calls Refcount::unmarkInMMContainer() which unmarks the kLinked bit with acq_rel. This means L1360 and L2397 must be ordered correctly and thus L1370 which sequences-after L1360 must be ordered correctly with L2397 as well.

@facebook-github-bot
Copy link
Contributor

@therealgymmy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@therealgymmy
Copy link
Contributor

therealgymmy commented Jun 1, 2022

@igchor: can you take care of the comments and rebase this PR to latest version? I will try importing it again and run through the internal tests.

@igchor igchor force-pushed the optimize_mmcontainer_locking branch from 57c8bd3 to fdb2163 Compare June 2, 2022 12:51
@facebook-github-bot
Copy link
Contributor

@igchor has updated the pull request. You must reimport the pull request before landing.

@igchor
Copy link
Contributor Author

igchor commented Jun 2, 2022

I've responded to comments, run the tests you suggested and reworked the code. I realized there was one other problem: incrementing ref count under the mmContainer lock was causing races with replaceIf/removeIf. I changed the code to increment the ref count under AC lock so that it properly synchronizes with predicate evaluation inside replaceIf/removeIf.

I belive that incrementing refCount without taking the lock is possible but would require changes in other places in the code. Performance impact of taking the AC lock is pretty small (~5% in terms of throughput) for hit_ratio benchmarks. Performance after this patch is still better than performance of the main branch.

@facebook-github-bot
Copy link
Contributor

@therealgymmy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@therealgymmy
Copy link
Contributor

therealgymmy commented Jun 7, 2022

@igchor: the following test failed:

cachelib/allocator/tests:cache-allocator-test - BaseAllocatorTest/3.AddChainedItemMultithread

It failed on this line: https://github.com/facebook/CacheLib/blob/main/cachelib/allocator/tests/BaseAllocatorTest.h#L4522

With the following output:

buck-out/dev/gen/aab7ed39/cachelib/allocator/tests/cache-allocator-test#platform010-clang,private-headers/cachelib/allocator/tests/BaseAllocatorTest.h:4522
Expected: (nullptr) != (childItem), actual: (nullptr) vs nullptr
buck-out/dev/gen/aab7ed39/cachelib/allocator/tests/cache-allocator-test#platform010-clang,private-headers/cachelib/allocator/tests/BaseAllocatorTest.h:4522
Expected: (nullptr) != (childItem), actual: (nullptr) vs nullptr
buck-out/dev/gen/aab7ed39/cachelib/allocator/tests/cache-allocator-test#platform010-clang,private-headers/cachelib/allocator/tests/BaseAllocatorTest.h:4522
Expected: (nullptr) != (childItem), actual: (nullptr) vs nullptr

This wasn't a crash but that we couldn't allocate a new chained item even after 100 attempts. This shouldn't be possible since we have 3 allocating thread and each thread could hold 11 outstanding item handles at max (1 parent + 10 chained items). By default we walk the bottom 50 items of an eviction queue and evict an item without any refcount, so each of the allocating thread should always succeed in allocating an item.


This also triggered failures in a number of downstream tests from internal services that depend on cachelib. I skimmed over those and they all seem related to allocating chained items.


I changed the code to increment the ref count under AC lock

Did you mean under the hash-table lock? Seems like the change is to get an item handle by calling find()?

auto toReleaseHandle = accessContainer_->find(*candidate);

Copy link
Contributor

@therealgymmy therealgymmy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a pass at the latest version. I couldn't spot any obvious bugs in the code so far despite the chained item related test failure. Let me know if you can repro it on your side @igchor (I'll poke some more on my side in the meanwhile).

}

auto toReleaseHandle = accessContainer_->find(*candidate);
if (!toReleaseHandle || !toReleaseHandle.isReady()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to check isReady() since this we still hold LRU lock and this item cannot have been evicted to NvmCache.

Comment on lines 1346 to 1349
if (item.isChainedItem())
stats_.evictFailConcurrentFill.inc();
else
stats_.evictFailConcurrentFill.inc();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for if? we're bumping the same stat


if (toReleaseHandle) {
if (toReleaseHandle->hasChainedItem()) {
bool evicted = tryEvictItem(mmContainer, std::move(toReleaseHandle));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

less error-prone if we keep the existing behavior of returning the toReleaseHandle and then later in this function we explicitly it release and decrement the refcount. (Near L1267)

++itr;
stats_.evictFailAC.inc();
return evictHandle;
if (item.isChainedItem())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cannot be a chained item since we're now always passing in the parent item handle into this function.

(We should find a way to keep this stat updated properly tho, maybe we can update this in findEviction() function instead)

if (evictHandle->isMoving()) {
stats_.evictFailMove.inc();
return WriteHandle{};
if (item.isChainedItem())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as L1362

@igchor
Copy link
Contributor Author

igchor commented Jun 9, 2022

@therealgymmy Thank you for the feedback!

I managed to reproduce the issue locally and will work on fixing it.

Did you mean under the hash-table lock? Seems like the change is to get an item handle by calling find()?

Yes, under the hash-table (Access Container) lock. I actually realized that instead of increasing the refCount (which requires this hash-table lock) we could just mark an item as moving in findEviction (if not already marked) This would prevent any other thread from releasing this item and we could even reuse the existing evictNormalItemForSlabRelease method (for ChainedItems we would still need separate code path).

I have just one concern/question regarding this approach. In current implementation, is it possible for multiple threads to mark the same item as moving? (I don't see any check for that in markMoving()) If not, how is this prevented? Is there an assumption that only one thread can do SlabRelease at any point?

@therealgymmy
Copy link
Contributor

therealgymmy commented Jun 10, 2022 via email

@facebook-github-bot
Copy link
Contributor

@igchor has updated the pull request. You must reimport the pull request before landing.

@igchor igchor force-pushed the optimize_mmcontainer_locking branch from 28d84df to f0e8314 Compare June 13, 2022 17:15
@facebook-github-bot
Copy link
Contributor

@igchor has updated the pull request. You must reimport the pull request before landing.

@igchor igchor force-pushed the optimize_mmcontainer_locking branch from f0e8314 to 7891133 Compare June 13, 2022 17:20
@facebook-github-bot
Copy link
Contributor

@igchor has updated the pull request. You must reimport the pull request before landing.

@igchor
Copy link
Contributor Author

igchor commented Jun 13, 2022

@therealgymmy I found out what the issue was: I was passing candidate to releaseBackToAllocator which after my changes represented the parent Item. We should pass pointer to the child we actually want to recycle instead.

Also, instead of increasing refCount of the item I now rely on moving flag. I changed the markMoving function to only succeed if the item is not yet marked as moving. This guarantees proper synchronization with other evicting threads and with SlabRelease thread. Once the item is marked as moving inside findEviction we can just execute the function which is used in SlabReleaseImpl for regular items - this removes some of the code duplication.

If the eviction fails (due to refCount being != 0) we unmark the item and check the flags. If flags are 0 this means that item is not used anywhere (and was not freed by any other thread since it was marked moving) and removed from AC and MMContainer by other thread so we can recycle it anyway.

After I implemented the above change I also realized it will be quite easy to use combined locking for eviction iterator which I also implemented (I took the idea from https://github.com/facebook/CacheLib/blob/main/cachelib/allocator/MM2Q-inl.h#L256). The performance results are pretty good now, we can share the exact numbers with you once we finish running all benchmarks.

@facebook-github-bot
Copy link
Contributor

@therealgymmy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@@ -2727,6 +2627,11 @@ CacheAllocator<CacheTrait>::evictNormalItemForSlabRelease(Item& item) {
auto token = evictToNvmCache ? nvmCache_->createPutToken(item.getKey())
: typename NvmCacheT::PutToken{};

if (skipIfTokenInvalid && evictToNvmCache && !token.isValid()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can actually remove this logic. I don't remember clearly the rationale behind checking token validity right after creating it in advanceIteratorAndTryEvictRegularItem().

Maybe it was to avoid evicting item pre-maturely (when we cannot insert it into nvm-cache). But this should be fairly rare in practice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (in separate commit). Should I also remove the corresponding stat evictFailConcurrentFill or leave it for compatibility?

Also this stat has the following comment next to it: "Eviction failures because this item has a potential concurrent fill from nvm cache. For consistency reason, we cannot evict it. Refer to NvmCache.h for more details."

* An item can only be marked moving when `isInMMContainer` returns true.
* This operation is atomic.
* An item can only be marked moving when `isInMMContainer` returns true and
* the item is not yet marked as moving. This operation is atomic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hah, this is clever!

I will go through the code in more detail to reason about the logic change, but in theory this makes sense.

Minor comment. Please rename this API to markExclusive, since that is our intent now: marking an item as exclusive so cachelib can gain exclusive access to it internally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, there is already a function isExclusive in Refcount.h Should we rename it to something else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can deprecate isExclusive() altogether. This change actually calls for a much bigger refactor in the slab release code path.

Moving bit is now denoting "exclusive" access to an item. Slab release should also be using it for gaining exclusive access, and the current notion of holding an active refcount to prevent someone else from removing/evicting an item can be fully deprecated.

Will elaborate in separate places about slab release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just not sure if isOnlyExclusive is a good name... Maybe you have some ideas?


if (toReleaseHandle) {
if (toReleaseHandle->hasChainedItem()) {
if (toReleaseHandle || ref == 0u) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can we rely on just ref == 0 to proceed with eviction?

Without a toReleaseHandle, it could mean we didn't remove the item (or its parent item) from access-container. It's not safe to evict it since it can still be accessed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my answer below.

I actually think that we might destroy the toReleaseHandle first (unconditionally) and then only rely on ref==0u. toReleaseHandle will not actually release anything on destruction since we have the moving bit set. And if unmarkMoving() will return 0 we know that, either we or some other thread, managed to remove the item from AC and MMContainer and no one is using the item. I'll check this.

return toRecycle;
}
} else if (ref == 0u) {
// it's safe to recycle the item here as there are no more
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this true? If toReleaseHandle is empty, we could have returned early from evictNormalItem() and didn't remove this item from access-container or mm-container.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking here was that this case (toReleaseHandle is NULL and ref == 0u) might happen in following (rare) scenario:

We have failed to remove item from AccessContainer (due to refCount being > 0). However, after that, and before we called unmarkMoving(), someone else removed the element from AccessContainer and MMContainer and decreased ref count to zero. That other thread couldn't release the item to allocator (since it's marked as moving, and this branch will not be taken: https://github.com/facebook/CacheLib/blob/main/cachelib/allocator/CacheAllocator-inl.h#L907). This means that we can (and should, to avoid memory leak) safely recycle the item (ref == 0u means no outstanding references and Admin/Control bits are zero).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@igchor: ah I see. Yes you're right. And this is a good point too. In fact, after we have called unmarkMoving, if the ref is 0, then we must free/recycle the memory. Because whichever other thread that has removed it couldn't free the memory due to moving bit.

? advanceIteratorAndTryEvictChainedItem(itr)
: advanceIteratorAndTryEvictRegularItem(mmContainer, itr);
evictNormalItem(*candidate, true /* skipIfTokenInvalid */);
auto ref = candidate->unmarkMoving();
Copy link
Contributor

@therealgymmy therealgymmy Jun 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change the const auto ref = decRef(itemToRelease); at L1298 and change it to just ref = ...;?

Internal build is failing due to local shadow variable. (Unfortunately, I cannot change the code of a github-backed PR.) I will reimport to run internal tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I've also added one additional commit which always destroys the handle prior to unmarking the item (simplifies the code a bit).

igchor added 2 commits June 20, 2022 11:01
Remove the item from mmContainer and drop the lock before attempting
eviction.
igchor added 5 commits June 20, 2022 11:01
to avoid races with remove/replace with predicate.

Also, refact tryEvictRegularItem and tryEvictChainedItem into
a single function.
moving bit is used to give exclusive right to evict the item
to a particular thread.

Originially, there was an assumption that whoever marked the item
as moving will try to free it until he succeeds. Since we don't want
to do that in findEviction (potentially can take a long time) we need
to make sure that unmarking is safe.

This patch checks the flags after unmarking (atomically) and if ref is
zero it also recyles the item. This is needed as there might be some
concurrent thread releasing the item (and decrementing ref count). If
moving bit is set, that thread would not free the memory back to
allocator, resulting in memory leak on unmarkMoving().
under MMContainer combined_lock.
Checking token validity should not be needed
right after creating it.
in findEviction, and rely only of recount
being zero when releasing items back to allocator.
@igchor igchor force-pushed the optimize_mmcontainer_locking branch from 7891133 to 782e18b Compare June 20, 2022 15:32
@facebook-github-bot
Copy link
Contributor

@igchor has updated the pull request. You must reimport the pull request before landing.

@igchor igchor force-pushed the optimize_mmcontainer_locking branch from 782e18b to cbbbbe3 Compare June 20, 2022 15:39
@facebook-github-bot
Copy link
Contributor

@igchor has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@therealgymmy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@therealgymmy therealgymmy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look great! I left high-level comments on simplifying eviction/slab-release further.

Internal unit tests have passed. I will move on to testing this on prod systems.


Do you have latest numbers of cachebench throughput improvement before/after this change?

@@ -2664,7 +2544,7 @@ void CacheAllocator<CacheTrait>::evictForSlabRelease(
auto owningHandle =
item.isChainedItem()
? evictChainedItemForSlabRelease(item.asChainedItem())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we try to mark a chained item's parent as "moving" and then unify the eviction logic with regular item?

The intent of this function is to remove an item from "access container" and "mm container", and then we will just wait until all active refcounts are drained.

The logic in my mind can look like this:

auto* candidate = &item;
if (item.isChainedItem()) {
  candidate = /* get parent item */;
  
  // if this failed, we busy-loop until chained item isOnlyMoving, as we're now
  // expecting another thread will free parent and this item's other siblings
  if (!candidate->markMoving()) {
    while (!item.isOnlyMoving()) {}
    
    // Free chained item
  }
}

// Remove regular item from access-container and mmcontainer,
// and wait until isOnlyMoving()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be OK with doing this refactor for Slab Release in separate PR (after this one would be merged)?

I have done some of the refactorings but I have some concerns about its correctness and performance implications (I would need to spend some more time testing it and understanding the evictChainedItemForSlabRelease more deeply). I also believe some other parts of the code could be simplified as well, after this change. I could work on that incrementally.

One specific concern with the approach you suggested is: when should we call nvmCache->put? If we would create putToken before removing the item from AccessContainer and MMContainer and issued the actual put after item.isOnlyMoving() it will result in keeping they key in the inFlightPut map longer - is this OK?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be OK with doing this refactor for Slab Release in separate PR (after this one would be merged)?

Yes that is totally fine.

One specific concern with the approach you suggested is: when should we call nvmCache->put If we would create putToken before removing the item from AccessContainer and MMContainer and issued the actual put after item.isOnlyMoving() it will result in keeping they key in the inFlightPut map longer - is this OK?

Oh we should issue a put to NvmCache earlier. As soon as we have removed the RAM parent item from access-container. This is because from DRAM cahce's perspective, it is already invisible and it's expected that this item is on the queue to get inserted into NvmCache. (Similar to the existing logic in evictChainedItemForSlabRelease() )

If we delay the insertion to NvmCache until after "isOnlyMoving() == true", we actually become more at risk of seeing a concurrent Lookup to the same key. In which case, we will be forced to give up the insertion altogether. (Since the concurrent lookup have to return quickly to inform the caller a key exists or not, and in this case we don't have the item inserted in NvmCache yet and we will be forced to drop the insertion altogether). This is to avoid a scenario like:

(1) T1 -> Created PutToken (not yet enqueued or still in the queue)
(2) T2 -> GET -> Miss
(3) T1 -> Inserted into NvmCache
(4) T2 -> GEt -> Hit <=== Bad. This is read/write inconsistency. This should return a miss as well.

Today in our logic, we handle this correctly by returning a miss at (4), but it's not ideal. Ideally we want (1) and (3) to be as close as possible to avoid (2) getting in between. Increasing the time gap between (1) and (3) will make (2) more likely to get interleaved.

@@ -2664,7 +2544,7 @@ void CacheAllocator<CacheTrait>::evictForSlabRelease(
auto owningHandle =
item.isChainedItem()
? evictChainedItemForSlabRelease(item.asChainedItem())
: evictNormalItemForSlabRelease(item);
: evictNormalItem(item);
Copy link
Contributor

@therealgymmy therealgymmy Jun 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we should always remove item from access-container and mm-container.

And then we should wait until refcount drains.


There's no other thread that can "remove/evict" this item since when we initially marked this as moving, we have already gained exclusive access.

In L2567, we check owningHandle->isExclusive(). This can be deprecated. Since we can rely on moving-bit to denote exclusiveness. We can wait for isOnlyMoving().

@@ -2721,7 +2601,7 @@ void CacheAllocator<CacheTrait>::evictForSlabRelease(

template <typename CacheTrait>
typename CacheAllocator<CacheTrait>::WriteHandle
CacheAllocator<CacheTrait>::evictNormalItemForSlabRelease(Item& item) {
CacheAllocator<CacheTrait>::evictNormalItem(Item& item) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In L2620, do we still need accessContainer->remoevIf to return us a handle?

It seems we just need to return true/false.

template <typename F>
void MM2Q::Container<T, HookPtr>::withEvictionIterator(F&& fun) {
lruMutex_->lock_combine([this, &fun]() {
fun(Iterator{LockHolder{}, lru_.rbegin()});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have other call sites of EvictionIterator that requires a lock being held?

In this design, we no longer exposes eviction iterator outside MM-container, so I think we can simplify it to no longer require a lock passed into its constructor.


I.e. can we delete getEvictionIterator()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, done.

Right now, also remove(Iterator& it) is not used anywhere. However, I think we should keep it. It might be possible to optimize the eviction even further in future (by removing element from MMContainer inside withEvictionIterator).

igchor added 2 commits June 24, 2022 08:35
@facebook-github-bot
Copy link
Contributor

@igchor has updated the pull request. You must reimport the pull request before landing.

@igchor igchor force-pushed the optimize_mmcontainer_locking branch from 386e781 to 215d822 Compare June 24, 2022 17:01
@facebook-github-bot
Copy link
Contributor

@igchor has updated the pull request. You must reimport the pull request before landing.

@igchor igchor changed the title RFC: Shorten critical section in findEviction Shorten critical section in findEviction Jun 29, 2022
@therealgymmy
Copy link
Contributor

@igchor: cleaning up the slab release logic can be done in a separate PR.

I will need to run this change on a shadow setup for some of our production services. Once I verify the shadows run correctly. We'll be sufficiently confident to approve this PR and merge it in.

@facebook-github-bot
Copy link
Contributor

@therealgymmy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@igchor
Copy link
Contributor Author

igchor commented Oct 11, 2022

@therealgymmy I have opened a new PR: #166 which basically a subset of this one. It would be great if you could take a look.

facebook-github-bot pushed a commit that referenced this pull request Nov 4, 2022
Summary:
This PR refactors the eviction logic (inside `findEviction`) so that it will be easy to add support for multiple memory tiers.  Problem with multi-tier configuration is that the critical section under MMContainer lock is too long. To fix that we have implemented an algorithm which utilize WaitContext to decrease the critical section. (which will be part of future PRs).

The idea is to use `moving` (now `exclusive`) bit to synchronize eviction with SlabRelease (and in future, with readers). In this PR, I only changed how `findEviction` synchronizes with SlabRelease.

This PR is a subset of: #132
The above PR introduced some performance regressions in the single-memory-tier version which we weren't able to fix yet, hence we decided to first implement this part (which should not affect performance) and later we can add separate path for multi-tier or try to optimize the original patch.

Pull Request resolved: #166

Test Plan:
Imported from GitHub, without a `Test Plan:` line.

CPU (A-B). Verified A/B and B/A have no noticeable cpu difference.
https://pxl.cl/2jDfg

Reviewed By: jaesoo-fb

Differential Revision: D40360742

Pulled By: therealgymmy

fbshipit-source-id: 96b416b07e67172ac797969ca374ecb1267ac5bb
@igchor igchor closed this Nov 4, 2022
facebook-github-bot pushed a commit that referenced this pull request Nov 30, 2022
Summary:
This PR refactors the eviction logic (inside `findEviction`) so that it will be easy to add support for multiple memory tiers.  Problem with multi-tier configuration is that the critical section under MMContainer lock is too long. To fix that we have implemented an algorithm which utilize WaitContext to decrease the critical section. (which will be part of future PRs).

The idea is to use `moving` (now `exclusive`) bit to synchronize eviction with SlabRelease (and in future, with readers). In this PR, I only changed how `findEviction` synchronizes with SlabRelease.

This PR is a subset of: #132
The above PR introduced some performance regressions in the single-memory-tier version which we weren't able to fix yet, hence we decided to first implement this part (which should not affect performance) and later we can add separate path for multi-tier or try to optimize the original patch.

Pull Request resolved: #166

Test Plan:
Imported from GitHub, without a `Test Plan:` line.

Canary on CDN edge cluster where we had allocation failure problems.
Results: https://fburl.com/ods/bxw2bas0

sf canary --sfid traffic_static/bigcache -V fbcdn.bigcache:cd4ae9804d99edbb314cdfd34edfa083 -j maa2c01/ti/edge.bigcache.maa2c01 --task 60 61 62 63 64

Reviewed By: jaesoo-fb

Differential Revision: D41409497

Pulled By: haowu14

fbshipit-source-id: befc189c663778731ada0ea8dddbac5adba8ea36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants