Shorten critical section in findEviction #132

igchor · 2022-04-15T09:09:06Z

Remove the item from mmContainer and drop the lock before attempting eviction.

The change improves throughput for default hit_ratio/graph_cache_leader_fbobj config by ~30%. It also reduces p99 latencies significantly. The improvement is even bigger for multi-tier approach (multiple memory tiers) which we are working on here: https://github.com/pmem/CacheLib

I was not able to find any races/synchronization problems with this approach but there is a good chance I missed something - it would be great if you could review and evaluate this patch.

igchor · 2022-04-28T16:13:44Z

The implementation in first commit suffers from a few problems:

the item is added back to the MMContainer if eviction fails (becomes "hot")
the item can potentially be destroyed after dropping MMContainer lock (?)
there might be some races with SlabRelease

One idea we have for solving this issues is presented in second patch.

It increments ref count of the item before dropping the lock, which synchronizes with other threads doing eviction and prevents item destruction.

therealgymmy · 2022-05-23T17:24:14Z

@igchor: Hi, I will be reviewing this week. In the meanwhile, could you run the following cachebench tests? These test the consistency of item values that involve chained items.

My concern is around prematurely dropping the eviction lock for a regular item when a concurrent chained item eviction is happening, and vice-versa. An additional concern is around the interaction between Slab-release and eviction code path. I will focus on these during my review.

`cachelib/cachebench/test_configs/consistency/chained-items-stress-moving.json`
`cachelib/cachebench/test_configs/consistency/chained-items-stress-no-moving.json`
`cachelib/cachebench/test_configs/consistency/ram-with-deletes.json`

therealgymmy

@igchor: Below are the scenarios I went through with this change. There is one potential bug (See (4) * (5) ).

When we drop the lock and try to remove an item from mm-container, what happens if:

another thread is trying to evict from the same mmcontainer?
1. It can’t happen because only one thread can remove an item from AccessContainer. The concurrent thread will fail to remove this item from access-container due to refcount > 1. (Current thread is holding at least 1 refcount, and the concurrent thread must acquire a refcount before proceeding to remove from access-container.)
another thread is trying to evict one of its chained items from the same mmcontainer? And vice-versa?
1. It can’t happen because only one thread can remove an item from AccessContainer. Either the current thread or the concurrent thread will skip this item.
another thread is trying to evict one of its chained items from a different mmcontainer? And vice-versa?
1. It can’t happen because only one thread can remove an item from AccessContainer. Either the current thread or the concurrent thread will skip this item.
[BUG] another thread is evicting a slab and trying to move/evict the same item? (CacheAllocator::evictForSlabRelease)
1. Can an item be marked as moving concurrently as it was evicted from LRU?
  1. This is possible. The intent is that we cannot mark moving unless item is still in LRU. Dropping the lock changes this since we’re doing an atomic compare and exchange with acq_rel, but reading with relaxed. Eventually we can free this item even if the eviction thread had already freed it. This would be a double-free.
    1. Please see “markMoving()” and “isMoving()” in CacheItem-inl.h and Refcount.h
    2. In slab release, we mark items as moving in markMovingForSlabRelease() in CacheAllocator-inl.h
  2. Fix is to make sure the atomic exchange and read are properly ordered. I’m forgetting details in the ordering, but first glance I think reading the moving bit with acquire and make sure we the failure scenario’s ordering in the exchange is also with acquire should do.
2. Can an item be moved concurrently as it was evicted from LRU?
  1. Yes. See above.
3. Can an item be evicted concurrently as it was evicted from LRU?
  1. Yes. See above.
[bug] another thread is evicting a slab and trying to move/evict one of its chained items? And vice versa?
1. Similar to above. The chained item can be marked as moving even if it were removed from LRU from the eviction thread. After that we can get into CacheAllocator-inl.h: 2617 if (item.isOnlyMoving()) check, and if the eviction thread had freed the parent and all its chained item, we will pass this check and then free this chained item again. That would be a double-free.

therealgymmy · 2022-05-24T18:03:03Z

cachelib/allocator/CacheAllocator-inl.h

@@ -1267,6 +1274,8 @@ CacheAllocator<CacheTrait>::findEviction(PoolId pid, ClassId cid) {
    // from the beginning again
    if (!itr) {
      itr.resetToBegin();
+      for (int i = 0; i < searchTries; i++)


Doesn't seem like we need this. Is the assumption the items we previously tried still has an outstanding refcount? (If so the logic at L1226 should cover it).

This was supposed to also cover some other failure (e.g. when item is marked as moving) but I think we can just check that at the top of the loop.

therealgymmy · 2022-05-24T18:06:42Z

cachelib/allocator/CacheAllocator-inl.h

-        itr->isChainedItem()
-            ? advanceIteratorAndTryEvictChainedItem(itr)
-            : advanceIteratorAndTryEvictRegularItem(mmContainer, itr);
+    auto toReleaseHandle = candidate->isChainedItem()


Since we already incremented refcount at L1231. We no longer need to get a "handle" back. The reason we previously got a handle was to have a positive refcount on the item, so that after we release the lock on MMCotnainer, another thread cannot come in and evict this item.

Right, I reworked this part. I also noticed there was significant code duplication between advanceIteratorAndTryEvictChainedItem and advanceIteratorAndTryEvictRegularItem so I merged them into a single function. Please let me know if this is OK.

igchor · 2022-05-25T21:32:24Z

@therealgymmy So, the only needed fix for correctness is to change the memory order on reads (at least for non-chained items)? Why the relaxed memory order is enough right now?

Also, here is my rationale for why 4th and 5th cases should work (perhaps after fixing memory order):

In my patch, if the item is removed from the LRU, then it must have also been removed from AccessContainer. This means that the item cannot be moved due to this check:

CacheLib/cachelib/allocator/CacheAllocator-inl.h

Line 1075 in 57c8bd3

if (!oldItem.isAccessible() || oldItem.isExpired()) {

. It can only be evicted.

If the item was already freed in findEviction (before any operation in SlabRelease), markMoving() will not be set (we see that the item is freed) and so, we can just finish.

There could be a problem when slabRlease calls markMoving() and then evictForSlabRelease just before:

CacheLib/cachelib/allocator/CacheAllocator-inl.h

Line 1267 in 57c8bd3

releaseBackToAllocator(itemToRelease, RemoveContext::kEviction,

. But at that time, the element is already removed from LRU, causing markMoving() to fail.

Also, if markMoving() is called after the element is removed from AccessContainer and before it is removed from LRU (inside tryEvictRegularItem) then the thread which is doing findEvition, will exit before freeing the item (because markMoving is set).

Are there any issues with my reasoning?

therealgymmy · 2022-06-01T00:06:19Z

the only needed fix for correctness is to change the memory order on reads (at least for non-chained items)? Why the relaxed memory order is enough right now?

@igchor: please ignore my comments on the mem ordering. I overlooked that after we remove from mm-container, we unmark an item's kLinked bit with acq_rel ordering, which will force the "eviction thread" and "slab release thread" to be correctly ordered.

My concern previously was this:

slab release thread marks an item as moving as it sees item's kLinked bit is still marked
eviction thread sees the item moving bit UNmarked (reading a stale value from (1))

I thought it could happen because L2397 (markMovingForSlabRelease) will mark the bit with acq_rel but L1370 (if (evictHandle->isMoving())) reads it with relaxed. However, the code in L1360 mmContainer.remove(itr) calls Refcount::unmarkInMMContainer() which unmarks the kLinked bit with acq_rel. This means L1360 and L2397 must be ordered correctly and thus L1370 which sequences-after L1360 must be ordered correctly with L2397 as well.

facebook-github-bot · 2022-06-01T00:10:04Z

@therealgymmy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

therealgymmy · 2022-06-01T16:36:52Z

@igchor: can you take care of the comments and rebase this PR to latest version? I will try importing it again and run through the internal tests.

facebook-github-bot · 2022-06-02T12:52:01Z

@igchor has updated the pull request. You must reimport the pull request before landing.

igchor · 2022-06-02T13:07:04Z

I've responded to comments, run the tests you suggested and reworked the code. I realized there was one other problem: incrementing ref count under the mmContainer lock was causing races with replaceIf/removeIf. I changed the code to increment the ref count under AC lock so that it properly synchronizes with predicate evaluation inside replaceIf/removeIf.

I belive that incrementing refCount without taking the lock is possible but would require changes in other places in the code. Performance impact of taking the AC lock is pretty small (~5% in terms of throughput) for hit_ratio benchmarks. Performance after this patch is still better than performance of the main branch.

facebook-github-bot · 2022-06-06T22:52:19Z

@therealgymmy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

therealgymmy · 2022-06-07T00:02:50Z

@igchor: the following test failed:

cachelib/allocator/tests:cache-allocator-test - BaseAllocatorTest/3.AddChainedItemMultithread

It failed on this line: https://github.com/facebook/CacheLib/blob/main/cachelib/allocator/tests/BaseAllocatorTest.h#L4522

With the following output:

buck-out/dev/gen/aab7ed39/cachelib/allocator/tests/cache-allocator-test#platform010-clang,private-headers/cachelib/allocator/tests/BaseAllocatorTest.h:4522
Expected: (nullptr) != (childItem), actual: (nullptr) vs nullptr
buck-out/dev/gen/aab7ed39/cachelib/allocator/tests/cache-allocator-test#platform010-clang,private-headers/cachelib/allocator/tests/BaseAllocatorTest.h:4522
Expected: (nullptr) != (childItem), actual: (nullptr) vs nullptr
buck-out/dev/gen/aab7ed39/cachelib/allocator/tests/cache-allocator-test#platform010-clang,private-headers/cachelib/allocator/tests/BaseAllocatorTest.h:4522
Expected: (nullptr) != (childItem), actual: (nullptr) vs nullptr

This wasn't a crash but that we couldn't allocate a new chained item even after 100 attempts. This shouldn't be possible since we have 3 allocating thread and each thread could hold 11 outstanding item handles at max (1 parent + 10 chained items). By default we walk the bottom 50 items of an eviction queue and evict an item without any refcount, so each of the allocating thread should always succeed in allocating an item.

This also triggered failures in a number of downstream tests from internal services that depend on cachelib. I skimmed over those and they all seem related to allocating chained items.

I changed the code to increment the ref count under AC lock

Did you mean under the hash-table lock? Seems like the change is to get an item handle by calling find()?

auto toReleaseHandle = accessContainer_->find(*candidate);

therealgymmy

Made a pass at the latest version. I couldn't spot any obvious bugs in the code so far despite the chained item related test failure. Let me know if you can repro it on your side @igchor (I'll poke some more on my side in the meanwhile).

therealgymmy · 2022-06-07T00:18:26Z

cachelib/allocator/CacheAllocator-inl.h

+    }
+
+    auto toReleaseHandle = accessContainer_->find(*candidate);
+    if (!toReleaseHandle || !toReleaseHandle.isReady()) {


We don't need to check isReady() since this we still hold LRU lock and this item cannot have been evicted to NvmCache.

therealgymmy · 2022-06-07T00:21:42Z

cachelib/allocator/CacheAllocator-inl.h

+    if (item.isChainedItem())
+      stats_.evictFailConcurrentFill.inc();
+    else
+      stats_.evictFailConcurrentFill.inc();


no need for if? we're bumping the same stat

therealgymmy · 2022-06-07T00:23:06Z

cachelib/allocator/CacheAllocator-inl.h

-
-    if (toReleaseHandle) {
-      if (toReleaseHandle->hasChainedItem()) {
+    bool evicted = tryEvictItem(mmContainer, std::move(toReleaseHandle));


less error-prone if we keep the existing behavior of returning the toReleaseHandle and then later in this function we explicitly it release and decrement the refcount. (Near L1267)

therealgymmy · 2022-06-07T00:27:24Z

cachelib/allocator/CacheAllocator-inl.h

-    ++itr;
-    stats_.evictFailAC.inc();
-    return evictHandle;
+    if (item.isChainedItem())


This cannot be a chained item since we're now always passing in the parent item handle into this function.

(We should find a way to keep this stat updated properly tho, maybe we can update this in findEviction() function instead)

therealgymmy · 2022-06-07T00:39:35Z

cachelib/allocator/CacheAllocator-inl.h

  if (evictHandle->isMoving()) {
-    stats_.evictFailMove.inc();
-    return WriteHandle{};
+    if (item.isChainedItem())


Same as L1362

igchor · 2022-06-09T08:58:40Z

@therealgymmy Thank you for the feedback!

I managed to reproduce the issue locally and will work on fixing it.

Did you mean under the hash-table lock? Seems like the change is to get an item handle by calling find()?

Yes, under the hash-table (Access Container) lock. I actually realized that instead of increasing the refCount (which requires this hash-table lock) we could just mark an item as moving in findEviction (if not already marked) This would prevent any other thread from releasing this item and we could even reuse the existing evictNormalItemForSlabRelease method (for ChainedItems we would still need separate code path).

I have just one concern/question regarding this approach. In current implementation, is it possible for multiple threads to mark the same item as moving? (I don't see any check for that in markMoving()) If not, how is this prevented? Is there an assumption that only one thread can do SlabRelease at any point?

therealgymmy · 2022-06-10T23:57:08Z

Hi Igor,

I have just one concern/question regarding this approach. In current implementation, is it possible for multiple threads to mark the same item as moving?

Only one thread can mark the same item as moving. Multiple threads can be releasing slabs, but the same slab can only be released by one thread. Thanks, Jimmy

…

________________________________ From: Igor Chorążewicz ***@***.***> Sent: Thursday, June 9, 2022 1:58 AM To: facebook/CacheLib ***@***.***> Cc: Jimmy Lu ***@***.***>; Mention ***@***.***> Subject: Re: [facebook/CacheLib] RFC: Shorten critical section in findEviction (PR #132) @therealgymmy<https://github.com/therealgymmy> Thank you for the feedback! I managed to reproduce the issue locally and will work on fixing it. Did you mean under the hash-table lock? Seems like the change is to get an item handle by calling find()? Yes, under the hash-table (Access Container) lock. I actually realized that instead of increasing the refCount (which requires this hash-table lock) we could just mark an item as moving in findEviction (if not already marked) This would prevent any other thread from releasing this item and we could even reuse the existing evictNormalItemForSlabRelease method (for ChainedItems we would still need separate code path). I have just one concern/question regarding this approach. In current implementation, is it possible for multiple threads to mark the same item as moving? (I don't see any check for that in markMoving()) If not, how is this prevented? Is there an assumption that only one thread can do SlabRelease at any point? — Reply to this email directly, view it on GitHub<#132 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAFDBVOP2SCM3Q2MFLDRUO3VOGW4ZANCNFSM5TP7W4OQ>. You are receiving this because you were mentioned.Message ID: ***@***.***>

facebook-github-bot · 2022-06-13T17:14:32Z

@igchor has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-06-13T17:15:07Z

@igchor has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-06-13T17:20:26Z

@igchor has updated the pull request. You must reimport the pull request before landing.

igchor · 2022-06-13T17:33:52Z

@therealgymmy I found out what the issue was: I was passing candidate to releaseBackToAllocator which after my changes represented the parent Item. We should pass pointer to the child we actually want to recycle instead.

Also, instead of increasing refCount of the item I now rely on moving flag. I changed the markMoving function to only succeed if the item is not yet marked as moving. This guarantees proper synchronization with other evicting threads and with SlabRelease thread. Once the item is marked as moving inside findEviction we can just execute the function which is used in SlabReleaseImpl for regular items - this removes some of the code duplication.

If the eviction fails (due to refCount being != 0) we unmark the item and check the flags. If flags are 0 this means that item is not used anywhere (and was not freed by any other thread since it was marked moving) and removed from AC and MMContainer by other thread so we can recycle it anyway.

After I implemented the above change I also realized it will be quite easy to use combined locking for eviction iterator which I also implemented (I took the idea from https://github.com/facebook/CacheLib/blob/main/cachelib/allocator/MM2Q-inl.h#L256). The performance results are pretty good now, we can share the exact numbers with you once we finish running all benchmarks.

facebook-github-bot · 2022-06-16T18:33:48Z

@therealgymmy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

therealgymmy · 2022-06-16T18:48:11Z

cachelib/allocator/CacheAllocator-inl.h

@@ -2727,6 +2627,11 @@ CacheAllocator<CacheTrait>::evictNormalItemForSlabRelease(Item& item) {
  auto token = evictToNvmCache ? nvmCache_->createPutToken(item.getKey())
                               : typename NvmCacheT::PutToken{};

+  if (skipIfTokenInvalid && evictToNvmCache && !token.isValid()) {


I think we can actually remove this logic. I don't remember clearly the rationale behind checking token validity right after creating it in advanceIteratorAndTryEvictRegularItem().

Maybe it was to avoid evicting item pre-maturely (when we cannot insert it into nvm-cache). But this should be fairly rare in practice.

Done (in separate commit). Should I also remove the corresponding stat evictFailConcurrentFill or leave it for compatibility?

Also this stat has the following comment next to it: "Eviction failures because this item has a potential concurrent fill from nvm cache. For consistency reason, we cannot evict it. Refer to NvmCache.h for more details."

therealgymmy · 2022-06-16T18:51:46Z

cachelib/allocator/Refcount.h

-   * An item can only be marked moving when `isInMMContainer` returns true.
-   * This operation is atomic.
+   * An item can only be marked moving when `isInMMContainer` returns true and
+   * the item is not yet marked as moving. This operation is atomic.


hah, this is clever!

I will go through the code in more detail to reason about the logic change, but in theory this makes sense.

Minor comment. Please rename this API to markExclusive, since that is our intent now: marking an item as exclusive so cachelib can gain exclusive access to it internally.

Hm, there is already a function isExclusive in Refcount.h Should we rename it to something else?

I think we can deprecate isExclusive() altogether. This change actually calls for a much bigger refactor in the slab release code path.

Moving bit is now denoting "exclusive" access to an item. Slab release should also be using it for gaining exclusive access, and the current notion of holding an active refcount to prevent someone else from removing/evicting an item can be fully deprecated.

Will elaborate in separate places about slab release.

I'm just not sure if isOnlyExclusive is a good name... Maybe you have some ideas?

therealgymmy · 2022-06-16T18:54:37Z

cachelib/allocator/CacheAllocator-inl.h


-    if (toReleaseHandle) {
-      if (toReleaseHandle->hasChainedItem()) {
+    if (toReleaseHandle || ref == 0u) {


Why can we rely on just ref == 0 to proceed with eviction?

Without a toReleaseHandle, it could mean we didn't remove the item (or its parent item) from access-container. It's not safe to evict it since it can still be accessed.

Please see my answer below.

I actually think that we might destroy the toReleaseHandle first (unconditionally) and then only rely on ref==0u. toReleaseHandle will not actually release anything on destruction since we have the moving bit set. And if unmarkMoving() will return 0 we know that, either we or some other thread, managed to remove the item from AC and MMContainer and no one is using the item. I'll check this.

therealgymmy · 2022-06-16T18:56:41Z

cachelib/allocator/CacheAllocator-inl.h

+        return toRecycle;
+      }
+    } else if (ref == 0u) {
+      // it's safe to recycle the item here as there are no more


is this true? If toReleaseHandle is empty, we could have returned early from evictNormalItem() and didn't remove this item from access-container or mm-container.

My thinking here was that this case (toReleaseHandle is NULL and ref == 0u) might happen in following (rare) scenario:

We have failed to remove item from AccessContainer (due to refCount being > 0). However, after that, and before we called unmarkMoving(), someone else removed the element from AccessContainer and MMContainer and decreased ref count to zero. That other thread couldn't release the item to allocator (since it's marked as moving, and this branch will not be taken: https://github.com/facebook/CacheLib/blob/main/cachelib/allocator/CacheAllocator-inl.h#L907). This means that we can (and should, to avoid memory leak) safely recycle the item (ref == 0u means no outstanding references and Admin/Control bits are zero).

@igchor: ah I see. Yes you're right. And this is a good point too. In fact, after we have called unmarkMoving, if the ref is 0, then we must free/recycle the memory. Because whichever other thread that has removed it couldn't free the memory due to moving bit.

therealgymmy · 2022-06-17T22:41:01Z

cachelib/allocator/CacheAllocator-inl.h

-            ? advanceIteratorAndTryEvictChainedItem(itr)
-            : advanceIteratorAndTryEvictRegularItem(mmContainer, itr);
+        evictNormalItem(*candidate, true /* skipIfTokenInvalid */);
+    auto ref = candidate->unmarkMoving();


Can you change the const auto ref = decRef(itemToRelease); at L1298 and change it to just ref = ...;?

Internal build is failing due to local shadow variable. (Unfortunately, I cannot change the code of a github-backed PR.) I will reimport to run internal tests.

Done. I've also added one additional commit which always destroys the handle prior to unmarking the item (simplifies the code a bit).

Remove the item from mmContainer and drop the lock before attempting eviction.

to avoid races with remove/replace with predicate. Also, refact tryEvictRegularItem and tryEvictChainedItem into a single function.

moving bit is used to give exclusive right to evict the item to a particular thread. Originially, there was an assumption that whoever marked the item as moving will try to free it until he succeeds. Since we don't want to do that in findEviction (potentially can take a long time) we need to make sure that unmarking is safe. This patch checks the flags after unmarking (atomically) and if ref is zero it also recyles the item. This is needed as there might be some concurrent thread releasing the item (and decrementing ref count). If moving bit is set, that thread would not free the memory back to allocator, resulting in memory leak on unmarkMoving().

under MMContainer combined_lock.

Checking token validity should not be needed right after creating it.

in findEviction, and rely only of recount being zero when releasing items back to allocator.

facebook-github-bot · 2022-06-20T15:32:42Z

@igchor has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-06-20T15:40:02Z

@igchor has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-06-20T16:17:37Z

@therealgymmy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

therealgymmy

Changes look great! I left high-level comments on simplifying eviction/slab-release further.

Internal unit tests have passed. I will move on to testing this on prod systems.

Do you have latest numbers of cachebench throughput improvement before/after this change?

therealgymmy · 2022-06-23T23:58:07Z

cachelib/allocator/CacheAllocator-inl.h

@@ -2664,7 +2544,7 @@ void CacheAllocator<CacheTrait>::evictForSlabRelease(
    auto owningHandle =
        item.isChainedItem()
            ? evictChainedItemForSlabRelease(item.asChainedItem())


Can we try to mark a chained item's parent as "moving" and then unify the eviction logic with regular item?

The intent of this function is to remove an item from "access container" and "mm container", and then we will just wait until all active refcounts are drained.

The logic in my mind can look like this:

auto* candidate = &item; if (item.isChainedItem()) { candidate = /* get parent item */; // if this failed, we busy-loop until chained item isOnlyMoving, as we're now // expecting another thread will free parent and this item's other siblings if (!candidate->markMoving()) { while (!item.isOnlyMoving()) {} // Free chained item } } // Remove regular item from access-container and mmcontainer, // and wait until isOnlyMoving()

Would you be OK with doing this refactor for Slab Release in separate PR (after this one would be merged)?

I have done some of the refactorings but I have some concerns about its correctness and performance implications (I would need to spend some more time testing it and understanding the evictChainedItemForSlabRelease more deeply). I also believe some other parts of the code could be simplified as well, after this change. I could work on that incrementally.

One specific concern with the approach you suggested is: when should we call nvmCache->put? If we would create putToken before removing the item from AccessContainer and MMContainer and issued the actual put after item.isOnlyMoving() it will result in keeping they key in the inFlightPut map longer - is this OK?

Would you be OK with doing this refactor for Slab Release in separate PR (after this one would be merged)?

Yes that is totally fine.

One specific concern with the approach you suggested is: when should we call nvmCache->put If we would create putToken before removing the item from AccessContainer and MMContainer and issued the actual put after item.isOnlyMoving() it will result in keeping they key in the inFlightPut map longer - is this OK?

Oh we should issue a put to NvmCache earlier. As soon as we have removed the RAM parent item from access-container. This is because from DRAM cahce's perspective, it is already invisible and it's expected that this item is on the queue to get inserted into NvmCache. (Similar to the existing logic in evictChainedItemForSlabRelease() )

If we delay the insertion to NvmCache until after "isOnlyMoving() == true", we actually become more at risk of seeing a concurrent Lookup to the same key. In which case, we will be forced to give up the insertion altogether. (Since the concurrent lookup have to return quickly to inform the caller a key exists or not, and in this case we don't have the item inserted in NvmCache yet and we will be forced to drop the insertion altogether). This is to avoid a scenario like:

(1) T1 -> Created PutToken (not yet enqueued or still in the queue) (2) T2 -> GET -> Miss (3) T1 -> Inserted into NvmCache (4) T2 -> GEt -> Hit <=== Bad. This is read/write inconsistency. This should return a miss as well.

Today in our logic, we handle this correctly by returning a miss at (4), but it's not ideal. Ideally we want (1) and (3) to be as close as possible to avoid (2) getting in between. Increasing the time gap between (1) and (3) will make (2) more likely to get interleaved.

therealgymmy · 2022-06-23T23:58:52Z

cachelib/allocator/CacheAllocator-inl.h

@@ -2664,7 +2544,7 @@ void CacheAllocator<CacheTrait>::evictForSlabRelease(
    auto owningHandle =
        item.isChainedItem()
            ? evictChainedItemForSlabRelease(item.asChainedItem())
-            : evictNormalItemForSlabRelease(item);
+            : evictNormalItem(item);


Here we should always remove item from access-container and mm-container.

And then we should wait until refcount drains.

There's no other thread that can "remove/evict" this item since when we initially marked this as moving, we have already gained exclusive access.

In L2567, we check owningHandle->isExclusive(). This can be deprecated. Since we can rely on moving-bit to denote exclusiveness. We can wait for isOnlyMoving().

therealgymmy · 2022-06-24T00:03:04Z

cachelib/allocator/CacheAllocator-inl.h

@@ -2721,7 +2601,7 @@ void CacheAllocator<CacheTrait>::evictForSlabRelease(

 template <typename CacheTrait>
 typename CacheAllocator<CacheTrait>::WriteHandle
-CacheAllocator<CacheTrait>::evictNormalItemForSlabRelease(Item& item) {
+CacheAllocator<CacheTrait>::evictNormalItem(Item& item) {


In L2620, do we still need accessContainer->remoevIf to return us a handle?

It seems we just need to return true/false.

therealgymmy · 2022-06-24T00:04:16Z

cachelib/allocator/MM2Q-inl.h

+template <typename F>
+void MM2Q::Container<T, HookPtr>::withEvictionIterator(F&& fun) {
+  lruMutex_->lock_combine([this, &fun]() {
+    fun(Iterator{LockHolder{}, lru_.rbegin()});


Do we have other call sites of EvictionIterator that requires a lock being held?

In this design, we no longer exposes eviction iterator outside MM-container, so I think we can simplify it to no longer require a lock passed into its constructor.

I.e. can we delete getEvictionIterator()?

Right, done.

Right now, also remove(Iterator& it) is not used anywhere. However, I think we should keep it. It might be possible to optimize the eviction even further in future (by removing element from MMContainer inside withEvictionIterator).

and use withEvictionIterator everywhere

facebook-github-bot · 2022-06-24T16:59:42Z

@igchor has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-06-24T17:01:51Z

@igchor has updated the pull request. You must reimport the pull request before landing.

therealgymmy · 2022-06-30T17:48:48Z

@igchor: cleaning up the slab release logic can be done in a separate PR.

I will need to run this change on a shadow setup for some of our production services. Once I verify the shadows run correctly. We'll be sufficiently confident to approve this PR and merge it in.

facebook-github-bot · 2022-06-30T18:02:44Z

@therealgymmy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

igchor · 2022-10-11T21:03:11Z

@therealgymmy I have opened a new PR: #166 which basically a subset of this one. It would be great if you could take a look.

Summary: This PR refactors the eviction logic (inside `findEviction`) so that it will be easy to add support for multiple memory tiers. Problem with multi-tier configuration is that the critical section under MMContainer lock is too long. To fix that we have implemented an algorithm which utilize WaitContext to decrease the critical section. (which will be part of future PRs). The idea is to use `moving` (now `exclusive`) bit to synchronize eviction with SlabRelease (and in future, with readers). In this PR, I only changed how `findEviction` synchronizes with SlabRelease. This PR is a subset of: #132 The above PR introduced some performance regressions in the single-memory-tier version which we weren't able to fix yet, hence we decided to first implement this part (which should not affect performance) and later we can add separate path for multi-tier or try to optimize the original patch. Pull Request resolved: #166 Test Plan: Imported from GitHub, without a `Test Plan:` line. CPU (A-B). Verified A/B and B/A have no noticeable cpu difference. https://pxl.cl/2jDfg Reviewed By: jaesoo-fb Differential Revision: D40360742 Pulled By: therealgymmy fbshipit-source-id: 96b416b07e67172ac797969ca374ecb1267ac5bb

Summary: This PR refactors the eviction logic (inside `findEviction`) so that it will be easy to add support for multiple memory tiers. Problem with multi-tier configuration is that the critical section under MMContainer lock is too long. To fix that we have implemented an algorithm which utilize WaitContext to decrease the critical section. (which will be part of future PRs). The idea is to use `moving` (now `exclusive`) bit to synchronize eviction with SlabRelease (and in future, with readers). In this PR, I only changed how `findEviction` synchronizes with SlabRelease. This PR is a subset of: #132 The above PR introduced some performance regressions in the single-memory-tier version which we weren't able to fix yet, hence we decided to first implement this part (which should not affect performance) and later we can add separate path for multi-tier or try to optimize the original patch. Pull Request resolved: #166 Test Plan: Imported from GitHub, without a `Test Plan:` line. Canary on CDN edge cluster where we had allocation failure problems. Results: https://fburl.com/ods/bxw2bas0 sf canary --sfid traffic_static/bigcache -V fbcdn.bigcache:cd4ae9804d99edbb314cdfd34edfa083 -j maa2c01/ti/edge.bigcache.maa2c01 --task 60 61 62 63 64 Reviewed By: jaesoo-fb Differential Revision: D41409497 Pulled By: haowu14 fbshipit-source-id: befc189c663778731ada0ea8dddbac5adba8ea36

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 15, 2022

igchor marked this pull request as ready for review April 27, 2022 15:15

victoria-mcgrath mentioned this pull request May 5, 2022

Upstreaming plan (5/4/2022) pmem/CacheLib#68

Closed

igchor force-pushed the optimize_mmcontainer_locking branch from 5d5f3f3 to f859beb Compare May 5, 2022 17:18

therealgymmy reviewed May 24, 2022

View reviewed changes

igchor force-pushed the optimize_mmcontainer_locking branch from 57c8bd3 to fdb2163 Compare June 2, 2022 12:51

therealgymmy requested changes Jun 7, 2022

View reviewed changes

igchor force-pushed the optimize_mmcontainer_locking branch from 28d84df to f0e8314 Compare June 13, 2022 17:15

igchor force-pushed the optimize_mmcontainer_locking branch from f0e8314 to 7891133 Compare June 13, 2022 17:20

therealgymmy reviewed Jun 16, 2022

View reviewed changes

therealgymmy requested changes Jun 17, 2022

View reviewed changes

igchor added 2 commits June 20, 2022 11:01

Shorten critical section in findEviction

1c23ee0

Remove the item from mmContainer and drop the lock before attempting eviction.

Increment refcount under lock

55e234d

igchor added 5 commits June 20, 2022 11:01

Increment item refCount under AC lock

4a34f75

to avoid races with remove/replace with predicate. Also, refact tryEvictRegularItem and tryEvictChainedItem into a single function.

Execute findEviction critical section

437041f

under MMContainer combined_lock.

Remove skip skipIfTokenInvalid.

c5306ec

Checking token validity should not be needed right after creating it.

Unconditionally destroy toReleaseHandle

cbbbbe3

in findEviction, and rely only of recount being zero when releasing items back to allocator.

igchor force-pushed the optimize_mmcontainer_locking branch from 7891133 to 782e18b Compare June 20, 2022 15:32

igchor force-pushed the optimize_mmcontainer_locking branch from 782e18b to cbbbbe3 Compare June 20, 2022 15:39

therealgymmy reviewed Jun 24, 2022

View reviewed changes

igchor added 2 commits June 24, 2022 08:35

Remove getEvictionIterator

ac56097

and use withEvictionIterator everywhere

Rename moving bit to exclusive

215d822

igchor force-pushed the optimize_mmcontainer_locking branch from 386e781 to 215d822 Compare June 24, 2022 17:01

igchor changed the title ~~RFC: Shorten critical section in findEviction~~ Shorten critical section in findEviction Jun 29, 2022

igchor mentioned this pull request Oct 11, 2022

Prepare findEviction to be extended for multiple memory tiers #166

Closed

igchor closed this Nov 4, 2022

igchor mentioned this pull request Dec 14, 2022

Add combined locking support for MMContainer #182

Closed

Shorten critical section in findEviction #132

Shorten critical section in findEviction #132

Conversation

igchor commented Apr 15, 2022

igchor commented Apr 28, 2022 • edited Loading

therealgymmy commented May 23, 2022 • edited Loading

therealgymmy left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

igchor commented May 25, 2022

therealgymmy commented Jun 1, 2022 • edited Loading

facebook-github-bot commented Jun 1, 2022

therealgymmy commented Jun 1, 2022 • edited Loading

facebook-github-bot commented Jun 2, 2022

igchor commented Jun 2, 2022

facebook-github-bot commented Jun 6, 2022

therealgymmy commented Jun 7, 2022 • edited Loading

therealgymmy left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

igchor commented Jun 9, 2022

therealgymmy commented Jun 10, 2022 via email

facebook-github-bot commented Jun 13, 2022

facebook-github-bot commented Jun 13, 2022

facebook-github-bot commented Jun 13, 2022

igchor commented Jun 13, 2022

facebook-github-bot commented Jun 16, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

therealgymmy Jun 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 20, 2022

facebook-github-bot commented Jun 20, 2022

facebook-github-bot commented Jun 20, 2022

therealgymmy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

therealgymmy Jun 23, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 24, 2022

facebook-github-bot commented Jun 24, 2022

therealgymmy commented Jun 30, 2022

facebook-github-bot commented Jun 30, 2022

igchor commented Oct 11, 2022

igchor commented Apr 28, 2022 •

edited

Loading

therealgymmy commented May 23, 2022 •

edited

Loading

therealgymmy left a comment •

edited

Loading

therealgymmy commented Jun 1, 2022 •

edited

Loading

therealgymmy commented Jun 1, 2022 •

edited

Loading

therealgymmy commented Jun 7, 2022 •

edited

Loading

therealgymmy left a comment •

edited

Loading

therealgymmy Jun 17, 2022 •

edited

Loading

therealgymmy Jun 23, 2022 •

edited

Loading