Skip to content

Commit 12e9c7a

Browse files
committed
[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map
This is phase 1 of multiple planned improvements on the sample profile loader. The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map. The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow. Several changes to note: (1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names. (2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) ) (3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing. (4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable. (5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive) Performance impact: When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%. Reviewed By: davidxl, snehasish Differential Revision: https://reviews.llvm.org/D147740
1 parent 6ee1aba commit 12e9c7a

File tree

14 files changed

+515
-153
lines changed

14 files changed

+515
-153
lines changed

llvm/include/llvm/ProfileData/SampleProf.h

+197-17
Original file line numberDiff line numberDiff line change
@@ -318,6 +318,14 @@ struct LineLocationHash {
318318

319319
raw_ostream &operator<<(raw_ostream &OS, const LineLocation &Loc);
320320

321+
static inline hash_code hashFuncName(StringRef F) {
322+
// If function name is already MD5 string, do not hash again.
323+
uint64_t Hash;
324+
if (F.getAsInteger(10, Hash))
325+
Hash = MD5Hash(F);
326+
return Hash;
327+
}
328+
321329
/// Representation of a single sample record.
322330
///
323331
/// A sample record is represented by a positive integer value, which
@@ -630,9 +638,13 @@ class SampleContext {
630638
return getContextString(FullContext, false);
631639
}
632640

633-
uint64_t getHashCode() const {
634-
return hasContext() ? hash_value(getContextFrames())
635-
: hash_value(getName());
641+
hash_code getHashCode() const {
642+
if (hasContext())
643+
return hash_value(getContextFrames());
644+
645+
// For non-context function name, use its MD5 as hash value, so that it is
646+
// consistent with the profile map's key.
647+
return hashFuncName(getName());
636648
}
637649

638650
/// Set the name of the function and clear the current context.
@@ -710,9 +722,12 @@ class SampleContext {
710722
uint32_t Attributes;
711723
};
712724

713-
static inline hash_code hash_value(const SampleContext &arg) {
714-
return arg.hasContext() ? hash_value(arg.getContextFrames())
715-
: hash_value(arg.getName());
725+
static inline hash_code hash_value(const SampleContext &Context) {
726+
return Context.getHashCode();
727+
}
728+
729+
inline raw_ostream &operator<<(raw_ostream &OS, const SampleContext &Context) {
730+
return OS << Context.toString();
716731
}
717732

718733
class FunctionSamples;
@@ -1202,6 +1217,9 @@ class FunctionSamples {
12021217
return !(*this == Other);
12031218
}
12041219

1220+
template <typename T>
1221+
const T &getKey() const;
1222+
12051223
private:
12061224
/// CFG hash value for the function.
12071225
uint64_t FunctionHash = 0;
@@ -1265,12 +1283,176 @@ class FunctionSamples {
12651283
const LocToLocMap *IRToProfileLocationMap = nullptr;
12661284
};
12671285

1286+
template <>
1287+
inline const SampleContext &FunctionSamples::getKey<SampleContext>() const {
1288+
return getContext();
1289+
}
1290+
12681291
raw_ostream &operator<<(raw_ostream &OS, const FunctionSamples &FS);
12691292

1270-
using SampleProfileMap =
1271-
std::unordered_map<SampleContext, FunctionSamples, SampleContext::Hash>;
1293+
/// This class is a wrapper to associative container MapT<KeyT, ValueT> using
1294+
/// the hash value of the original key as the new key. This greatly improves the
1295+
/// performance of insert and query operations especially when hash values of
1296+
/// keys are available a priori, and reduces memory usage if KeyT has a large
1297+
/// size.
1298+
/// When performing any action, if an existing entry with a given key is found,
1299+
/// and the interface "KeyT ValueT::getKey<KeyT>() const" to retrieve a value's
1300+
/// original key exists, this class checks if the given key actually matches
1301+
/// the existing entry's original key. If they do not match, this class behaves
1302+
/// as if the entry did not exist (for insertion, this means the new value will
1303+
/// replace the existing entry's value, as if it is newly inserted). If
1304+
/// ValueT::getKey<KeyT>() is not available, all keys with the same hash value
1305+
/// are considered equivalent (i.e. hash collision is silently ignored). Given
1306+
/// such feature this class should only be used where it does not affect
1307+
/// compilation correctness, for example, when loading a sample profile.
1308+
/// Assuming the hashing algorithm is uniform, the probability of hash collision
1309+
/// with 1,000,000 entries is
1310+
/// (2^64)!/((2^64-1000000)!*(2^64)^1000000) ~= 3*10^-8.
1311+
template <template <typename, typename, typename...> typename MapT,
1312+
typename KeyT, typename ValueT, typename... MapTArgs>
1313+
class HashKeyMap : public MapT<hash_code, ValueT, MapTArgs...> {
1314+
public:
1315+
using base_type = MapT<hash_code, ValueT, MapTArgs...>;
1316+
using key_type = hash_code;
1317+
using original_key_type = KeyT;
1318+
using mapped_type = ValueT;
1319+
using value_type = typename base_type::value_type;
12721320

1273-
using NameFunctionSamples = std::pair<SampleContext, const FunctionSamples *>;
1321+
using iterator = typename base_type::iterator;
1322+
using const_iterator = typename base_type::const_iterator;
1323+
1324+
private:
1325+
// If the value type has getKey(), retrieve its original key for comparison.
1326+
template <typename U = mapped_type,
1327+
typename = decltype(U().template getKey<original_key_type>())>
1328+
static bool
1329+
CheckKeyMatch(const original_key_type &Key, const mapped_type &ExistingValue,
1330+
original_key_type *ExistingKeyIfDifferent = nullptr) {
1331+
const original_key_type &ExistingKey =
1332+
ExistingValue.template getKey<original_key_type>();
1333+
bool Result = (Key == ExistingKey);
1334+
if (!Result && ExistingKeyIfDifferent)
1335+
*ExistingKeyIfDifferent = ExistingKey;
1336+
return Result;
1337+
}
1338+
1339+
// If getKey() does not exist, this overload is selected, which assumes all
1340+
// keys with the same hash are equivalent.
1341+
static bool CheckKeyMatch(...) { return true; }
1342+
1343+
public:
1344+
template <typename... Ts>
1345+
std::pair<iterator, bool> try_emplace(const key_type &Hash,
1346+
const original_key_type &Key,
1347+
Ts &&...Args) {
1348+
assert(Hash == hash_value(Key));
1349+
auto Ret = base_type::try_emplace(Hash, std::forward<Ts>(Args)...);
1350+
if (!Ret.second) {
1351+
original_key_type ExistingKey;
1352+
if (LLVM_UNLIKELY(!CheckKeyMatch(Key, Ret.first->second, &ExistingKey))) {
1353+
dbgs() << "MD5 collision detected: " << Key << " and " << ExistingKey
1354+
<< " has same hash value " << Hash << "\n";
1355+
Ret.second = true;
1356+
Ret.first->second = mapped_type(std::forward<Ts>(Args)...);
1357+
}
1358+
}
1359+
return Ret;
1360+
}
1361+
1362+
template <typename... Ts>
1363+
std::pair<iterator, bool> try_emplace(const original_key_type &Key,
1364+
Ts &&...Args) {
1365+
key_type Hash = hash_value(Key);
1366+
return try_emplace(Hash, Key, std::forward<Ts>(Args)...);
1367+
}
1368+
1369+
template <typename... Ts> std::pair<iterator, bool> emplace(Ts &&...Args) {
1370+
return try_emplace(std::forward<Ts>(Args)...);
1371+
}
1372+
1373+
mapped_type &operator[](const original_key_type &Key) {
1374+
return try_emplace(Key, mapped_type()).first->second;
1375+
}
1376+
1377+
iterator find(const original_key_type &Key) {
1378+
key_type Hash = hash_value(Key);
1379+
auto It = base_type::find(Hash);
1380+
if (It != base_type::end())
1381+
if (LLVM_LIKELY(CheckKeyMatch(Key, It->second)))
1382+
return It;
1383+
return base_type::end();
1384+
}
1385+
1386+
const_iterator find(const original_key_type &Key) const {
1387+
key_type Hash = hash_value(Key);
1388+
auto It = base_type::find(Hash);
1389+
if (It != base_type::end())
1390+
if (LLVM_LIKELY(CheckKeyMatch(Key, It->second)))
1391+
return It;
1392+
return base_type::end();
1393+
}
1394+
1395+
size_t erase(const original_key_type &Ctx) {
1396+
auto It = find(Ctx);
1397+
if (It != base_type::end()) {
1398+
base_type::erase(It);
1399+
return 1;
1400+
}
1401+
return 0;
1402+
}
1403+
};
1404+
1405+
/// This class provides operator overloads to the map container using MD5 as the
1406+
/// key type, so that existing code can still work in most cases using
1407+
/// SampleContext as key.
1408+
/// Note: when populating container, make sure to assign the SampleContext to
1409+
/// the mapped value immediately because the key no longer holds it.
1410+
class SampleProfileMap
1411+
: public HashKeyMap<DenseMap, SampleContext, FunctionSamples> {
1412+
public:
1413+
// Convenience method because this is being used in many places. Set the
1414+
// FunctionSamples' context if its newly inserted.
1415+
mapped_type &Create(const SampleContext &Ctx) {
1416+
auto Ret = try_emplace(Ctx, FunctionSamples());
1417+
if (Ret.second)
1418+
Ret.first->second.setContext(Ctx);
1419+
return Ret.first->second;
1420+
}
1421+
1422+
iterator find(const SampleContext &Ctx) {
1423+
return HashKeyMap<llvm::DenseMap, SampleContext, FunctionSamples>::find(
1424+
Ctx);
1425+
}
1426+
1427+
const_iterator find(const SampleContext &Ctx) const {
1428+
return HashKeyMap<llvm::DenseMap, SampleContext, FunctionSamples>::find(
1429+
Ctx);
1430+
}
1431+
1432+
// Overloaded find() to lookup a function by name. This is called by IPO
1433+
// passes with an actual function name, and it is possible that the profile
1434+
// reader converted function names in the profile to MD5 strings, so we need
1435+
// to check if either representation matches.
1436+
iterator find(StringRef Fname) {
1437+
hash_code Hash = hashFuncName(Fname);
1438+
auto It = base_type::find(Hash);
1439+
if (It != end()) {
1440+
StringRef CtxName = It->second.getContext().getName();
1441+
if (LLVM_LIKELY(CtxName == Fname || CtxName == std::to_string(Hash)))
1442+
return It;
1443+
}
1444+
return end();
1445+
}
1446+
1447+
size_t erase(const SampleContext &Ctx) {
1448+
return HashKeyMap<llvm::DenseMap, SampleContext, FunctionSamples>::erase(
1449+
Ctx);
1450+
}
1451+
1452+
size_t erase(const key_type &Key) { return base_type::erase(Key); }
1453+
};
1454+
1455+
using NameFunctionSamples = std::pair<hash_code, const FunctionSamples *>;
12741456

12751457
void sortFuncProfiles(const SampleProfileMap &ProfileMap,
12761458
std::vector<NameFunctionSamples> &SortedProfiles);
@@ -1316,8 +1498,6 @@ class SampleContextTrimmer {
13161498
bool MergeColdContext,
13171499
uint32_t ColdContextFrameLength,
13181500
bool TrimBaseProfileOnly);
1319-
// Canonicalize context profile name and attributes.
1320-
void canonicalizeContextProfiles();
13211501

13221502
private:
13231503
SampleProfileMap &ProfileMap;
@@ -1363,12 +1543,12 @@ class ProfileConverter {
13631543
SampleProfileMap &OutputProfiles,
13641544
bool ProfileIsCS = false) {
13651545
if (ProfileIsCS) {
1366-
for (const auto &I : InputProfiles)
1367-
OutputProfiles[I.second.getName()].merge(I.second);
1368-
// Retain the profile name and clear the full context for each function
1369-
// profile.
1370-
for (auto &I : OutputProfiles)
1371-
I.second.setContext(SampleContext(I.first));
1546+
for (const auto &I : InputProfiles) {
1547+
// Retain the profile name and clear the full context for each function
1548+
// profile.
1549+
FunctionSamples &FS = OutputProfiles.Create(I.second.getName());
1550+
FS.merge(I.second);
1551+
}
13721552
} else {
13731553
for (const auto &I : InputProfiles)
13741554
flattenNestedProfile(OutputProfiles, I.second);

llvm/include/llvm/ProfileData/SampleProfReader.h

+28-24
Original file line numberDiff line numberDiff line change
@@ -347,7 +347,7 @@ class SampleProfileReader {
347347
public:
348348
SampleProfileReader(std::unique_ptr<MemoryBuffer> B, LLVMContext &C,
349349
SampleProfileFormat Format = SPF_None)
350-
: Profiles(0), Ctx(C), Buffer(std::move(B)), Format(Format) {}
350+
: Profiles(), Ctx(C), Buffer(std::move(B)), Format(Format) {}
351351

352352
virtual ~SampleProfileReader() = default;
353353

@@ -383,8 +383,8 @@ class SampleProfileReader {
383383
/// The implementaion to read sample profiles from the associated file.
384384
virtual std::error_code readImpl() = 0;
385385

386-
/// Print the profile for \p FContext on stream \p OS.
387-
void dumpFunctionProfile(SampleContext FContext, raw_ostream &OS = dbgs());
386+
/// Print the profile for \p FunctionSamples on stream \p OS.
387+
void dumpFunctionProfile(const FunctionSamples &FS, raw_ostream &OS = dbgs());
388388

389389
/// Collect functions with definitions in Module M. For reader which
390390
/// support loading function profiles on demand, return true when the
@@ -410,23 +410,15 @@ class SampleProfileReader {
410410
/// Return the samples collected for function \p F, create empty
411411
/// FunctionSamples if it doesn't exist.
412412
FunctionSamples *getOrCreateSamplesFor(const Function &F) {
413-
std::string FGUID;
414413
StringRef CanonName = FunctionSamples::getCanonicalFnName(F);
415-
CanonName = getRepInFormat(CanonName, useMD5(), FGUID);
416-
auto It = Profiles.find(CanonName);
417-
if (It != Profiles.end())
418-
return &It->second;
419-
if (!FGUID.empty()) {
420-
assert(useMD5() && "New name should only be generated for md5 profile");
421-
CanonName = *MD5NameBuffer.insert(FGUID).first;
422-
}
414+
auto it = Profiles.find(CanonName);
415+
if (it != Profiles.end())
416+
return &it->second;
423417
return &Profiles[CanonName];
424418
}
425419

426420
/// Return the samples collected for function \p F.
427-
virtual FunctionSamples *getSamplesFor(StringRef Fname) {
428-
std::string FGUID;
429-
Fname = getRepInFormat(Fname, useMD5(), FGUID);
421+
FunctionSamples *getSamplesFor(StringRef Fname) {
430422
auto It = Profiles.find(Fname);
431423
if (It != Profiles.end())
432424
return &It->second;
@@ -654,15 +646,16 @@ class SampleProfileReaderBinary : public SampleProfileReader {
654646
/// Read the whole name table.
655647
std::error_code readNameTable();
656648

657-
/// Read a string indirectly via the name table.
658-
ErrorOr<StringRef> readStringFromTable();
649+
/// Read a string indirectly via the name table. Optionally return the index.
650+
ErrorOr<StringRef> readStringFromTable(size_t *RetIdx = nullptr);
659651

660-
/// Read a context indirectly via the CSNameTable.
661-
ErrorOr<SampleContextFrames> readContextFromTable();
652+
/// Read a context indirectly via the CSNameTable. Optionally return the
653+
/// index.
654+
ErrorOr<SampleContextFrames> readContextFromTable(size_t *RetIdx = nullptr);
662655

663656
/// Read a context indirectly via the CSNameTable if the profile has context,
664-
/// otherwise same as readStringFromTable.
665-
ErrorOr<SampleContext> readSampleContextFromTable();
657+
/// otherwise same as readStringFromTable, also return its hash value.
658+
ErrorOr<std::pair<SampleContext, hash_code>> readSampleContextFromTable();
666659

667660
/// Points to the current location in the buffer.
668661
const uint8_t *Data = nullptr;
@@ -682,13 +675,24 @@ class SampleProfileReaderBinary : public SampleProfileReader {
682675
/// the lifetime of MD5StringBuf is not shorter than that of NameTable.
683676
std::vector<std::string> MD5StringBuf;
684677

685-
/// The starting address of NameTable containing fixed length MD5.
678+
/// The starting address of fixed length MD5 name table section.
686679
const uint8_t *MD5NameMemStart = nullptr;
687680

688681
/// CSNameTable is used to save full context vectors. It is the backing buffer
689682
/// for SampleContextFrames.
690683
std::vector<SampleContextFrameVector> CSNameTable;
691684

685+
/// Table to cache MD5 values of sample contexts corresponding to
686+
/// readSampleContextFromTable(), used to index into Profiles or
687+
/// FuncOffsetTable.
688+
std::vector<hash_code> MD5SampleContextTable;
689+
690+
/// The starting address of the table of MD5 values of sample contexts. For
691+
/// fixed length MD5 non-CS profile it is same as MD5NameMemStart because
692+
/// hashes of non-CS contexts are already in the profile. Otherwise it points
693+
/// to the start of MD5SampleContextTable.
694+
const hash_code *MD5SampleContextStart = nullptr;
695+
692696
private:
693697
std::error_code readSummaryEntry(std::vector<ProfileSummaryEntry> &Entries);
694698
virtual std::error_code verifySPMagic(uint64_t Magic) = 0;
@@ -762,10 +766,10 @@ class SampleProfileReaderExtBinaryBase : public SampleProfileReaderBinary {
762766

763767
std::unique_ptr<ProfileSymbolList> ProfSymList;
764768

765-
/// The table mapping from function context to the offset of its
769+
/// The table mapping from a function context's MD5 to the offset of its
766770
/// FunctionSample towards file start.
767771
/// At most one of FuncOffsetTable and FuncOffsetList is populated.
768-
DenseMap<SampleContext, uint64_t> FuncOffsetTable;
772+
DenseMap<hash_code, uint64_t> FuncOffsetTable;
769773

770774
/// The list version of FuncOffsetTable. This is used if every entry is
771775
/// being accessed.

llvm/lib/ProfileData/ProfileSummaryBuilder.cpp

+1-3
Original file line numberDiff line numberDiff line change
@@ -204,9 +204,7 @@ SampleProfileSummaryBuilder::computeSummaryForProfiles(
204204
// profiles before computing profile summary.
205205
if (UseContextLessSummary || (sampleprof::FunctionSamples::ProfileIsCS &&
206206
!UseContextLessSummary.getNumOccurrences())) {
207-
for (const auto &I : Profiles) {
208-
ContextLessProfiles[I.second.getName()].merge(I.second);
209-
}
207+
ProfileConverter::flattenProfile(Profiles, ContextLessProfiles, true);
210208
ProfilesToUse = &ContextLessProfiles;
211209
}
212210

0 commit comments

Comments
 (0)