-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[ASan][libc++] std::basic_string annotations #72677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ASan][libc++] std::basic_string annotations #72677
Conversation
@llvm/pr-subscribers-libcxxabi @llvm/pr-subscribers-libcxx Author: Tacet (AdvenamTacet) ChangesThis commit turns on basic annotations for std::basic_string, similar to those in std::vector and std::deque. Notice: those annotations work if and only if libc++ and libc++abi are instrumented (with ASan) as well! Originally proposed here: https://reviews.llvm.org/D132769 Related patches: Turning on annotations for all allocators: https://reviews.llvm.org/D146214 This revision is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in std::vector, to std::string and std::deque collections. These changes allow ASan to detect cases when the instrumented program accesses memory which is internally allocated by the collection but is still not in-use (accesses before or after the stored elements for std::deque, or between the size and capacity bounds for std::string). The motivation for the research and those changes was a bug, found by Trail of Bits, in a real code where an out-of-bounds read could happen as two strings were compared via a std::equals function that took iter1_begin, iter1_end, iter2_begin iterators (with a custom comparison function). When object iter1 was longer than iter2, read out-of-bounds on iter2 could happen. Container sanitization would detect it. This Pull Request adds annotations for As D132522 extended possibilities of If you have any questions, please email:
Patch is 129.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/72677.diff 83 Files Affected:
diff --git a/libcxx/include/string b/libcxx/include/string
index 9c2efac8006bd72..9362b1bcd16fa19 100644
--- a/libcxx/include/string
+++ b/libcxx/include/string
@@ -645,6 +645,16 @@ basic_string<char32_t> operator""s( const char32_t *str, size_t len );
_LIBCPP_PUSH_MACROS
#include <__undef_macros>
+#ifndef _LIBCPP_HAS_NO_ASAN
+# define _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS __attribute__((__no_sanitize__("address")))
+// Sometimes string access own memory, which should not be accessed by different parts of program
+// (eg. not-in-use bytes of buffer in short string) and are poisoned by ASan.
+// This macro turns off instrumentation in a function, so memory accesses which normally would crash
+// work as expected.
+#else
+# define _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS
+#endif
+#define _LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED false
_LIBCPP_BEGIN_NAMESPACE_STD
@@ -702,6 +712,9 @@ struct __init_with_sentinel_tag {};
template<class _CharT, class _Traits, class _Allocator>
class basic_string
{
+private:
+ using __default_allocator_type = allocator<_CharT>;
+
public:
typedef basic_string __self;
typedef basic_string_view<_CharT, _Traits> __self_view;
@@ -888,34 +901,46 @@ public:
#endif
: __r_(__value_init_tag(), __a) {}
- _LIBCPP_CONSTEXPR_SINCE_CXX20 basic_string(const basic_string& __str)
+ _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS basic_string(const basic_string& __str)
: __r_(__default_init_tag(), __alloc_traits::select_on_container_copy_construction(__str.__alloc())) {
if (!__str.__is_long())
+ {
__r_.first() = __str.__r_.first();
+ __annotate_new(__get_short_size());
+ }
else
__init_copy_ctor_external(std::__to_address(__str.__get_long_pointer()), __str.__get_long_size());
}
- _LIBCPP_CONSTEXPR_SINCE_CXX20 basic_string(const basic_string& __str, const allocator_type& __a)
+ _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS basic_string(const basic_string& __str, const allocator_type& __a)
: __r_(__default_init_tag(), __a) {
if (!__str.__is_long())
+ {
__r_.first() = __str.__r_.first();
+ __annotate_new(__get_short_size());
+ }
else
__init_copy_ctor_external(std::__to_address(__str.__get_long_pointer()), __str.__get_long_size());
}
#ifndef _LIBCPP_CXX03_LANG
- _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 basic_string(basic_string&& __str)
+ _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
+ basic_string(basic_string&& __str)
# if _LIBCPP_STD_VER <= 14
_NOEXCEPT_(is_nothrow_move_constructible<allocator_type>::value)
# else
_NOEXCEPT
# endif
- : __r_(std::move(__str.__r_)) {
+ // Turning off ASan instrumentation for variable initialization with _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS
+ // is inconsistent and that initialization may be annotated.
+ // Therefore, to copy __str memory, we have to unpoison it first (if object is poisoned and not external buffer,
+ // so short string case).
+ : __r_( ( (__str.__is_long() ? 0 : (__str.__annotate_delete(), 0)), std::move(__str.__r_)) ) {
__str.__r_.first() = __rep();
}
- _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 basic_string(basic_string&& __str, const allocator_type& __a)
+ _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS
+ basic_string(basic_string&& __str, const allocator_type& __a)
: __r_(__default_init_tag(), __a) {
if (__str.__is_long() && __a != __str.__alloc()) // copy, not move
__init(std::__to_address(__str.__get_long_pointer()), __str.__get_long_size());
@@ -923,6 +948,9 @@ public:
if (__libcpp_is_constant_evaluated())
__r_.first() = __rep();
__r_.first() = __str.__r_.first();
+ if (!__str.__is_long()) {
+ __str.__annotate_delete();
+ }
__str.__r_.first() = __rep();
}
}
@@ -965,11 +993,11 @@ public:
}
#if _LIBCPP_STD_VER >= 23
- _LIBCPP_HIDE_FROM_ABI constexpr
+ _LIBCPP_HIDE_FROM_ABI _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS constexpr
basic_string(basic_string&& __str, size_type __pos, const _Allocator& __alloc = _Allocator())
: basic_string(std::move(__str), __pos, npos, __alloc) {}
- _LIBCPP_HIDE_FROM_ABI constexpr
+ _LIBCPP_HIDE_FROM_ABI _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS constexpr
basic_string(basic_string&& __str, size_type __pos, size_type __n, const _Allocator& __alloc = _Allocator())
: __r_(__default_init_tag(), __alloc) {
if (__pos > __str.size())
@@ -1081,6 +1109,7 @@ public:
#endif // _LIBCPP_CXX03_LANG
inline _LIBCPP_CONSTEXPR_SINCE_CXX20 ~basic_string() {
+ __annotate_delete();
if (__is_long())
__alloc_traits::deallocate(__alloc(), __get_long_pointer(), __get_long_cap());
}
@@ -1098,8 +1127,8 @@ public:
}
#ifndef _LIBCPP_CXX03_LANG
- _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 basic_string& operator=(basic_string&& __str)
- _NOEXCEPT_((__noexcept_move_assign_container<_Allocator, __alloc_traits>::value)) {
+ _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS basic_string&
+ operator=(basic_string&& __str) _NOEXCEPT_((__noexcept_move_assign_container<_Allocator, __alloc_traits>::value)) {
__move_assign(__str, integral_constant<bool, __alloc_traits::propagate_on_container_move_assignment::value>());
return *this;
}
@@ -1112,7 +1141,7 @@ public:
#if _LIBCPP_STD_VER >= 23
basic_string& operator=(nullptr_t) = delete;
#endif
- _LIBCPP_CONSTEXPR_SINCE_CXX20 basic_string& operator=(value_type __c);
+ _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS basic_string& operator=(value_type __c);
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
iterator begin() _NOEXCEPT
@@ -1329,7 +1358,7 @@ public:
}
#if _LIBCPP_STD_VER >= 20
- _LIBCPP_HIDE_FROM_ABI constexpr
+ _LIBCPP_HIDE_FROM_ABI constexpr _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS
void __move_assign(basic_string&& __str, size_type __pos, size_type __len) {
// Pilfer the allocation from __str.
_LIBCPP_ASSERT_INTERNAL(__alloc() == __str.__alloc(), "__move_assign called with wrong allocator");
@@ -1345,7 +1374,7 @@ public:
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
basic_string& assign(const basic_string& __str) { return *this = __str; }
#ifndef _LIBCPP_CXX03_LANG
- _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
+ _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS
basic_string& assign(basic_string&& __str)
_NOEXCEPT_((__noexcept_move_assign_container<_Allocator, __alloc_traits>::value))
{*this = std::move(__str); return *this;}
@@ -1736,7 +1765,7 @@ private:
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __shrink_or_extend(size_type __target_capacity);
- _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
+ _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS
bool __is_long() const _NOEXCEPT {
if (__libcpp_is_constant_evaluated() && __builtin_constant_p(__r_.first().__l.__is_long_)) {
return __r_.first().__l.__is_long_;
@@ -1776,6 +1805,7 @@ private:
value_type* __p;
if (__cap - __sz >= __n)
{
+ __annotate_increase(__n);
__p = std::__to_address(__get_pointer());
size_type __n_move = __sz - __ip;
if (__n_move != 0)
@@ -1802,7 +1832,7 @@ private:
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 allocator_type& __alloc() _NOEXCEPT { return __r_.second(); }
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR const allocator_type& __alloc() const _NOEXCEPT { return __r_.second(); }
- _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
+ _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS
void __set_short_size(size_type __s) _NOEXCEPT {
_LIBCPP_ASSERT_INTERNAL(
__s < __min_cap, "__s should never be greater than or equal to the short string capacity");
@@ -1810,7 +1840,7 @@ private:
__r_.first().__s.__is_long_ = false;
}
- _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
+ _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS
size_type __get_short_size() const _NOEXCEPT {
_LIBCPP_ASSERT_INTERNAL(
!__r_.first().__s.__is_long_, "String has to be short when trying to get the short size");
@@ -1860,6 +1890,45 @@ private:
const_pointer __get_pointer() const _NOEXCEPT
{return __is_long() ? __get_long_pointer() : __get_short_pointer();}
+ // The following functions are no-ops outside of AddressSanitizer mode.
+#ifndef _LIBCPP_HAS_NO_ASAN
+ _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __annotate_contiguous_container(
+ const void* __old_mid, const void* __new_mid) const {
+ const void* __begin = data();
+ const void* __end = data() + capacity() + 1;
+ if (!__libcpp_is_constant_evaluated() && __begin != nullptr && is_same<allocator_type, __default_allocator_type>::value)
+ __sanitizer_annotate_contiguous_container(__begin, __end, __old_mid, __new_mid);
+ }
+#else
+ _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void
+ __annotate_contiguous_container(const void*, const void*) const {}
+#endif
+
+ // ASan: short string is poisoned if and only if this function returns true.
+ _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 bool __asan_short_string_is_annotated() const _NOEXCEPT {
+ return _LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED && !__libcpp_is_constant_evaluated();
+ }
+
+ _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __annotate_new(size_type __current_size) const _NOEXCEPT {
+ if (!__libcpp_is_constant_evaluated() && (__asan_short_string_is_annotated() || __is_long()))
+ __annotate_contiguous_container(data() + capacity() + 1, data() + __current_size + 1);
+ }
+
+ _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __annotate_delete() const _NOEXCEPT {
+ if (!__libcpp_is_constant_evaluated() && (__asan_short_string_is_annotated() || __is_long()))
+ __annotate_contiguous_container(data() + size() + 1, data() + capacity() + 1);
+ }
+
+ _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __annotate_increase(size_type __n) const _NOEXCEPT {
+ if (!__libcpp_is_constant_evaluated() && (__asan_short_string_is_annotated() || __is_long()))
+ __annotate_contiguous_container(data() + size() + 1, data() + size() + 1 + __n);
+ }
+
+ _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __annotate_shrink(size_type __old_size) const _NOEXCEPT {
+ if (!__libcpp_is_constant_evaluated() && (__asan_short_string_is_annotated() || __is_long()))
+ __annotate_contiguous_container(data() + __old_size + 1, data() + size() + 1);
+ }
+
template <size_type __a> static
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
size_type __align_it(size_type __s) _NOEXCEPT
@@ -1962,6 +2031,7 @@ private:
}
else
{
+ __annotate_delete();
allocator_type __a = __str.__alloc();
auto __allocation = std::__allocate_at_least(__a, __str.__get_long_cap());
__begin_lifetime(__allocation.ptr, __allocation.count);
@@ -1971,6 +2041,7 @@ private:
__set_long_pointer(__allocation.ptr);
__set_long_cap(__allocation.count);
__set_long_size(__str.size());
+ __annotate_new(__get_long_size());
}
}
}
@@ -2018,18 +2089,28 @@ private:
// Assigns the value in __s, guaranteed to be __n < __min_cap in length.
inline _LIBCPP_CONSTEXPR_SINCE_CXX20 basic_string& __assign_short(const value_type* __s, size_type __n) {
+ size_type __old_size = size();
+ if (__n > __old_size)
+ __annotate_increase(__n - __old_size);
pointer __p = __is_long()
? (__set_long_size(__n), __get_long_pointer())
: (__set_short_size(__n), __get_short_pointer());
traits_type::move(std::__to_address(__p), __s, __n);
traits_type::assign(__p[__n], value_type());
+ if (__old_size > __n)
+ __annotate_shrink(__old_size);
return *this;
}
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
basic_string& __null_terminate_at(value_type* __p, size_type __newsz) {
+ size_type __old_size = size();
+ if (__newsz > __old_size)
+ __annotate_increase(__newsz - __old_size);
__set_size(__newsz);
traits_type::assign(__p[__newsz], value_type());
+ if (__old_size > __newsz)
+ __annotate_shrink(__old_size);
return *this;
}
@@ -2136,6 +2217,7 @@ void basic_string<_CharT, _Traits, _Allocator>::__init(const value_type* __s,
}
traits_type::copy(std::__to_address(__p), __s, __sz);
traits_type::assign(__p[__sz], value_type());
+ __annotate_new(__sz);
}
template <class _CharT, class _Traits, class _Allocator>
@@ -2164,6 +2246,7 @@ basic_string<_CharT, _Traits, _Allocator>::__init(const value_type* __s, size_ty
}
traits_type::copy(std::__to_address(__p), __s, __sz);
traits_type::assign(__p[__sz], value_type());
+ __annotate_new(__sz);
}
template <class _CharT, class _Traits, class _Allocator>
@@ -2188,6 +2271,7 @@ void basic_string<_CharT, _Traits, _Allocator>::__init_copy_ctor_external(
__set_long_size(__sz);
}
traits_type::copy(std::__to_address(__p), __s, __sz + 1);
+ __annotate_new(__sz);
}
template <class _CharT, class _Traits, class _Allocator>
@@ -2217,6 +2301,7 @@ basic_string<_CharT, _Traits, _Allocator>::__init(size_type __n, value_type __c)
}
traits_type::assign(std::__to_address(__p), __n, __c);
traits_type::assign(__p[__n], value_type());
+ __annotate_new(__n);
}
template <class _CharT, class _Traits, class _Allocator>
@@ -2243,6 +2328,7 @@ void basic_string<_CharT, _Traits, _Allocator>::__init_with_sentinel(_InputItera
}
catch (...)
{
+ __annotate_delete();
if (__is_long())
__alloc_traits::deallocate(__alloc(), __get_long_pointer(), __get_long_cap());
throw;
@@ -2298,11 +2384,13 @@ void basic_string<_CharT, _Traits, _Allocator>::__init_with_size(
}
catch (...)
{
+ __annotate_delete();
if (__is_long())
__alloc_traits::deallocate(__alloc(), __get_long_pointer(), __get_long_cap());
throw;
}
#endif // _LIBCPP_HAS_NO_EXCEPTIONS
+ __annotate_new(__sz);
}
template <class _CharT, class _Traits, class _Allocator>
@@ -2319,6 +2407,7 @@ basic_string<_CharT, _Traits, _Allocator>::__grow_by_and_replace
size_type __cap = __old_cap < __ms / 2 - __alignment ?
__recommend(std::max(__old_cap + __delta_cap, 2 * __old_cap)) :
__ms - 1;
+ __annotate_delete();
auto __allocation = std::__allocate_at_least(__alloc(), __cap + 1);
pointer __p = __allocation.ptr;
__begin_lifetime(__p, __allocation.count);
@@ -2338,6 +2427,7 @@ basic_string<_CharT, _Traits, _Allocator>::__grow_by_and_replace
__old_sz = __n_copy + __n_add + __sec_cp_sz;
__set_long_size(__old_sz);
traits_type::assign(__p[__old_sz], value_type());
+ __annotate_new(__old_cap + __delta_cap);
}
// __grow_by is deprecated because it does not set the size. It may not update the size when the size is changed, and it
@@ -2360,6 +2450,7 @@ basic_string<_CharT, _Traits, _Allocator>::__grow_by(size_type __old_cap, size_t
size_type __cap = __old_cap < __ms / 2 - __alignment ?
__recommend(std::max(__old_cap + __delta_cap, 2 * __old_cap)) :
__ms - 1;
+ __annotate_delete();
auto __allocation = std::__allocate_at_least(__alloc(), __cap + 1);
pointer __p = __allocation.ptr;
__begin_lifetime(__p, __allocation.count);
@@ -2402,10 +2493,15 @@ basic_string<_CharT, _Traits, _Allocator>::__assign_no_alias(
const value_type* __s, size_type __n) {
size_type __cap = __is_short ? static_cast<size_type>(__min_cap) : __get_long_cap();
if (__n < __cap) {
+ size_type __old_size = __is_short ? __get_short_size() : __get_long_size();
+ if (__n > __old_size)
+ __annotate_increase(__n - __old_size);
pointer __p = __is_short ? __get_short_pointer() : __get_long_pointer();
__is_short ? __set_short_size(__n) : __set_long_size(__n);
traits_type::copy(std::__to_address(__p), __s, __n);
traits_type::assign(__p[__n], value_type());
+ if (__old_size > __n)
+ __annotate_shrink(__old_size);
} else {
size_type __sz = __is_short ? __get_short_size() : __get_long_size();
__grow_by_and_replace(__cap - 1, __n - __cap + 1, __sz, 0, __sz, __n, __s);
@@ -2420,6 +2516,9 @@ basic_string<_CharT, _Traits, _Allocator>::__assign_external(
const value_type* __s, size_type __n) {
size_type __cap = capacity();
if (__cap >= __n) {
+ size_type __old_size = size();
+ if (__n > __old_size)
+ __annotate_increase(__n - __old_size);
value_type* __p = std::__to_address(__get_pointer());
traits_type::move(__p, __s, __n);
return __null_terminate_at(__p, __n);
@@ -2447,11 +2546,15 @@ basic_string<_CharT, _Traits, _Allocator>&
basic_string<_CharT, _Traits, _Allocator>::assign(size_type __n, value_type __c)
{
size_type __cap = capacity();
+ size_type __old_size = size();
if (__cap < __n)
{
size_type __sz = size();
__grow_by_without_replace(__cap, __n - __cap, __sz, 0, __sz);
+ __annotate_increase(__n);
}
+ else if(__n > __old_size)
+ __annotate_increase(__n - __old_size);
value_type* __p = std::__to_address(__get_pointer());
traits_type::assign(__p, __n, __c);
return __null_terminate_at(__p, __n);
@@ -2462,24 +2565,26 @@ _LIBCPP_CONSTEXPR_SINCE_CXX20
basic_string<_CharT, _Traits, _Allocator>&
basic_string<_CharT, _Traits, _Allocator>::operator=(value_type __c)
{
- pointer __p;
- if (__is_long())
- {
- __p = __get_long_pointer();
- __set_long_size(1);
- }
- else
- {
- __p = __get_short_pointer();
- __set_short_size(1);
- }
- traits_type::assign(*__p, __c);
- traits_type::assign(*++__p, value_type());
- return *this;
+ pointer __p;
+ size_type __old_size = size();
+ if (__old_size == 0)
+ __annotate_increase(1);
+ if (__is_long()) {
+ __p = __get_long_pointer();
+ __set_long_size(1);
+ } else {
+ __p = __get_short_pointer();
+ __set_short_size(1);
+ }
+ traits_type::assign(*__p, __c);
+ traits_type::assign(*++__p, value_type());
+ if (__old_size > 1)
+ __annotate_shrink(__old_size);
+ return *this;
}
template <class _CharT, class _Traits, class _Allocator>
-_LIBCPP_CONSTEXPR_SINCE_CXX20
+_LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS
basic_string<_CharT, _Traits, _Allocator>&
basic_string<_CharT, _Traits, _Allocator>::operator=(const basic_string& __str)
{
@@ -2487,7 +2592,12 @@ basic_string<_CharT, _Traits, _Allocator>::operator=(const basic_string& __str)
__copy_assign_alloc(__str);
if (!__is_long()) {
if (!__str.__is_long()) {
+ size_type __old_size = __get_short_size();
+ if (__get_short_size() < __str.__get_short_size())
+ __annotate_increase(__str.__get_short_size() - __get_short_size());
__r_.first() = __str.__r_.first();
+ if (__old_size > __get_short_size())
+ __annotate_shrink(__old_size);
} else {
return __assign_no_alias<true>(__str.data(), __str.size());
}
@@ -2513,7 +2623,7 @@ basic_string<_CharT, _Traits, _Allocator>::__move_assign(basic_string& __str, fa
}
template <class _CharT, class _Traits, class _Allocator>
-inline _LIBCPP_CONSTEXPR_SINCE_CXX20
+inline _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS
void
basic_string<_CharT, _Traits, _Allocator>::__move_assign(basic_string& __str, true_t...
[truncated]
|
|
@ldionne Given that the requirement for sanitized libc++ and libc++abi is already an established practice (Memory Sanitizer), could we proceed with merging this branch after addressing all code review concerns, prior to developing a unified approach for shipping instrumented libraries? Edit: especially that Instrumented libc++ RFC is going into direction of fairly limited changes compared to original plan. And maintaining this patch outside of upstream takes a lot of time |
^ Yes, I think that's a reasonable approach. So we'll assume that libc++ is built with support for asan + container annotations if a user enables these container annotations in their code? Does this mean that |
Thank you!
Now yes, but adding a macro to opt-in or opt-out is trivial, we already have the I'm in favor of opt-out macro, instead of opt-in. I'm aware of problems resulting from this decision, but using sanitized dylib should be the standard anyway and we should push in that direction. We can always change opt-out to opt-in before branching, if we get feedback suggesting it. @ldionne let me know what version you prefer and I will add it. PS |
@philnik777 since your last review at Phabricator, except removing all changes from the file listing ABI functions (as requested on Phabricator), I was doing only maintenance ( If you see nothing else to fix or only minor things, could you greenlight the review? |
b31e1e0
to
14dfcbf
Compare
@ldionne I'd appreciate a review of that PR, please let me know if you have time to look at it anytime soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please write specific tests for your change.
I don't think tacking on an assertion to every existing test is the right way to go. I understand the appeal, but I think specific tests written expressly to test this change is the right way to go.
In the past, we've allowed tests to be added like this, but it's resulted in tests that run too long and tests which don't test a single thing.
Hey @EricWF , that way of testing ASan is already established in We did the same thing with Creating a separate set of tests is not feasible, as it would basically require copy-pasting all tests we have, as almost every function has to be carefully tested. And if a new function is added/overloaded in future, tests should be added in two places. To address performance concerns when running tests with What do you think? |
3e9c22b
to
90e7249
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As-is, landing this patch will break most users who use -fsanitize=address
and use libc++ in their code. Indeed, vendors ship a single version of libc++.dylib
right now and that version is not instrumented, which means that the extern template instantiations in libc++.dylib
will cause problems.
I'm worried that this will be a significant regression. Ideally, we'd address the issue by having a proper plan for shipping an instrumented library and having the driver select it based on -fsanitize=address
. But barring that, we'll need to either
- make the
string
annotations opt-in, or - disable the extern template instantiation declarations under asan
- only enable these annotations when the dylib was built with support for ASAN
(1) basically negates the benefit of this patch since almost nobody will turn it on
(2) is still not correct because of ODR violations, but it should make most basic use cases work. It would still be a temporary solution.
I think (3) is probably what makes the most sense, since that wouldn't be a temporary solution. It would only mean that most platforms don't get this support by default because the dylib isn't built with instrumentation, but the minute we ship a library that does support that, everything would still work out of the box.
This could be done by encoding whether ASAN container annotations are supported in the __config_site
, and that would then become a configuration-time property of the library.
libcxx/test/std/strings/basic.string/string.capacity/reserve_size.asan.pass.cpp
Show resolved
Hide resolved
libcxx/test/std/strings/basic.string/string.modifiers/string_swap/swap.asan.pass.cpp
Show resolved
Hide resolved
libcxx/test/std/strings/basic.string/string.modifiers/string_erase/pop_back.asan.pass.cpp
Outdated
Show resolved
Hide resolved
libcxx/test/std/strings/basic.string/string.cons/initializer_list.pass.cpp
Outdated
Show resolved
Hide resolved
If we encode the fact that libc++ was built with instrumentation in Basically the idea would be to make these container annotations distinct from whether the user is compiling with |
I don't think we can make it work, linking is going to fail in that case. It already cannot work, string annotations have no impact on it. compiler-rt ASan functions used in dylib are missing during linking without the ASan flag.
While I agree, it's far from perfect, we already assume that ASan has no stable ABI. An advantage of this solution is that all users will now have annotated strings. But (3) sounds good, if we just have a makro in Edit: I implemented 3, see a relevant commit. |
14ee035
to
543b283
Compare
This function replaces a call to `__move_assign` (internal function) with two calls to public member functions (`resize` and `erase`). The order of call is chosen for the best performance. This change is required to turn on ASan string annotations for short strings (Short String Optimization - SSO). The `std::basic_string` class's `void __move_assign(basic_string&& __str, size_type __pos, size_type __len)` function operates on uninitialized strings, where it is reasonable to assume that the memory is not poisoned. However, in `sstream` this function is applied to existing strings that may already have poisoned memory. String ASan annotations turned on here: llvm#72677
This function replaces a call to `__move_assign` (internal function) with two calls to public member functions (`resize` and `erase`). The order of calls is chosen for the best performance. This change is required to [turn on ASan string annotations for short strings](#75882) (Short String Optimization - SSO). The `std::basic_string` class's `void __move_assign(basic_string&& __str, size_type __pos, size_type __len)` function operates on uninitialized strings, where it is reasonable to assume that the memory is not poisoned. However, in `sstream` this function is applied to existing strings that already have poisoned memory. String ASan annotations turned on here: #72677
This commit turns on ASan annotations in `std::basic_string` for all allocators by default. Originally suggested here: https://reviews.llvm.org/D146214 This commit is part of our efforts to support container annotations with (almost) every allocator. Annotating `std::basic_string` with default allocator is implemented in llvm#72677. Support in ASan API exests since llvm@dd1b7b7. This patch removes the check in std::basic_string annotation member function (__annotate_contiguous_container) to support different allocators. You can turn off annotations for a specific allocator based on changes from llvm@2fa1bec. This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in `std::vector` and `std::deque` collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in `std::deque`, or between the `std::basic_string`'s size (including the null terminator) and capacity bounds. The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the `std::equals` function. This function was taking iterators (`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison, using a custom comparison function. When the `iter1` object exceeded the length of `iter2`, an out-of-bounds read could occur on the `iter2` object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability. If you have any questions, please email: - advenam.tacet@trailofbits.com - disconnect3d@trailofbits.com
This commit turns on ASan annotations in `std::basic_string` for short stings (SSO case). Originally suggested here: https://reviews.llvm.org/D147680 String annotations added here: llvm#72677 Annotating `std::basic_string` with default allocator is implemented in llvm#72677 but annotations for short strings (SSO - Short String Optimization) are turned off there. This commit turns them on. This also removes `_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to support turning on and off short string annotations. Support in ASan API exists since llvm@dd1b7b7. You can turn off annotations for a specific allocator based on changes from llvm@2fa1bec. This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in `std::vector` and `std::deque` collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in `std::deque`, or between the `std::basic_string`'s size (including the null terminator) and capacity bounds. The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the `std::equals` function. This function was taking iterators (`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison, using a custom comparison function. When the `iter1` object exceeded the length of `iter2`, an out-of-bounds read could occur on the `iter2` object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability. If you have any questions, please email: advenam.tacet@trailofbits.com disconnect3d@trailofbits.com
…5845) This commit turns on ASan annotations in `std::basic_string` for all allocators by default. Originally suggested here: https://reviews.llvm.org/D146214 String annotations added here: #72677 This commit is part of our efforts to support container annotations with (almost) every allocator. Annotating `std::basic_string` with default allocator is implemented in #72677. Additionally it removes `__begin != nullptr` because `data()` should never return a nullptr. Support in ASan API exists since 1c5ad6d. This patch removes the check in std::basic_string annotation member function (__annotate_contiguous_container) to support different allocators. You can turn off annotations for a specific allocator based on changes from 2fa1bec. The motivation for a research and those changes was a bug, found by Trail of Bits, in a real code where an out-of-bounds read could happen as two strings were compared via a call to `std::equal` that took `iter1_begin`, `iter1_end`, `iter2_begin` iterators (with a custom comparison function). When object `iter1` was longer than `iter2`, read out-of-bounds on `iter2` could happen. Container sanitization would detect it. If you have any questions, please email: - advenam.tacet@trailofbits.com - disconnect3d@trailofbits.com
This commit turns on ASan annotations in `std::basic_string` for short stings (SSO case). Originally suggested here: https://reviews.llvm.org/D147680 String annotations added here: llvm#72677 Annotating `std::basic_string` with default allocator is implemented in llvm#72677 but annotations for short strings (SSO - Short String Optimization) are turned off there. This commit turns them on. This also removes `_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to support turning on and off short string annotations. Support in ASan API exists since llvm@dd1b7b7. You can turn off annotations for a specific allocator based on changes from llvm@2fa1bec. This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in `std::vector` and `std::deque` collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in `std::deque`, or between the `std::basic_string`'s size (including the null terminator) and capacity bounds. The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the `std::equals` function. This function was taking iterators (`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison, using a custom comparison function. When the `iter1` object exceeded the length of `iter2`, an out-of-bounds read could occur on the `iter2` object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability. If you have any questions, please email: advenam.tacet@trailofbits.com disconnect3d@trailofbits.com
This commit turns on ASan annotations in `std::basic_string` for short stings (SSO case). Originally suggested here: https://reviews.llvm.org/D147680 String annotations added here: #72677 Requires to pass CI without fails: - #75845 - #75858 Annotating `std::basic_string` with default allocator is implemented in #72677 but annotations for short strings (SSO - Short String Optimization) are turned off there. This commit turns them on. This also removes `_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to support turning on and off short string annotations. Support in ASan API exists since dd1b7b7. You can turn off annotations for a specific allocator based on changes from 2fa1bec. This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in `std::vector` and `std::deque` collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in `std::deque`, or between the `std::basic_string`'s size (including the null terminator) and capacity bounds. The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the `std::equals` function. This function was taking iterators (`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison, using a custom comparison function. When the `iter1` object exceeded the length of `iter2`, an out-of-bounds read could occur on the `iter2` object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability. If you have any questions, please email: advenam.tacet@trailofbits.com disconnect3d@trailofbits.com
This commit turns on ASan annotations in `std::basic_string` for short stings (SSO case). Originally suggested here: https://reviews.llvm.org/D147680 String annotations added here: llvm#72677 Requires to pass CI without fails: - llvm#75845 - llvm#75858 Annotating `std::basic_string` with default allocator is implemented in llvm#72677 but annotations for short strings (SSO - Short String Optimization) are turned off there. This commit turns them on. This also removes `_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to support turning on and off short string annotations. Support in ASan API exists since llvm@dd1b7b7. You can turn off annotations for a specific allocator based on changes from llvm@2fa1bec. This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in `std::vector` and `std::deque` collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in `std::deque`, or between the `std::basic_string`'s size (including the null terminator) and capacity bounds. The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the `std::equals` function. This function was taking iterators (`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison, using a custom comparison function. When the `iter1` object exceeded the length of `iter2`, an out-of-bounds read could occur on the `iter2` object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability. If you have any questions, please email: advenam.tacet@trailofbits.com disconnect3d@trailofbits.com
This commit turns on ASan annotations in `std::basic_string` for short stings (SSO case). Originally suggested here: https://reviews.llvm.org/D147680 String annotations added here: llvm#72677 Requires to pass CI without fails: - llvm#75845 - llvm#75858 Annotating `std::basic_string` with default allocator is implemented in llvm#72677 but annotations for short strings (SSO - Short String Optimization) are turned off there. This commit turns them on. This also removes `_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to support turning on and off short string annotations. Support in ASan API exists since llvm@dd1b7b7. You can turn off annotations for a specific allocator based on changes from llvm@2fa1bec. This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in `std::vector` and `std::deque` collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in `std::deque`, or between the `std::basic_string`'s size (including the null terminator) and capacity bounds. The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the `std::equals` function. This function was taking iterators (`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison, using a custom comparison function. When the `iter1` object exceeded the length of `iter2`, an out-of-bounds read could occur on the `iter2` object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability. If you have any questions, please email: advenam.tacet@trailofbits.com disconnect3d@trailofbits.com
Originally merged here: #75882 Reverted here: #78627 Reverted due to failing buildbots. The problem was not caused by the annotations code, but by code in the `UniqueFunctionBase` class and in the `JSON.h` file. That code caused the program to write to memory that was already being used by string objects, which resulted in an ASan error. Fixes are implemented in: - #79065 - #79066 Problematic code from `UniqueFunctionBase` for example: ```cpp #ifndef NDEBUG // In debug builds, we also scribble across the rest of the storage. memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize); #endif ``` --- Original description: This commit turns on ASan annotations in `std::basic_string` for short stings (SSO case). Originally suggested here: https://reviews.llvm.org/D147680 String annotations added here: #72677 Requires to pass CI without fails: - #75845 - #75858 Annotating `std::basic_string` with default allocator is implemented in #72677 but annotations for short strings (SSO - Short String Optimization) are turned off there. This commit turns them on. This also removes `_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to support turning on and off short string annotations. Support in ASan API exists since dd1b7b7. You can turn off annotations for a specific allocator based on changes from 2fa1bec. This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in `std::vector` and `std::deque` collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in `std::deque`, or between the `std::basic_string`'s size (including the null terminator) and capacity bounds. The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the `std::equals` function. This function was taking iterators (`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison, using a custom comparison function. When the `iter1` object exceeded the length of `iter2`, an out-of-bounds read could occur on the `iter2` object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability. If you have any questions, please email: advenam.tacet@trailofbits.com disconnect3d@trailofbits.com
This is 3rd attempt to upstream short string annotations, it's the same as the previous one, but other PRs fixed issues withing LLVM: - llvm#79489 - llvm#79522 Additionaly annotations were updated (but it shouldn't have any impact on anything): - llvm#79292 Now, as far as I know, all buildbots should work without problems. Both previous reverts were not related to issues with string annotations, but with issues in LLVM/clang. Read PRs above and below for details. --- Previous description: Originally merged here: llvm#75882 Reverted here: llvm#78627 Reverted due to failing buildbots. The problem was not caused by the annotations code, but by code in the `UniqueFunctionBase` class and in the `JSON.h` file. That code caused the program to write to memory that was already being used by string objects, which resulted in an ASan error. Fixes are implemented in: - llvm#79065 - llvm#79066 Problematic code from `UniqueFunctionBase` for example: ```cpp // In debug builds, we also scribble across the rest of the storage. memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize); ``` --- Original description: This commit turns on ASan annotations in `std::basic_string` for short stings (SSO case). Originally suggested here: https://reviews.llvm.org/D147680 String annotations added here: llvm#72677 Requires to pass CI without fails: - llvm#75845 - llvm#75858 Annotating `std::basic_string` with default allocator is implemented in llvm#72677 but annotations for short strings (SSO - Short String Optimization) are turned off there. This commit turns them on. This also removes `_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to support turning on and off short string annotations. Support in ASan API exists since llvm@dd1b7b7. You can turn off annotations for a specific allocator based on changes from llvm@2fa1bec. This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in `std::vector` and `std::deque` collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in `std::deque`, or between the `std::basic_string`'s size (including the null terminator) and capacity bounds. The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the `std::equals` function. This function was taking iterators (`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison, using a custom comparison function. When the `iter1` object exceeded the length of `iter2`, an out-of-bounds read could occur on the `iter2` object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability. If you have any questions, please email: advenam.tacet@trailofbits.com disconnect3d@trailofbits.com
) This function replaces a call to `__move_assign` (internal function) with two calls to public member functions (`resize` and `erase`). The order of calls is chosen for the best performance. This change is required to [turn on ASan string annotations for short strings](llvm#75882) (Short String Optimization - SSO). The `std::basic_string` class's `void __move_assign(basic_string&& __str, size_type __pos, size_type __len)` function operates on uninitialized strings, where it is reasonable to assume that the memory is not poisoned. However, in `sstream` this function is applied to existing strings that already have poisoned memory. String ASan annotations turned on here: llvm#72677
…vm#75845) This commit turns on ASan annotations in `std::basic_string` for all allocators by default. Originally suggested here: https://reviews.llvm.org/D146214 String annotations added here: llvm#72677 This commit is part of our efforts to support container annotations with (almost) every allocator. Annotating `std::basic_string` with default allocator is implemented in llvm#72677. Additionally it removes `__begin != nullptr` because `data()` should never return a nullptr. Support in ASan API exists since llvm@1c5ad6d. This patch removes the check in std::basic_string annotation member function (__annotate_contiguous_container) to support different allocators. You can turn off annotations for a specific allocator based on changes from llvm@2fa1bec. The motivation for a research and those changes was a bug, found by Trail of Bits, in a real code where an out-of-bounds read could happen as two strings were compared via a call to `std::equal` that took `iter1_begin`, `iter1_end`, `iter2_begin` iterators (with a custom comparison function). When object `iter1` was longer than `iter2`, read out-of-bounds on `iter2` could happen. Container sanitization would detect it. If you have any questions, please email: - advenam.tacet@trailofbits.com - disconnect3d@trailofbits.com
Extend `std::basic_string` tests to cover more buffer situations and length in general, particularly non-SSO cases after SSO test cases (changing buffers). This commit is a side effect of working on tests for ASan annotations. Related PR: llvm/llvm-project#72677 NOKEYCHECK=True GitOrigin-RevId: c77cdbac9b121611121adf5806a99aff4812a40c
This commit introduces basic annotations for `std::basic_string`, mirroring the approach used in `std::vector` and `std::deque`. Initially, only long strings with the default allocator will be annotated. Short strings (_SSO - short string optimization_) and strings with non-default allocators will be annotated in the near future, with separate commits dedicated to enabling them. The process will be similar to the workflow employed for enabling annotations in `std::deque`. **Please note**: these annotations function effectively only when libc++ and libc++abi dylibs are instrumented (with ASan). This aligns with the prevailing behavior of Memory Sanitizer. To avoid breaking everything, this commit also appends `_LIBCPP_INSTRUMENTED_WITH_ASAN` to `__config_site` whenever libc++ is compiled with ASan. If this macro is not defined, string annotations are not enabled. However, linking a binary that does **not** annotate strings with a dynamic library that annotates strings, is not permitted. Originally proposed here: https://reviews.llvm.org/D132769 Related patches on Phabricator: - Turning on annotations for short strings: https://reviews.llvm.org/D147680 - Turning on annotations for all allocators: https://reviews.llvm.org/D146214 This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in `std::vector` and `std::deque` collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in `std::deque`, or between the `std::basic_string`'s size (including the null terminator) and capacity bounds. The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the `std::equals` function. This function was taking iterators (`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison, using a custom comparison function. When the `iter1` object exceeded the length of `iter2`, an out-of-bounds read could occur on the `iter2` object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability. This Pull Request introduces basic annotations for `std::basic_string`. Long strings exhibit structural similarities to `std::vector` and will be annotated accordingly. Short strings are already implemented, but will be turned on separately in a forthcoming commit. Look at [a comment](llvm/llvm-project#72677 (comment)) below to read about SSO issues at current moment. Due to the functionality introduced in [D132522](llvm/llvm-project@dd1b7b7), the `__sanitizer_annotate_contiguous_container` function now offers compatibility with all allocators. However, enabling this support will be done in a subsequent commit. For the time being, only strings with the default allocator will be annotated. If you have any questions, please email: - advenam.tacet@trailofbits.com - disconnect3d@trailofbits.com NOKEYCHECK=True GitOrigin-RevId: 9ed20568e7de53dce85f1631d7d8c1415e7930ae
This function replaces a call to `__move_assign` (internal function) with two calls to public member functions (`resize` and `erase`). The order of calls is chosen for the best performance. This change is required to [turn on ASan string annotations for short strings](llvm/llvm-project#75882) (Short String Optimization - SSO). The `std::basic_string` class's `void __move_assign(basic_string&& __str, size_type __pos, size_type __len)` function operates on uninitialized strings, where it is reasonable to assume that the memory is not poisoned. However, in `sstream` this function is applied to existing strings that already have poisoned memory. String ASan annotations turned on here: llvm/llvm-project#72677 NOKEYCHECK=True GitOrigin-RevId: 5351ded68d579921a61b26a34e36046c22f668bd
…5845) This commit turns on ASan annotations in `std::basic_string` for all allocators by default. Originally suggested here: https://reviews.llvm.org/D146214 String annotations added here: llvm/llvm-project#72677 This commit is part of our efforts to support container annotations with (almost) every allocator. Annotating `std::basic_string` with default allocator is implemented in llvm/llvm-project#72677. Additionally it removes `__begin != nullptr` because `data()` should never return a nullptr. Support in ASan API exists since llvm/llvm-project@1c5ad6d. This patch removes the check in std::basic_string annotation member function (__annotate_contiguous_container) to support different allocators. You can turn off annotations for a specific allocator based on changes from llvm/llvm-project@2fa1bec. The motivation for a research and those changes was a bug, found by Trail of Bits, in a real code where an out-of-bounds read could happen as two strings were compared via a call to `std::equal` that took `iter1_begin`, `iter1_end`, `iter2_begin` iterators (with a custom comparison function). When object `iter1` was longer than `iter2`, read out-of-bounds on `iter2` could happen. Container sanitization would detect it. If you have any questions, please email: - advenam.tacet@trailofbits.com - disconnect3d@trailofbits.com NOKEYCHECK=True GitOrigin-RevId: 60ac394dc9ed617f802b33c3b9ac8881ca6a940c
This commit turns on ASan annotations in `std::basic_string` for short stings (SSO case). Originally suggested here: https://reviews.llvm.org/D147680 String annotations added here: llvm/llvm-project#72677 Requires to pass CI without fails: - llvm/llvm-project#75845 - llvm/llvm-project#75858 Annotating `std::basic_string` with default allocator is implemented in llvm/llvm-project#72677 but annotations for short strings (SSO - Short String Optimization) are turned off there. This commit turns them on. This also removes `_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to support turning on and off short string annotations. Support in ASan API exists since llvm/llvm-project@dd1b7b7. You can turn off annotations for a specific allocator based on changes from llvm/llvm-project@2fa1bec. This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in `std::vector` and `std::deque` collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in `std::deque`, or between the `std::basic_string`'s size (including the null terminator) and capacity bounds. The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the `std::equals` function. This function was taking iterators (`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison, using a custom comparison function. When the `iter1` object exceeded the length of `iter2`, an out-of-bounds read could occur on the `iter2` object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability. If you have any questions, please email: advenam.tacet@trailofbits.com disconnect3d@trailofbits.com NOKEYCHECK=True GitOrigin-RevId: d06fb0b29c7030497e0e6411cf256cabd71940c2
Originally merged here: llvm/llvm-project#75882 Reverted here: llvm/llvm-project#78627 Reverted due to failing buildbots. The problem was not caused by the annotations code, but by code in the `UniqueFunctionBase` class and in the `JSON.h` file. That code caused the program to write to memory that was already being used by string objects, which resulted in an ASan error. Fixes are implemented in: - llvm/llvm-project#79065 - llvm/llvm-project#79066 Problematic code from `UniqueFunctionBase` for example: ```cpp #ifndef NDEBUG // In debug builds, we also scribble across the rest of the storage. memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize); #endif ``` --- Original description: This commit turns on ASan annotations in `std::basic_string` for short stings (SSO case). Originally suggested here: https://reviews.llvm.org/D147680 String annotations added here: llvm/llvm-project#72677 Requires to pass CI without fails: - llvm/llvm-project#75845 - llvm/llvm-project#75858 Annotating `std::basic_string` with default allocator is implemented in llvm/llvm-project#72677 but annotations for short strings (SSO - Short String Optimization) are turned off there. This commit turns them on. This also removes `_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to support turning on and off short string annotations. Support in ASan API exists since llvm/llvm-project@dd1b7b7. You can turn off annotations for a specific allocator based on changes from llvm/llvm-project@2fa1bec. This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in `std::vector` and `std::deque` collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in `std::deque`, or between the `std::basic_string`'s size (including the null terminator) and capacity bounds. The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the `std::equals` function. This function was taking iterators (`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison, using a custom comparison function. When the `iter1` object exceeded the length of `iter2`, an out-of-bounds read could occur on the `iter2` object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability. If you have any questions, please email: advenam.tacet@trailofbits.com disconnect3d@trailofbits.com NOKEYCHECK=True GitOrigin-RevId: cb528ec5e6331ce207c7b835d7ab963bd5e13af7
This is 3rd attempt to upstream short string annotations, it's the same as the previous one, but other PRs fixed issues withing LLVM: - llvm#79489 - llvm#79522 Additionaly annotations were updated (but it shouldn't have any impact on anything): - llvm#79292 Now, as far as I know, all buildbots should work without problems. Both previous reverts were not related to issues with string annotations, but with issues in LLVM/clang. Read PRs above and below for details. --- Previous description: Originally merged here: llvm#75882 Reverted here: llvm#78627 Reverted due to failing buildbots. The problem was not caused by the annotations code, but by code in the `UniqueFunctionBase` class and in the `JSON.h` file. That code caused the program to write to memory that was already being used by string objects, which resulted in an ASan error. Fixes are implemented in: - llvm#79065 - llvm#79066 Problematic code from `UniqueFunctionBase` for example: ```cpp // In debug builds, we also scribble across the rest of the storage. memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize); ``` --- Original description: This commit turns on ASan annotations in `std::basic_string` for short stings (SSO case). Originally suggested here: https://reviews.llvm.org/D147680 String annotations added here: llvm#72677 Requires to pass CI without fails: - llvm#75845 - llvm#75858 Annotating `std::basic_string` with default allocator is implemented in llvm#72677 but annotations for short strings (SSO - Short String Optimization) are turned off there. This commit turns them on. This also removes `_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to support turning on and off short string annotations. Support in ASan API exists since llvm@dd1b7b7. You can turn off annotations for a specific allocator based on changes from llvm@2fa1bec. This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in `std::vector` and `std::deque` collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in `std::deque`, or between the `std::basic_string`'s size (including the null terminator) and capacity bounds. The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the `std::equals` function. This function was taking iterators (`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison, using a custom comparison function. When the `iter1` object exceeded the length of `iter2`, an out-of-bounds read could occur on the `iter2` object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability. If you have any questions, please email: advenam.tacet@trailofbits.com disconnect3d@trailofbits.com
This is 3rd attempt to upstream short string annotations, it's the same as the previous one, but other PRs fixed issues withing LLVM: - llvm#79489 - llvm#79522 Additionaly annotations were updated (but it shouldn't have any impact on anything): - llvm#79292 Now, as far as I know, all buildbots should work without problems. Both previous reverts were not related to issues with string annotations, but with issues in LLVM/clang. Read PRs above and below for details. --- Previous description: Originally merged here: llvm#75882 Reverted here: llvm#78627 Reverted due to failing buildbots. The problem was not caused by the annotations code, but by code in the `UniqueFunctionBase` class and in the `JSON.h` file. That code caused the program to write to memory that was already being used by string objects, which resulted in an ASan error. Fixes are implemented in: - llvm#79065 - llvm#79066 Problematic code from `UniqueFunctionBase` for example: ```cpp // In debug builds, we also scribble across the rest of the storage. memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize); ``` --- Original description: This commit turns on ASan annotations in `std::basic_string` for short stings (SSO case). Originally suggested here: https://reviews.llvm.org/D147680 String annotations added here: llvm#72677 Requires to pass CI without fails: - llvm#75845 - llvm#75858 Annotating `std::basic_string` with default allocator is implemented in llvm#72677 but annotations for short strings (SSO - Short String Optimization) are turned off there. This commit turns them on. This also removes `_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to support turning on and off short string annotations. Support in ASan API exists since llvm@dd1b7b7. You can turn off annotations for a specific allocator based on changes from llvm@2fa1bec. This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in `std::vector` and `std::deque` collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in `std::deque`, or between the `std::basic_string`'s size (including the null terminator) and capacity bounds. The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the `std::equals` function. This function was taking iterators (`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison, using a custom comparison function. When the `iter1` object exceeded the length of `iter2`, an out-of-bounds read could occur on the `iter2` object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability. If you have any questions, please email: advenam.tacet@trailofbits.com disconnect3d@trailofbits.com
This is 3rd attempt to upstream short string annotations, it's the same as the previous one, but other PRs fixed issues withing LLVM: - llvm#79489 - llvm#79522 Additionaly annotations were updated (but it shouldn't have any impact on anything): - llvm#79292 Now, as far as I know, all buildbots should work without problems. Both previous reverts were not related to issues with string annotations, but with issues in LLVM/clang. Read PRs above and below for details. --- Previous description: Originally merged here: llvm#75882 Reverted here: llvm#78627 Reverted due to failing buildbots. The problem was not caused by the annotations code, but by code in the `UniqueFunctionBase` class and in the `JSON.h` file. That code caused the program to write to memory that was already being used by string objects, which resulted in an ASan error. Fixes are implemented in: - llvm#79065 - llvm#79066 Problematic code from `UniqueFunctionBase` for example: ```cpp // In debug builds, we also scribble across the rest of the storage. memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize); ``` --- Original description: This commit turns on ASan annotations in `std::basic_string` for short stings (SSO case). Originally suggested here: https://reviews.llvm.org/D147680 String annotations added here: llvm#72677 Requires to pass CI without fails: - llvm#75845 - llvm#75858 Annotating `std::basic_string` with default allocator is implemented in llvm#72677 but annotations for short strings (SSO - Short String Optimization) are turned off there. This commit turns them on. This also removes `_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to support turning on and off short string annotations. Support in ASan API exists since llvm@dd1b7b7. You can turn off annotations for a specific allocator based on changes from llvm@2fa1bec. This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in `std::vector` and `std::deque` collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in `std::deque`, or between the `std::basic_string`'s size (including the null terminator) and capacity bounds. The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the `std::equals` function. This function was taking iterators (`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison, using a custom comparison function. When the `iter1` object exceeded the length of `iter2`, an out-of-bounds read could occur on the `iter2` object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability. If you have any questions, please email: advenam.tacet@trailofbits.com disconnect3d@trailofbits.com
This pull request is the third iteration aiming to integrate short string annotations. This commit includes: - Enabling basic_string annotations for short strings. - Setting a value of `__trivially_relocatable` in `std::basic_string` to `false_type` when compiling with ASan (nothing changes when compiling without ASan). Short string annotations make `std::basic_string` to not be trivially relocatable, because memory has to be unpoisoned. - Adding a `_LIBCPP_STRING_INTERNAL_MEMORY_ACCESS` modifier to two functions. - Creating a macro `_LIBCPP_ASAN_VOLATILE_WRAPPER` to prevent problematic stack optimizations (the macro modifies code behavior only when compiling with ASan). Previously we had issues with compiler optimization, which we understand thanks to @vitalybuka. This commit also addresses smaller changes in short string, since previous upstream attempts. Problematic optimization was loading two values in code similar to: ``` __is_long() ? __get_long_size() : __get_short_size(); ``` We aim to resolve it with the volatile wrapper. This commit is built on top of two previous attempts which descriptions are below. Additionally, in the meantime, annotations were updated (but it shouldn't have any impact on anything): - #79292 --- Previous PR: #79049 Reverted: a16f81f Previous description: Originally merged here: #75882 Reverted here: #78627 Reverted due to failing buildbots. The problem was not caused by the annotations code, but by code in the `UniqueFunctionBase` class and in the `JSON.h` file. That code caused the program to write to memory that was already being used by string objects, which resulted in an ASan error. Fixes are implemented in: - #79065 - #79066 Problematic code from `UniqueFunctionBase` for example: ```cpp // In debug builds, we also scribble across the rest of the storage. memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize); ``` --- Original description: This commit turns on ASan annotations in `std::basic_string` for short stings (SSO case). Originally suggested here: https://reviews.llvm.org/D147680 String annotations added here: #72677 Requires to pass CI without fails: - #75845 - #75858 Annotating `std::basic_string` with default allocator is implemented in #72677 but annotations for short strings (SSO - Short String Optimization) are turned off there. This commit turns them on. This also removes `_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to support turning on and off short string annotations. Support in ASan API exists since dd1b7b7. You can turn off annotations for a specific allocator based on changes from 2fa1bec. This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in `std::vector` and `std::deque` collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in `std::deque`, or between the `std::basic_string`'s size (including the null terminator) and capacity bounds. The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the `std::equals` function. This function was taking iterators (`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison, using a custom comparison function. When the `iter1` object exceeded the length of `iter2`, an out-of-bounds read could occur on the `iter2` object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability. If you have any questions, please email: - advenam.tacet@trailofbits.com - disconnect3d@trailofbits.com
This pull request is the third iteration aiming to integrate short string annotations. This commit includes: - Enabling basic_string annotations for short strings. - Setting a value of `__trivially_relocatable` in `std::basic_string` to `false_type` when compiling with ASan (nothing changes when compiling without ASan). Short string annotations make `std::basic_string` to not be trivially relocatable, because memory has to be unpoisoned. - Adding a `_LIBCPP_STRING_INTERNAL_MEMORY_ACCESS` modifier to two functions. - Creating a macro `_LIBCPP_ASAN_VOLATILE_WRAPPER` to prevent problematic stack optimizations (the macro modifies code behavior only when compiling with ASan). Previously we had issues with compiler optimization, which we understand thanks to @vitalybuka. This commit also addresses smaller changes in short string, since previous upstream attempts. Problematic optimization was loading two values in code similar to: ``` __is_long() ? __get_long_size() : __get_short_size(); ``` We aim to resolve it with the volatile wrapper. This commit is built on top of two previous attempts which descriptions are below. Additionally, in the meantime, annotations were updated (but it shouldn't have any impact on anything): - llvm/llvm-project#79292 --- Previous PR: llvm/llvm-project#79049 Reverted: llvm/llvm-project@a16f81f Previous description: Originally merged here: llvm/llvm-project#75882 Reverted here: llvm/llvm-project#78627 Reverted due to failing buildbots. The problem was not caused by the annotations code, but by code in the `UniqueFunctionBase` class and in the `JSON.h` file. That code caused the program to write to memory that was already being used by string objects, which resulted in an ASan error. Fixes are implemented in: - llvm/llvm-project#79065 - llvm/llvm-project#79066 Problematic code from `UniqueFunctionBase` for example: ```cpp // In debug builds, we also scribble across the rest of the storage. memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize); ``` --- Original description: This commit turns on ASan annotations in `std::basic_string` for short stings (SSO case). Originally suggested here: https://reviews.llvm.org/D147680 String annotations added here: llvm/llvm-project#72677 Requires to pass CI without fails: - llvm/llvm-project#75845 - llvm/llvm-project#75858 Annotating `std::basic_string` with default allocator is implemented in llvm/llvm-project#72677 but annotations for short strings (SSO - Short String Optimization) are turned off there. This commit turns them on. This also removes `_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to support turning on and off short string annotations. Support in ASan API exists since llvm/llvm-project@dd1b7b7. You can turn off annotations for a specific allocator based on changes from llvm/llvm-project@2fa1bec. This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in `std::vector` and `std::deque` collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in `std::deque`, or between the `std::basic_string`'s size (including the null terminator) and capacity bounds. The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the `std::equals` function. This function was taking iterators (`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison, using a custom comparison function. When the `iter1` object exceeded the length of `iter2`, an out-of-bounds read could occur on the `iter2` object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability. If you have any questions, please email: - advenam.tacet@trailofbits.com - disconnect3d@trailofbits.com NOKEYCHECK=True GitOrigin-RevId: 1a96179596099b8a3839050dbff02bfed94502e5
This commit introduces basic annotations for
std::basic_string
, mirroring the approach used instd::vector
andstd::deque
. Initially, only long strings with the default allocator will be annotated. Short strings (SSO - short string optimization) and strings with non-default allocators will be annotated in the near future, with separate commits dedicated to enabling them. The process will be similar to the workflow employed for enabling annotations instd::deque
.Please note: these annotations function effectively only when libc++ and libc++abi dylibs are instrumented (with ASan). This aligns with the prevailing behavior of Memory Sanitizer.
To avoid breaking everything, this commit also appends
_LIBCPP_INSTRUMENTED_WITH_ASAN
to__config_site
whenever libc++ is compiled with ASan. If this macro is not defined, string annotations are not enabled. However, linking a binary that does not annotate strings with a dynamic library that annotates strings, is not permitted.Originally proposed here: https://reviews.llvm.org/D132769
Related patches on Phabricator:
This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in
std::vector
andstd::deque
collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements instd::deque
, or between thestd::basic_string
's size (including the null terminator) and capacity bounds.The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the
std::equals
function. This function was taking iterators (iter1_begin
,iter1_end
,iter2_begin
) to perform the comparison, using a custom comparison function. When theiter1
object exceeded the length ofiter2
, an out-of-bounds read could occur on theiter2
object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability.This Pull Request introduces basic annotations for
std::basic_string
. Long strings exhibit structural similarities tostd::vector
and will be annotated accordingly. Short strings are already implemented, but will be turned on separately in a forthcoming commit. Look at a comment below to read about SSO issues at current moment.Due to the functionality introduced in D132522, the
__sanitizer_annotate_contiguous_container
function now offers compatibility with all allocators. However, enabling this support will be done in a subsequent commit. For the time being, only strings with the default allocator will be annotated.If you have any questions, please email: