Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory corruption issue using C++ binding #624

Open
rbdm-qnt opened this issue Feb 12, 2025 · 13 comments
Open

Memory corruption issue using C++ binding #624

rbdm-qnt opened this issue Feb 12, 2025 · 13 comments
Labels
bug Something isn't working

Comments

@rbdm-qnt
Copy link

Required information

Linux 6.5.0-1018-aws #18~22.04.1-Ubuntu SMP Fri Apr 5 17:44:33 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

rustc 1.75.0 (82e1608df 2023-12-21) (built from a source tarball)

cargo 1.75.0

iceoryx2 version:
main branch, ICEORYX2_VERSION_STRING="0.5.0", commit hash: 5b45d39

Detailed log output:
Attached below

Observed result or behaviour:
This happened while using Pub/Sub mode using C++ bindings. It happened in the publisher application. We have multiple pub instances in the same application in different threads publishing data on the same bus. Logging level is TRACE. We had no crashes so far, and about 400MB worth of this error in the log files per day. We publish 500GB+ of data per day via the bus. No error and no logs for the subscriber.

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], payload_size: 280 }], data_segment: DataSegment { memory: Static(Memory { storage: Storage { shm: SharedMemory { name: FileName { value: FixedSizeByteString<255> { len: 91, data: "iox2_0354a209029e7d094a819e2d4030ea331e6caaf0_239570042580385605080812073151.publisher_data" } }, size: 10127376534, base_address: 0x7856591a3000, has_ownership: true, file_descriptor: FileDescriptor { value: 17, is_owned: true }, memory_lock: None }, name: FileName { value: FixedSizeByteString<255> { len: 30, data: "239570042580385605080812073151" } }, _phantom_data: PhantomData<iceoryx2_cal::shared_memory::common::details::AllocatorDetails<iceoryx2_cal::shm_allocator::pool_allocator::PoolAllocator>> }, name: FileName { value: FixedSizeByteString<255> { len: 30, data: "239570042580385605080812073151" } }, payload_start_address: 132312257409160, _phantom: PhantomData<iceoryx2_cal::shm_allocator::pool_allocator::PoolAllocator> }) }, port_id: UniquePublisherId(UniqueSystemId { value: 239570042580385605080812073151, pid: 3387583, creation_time: Time { clock_type: Realtime, seconds: 1739220282, nanoseconds: 102216143 } }), config: LocalPublisherConfig { max_loaned_samples: 8192, unable_to_deliver_strategy: Block, degration_callback: None, initial_max_slice_len: 1, allocation_strategy: Static }, service_state: ServiceState { static_config: StaticConfig { service_id: ServiceId(RestrictedFileName { value: FixedSizeByteString<64> { len: 40, data: "b7e671b10f398ee12fd67c143c8e378808d973e3" } }), service_name: ServiceName { value: "Company/App/my_program" }, attributes: AttributeSet([]), messaging_pattern: PublishSubscribe(StaticConfig { max_subscribers: 256, max_publishers: 256, max_nodes: 256, history_size: 0, subscriber_max_buffer_size: 131072, subscriber_max_borrowed_samples: 8192, enable_safe_overflow: true, message_type_details: MessageTypeDetails { header: TypeDetail { variant: FixedSize, type_name: "iceoryx2::service::header::publish_subscribe::Header", size: 24, alignment: 8 }, user_header: TypeDetail { variant: FixedSize, type_name: "v", size: 0, alignment: 1 }, payload: TypeDetail { variant: FixedSize, type_name: "N6my_program18FixedSizeByteArrayE", size: 256, alignment: 1 } } }) }, shared_node: SharedNode { id: NodeId(UniqueSystemId { value: 160319116027617802878482362559, pid: 3387583, creation_time: Time { clock_type: Realtime, seconds: 1739220282, nanoseconds: 100982102 } }), details: NodeDetails { executable: FileName { value: FixedSizeByteString<255> { len: 15, data: "my_program_publisher" } }, name: NodeName { value: "" }, config: Config { global: Global { root_path_unix: Path { value: FixedSizeByteString<255> { len: 14, data: "/tmp/iceoryx2/" } }, root_path_windows: Path { value: FixedSizeByteString<255> { len: 17, data: "c:\Temp\iceoryx2\" } }, prefix: FileName { value: FixedSizeByteString<255> { len: 5, data: "iox2_" } }, service: Service { directory: Path { value: FixedSizeByteString<255> { len: 8, data: "services" } }, publisher_data_segment_suffix: FileName { value: FixedSizeByteString<255> { len: 15, data: ".publisher_data" } }, static_config_storage_suffix: FileName { value: FixedSizeByteString<255> { len: 8, data: ".service" } }, dynamic_config_storage_suffix: FileName { value: FixedSizeByteString<255> { len: 8, data: ".dynamic" } }, creation_timeout: 500ms, connection_suffix: FileName { value: FixedSizeByteString<255> { len: 11, data: ".connection" } }, event_connection_suffix: FileName { value: FixedSizeByteString<255> { len: 6, data: ".event" } } }, node: Node { directory: Path { value: FixedSizeByteString<255> { len: 5, data: "nodes" } }, monitor_suffix: FileName { value: FixedSizeByteString<255> { len: 13, data: ".node_monitor" } }, static_config_suffix: FileName { value: FixedSizeByteString<255> { len: 8, data: ".details" } }, service_tag_suffix: FileName { value: FixedSizeByteString<255> { len: 12, data: ".service_tag" } }, cleanup_dead_nodes_on_creation: true, cleanup_dead_nodes_on_destruction: true } }, defaults: Defaults { publish_subscribe: PublishSubscribe { max_subscribers: 256, max_publishers: 256, max_nodes: 256, subscriber_max_buffer_size: 131072, subscriber_max_borrowed_samples: 8192, publisher_max_loaned_samples: 8192, publisher_history_size: 0, enable_safe_overflow: true, unable_to_deliver_strategy: Block, subscriber_expired_connection_buffer: 128 }, event: Event { max_listeners: 16, max_notifiers: 16, max_nodes: 36, event_id_max_value: 4294967295, deadline: None, notifier_created_event: None, notifier_dropped_event: None, notifier_dead_event: None }, request_response: RequestResonse { enable_safe_overflow_for_requests: true, enable_safe_overflow_for_responses: true, max_active_responses: 4, max_active_requests: 2, max_borrowed_responses: 4, max_borrowed_requests: 2, max_response_buffer_size: 2, max_request_buffer_size: 4, max_servers: 2, max_clients: 8, max_nodes: 20 } } } }, monitoring_token: UnsafeCell { .. }, registered_services: RegisteredServices { data: Mutex { data: {ServiceId(RestrictedFileName { value: FixedSizeByteString<64> { len: 40, data: "b7e671b10f398ee12fd67c143c8e378808d973e3" } }): (ContainerHandle { index: 3, container_id: 2 }, 1)}, poisoned: false, .. } }, signal_handling_mode: HandleTerminationRequests, _details_storage: Storage { name: FileName { value: FixedSizeByteString<255> { len: 4, data: "node" } }, config: Configuration { path: Path { value: FixedSizeByteString<255> { len: 50, data: "/tmp/iceoryx2/nodes/160319116027617802878482362559" } }, suffix: FileName { value: FixedSizeByteString<255> { len: 8, data: ".details" } }, prefix: FileName { value: FixedSizeByteString<255> { len: 5, data: "iox2_" } } }, has_ownership: false, file: File { path: Some(FilePath { value: FixedSizeByteString<255> { len: 68, data: "/tmp/iceoryx2/nodes/160319116027617802878482362559/iox2_node.details" } }), file_descriptor: FileDescriptor { value: 14, is_owned: true }, has_ownership: false }, len: 1471 } }, dynamic_storage: Storage { shm: SharedMemory { name: FileName { value: FixedSizeByteString<255> { len: 94, data: "iox2_305ad9523c6b202364d581359ec3d2c5743e42e7_b7e671b10f398ee12fd67c143c8e378808d973e3.dynamic" } }, size: 38211, base_address: 0x785b3786f000, has_ownership: false, file_descriptor: FileDescriptor { value: 20, is_owned: true }, memory_lock: None }, name: FileName { value: FixedSizeByteString<255> { len: 40, data: "b7e671b10f398ee12fd67c143c8e378808d973e3" } }, _phantom_data: PhantomData<iceoryx2::service::dynamic_config::DynamicConfig> }, static_storage: Storage { name: FileName { value: FixedSizeByteString<255> { len: 40, data: "b7e671b10f398ee12fd67c143c8e378808d973e3" } }, config: Configuration { path: Path { value: FixedSizeByteString<255> { len: 22, data: "/tmp/iceoryx2/services" } }, suffix: FileName { value: FixedSizeByteString<255> { len: 8, data: ".service" } }, prefix: FileName { value: FixedSizeByteString<255> { len: 5, data: "iox2_" } } }, has_ownership: false, file: File { path: Some(FilePath { value: FixedSizeByteString<255> { len: 76, data: "/tmp/iceoryx2/services/iox2_b7e671b10f398ee12fd67c143c8e378808d973e3.service" } }), file_descriptor: FileDescriptor { value: 18, is_owned: true }, has_ownership: false }, len: 762 } }, subscriber_connections: SubscriberConnections { connections: [UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. }], port_id: UniquePublisherId(UniqueSystemId { value: 239570042580385605080812073151, pid: 3387583, creation_time: Time { clock_type: Realtime, seconds: 1739220282, nanoseconds: 102216143 } }), shared_node: SharedNode { id: NodeId(UniqueSystemId { value: 160319116027617802878482362559, pid: 3387583, creation_time: Time { clock_type: Realtime, seconds: 1739220282, nanoseconds: 100982102 } }), details: NodeDetails { executable: FileName { value: FixedSizeByteString<255> { len: 15, data: "my_program_publisher" } }, name: NodeName { value: "" }, config: Config { global: Global { root_path_unix: Path { value: FixedSizeByteString<255> { len: 14, data: "/tmp/iceoryx2/" } }, root_path_windows: Path { value: FixedSizeByteString<255> { len: 17, data: "c:\Temp\iceoryx2\" } }, prefix: FileName { value: FixedSizeByteString<255> { len: 5, data: "iox2_" } }, service: Service { directory: Path { value: FixedSizeByteString<255> { len: 8, data: "services" } }, publisher_data_segment_suffix: FileName { value: FixedSizeByteString<255> { len: 15, data: ".publisher_data" } }, static_config_storage_suffix: FileName { value: FixedSizeByteString<255> { len: 8, data: ".service" } }, dynamic_config_storage_suffix: FileName { value: FixedSizeByteString<255> { len: 8, data: ".dynamic" } }, creation_timeout: 500ms, connection_suffix: FileName { value: FixedSizeByteString<255> { len: 11, data: ".connection" } }, event_connection_suffix: FileName { value: FixedSizeByteString<255> { len: 6, data: ".event" } } }, node: Node { directory: Path { value: FixedSizeByteString<255> { len: 5, data: "nodes" } }, monitor_suffix: FileName { value: FixedSizeByteString<255> { len: 13, data: ".node_monitor" } }, static_config_suffix: FileName { value: FixedSizeByteString<255> { len: 8, data: ".details" } }, service_tag_suffix: FileName { value: FixedSizeByteString<255> { len: 12, data: ".service_tag" } }, cleanup_dead_nodes_on_creation: true, cleanup_dead_nodes_on_destruction: true } }, defaults: Defaults { publish_subscribe: PublishSubscribe { max_subscribers: 256, max_publishers: 256, max_nodes: 256, subscriber_max_buffer_size: 131072, subscriber_max_borrowed_samples: 8192, publisher_max_loaned_samples: 8192, publisher_history_size: 0, enable_safe_overflow: true, unable_to_deliver_strategy: Block, subscriber_expired_connection_buffer: 128 }, event: Event { max_listeners: 16, max_notifiers: 16, max_nodes: 36, event_id_max_value: 4294967295, deadline: None, notifier_created_event: None, notifier_dropped_event: None, notifier_dead_event: None }, request_response: RequestResonse { enable_safe_overflow_for_requests: true, enable_safe_overflow_for_responses: true, max_active_responses: 4, max_active_requests: 2, max_borrowed_responses: 4, max_borrowed_requests: 2, max_response_buffer_size: 2, max_request_buffer_size: 4, max_servers: 2, max_clients: 8, max_nodes: 20 } } } }, monitoring_token: UnsafeCell { .. }, registered_services: RegisteredServices { data: Mutex { data: {ServiceId(RestrictedFileName { value: FixedSizeByteString<64> { len: 40, data: "b7e671b10f398ee12fd67c143c8e378808d973e3" } }): (ContainerHandle { index: 3, container_id: 2 }, 1)}, poisoned: false, .. } }, signal_handling_mode: HandleTerminationRequests, _details_storage: Storage { name: FileName { value: FixedSizeByteString<255> { len: 4, data: "node" } }, config: Configuration { path: Path { value: FixedSizeByteString<255> { len: 50, data: "/tmp/iceoryx2/nodes/160319116027617802878482362559" } }, suffix: FileName { value: FixedSizeByteString<255> { len: 8, data: ".details" } }, prefix: FileName { value: FixedSizeByteString<255> { len: 5, data: "iox2_" } } }, has_ownership: false, file: File { path: Some(FilePath { value: FixedSizeByteString<255> { len: 68, data: "/tmp/iceoryx2/nodes/160319116027617802878482362559/iox2_node.details" } }), file_descriptor: FileDescriptor { value: 14, is_owned: true }, has_ownership: false }, len: 1471 } }, static_config: StaticConfig { max_subscribers: 256, max_publishers: 256, max_nodes: 256, history_size: 0, subscriber_max_buffer_size: 131072, subscriber_max_borrowed_samples: 8192, enable_safe_overflow: true, message_type_details: MessageTypeDetails { header: TypeDetail { variant: FixedSize, type_name: "iceoryx2::service::header::publish_subscribe::Header", size: 24, alignment: 8 }, user_header: TypeDetail { variant: FixedSize, type_name: "v", size: 0, alignment: 1 }, payload: TypeDetail { variant: FixedSize, type_name: "N6my_program18FixedSizeByteArrayE", size: 256, alignment: 1 } } }, number_of_samples: 35659776, max_number_of_segments: 1 }, subscriber_list_state: UnsafeCell { .. }, history: None, static_config: StaticConfig { service_id: ServiceId(RestrictedFileName { value: FixedSizeByteString<64> { len: 40, data: "b7e671b10f398ee12fd67c143c8e378808d973e3" } }), service_name: ServiceName { value: "Company/App/my_program" }, attributes: AttributeSet([]), messaging_pattern: PublishSubscribe(StaticConfig { max_subscribers: 256, max_publishers: 256, max_nodes: 256, history_size: 0, subscriber_max_buffer_size: 131072, subscriber_max_borrowed_samples: 8192, enable_safe_overflow: true, message_type_details: MessageTypeDetails { header: TypeDetail { variant: FixedSize, type_name: "iceoryx2::service::header::publish_subscribe::Header", size: 24, alignment: 8 }, user_header: TypeDetail { variant: FixedSize, type_name: "v", size: 0, alignment: 1 }, payload: TypeDetail { variant: FixedSize, type_name: "N6my_program18FixedSizeByteArrayE", size: 256, alignment: 1 } } }) }, loan_counter: 0, is_active: true } 
| Unable to reclaim samples from connection Connection { sender: Sender { storage: Storage { shm: SharedMemory { name: FileName { value: FixedSizeByteString<255> { len: 117, data: "iox2_b9fc73e5c1f646968758453273c6c65cb372831b_239570042580385605080812073151_92154754679238304942278291628.connection" } }, size: 37822693, base_address: 0x785645d7f000, has_ownership: false, file_descriptor: FileDescriptor { value: 19, is_owned: true }, memory_lock: None }, name: FileName { value: FixedSizeByteString<255> { len: 60, data: "239570042580385605080812073151_92154754679238304942278291628" } }, _phantom_data: PhantomData<iceoryx2_cal::zero_copy_connection::common::details::SharedManagementData> }, name: FileName { value: FixedSizeByteString<255> { len: 60, data: "239570042580385605080812073151_92154754679238304942278291628" } } }, subscriber_id: UniqueSubscriberId(UniqueSystemId { value: 92154754679238304942278291628, pid: 3387564, creation_time: Time { clock_type: Realtime, seconds: 1739220278, nanoseconds: 700751965 } }) } due to ReceiverReturnedCorruptedPointerOffset. This may lead to a situation where no more samples will be delivered to this connection.
@rbdm-qnt rbdm-qnt added the bug Something isn't working label Feb 12, 2025
@elfenpiff
Copy link
Contributor

@rbdm-qnt This issue should only arise when the Subscriber returns a sample twice or some kind of race condition occurs. Returning a sample twice could happen if you have some kind of lifetime issue on the subscriber side.

Please be aware that:

  • the Publisher, Subscriber, Sample are not threadsafe
  • the publisher, subscriber and sample are not allowed to be moved between threads. You cannot receive something in thread A and move the sample to another thread B. For such use cases you must create a subscriber per thread - but this does not affect the overall performance.

Could you provide us a code snippet of the subscriber receive part that snapshots the whole lifetime of a received sample (from receive until it goes out-of-scope) and give me a bit more context so that I might be able to reproduce it locally.

Also could you please attach a the logfile to the issue - not as copy and paste (then github seems to cut out parts). If its too large, maybe the last 100 lines.

@rbdm-qnt
Copy link
Author

This is the method we use to receive the data in the Sub. The sub is single-threaded and has only 1 instance of iceoryx2 Bus. The callback reads the data and appends it to a file to save it to disk. The sub never gets destroyed, the application runs 24/7 with no interruption and the process or thread are never restarted. The processing of the sample is completely sync with no other threads active.

const iox::string<256> Bus::kReceiveSucceeds = "receive succeeds";

void Bus::Sub::receive(std::function<void(const FixedSizeByteArray&)> cb) {
  auto sample = subscriber_->receive().expect(kReceiveSucceeds);
  if (sample.has_value()) {
    cb(sample->payload());
  }
}

@rbdm-qnt This issue should only arise when the Subscriber returns a sample twice or some kind of race condition occurs. Returning a sample twice could happen if you have some kind of lifetime issue on the subscriber side.

Please be aware that:

  • the Publisher, Subscriber, Sample are not threadsafe
  • the publisher, subscriber and sample are not allowed to be moved between threads. You cannot receive something in thread A and move the sample to another thread B. For such use cases you must create a subscriber per thread - but this does not affect the overall performance.

Could you provide us a code snippet of the subscriber receive part that snapshots the whole lifetime of a received sample (from receive until it goes out-of-scope) and give me a bit more context so that I might be able to reproduce it locally.

Also could you please attach a the logfile to the issue - not as copy and paste (then github seems to cut out parts). If its too large, maybe the last 100 lines.

@elBoberido
Copy link
Member

@rbdm-qnt just to be clear. You do not experience (visible) communication issues. It's just the log that indicates that there might be issues, right?

@rbdm-qnt
Copy link
Author

@rbdm-qnt just to be clear. You do not experience (visible) communication issues. It's just the log that indicates that there might be issues, right?

Not that I'm aware of. From the error message it doesn't seem like a sample is lost (although this is almost impossible to verify on our side), just that one of preallocated chunks got corrupted, so I don't know if this will cause a crash down the line after the application runs for many days. I've noticed that this issue happens about twice a day, each time writing 2 lines (that I posted at the start of this issue) that weight about 50MB in the log files: it basically prints my entire preallocated memory chunk as all zeros. This seems to happen at a similar frequency to that of the crashes we used to get with Iceoryx 1. The error messages were similar, I can't find one now but it was something along the lines of POPO_CHUNK_INVALID_CHUNK, and it was fatal, so it's a big upgrade either way.

Would it improve things if I changed my design to something like this to immediately free the sample before calling my callback? This would minimise the time we "hold on" to the sample as much as possible, and make it constant. We'd rather not copy those 256 bytes one extra time, but if it guarantees stability we can temporarily keep it this way.

void Bus::Sub::receive(std::function<void(const FixedSizeByteArray&)> cb, FixedSizeByteArray& buffer) {
  auto sample = subscriber_->receive().expect(kReceiveSucceeds);
  if (!sample.has_value())
    return;
  
  std::memcpy(buffer.data(), sample->payload().data(), FixedSizeByteArraySize);
  cb(buffer);
}

@elfenpiff
Copy link
Contributor

@rbdm-qnt I think we are one step closer to a possible solution. I could reproduce your bug, but only in three misuse scenario.

  1. Calling Publisher::send_copy() on the same publisher but from different threads
  2. Calling Publisher::loan() or Publisher::loan_uninit() but from different threads
  3. Calling send(std::move(sample)) in a different thread then where the Publisher::loan() was called to get the sample.

Could you take a look at the publisher side if you find any instance where you might access the publisher by accident from multiple threads. Or maybe you can share a piece of code here.

Also to point it out. In classic iceoryx and iceoyrx2 you are not allows to loan a sample in one thread and send or access it in another thread. send(std::move(sample)) must be called from the same thread as it was loaned.

@rbdm-qnt
Copy link
Author

@rbdm-qnt I think we are one step closer to a possible solution. I could reproduce your bug, but only in three misuse scenario.

  1. Calling Publisher::send_copy() on the same publisher but from different threads
  2. Calling Publisher::loan() or Publisher::loan_uninit() but from different threads
  3. Calling send(std::move(sample)) in a different thread then where the Publisher::loan() was called to get the sample.

Could you take a look at the publisher side if you find any instance where you might access the publisher by accident from multiple threads. Or maybe you can share a piece of code here.

Also to point it out. In classic iceoryx and iceoyrx2 you are not allows to loan a sample in one thread and send or access it in another thread. send(std::move(sample)) must be called from the same thread as it was loaned.

Ok, then we'll do an in-depth code review of how we use our publishers, do some tests and report back in a week or so. So you confirm that the change I proposed to my "receive" method in my previous comment is useless, right?

@elfenpiff
Copy link
Contributor

@rbdm-qnt You are already using the subscriber correctly therefore there is no need for this change.

Ok, then we'll do an in-depth code review of how we use our publishers, do some tests and report back in a week or so.

If we can support you, please let us know! I suspect there is a concurrency issue with the publisher usage since you seem to have the same issue with classic iceoryx which has the same restriction.

@elBoberido
Copy link
Member

@rbdm-qnt you mentioned that you have the issue twice a day, each time with two log entries. Does it happen more or less periodically? I just want to rule out that there is somewhere a overflow or something similar, e.g. for twice a day, a 32 bit integer would overflow when the publishing rate would be 100kH.

@rbdm-qnt
Copy link
Author

Thank you for your availability! Will keep you posted after we review everything.

It doesn't happen exactly periodically. I've seen it happen once every 20-28 hours, and I've seen it happen every 4-10 hours. But the field of application is finance, so the message rate has a ton of variance, so I wouldn't rule this theory out completely. For reference, we are dealing with 2-5 billion messages per day, on average.

@rbdm-qnt
Copy link
Author

Update, so it looks like we had an edge case where we did use a publisher in 2 threads, we fixed and the issue seems to have disappeared. Thanks! On the flip side, this happened today on one of the servers:

[New Thread 0x7ffeffe00640 (LWP 3277839)]
        0 [W] "Config::global_config()"
              | No config file was loaded, a config with default values will be used.

Thread 1 “my_program” received signal SIGBUS, Bus error.
0x00005555557a4511 in <iceoryx2_cal::dynamic_storage::posix_shared_memory::Builder<T> as iceoryx2_cal::dynamic_storage::DynamicStorageBuilder<T,iceoryx2_cal::dynamic_storage::posix_shared_memory::Storage<T>>>::create ()

So, the message about "Config::global_config()" always appears on startup, I haven't figured out how to make it load a config but I think the default is fine, we put our desired settings as compile flags.

The problem is, the program crashed due to the second part of the message, any attempt to restart the process failed, and it was solved by a reboot of the server. I have no idea what it could be related to, nothing out of the ordinary happened, the RAM, CPU and disk space were all fine.

@rbdm-qnt
Copy link
Author

Pinging @elfenpiff @elBoberido

@rbdm-qnt
Copy link
Author

rbdm-qnt commented Mar 1, 2025

Getting "Bus error." again every few days. The program crashes with this error, every restart fails with the same error, and it can only be solved by a system reboot. @elBoberido @elfenpiff

@elfenpiff
Copy link
Contributor

@rbdm-qnt

So, the message about "Config::global_config()" always appears on startup, I haven't figured out how to make it load a config but I think the default is fine, we put our desired settings as compile flags.

We fixed this in main and in the upcoming v0.6 release. Up to v0.5 iceoryx2 expects that the config file can be found under $PWD/config/iceoryx2.toml and if its not present it loads the default values.

With v0.6 it looks it up under:

  1. $PWD/config/iceoryx2.toml
  2. $HOME/.config/iceoryx2/iceoryx2.toml
  3. /etc/iceoryx2/iceoryx2.toml

See the documentation in: https://github.com/eclipse-iceoryx/iceoryx2/tree/main/config

The problem is, the program crashed due to the second part of the message, any attempt to restart the process failed, and it was solved by a reboot of the server. I have no idea what it could be related to, nothing out of the ordinary happened, the RAM, CPU and disk space were all fine.

The error looks familiar. I think in classic iceoryx we had this problem when the user tried to acquire more memory than the system provided and then the shm_open or memset call failed with this errno. It seems that the this could be the case here as well. Could you set the log level to trace and attach the logfile which leads up to this error?

Based on this, we could find the exact location and provide you a more helpful error message.

Also take a look at this: https://github.com/eclipse-iceoryx/iceoryx2/blob/main/FAQ.md#run-out-of-memory-when-creating-publisher-with-a-large-service-payload

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants