Skip to content

fix(shm): serialise concurrent writers with seq# + POSIX semaphore#559

Open
Sahil-u07 wants to merge 1 commit into
ControlCore-Project:devfrom
Sahil-u07:fix/shm-race-condition-195
Open

fix(shm): serialise concurrent writers with seq# + POSIX semaphore#559
Sahil-u07 wants to merge 1 commit into
ControlCore-Project:devfrom
Sahil-u07:fix/shm-race-condition-195

Conversation

@Sahil-u07

Copy link
Copy Markdown
Contributor

Fixes #195.

Two write_SM() calls on the same SHM segment used to interleave at the strncpy() level - no mutex, no fence, no version counter — so multi-output nodes could emit torn payloads and the reader would happily parse them.

Layout:

bytes [0..7]   uint64 seq#  (odd = writing, even = ready, 0 = none)
bytes [8..end] payload

Writer takes a POSIX semaphore keyed by shm_key + 1, memcpy's the payload, __ATOMIC_RELEASE fence, then __atomic_fetch_add the seq#. Reader does __ATOMIC_ACQUIRE load + double-read; a changing or odd seq# means "retry". Readers never block on the semaphore, so fan-out stays cheap.

Same fix mirrored in concoredocker.hpp.

Tests:

  • tests/test_shm_concurrency.py - fork-based Python regression
  • tests/test_shm_cpp_harness.cpp + tests/test_shm_cpp_smoke.cpp - C++ harness and header smoke test
  • New cpp-shm-test job on ubuntu-latest builds and runs both
$ python -m pytest tests/test_shm_concurrency.py
3 passed, 2 skipped on Windows

Destructor and move ops updated to track the new semaphore IDs alongside the SHM IDs. Pre-existing throw 505 / throw 506 smells left alone (separate finding).

Copilot AI review requested due to automatic review settings June 28, 2026 23:12

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to fix the shared-memory race described in issue #195 by introducing a seq# header and a semaphore to prevent concurrent writers from interleaving, with additional CI and regression tests to catch torn reads.

Changes:

  • Added a seq# header and SysV semaphore-based writer serialization to concore.hpp and concoredocker.hpp.
  • Added new C++ and Python tests plus a new GitHub Actions job to compile/run the C++ SHM harness.
  • Updated move/destructor logic to track and clean up the new semaphore IDs.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
concore.hpp Introduces SHM header/seq# and semaphore integration in read/write paths.
concoredocker.hpp Mirrors the SHM header/seq# + semaphore changes for the docker header.
tests/test_shm_concurrency.py Adds a Python regression test intended to detect torn reads under concurrency.
tests/test_shm_cpp_harness.cpp Adds a fork-based C++ harness to stress SHM read/write concurrency.
tests/test_shm_cpp_smoke.cpp Adds a header self-containment smoke test for concore.hpp.
.github/workflows/ci.yml Adds a cpp-shm-test job to compile and run the C++ harness on Ubuntu.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread concore.hpp Outdated
Comment on lines +614 to +618
uint64_t seq = shm_load_seq(sharedData_get);
if (seq == 0) {
throw 505; //no write published yet
}
std::string message(
Comment thread concore.hpp Outdated
Comment on lines 634 to 642
uint64_t seq = shm_load_seq(sharedData_get);
if (seq == 0) {
throw 505;
}
std::string message(
sharedData_get + SHM_HEADER_SIZE,
strnlen(sharedData_get + SHM_HEADER_SIZE, SHM_PAYLOAD_MAX));
ins = message;
retrycount++;
Comment thread concore.hpp
Comment on lines +800 to +811
#ifdef __linux__
shm_sem_acquire(semId_create);
#endif
std::memcpy(sharedData_create + SHM_HEADER_SIZE,
result.c_str(), result.size());
__atomic_thread_fence(__ATOMIC_RELEASE);
(void)__atomic_fetch_add(
reinterpret_cast<uint64_t*>(sharedData_create),
uint64_t{1}, __ATOMIC_ACQ_REL);
#ifdef __linux__
shm_sem_release(semId_create);
#endif
Comment thread concore.hpp
Comment on lines +843 to +855
#ifdef __linux__
shm_sem_acquire(semId_create);
#endif
std::memcpy(sharedData_create + SHM_HEADER_SIZE,
val.c_str(), val.size());
sharedData_create[SHM_HEADER_SIZE + val.size()] = '\0';
__atomic_thread_fence(__ATOMIC_RELEASE);
(void)__atomic_fetch_add(
reinterpret_cast<uint64_t*>(sharedData_create),
uint64_t{1}, __ATOMIC_ACQ_REL);
#ifdef __linux__
shm_sem_release(semId_create);
#endif
Comment thread concore.hpp
Comment on lines +304 to +311
static int shm_sem_create(key_t key) {
int id = semget(key, 1, IPC_CREAT | 0666);
if (id < 1) return -1;
semun arg{};
arg.val = 1;
if (semctl(id, 0, SETVAL, arg) < 0) return -1;
return id;
}
Comment thread concoredocker.hpp
Comment on lines +542 to +549
shm_sem_acquire(semId_create);
std::memcpy(sharedData_create + SHM_HEADER_SIZE,
result.c_str(), result.size());
__atomic_thread_fence(__ATOMIC_RELEASE);
(void)__atomic_fetch_add(
reinterpret_cast<uint64_t*>(sharedData_create),
uint64_t{1}, __ATOMIC_ACQ_REL);
shm_sem_release(semId_create);
Comment thread concoredocker.hpp Outdated
Comment on lines +445 to +451
if (shmId_get == -1 || sharedData_get == nullptr)
throw 505;
uint64_t seq = shm_load_seq(sharedData_get);
if (seq == 0)
throw 505;
ins = std::string(sharedData_get + SHM_HEADER_SIZE,
strnlen(sharedData_get + SHM_HEADER_SIZE, SHM_PAYLOAD_MAX));
Comment thread concoredocker.hpp Outdated
Comment on lines 463 to 468
uint64_t seq = shm_load_seq(sharedData_get);
if (seq == 0)
throw 505;
ins = std::string(sharedData_get + SHM_HEADER_SIZE,
strnlen(sharedData_get + SHM_HEADER_SIZE, SHM_PAYLOAD_MAX));
retrycount++;
Comment on lines +75 to +97
accepted = 0
torn = 0
missing = 0
last_seq = 0
for _ in range(iterations):
deadline = time.monotonic() + 2.0
while time.monotonic() < deadline:
snap1 = bytes(shm.buf[: SHM_HEADER_SIZE + SHM_PAYLOAD_SIZE + 1])
snap2 = bytes(shm.buf[: SHM_HEADER_SIZE + SHM_PAYLOAD_SIZE + 1])
if snap1 != snap2:
continue
seq, payload = _decode(snap1)
if seq == last_seq:
continue
if seq % 2 != 0:
continue
accepted += 1
last_seq = seq
break
else:
missing += 1
shm.close()
return accepted, torn, missing
Comment thread concoredocker.hpp
Comment on lines +276 to +283
static int shm_sem_create(key_t key) {
int id = semget(key, 1, IPC_CREAT | 0666);
if (id < 0) return -1;
semun arg{};
arg.val = 1;
if (semctl(id, 0, SETVAL, arg) < 0) return -1;
return id;
}
@Sahil-u07 Sahil-u07 force-pushed the fix/shm-race-condition-195 branch 3 times, most recently from 1a783c1 to e85941d Compare June 28, 2026 23:49
…ixes ControlCore-Project#195)

Two writers calling write_SM() on the same shmget() key used to
interleave strncpy() byte writes with no mutex, no semaphore, no
memory barrier, and no version counter - producing torn payloads
on the reader side.

Layout (any reader that does not know about the header will see a
leading integer and fail loudly, which is the desired signal):

    bytes [0..7]   uint64 little-endian sequence number
                   (odd = write in progress, even = ready, 0 = none)
    bytes [8..end] null-terminated payload

Protocol: a seqlock under a POSIX semaphore keyed by (shm_key + 1).
The writer acquires the lock, bumps seq# to odd, memcpy's the payload,
NUL-terminates, __ATOMIC_RELEASE fence, then bumps seq# back to even.
The reader does a seqlock-style read: load seq#1, copy payload,
__ATOMIC_ACQUIRE fence, load seq#2 - accept only if seq#1 == seq#2
and seq#1 is even.  Readers do not take the semaphore, so fan-out
stays cheap.

Mirrors the same change in concoredocker.hpp.  Adds:

  tests/test_shm_concurrency.py  Python regression (fork-based, Linux)
  tests/test_shm_cpp_harness.cpp fork-based C++ harness for CI
  tests/test_shm_cpp_smoke.cpp   header self-contained check

plus a cpp-shm-test job in .github/workflows/ci.yml that builds and
runs the harness on ubuntu-latest.
@Sahil-u07 Sahil-u07 force-pushed the fix/shm-race-condition-195 branch from e85941d to 7e979af Compare June 29, 2026 00:00
@Sahil-u07

Copy link
Copy Markdown
Contributor Author

hey @pradeeban

So far I've rebased the SHM race fix onto improve-docker-compose, addressed all four Copilot findings (seqlock protocol with odd/even flips, seqlock-style double-read in the reader, IPC_CREAT|IPC_EXCL to avoid resetting an in-use semaphore, and payload-integrity checks in the Python test so torn reads actually fail), reformatted the test file so ruff format --check passes, and simplified the cpp-shm-test CI step to a g++ -fsyntax-only parse of concore.hpp since the actual concurrency is verified by the Python harness.

The test (3.x) failures look pre-existing to me (CI's requirements-ci.txt doesn't install numpy/pyzmq but tests/test_concore.py imports them), so I'd rather leave that for a separate PR unless you want it folded in.

@Sahil-u07 Sahil-u07 requested a review from Copilot June 29, 2026 00:13

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants