experchange > cpp experimental multi-mutex algorithm...

 Chris M. Thomasson (11-26-18, 08:19 AM)
This code allows a thread to dynamically lock multiple addresses in a
totally atomic fashion, and free of deadlock. It is based on hashing
pointers into a table of locks. A thread can create a little array of
hashed pointers, sort them, remove duplicates, then take all of the
locks in the table. The sorting and uniqueness factor prevents deadlock.
_________________
// pseudo-code

mtx_table lock_table;

// some vars
int a = 3;
short b = 6;
double x = 1.234;

{
locker locks(lock_table); // our hashed locks
locks.push(&a);
locks.push(&x);
locks.push(&b);

locks.lock();
// a, x, and b are all locked!
locks.unlock();
}
_________________

Here is my code, without using threads for now. Take a look at the main
function for a real example on how to use it. The code is crude, please
try to take some pity...
_____________________________
// Experimental Multi-Mutex Algorithm By Chris. M. Thomasson

#include <iostream>
#include <functional>
#include <algorithm>
#include <mutex>
#include <cassert>
#include <cstddef>
#include <cstdint>
#include <vector>

// Allows one to lock multiple addresses
// at once, and never hit a deadlock.
// It is an experiment.
namespace ct_multex
{
// A table of locks
struct mutex_table
{
std::vector<std::mutex> m_locks;

mutex_table(std::size_t size) : m_locks(size) { assert(size > 0); }

// totally contrived simple hash...
std::size_t hash(void const* ptr)
{
std::uintptr_t ptr_uint = (std::uintptr_t)ptr;
return (std::size_t)(((ptr_uint << 9) * 103) % m_locks.size());
}
};

// A threads local indices into a table of locks
struct local_locks
{
std::vector<std::size_t> m_lock_idxs;
mutex_table& m_mtx_tbl;

local_locks(mutex_table& mtx_tbl) :
m_lock_idxs(), m_mtx_tbl(mtx_tbl) {}

void push_ptr(void const* ptr)
{
std::size_t idx = m_mtx_tbl.hash(ptr);
m_lock_idxs.push_back(idx);
}

void ensure_locking_order()
{
// sort and remove duplicates
std::sort(m_lock_idxs.begin(), m_lock_idxs.end());
m_lock_idxs.erase(std::unique(m_lock_idxs.begin(),
m_lock_idxs.end()), m_lock_idxs.end());
}

// Take all of the locks
void lock()
{
// there can be a flag to minimize this...
ensure_locking_order();

std::size_t n = m_lock_idxs.size();

for (std::size_t i = 0; i < n; ++i)
{
m_mtx_tbl.m_locks[m_lock_idxs[i]].lock();
}
}

// Unlock everything
void unlock()
{
std::size_t n = m_lock_idxs.size();

for (std::size_t i = 0; i < n; ++i)
{
m_mtx_tbl.m_locks[m_lock_idxs[n - i - 1]].unlock();
}
}
};

// RAII scoped lock: Allows a thread to actualy take the locks
// It locks a threads local lock indices
struct scoped_lock
{
local_locks& m_locks;

scoped_lock(local_locks& locks) : m_locks(locks)
{
m_locks.lock();
}

~scoped_lock() throw()
{
m_locks.unlock();
}
};
}

int main(void)
{
{
// Our lock table
ct_multex::mutex_table mtx_tbl(42);

int data_0 = 123;
long data_1 = 456;
short data_2 = 789;

// Usage...
{
ct_multex::local_locks locks(mtx_tbl);

locks.push_ptr(&data_1);
locks.push_ptr(&data_0);

{
ct_multex::scoped_lock slock(locks);

// data_1 and data_0 are locked!
}

// unlocked... Add data_2 to the locks...
locks.push_ptr(&data_2);

{
ct_multex::scoped_lock slock(locks);

// Now, data_1, data_0 and data_1 are locked!
}
}
}

return 0;
}
_____________________________

 Chris M. Thomasson (11-26-18, 08:36 AM)
On 11/25/2018 10:19 PM, Chris M. Thomasson wrote:
[..]
>             {
>                 ct_multex::scoped_lock slock(locks);
>                 // Now, data_1, data_0 and data_1 are locked! ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^

Error in a comment. Funny.

I meant: // Now, data_1, data_0 and data_2 are locked!

Sorry about that. Fwiw, there are three addresses to be locked here,
data_[0...2]. However, there might be less locks in a threads local lock
vector because of hash collisions; everything is sorted and duplicates
are always removed.

Think if data_0 hashed to index 41, data_1 went to 3, and data_2 went
back to 41. Humm... We have

41, 3, 41

After a call to my ct_multex::local_locks::ensure_locking_order... It
looks like:

3, 41

So, only two locks for three addresses. The hash function can be a lot
better. ;^o

[...]
 Chris M. Thomasson (11-27-18, 07:59 AM)
On 11/25/2018 10:19 PM, Chris M. Thomasson wrote:
> This code allows a thread to dynamically lock multiple addresses in a
> totally atomic fashion, and free of deadlock. It is based on hashing
> pointers into a table of locks. A thread can create a little array of
> hashed pointers, sort them, remove duplicates, then take all of the
> locks in the table. The sorting and uniqueness factor prevents deadlock. [...]

Hey now, this crude example code actually uses threads. ;^)

hierarchy. It can be improved upon for sure. I am calling it the Multex.

Also, keep in mind that the locks are different than the memory pointed
to by the hashed pointers. So, this can be used to simulate atomic
operations on a system that only has semaphores. It easily solves the
Dining Philosophers problem.

The main and ct_thread functions show an example on how to use it:
____________________________
/*
The Multex, simple deadlock free locking abstraction
By Chris M. Thomasson
__________________________________________________ __________*/

#include <iostream>
#include <functional>
#include <algorithm>
#include <mutex>
#include <cassert>
#include <cstddef>
#include <cstdint>
#include <vector>

#define N 123456

// Allows one to lock multiple addresses
// at once, and never hit a deadlock.
// It is an experiment.
namespace ct_multex
{
// A table of locks
struct mutex_table
{
std::vector<std::mutex> m_locks;

mutex_table(std::size_t size) : m_locks(size) { assert(size > 0); }

// totally contrived simple hash...
std::size_t hash(void const* ptr)
{
std::uintptr_t ptr_uint = (std::uintptr_t)ptr;
return (std::size_t)(((ptr_uint << 9) * 103) % m_locks.size());
}
};

// A threads local indices into a table of locks
struct local_locks
{
std::vector<std::size_t> m_lock_idxs;
mutex_table& m_mtx_tbl;

local_locks(mutex_table& mtx_tbl) :
m_lock_idxs(), m_mtx_tbl(mtx_tbl) {}

void push_ptr(void const* ptr)
{
std::size_t idx = m_mtx_tbl.hash(ptr);
m_lock_idxs.push_back(idx);
}

void ensure_locking_order()
{
// sort and remove duplicates...
std::sort(m_lock_idxs.begin(), m_lock_idxs.end());
m_lock_idxs.erase(std::unique(m_lock_idxs.begin(),
m_lock_idxs.end()), m_lock_idxs.end());
}

// Take all of the locks
void lock()
{
// there can be a flag to minimize this...
ensure_locking_order();

std::size_t n = m_lock_idxs.size();

for (std::size_t i = 0; i < n; ++i)
{
m_mtx_tbl.m_locks[m_lock_idxs[i]].lock();
}
}

// Unlock everything
void unlock()
{
std::size_t n = m_lock_idxs.size();

for (std::size_t i = 0; i < n; ++i)
{
m_mtx_tbl.m_locks[m_lock_idxs[n - i - 1]].unlock();
}
}
};

// RAII scoped lock: Allows a thread to actualy take the locks
// It locks a threads local lock indices
struct scoped_lock
{
local_locks& m_locks;

scoped_lock(local_locks& locks) : m_locks(locks)
{
m_locks.lock();
}

~scoped_lock() throw()
{
m_locks.unlock();
}
};
}

// Test program...
//______________________________________

// Shared data
struct ct_shared
{
ct_multex::mutex_table& m_multex;

ct_shared(ct_multex::mutex_table& multex)
: m_multex(multex), data_0(1), data_1(2), data_2(3), data_3(4) { }

unsigned long data_0;
unsigned long data_1;
unsigned long data_2;
unsigned long data_3;
};

{
// Create our local locks
ct_multex::local_locks locks(shared.m_multex);

locks.push_ptr(&shared.data_2);
locks.push_ptr(&shared.data_0);

// Do some work
for (unsigned long i = 0; i < N / 2; ++i)
{
{
ct_multex::scoped_lock slock(locks);
// locked for data_0 and data_2
shared.data_0 += i;
shared.data_2 += i;
}

{
ct_multex::scoped_lock slock(locks);
// locked for data_0 and data_2
shared.data_0 -= i;
shared.data_2 -= i;
}
}

locks.push_ptr(&shared.data_1);
locks.push_ptr(&shared.data_3);

// Do some more work...
for (unsigned long i = 0; i < N / 2; ++i)
{
{
ct_multex::scoped_lock slock(locks);
// locked for data_0, data_1, data_2 and data_3
shared.data_0 += i;
shared.data_1 += i;
shared.data_2 += i;
shared.data_3 += i;
}

{
ct_multex::scoped_lock slock(locks);
// locked for data_0, data_1, data_2 and data_3
shared.data_0 -= i;
shared.data_1 -= i;
shared.data_2 -= i;
shared.data_3 -= i;
}
}
}

int main(void)
{
{
// Our mutex table
ct_multex::mutex_table multex_tbl(42);

// Our shared data
ct_shared shared(multex_tbl);

// Launch...
{

for (unsigned long i = 0; i < THREADS; ++i)
{
}

std::cout << "processing...\n\n";
std::cout.flush();

for (unsigned long i = 0; i < THREADS; ++i)
{
}
}

// Verify the shared data...
std::cout << "shared.data_0 = " << shared.data_0 << "\n";
std::cout << "shared.data_1 = " << shared.data_1 << "\n";
std::cout << "shared.data_2 = " << shared.data_2 << "\n";
std::cout << "shared.data_3 = " << shared.data_3 << "\n";

assert(shared.data_0 == 1);
assert(shared.data_1 == 2);
assert(shared.data_2 == 3);
assert(shared.data_3 == 4);
}

std::cout << "\n\nfin!\n\n";

return 0;
}
____________________________

Can you get it to run?

thanks.
 Melzzzzz (11-27-18, 09:24 AM)
On 2018-11-27, Chris M. Thomasson <invalid_chris_thomasson> wrote:
> Can you get it to run?
> thanks.

~/News >>> g++ -std=c++17 -O3 -march=native -pthread mmtex.cpp -o mmtex 2> errors
~/News >>> vim errors
~/News >>> ./mmtex
processing...

shared.data_0 = 1
shared.data_1 = 2
shared.data_2 = 3
shared.data_3 = 4

fin!
 Chris M. Thomasson (11-27-18, 09:25 AM)
On 11/26/2018 9:59 PM, Chris M. Thomasson wrote:
[..]
>                 m_mtx_tbl.m_locks[m_lock_idxs[i]].lock();
>             }
>         } [...]

Fwiw, there is another way to take the locks, using a try_lock and
backoff scheme, something like:
_____________________________
// Take all of the locks
void lock() // optimistic
{
// there can be a flag to minimize this...
ensure_locking_order();

std::size_t n = m_lock_idxs.size();

for (std::size_t i = 0; i < n; ++i)
{
if (! m_mtx_tbl.m_locks[m_lock_idxs[i]].try_lock())
{
for (std::size_t u = 0; u < i; ++u)
{
m_mtx_tbl.m_locks[m_lock_idxs[i - u - 1]].unlock();
}

// backoff algorithm, or whatever...

// Try again...
i = SIZE_MAX;

continue;
}
}
}
_____________________________

Remember to include <climits> for SIZE_MAX. The try_lock version does
not even need to be sorted, just unique.
 Chris M. Thomasson (11-27-18, 10:56 AM)
On 11/26/2018 11:24 PM, Melzzzzz wrote:
> On 2018-11-27, Chris M. Thomasson <invalid_chris_thomasson> wrote:
> ~/News >>> g++ -std=c++17 -O3 -march=native -pthread mmtex.cpp -o mmtex 2> errors
> ~/News >>> vim errors
> ~/News >>> ./mmtex
> processing...
> shared.data_0 = 1
> shared.data_1 = 2
> shared.data_2 = 3
> shared.data_3 = 4
> fin!

Thanks for trying it out Melzzzzz. :^)

The basic algorithm can be useful for many things involving locking.
There is another more "optimistic" version I am tinkering around with
that can be used for n-address based software transactional memory.

One note, this is not recursive!
 Chris M. Thomasson (11-29-18, 10:55 PM)
On 11/25/2018 10:19 PM, Chris M. Thomasson wrote:
> This code allows a thread to dynamically lock multiple addresses in a
> totally atomic fashion, and free of deadlock. It is based on hashing
> pointers into a table of locks. A thread can create a little array of
> hashed pointers, sort them, remove duplicates, then take all of the
> locks in the table. The sorting and uniqueness factor prevents deadlock. [...]

This allows one to lock addresses without having to create a mutex for
each one. Just pass the address of a variable to a thread local lock
set, lock it, and the variable is now locked wrt mutual exclusion.

Now, to make this more fine grain, what about a multex that used
writes.
 Pavel (11-30-18, 09:18 PM)
Chris M. Thomasson wrote:
[..]
> }
> _____________________________

Chris, do you have a use case? Your approach of ordering multiple
resources that need locking in some global order (along with alternative
optimistic algorithms) has been successfully used by DBMS s for years to
transactionally alter multiple rows (or records or sets etc for
non-relational DBMSs) for concurrent transactions so the algorithm
itself is sound.

With regard to mutexes in a multi-threaded program, however, I cannot
recall when I saw a necessity to simultaneously lock more than one mutex
(I saw code that did it but always in the context where I would either
refactor it to not do it or at least wanted to refactor it so). It might
certainly be that my experience is skewed so I am curious to see a use case.
 Melzzzzz (11-30-18, 10:01 PM)
On 2018-11-30, Pavel <pauldontspamtolk> wrote:
[..]
> (I saw code that did it but always in the context where I would either
> refactor it to not do it or at least wanted to refactor it so). It might
> certainly be that my experience is skewed so I am curious to see a use case.

Refactor this:
{
AutoLock<Mutex> l(lstSvcM_);
AutoLock<Mutex> ll(availM_);
AutoLock<Mutex> guard(rtlock);
if(!lstSvc_.empty())
{
s= lstSvc_.front();
lstSvc_.pop_front();
} else if((running_threads < 100 || threadsAvail_ > 3) && !lstSvcLow_.empty()) {
s = lstSvcLow_.front();
lstSvcLow_.pop_front();
} else {
more = false;
s=0;
continue;
}
}
 Pavel (11-30-18, 11:31 PM)
Melzzzzz wrote:
[..]
> continue;
> }
> }

What does "this" compute and why?
 Chris M. Thomasson (12-02-18, 04:53 AM)
On 11/30/2018 11:18 AM, Pavel wrote:
> Chris M. Thomasson wrote:
> Chris, do you have a use case? Your approach of ordering multiple
> resources that need locking in some global order (along with alternative
> optimistic algorithms) has been successfully used by DBMS s for years to
> transactionally alter multiple rows (or records or sets etc for
> non-relational DBMSs) for concurrent transactions so the algorithm
> itself is sound.

Wrt threads, the use cases are generally geared toward in memory
database type schemes. Or any time a thread needs to lock a couple of
objects, read/write, unlock. The n-ary dining philosophers problem is
solved by this. One interesting use case is implementing C++11 or C11
atomics. Every type that uses locks would have to take the following
into account wrt the value of the ATOMIC_*_LOCK_FREE macros, and the
atomic_is_lock_free function:

They need to be zero wrt a locking scheme. The multex can also be used
for the writer side of a RCU algorithm.

Big time point, this is NOT recursive!, must keep that in mind.

One point, we can augment the lock table for other fun things besides
lock-based algorithms... We can define ATOMIC_*_LOCK_FREE as 1, or even 2.

> With regard to mutexes in a multi-threaded program, however, I cannot
> recall when I saw a necessity to simultaneously lock more than one mutex
> (I saw code that did it but always in the context where I would either
> refactor it to not do it or at least wanted to refactor it so). It might
> certainly be that my experience is skewed so I am curious to see a use case.

Hashed address locking can be used to implement DCAS, that can be used
to build other things. The fact that the atomic lib can be created by
addr locks means it exists at a lower level to build higher level
objects with. This can be used for inter or intra process communications
using robust mutexes. The robust type are hard to simulate in user-space
without using a watchdog process, so to speak. C11 or C++11 do not offer
such things. They are POSIX, and Windows.
 Marcel Mueller (12-02-18, 10:43 AM)
Am 02.12.18 um 03:53 schrieb Chris M. Thomasson:
> Wrt threads, the use cases are generally geared toward in memory
> database type schemes.

I wrote an in memory DB kernel some years ago and avoided this type of
locking. Snapshot isolation with immutable objects and optimistic
locking provided better performance and simplified locking significantly
with almost no risk of deadlocks.

> Or any time a thread needs to lock a couple of
> objects, read/write, unlock. The n-ary dining philosophers problem is
> solved by this.

I think especially considering NUMA and/or cluster servers traditional
mutex locking will become less useful. It is a bottleneck due to the
limited speed of synchronization.

Of course, for now there are use cases where a mutex perfectly fit the
needs. But I strongly recommend to avoid to gather more than one mutex
at a time. Sooner or later it will cause problems, at least increased
effort.

> One interesting use case is implementing C++11 or C11
> atomics.

Dark chapter. Who needs atomics that /might/ be lock free?

OK, in practice most of the time they /are/ lock free, at least for
small types. But there is always the risk of running into a performance
bottleneck on the next platform. :-/

> One point, we can augment the lock table for other fun things besides
> lock-based algorithms... We can define ATOMIC_*_LOCK_FREE as 1, or even 2.

?

Marcel
 Pavel (12-02-18, 11:30 PM)
Chris M. Thomasson wrote:
....

> Wrt threads, the use cases are generally geared toward in memory database type
> schemes. Or any time a thread needs to lock a couple of objects, read/write,
> unlock. The n-ary dining philosophers problem is solved by this. ....

Which variation did you solve? Any requirements on fairness / starvation avoidance?

(Asking to try to decompose it without multiple lock of mutexes. Obviously,
locking of multiple chopsticks have to be done as it's a part of the problem
statement, but a chopstick does not have to have a mutex).

Thanks!
-Pavel
 Chris M. Thomasson (12-03-18, 12:01 AM)
On 12/2/2018 12:43 AM, Marcel Mueller wrote:
> Am 02.12.18 um 03:53 schrieb Chris M. Thomasson:
> I wrote an in memory DB kernel some years ago and avoided this type of
> locking. Snapshot isolation with immutable objects and optimistic
> locking provided better performance and simplified locking significantly
> with almost no risk of deadlocks.

That's fine. Any risk of livelock wrt the optimistic locking in your
scheme? Seqlocks can be pretty nice at getting a fast snapshot:

>> Or any time a thread needs to lock a couple of objects, read/write,
>> unlock. The n-ary dining philosophers problem is solved by this.

> I think especially considering NUMA and/or cluster servers traditional
> mutex locking will become less useful. It is a bottleneck due to the
> limited speed of synchronization.

This is just an example of address based locking. There are a lot of
faster, and much more scalable synchronization algorithms out there.

> Of course, for now there are use cases where a mutex perfectly fit the
> needs. But I strongly recommend to avoid to gather more than one mutex
> at a time. Sooner or later it will cause problems, at least increased
> effort.

There are places where it is required. For instance, if you have to
simulate something like DCAS with locks, then there are two addresses
involved.

>> One interesting use case is implementing C++11 or C11 atomics.

> Dark chapter. Who needs atomics that /might/ be lock free?

Very dark.

> OK, in practice most of the time they /are/ lock free, at least for
> small types. But there is always the risk of running into a performance
> bottleneck on the next platform. :-/

Well, if your program runs on such a system, it can warn the user that
the performance might not be up to par... ;^o

>> One point, we can augment the lock table for other fun things besides
>> lock-based algorithms... We can define ATOMIC_*_LOCK_FREE as 1, or
>> even 2.

The address based hashing can be used to distribute some algorithms.
 Chris M. Thomasson (12-03-18, 12:08 AM)
On 12/2/2018 1:30 PM, Pavel wrote:
> Chris M. Thomasson wrote:
> ...
> ...
> Which variation did you solve?

I did one a while back for fun where each diner was a different creature
that could have more than two hands.

> Any requirements on fairness / starvation avoidance?

It shall not deadlock. It shall provide forward progress.

> (Asking to try to decompose it without multiple lock of mutexes. Obviously,
> locking of multiple chopsticks have to be done as it's a part of the problem
> statement, but a chopstick does not have to have a mutex).

It depends on how fine of a grain one uses with the locking scheme. You
can lock the table. Lock each diner. Lock each chopstick.