Cuncurrent Programming in Python - Async.io
Let’s introduce Async.io, a Python framework enabling asynchronous single-threaded concurrent code using coroutines. If you haven’t used it yet that’s perfect, some knowledge of Python is assumed but don’t be afraid to try it out.
It’s not about using multiple cores, it’s about using a single core more efficiently 😎
Introduction
In the series of coming Blogs, we will look into making sure that you understand why we would even want to use async i/o in the first place. we’re gonna move to the event loop which is the main building block of an asynchronous framework or library in any language, Later on, we’re gonna move into looking at coroutines which are how you’re supposed to use async i/o in the high-level coroutines.
This is very important to see in practice so that you have an understanding of the inner workings of async IO that makes sure that you don’t think of it as a magic thing that just happens to work but understand it.
Let’s look at the basic difference between synchronous and asynchronous execution,
Synchronous execution means that code running now blocks the program from running anything else until that piece of code is done, if you want to run something else you have to wait for the blocking call to finish, so the entire program like a highway lane is only as fast as the slowest car in it and if the car stops and blocks the lane then it doesn’t matter if you’re driving a supercharged v8 Hellcat you have to wait.
We should rather do long-running tasks asynchronously, in other words in the background. You may ask, what is this background I keep using that term, Coming back to our highway analogy, the easiest way to unblock the flow is to add more lanes those lanes are called execution threads because you want to do many things at once.
Threads in Programs
In a single program, you spawn some more threads. The original thread that your program started with is usually called the main thread the other threads are often called secondary threads background or worker threads.
Important thing to note about thread, threads are not free they take up space so they take up space in the operating system schedule and memory, and very often you need a way to synchronize them.
What is synchronization between threads? what I mean by this is, threads are not independent programs they share data within your program so if multiple threads change the same piece of data at the same time they could corrupt it this could crash your program or silently break the consistency of your data.
To make a program thread-safe and avoid something called Race Conditions, there are different synchronization primitives just for that purpose, mutex, semaphores, object queues, and so on. The most basic one is a lock.
What does a lock do? It allows one thread to acquire exclusive access to our resource until it’s done then the thread has to release the lock so other threads can acquire it. If you try to acquire a lock that is currently held by a different thread you have to wait. So it’s all good are the locks the Silver Bullet? Sadly NO!
Let’s imagine a bunch of threads that are running happily ever after. Let’s say there are four of them, they’re never blocked on anything they have 100 CPU usage 100% CPU usage. So you are getting all that you paid for in hardware and you are very happy but as soon as you have a shared resource that you need to synchronize well things get a little less optimistic and they get a little less easy.
Let’s just start with seeing how the situation works Just on one thread, so having just a single thread that wants access to the shared resource when that thread is interested in access to it it can probably easily acquire it and as soon as it’s done with the processing that it’s doing on the shared resource it can now release the lock do something else and live happily ever after until it wants the access again then it can acquire the lock at the same time right do its processing and release it right so that looks good right? Not so fast My Friend 🤷♂️, Now let’s take the same example with 4 threads. Each having to request the same piece of data, the scenario changes quite a lot now.
You will see that even though the first thread happily acquired the lock all the other threads now have to wait and even if it released the lock, only one other thread got access to the shared resource now and when that got released the first thread again acquired the lock and only after it was done the third thread could do anything with the shared resource. What about the fourth thread? It never actually even got to use the resource at all. This is a classic case of starvation.
So this single example shows two problems that can arise when using just one lock,
- Lock contention which happens when there is a shared resource that is wanted very often by many threads most threads will waste time waiting for access to this resource instead of doing other interesting work.
- Lock starvation which happens when the last thread never got access to the lock, wasting a lot of time and resources.
Quite surprising how complicated the situation gets with just one lock. Imagine if there’s more, then there are more opportunities for contention and more opportunities for starvation as well. Apart from this, The locking mechanisms that we use they’re not free as well. So the more locks the slower your program gets and the more memory it uses. There’s an even more fascinating example of a lock problem when there’s more than one thread and that is a deadlock.
Python in particular has one very special lock that is you know the subject of many discussions that’s the Global Interpreter Lock (GIL). This is the single lock that the Python interpreter is using to protect crucial shared data structures from corruption when many threads are used.
We will discuss more GIL in the coming series of blogs.
Subscribe to Developer Stack
Get the latest posts delivered right to your inbox