Go & DevOps Blog

The most popular Go dependency is…

Fedor S — Sat, 24 Jan 2026 08:47:23 GMT

As you know, destination is not as important as the journey, so now that we got this out of the way, bear with me for the rest of this article, and I’ll give you the top 10, and many more stats 😊 You might even learn a few things along the way!

Without usage statistics, finding useful and reliable dependencies can be a bit of a challenge. In the Go community, we basically have to rely either:

on "brand" name reputation: there are some very well known packages (gin, cobra, testify…) and organizations (Google, gorilla…) that we can trust,
or on side metrics such as number of GitHub stars, number of open issues, last activity, and more.

All of this can sometimes help get a feeling of how widely used and trusted a Go module is, but we could get more. What I personally want is knowing how many times a module is actually required as a project dependency to get a feeling of how "battle-tested" a library is. But to do this, I would need to build a graph of the whole (open source) ecosystem, which would be insane… You can see where this is going 😉

Mapping the Go ecosystem

My first idea was to build a list of repositories (a seed) to use as a starting point. The goal was to read the dependencies of these modules from their go.mod, download each of them, read the dependencies of these modules from their go.mod, download each of them, read the dependencies of these modules from their go.mod, download each of… well, you know the deal.

I implemented this idea on the v1 branch of my repository using mainly Github-Ranking and awesome-go as sources for building the seed. I ultimately abandoned the idea because of a few shortcomings:

the sample is largely incomplete,
cloning so many Git repositories to find their go.mod is reaaally painful and slow,
and it’s particularly biased towards repositories hosted on GitHub.

Luckily for me, I came up with a second idea: the Go modules ecosystem relies on a centralized public proxy, so surely they expose some information on these modules. And they in fact do so! The proxy APIs are documented on proxy.golang.org:

proxy.golang.org exposes metadata on each module (versions, latest, mod file…),
index.golang.org exposes a feed of all the published module versions since the introduction of the Go proxy (2019-04-10T19:08:52.997264Z, if you want to make sure not to forget its birthday).

I used this information to locally download the whole index (module names and versions) since 2019. The downloaded data is available in goproxy-modules. This can be used as a local immutable cache.

With all this data available locally, the seed is now pretty much exhaustive, and more suitable for data analysis. The processing now simply consists of iterating over every single module, downloading their go.mod file and listing their dependencies. The resulting graph can then trivially be inserted in a specialized graph database like Neo4j.

Deep diving

Neo4j is a graph oriented database. It means that unlike relational databases, it works on... graphs. The primary way to store data in Neo4j is using nodes and relationships. This specialized data structure makes it extremely simple to model, and more importantly query huge graphs.

Neo4j, like many NoSQL databases, is schemaless, meaning you don't need to define a schema before creating data. That doesn't mean we don't need a schema, so let's see what we need!

DEPENDS_ONModulestringnamestringversion

A Go module is basically identified by its name (eg. github.com/stretchr/testify or go.yaml.in/yaml/v4) and its version. Each module can depend on other modules. We can add more properties to our nodes later on as needed.

Creating nodes

Neo4j uses Cypher as a query language. Inserting data with Cypher can be done with the CREATE clause, but in go-stats I've decided to use the MERGE clause instead because it behaves as an upsert, letting you update or do nothing in case a node already exists.

The basic Cypher query to upsert a module node is:

MERGE (m:Module { name: $name, version: $version })
RETURN m

:Module is a label attached to the node. You can think of it as the type of the node. name and version are properties of the node, they're the data belonging to each specific node.

To make sure we can't create multiple nodes with the same name-version pair, a unicity constraint is needed:

CREATE CONSTRAINT module_identity IF NOT EXISTS
FOR (m:Module)
REQUIRE (m.name, m.version) IS UNIQUE

Thanks to this constraint, if MERGE is called a second time with the same name and version properties, it won't do anything.

Creating relationships

We can then create the dependency relationships between our module nodes:

MATCH (dependency:Module { name: $dependencyName, version: $dependencyVersion })
MATCH (dependent:Module { name: $dependentName, version: $dependentVersion })
MERGE (dependent)-[:DEPENDS_ON]->(dependency)
RETURN dependency, dependent

This query will:

find modules that were created earlier using the MATCH clause and assign them to dependency and dependent,
create the directed relationship between them using the -[:DEPENDS_ON]-> syntax.

The Go index is naturally sorted chronologically. So as long as we iterate over it sequentially, it means that if a module (dependent) depends on another module (dependency), then dependency was necessarily added to the graph before dependent. If this condition doesn't hold for some reason (if we were to decide to parallelize the insertions for example), we could simply rewrite the query as:

MERGE (dependency:Module { name: $dependencyName, version: $dependencyVersion })
MERGE (dependent:Module { name: $dependentName, version: $dependentVersion })
MERGE (dependent)-[:DEPENDS_ON]->(dependency)
RETURN dependency, dependent

Using MERGE instead of MATCH would ensure the node gets created if it doesn't exist already.

These queries are simplified (but close!) variants of what I actually did in go-stats. The main difference is that I enriched the nodes with some additional properties:

version timestamp,
latest version,
semantic version splitting (major, minor, patch, label),
host, organisation, and more.

Digging into the graph

After running go-stats for a few days, I ended up with a graph of roughly 40 million nodes, and 400 million relationships… that's quite a lot! The first thing these numbers tell us is that Go modules have 10 direct dependencies on average.

For more interesting stats, let's write some Cypher, shall we?

Indexing

With this volume of data, the absolute first thing to do (that I clearly didn't do at first) is to create relevant indexes. I was initially under the impression that the module_identity constraint previously created would also act as an index for :Module.name since it's a composite unique index. I was wrong, and creating a specific index was necessary:

CREATE INDEX module_name_idx IF NOT EXISTS
FOR (m:Module) ON (m.name)

I created other indexes as needed, and to do so the Cypher PROFILE was a tremendous help.

Find the direct dependents of a module

As a warm-up, and to learn a bit more about Cypher, let's list the dependents of a specific module:

MATCH (dependency:Module { name: 'github.com/pkg/errors', version: 'v0.9.1' })
MATCH (dependent:Module)-[:DEPENDS_ON]->(dependency)
WHERE dependent.isLatest
RETURN dependent.versionTime.year AS year, COUNT(dependent) AS nbDependents

I chose github.com/pkg/errors@v0.9.1 because it's a module that was deprecated long ago, I find it interesting to know how much it's still used in the wild. Let's break the query down line by line:

MATCH (dependency:Module { name: 'xxx', version: 'xxx' })
- finds the module node (we know it's unique because of the constraint we declared earlier) with the given name and version. This is equivalent to:
```
  MATCH (dependency:Module)
  WHERE dependency.name = 'xxx'
  AND dependency.version = 'xxx'
```
MATCH (dependency)<-[:DEPENDS_ON]-(dependent:Module)
- finds the module nodes with a direct DEPENDS_ON relationship towards dependency.
WHERE dependent.isLatest
- keeps dependents modules that are in their latest version. This is useful because for any module x depends on github.com/pkg/errors@v0.9.1, we don't want to count all the versions of x that depend on it. The latest is more relevant to us.
RETURN dependent.versionTime.year AS year, COUNT(dependent) AS nbDependents
- simply counts the total dependents of github.com/pkg/errors@v0.9.1 and group by release year of the dependent. If we wanted to list them, we could write:
```
  RETURN dependent.name AS dependentName
  ORDER BY dependentName;
```

Results:

year	nbDependents
2019	3
2020	6,774
2021	10,680
2022	11,747
2023	8,992
2024	12,220
2025	16,001

That's a lot of dependents for a dead library!

Find the transitive dependents of a module

Neo4j really shines at graph traversal. Navigating relationships transitively requires virtually no changes:

MATCH (dependency:Module { name: 'github.com/pkg/errors', version: 'v0.9.1' })
MATCH (dependent:Module)-[:DEPENDS_ON*1..]->(dependency)
WHERE dependent.isLatest
RETURN COUNT(dependent) AS nbDependents

The only difference with the previous query is *1.., asking Neo4j to follow the DEPENDS_ON relationship transitively. We could also limit it to 2 levels with *1..2.

I find this interesting, because in the case of the query for direct dependencies, if we used a relational database, the SQL query would be pretty simple:

SELECT COUNT(*) AS nb_dependents
FROM dependencies d
JOIN modules m ON m.id = d.dependent_id
JOIN modules dependency ON dependency.id = d.dependency_id
WHERE dependency.name = 'github.com/pkg/errors'
  AND dependency.version = 'v0.9.1'
  AND m.is_latest = true;

but what's as simple as *1.. in Cypher would make a dramatically more complex SQL query:

-- Example using a recursive CTE, I'm not sure every SGBDR implements it in the same way
WITH RECURSIVE dependents_cte AS (
  SELECT m.id AS dependency_id
  FROM modules m
  WHERE m.name = 'github.com/pkg/errors'
    AND m.version = 'v0.9.1'

  UNION ALL

  SELECT d.dependent_id
  FROM dependencies d
  JOIN dependents_cte cte ON d.dependency_id = cte.dependency_id
)
SELECT COUNT(DISTINCT m.id) AS nb_dependents
FROM modules m
WHERE m.id IN (SELECT dependency_id FROM dependents_cte)
  AND m.is_latest = true;

Top 10 most used dependencies

Using the constructs from above, the query is once again pretty similar.

MATCH (dependent:Module)-[:DEPENDS_ON]->(dependency:Module)
WHERE dependent.isLatest
RETURN dependency.name AS dependencyName, COUNT(dependent) AS nbDependents
ORDER BY nbDependents DESC
LIMIT 10;

Results:

dependencyName	nbDependents
github.com/stretchr/testify	259,237
github.com/google/uuid	104,877
golang.org/x/crypto	100,633
google.golang.org/grpc	97,228
github.com/spf13/cobra	93,062
github.com/pkg/errors	92,491
golang.org/x/net	76,722
google.golang.org/protobuf	74,971
github.com/sirupsen/logrus	71,730
github.com/spf13/viper	64,174

github.com/stretchr/testify is comfortably ahead of other dependencies as the most used in the open source Go ecosystem. Unsurprisingly, github.com/google/uuid is also a staple library used pretty much everywhere. The golang.org/x/ dependencies also hold a strong place as the extended stdlib, as well as the infamous github.com/pkg/errors.

To get more insights, you can download the top 100 as a CSV file.

That's all Folks!

I hope you had a good time reading this post, and that you learned a thing or two!

Shift in the Software Development Paradigm: From Imperative Coding to Solution Architecture and the Economics of AI

Fedor S — Thu, 18 Dec 2025 10:10:55 GMT

The modern software development industry is at a point of unprecedented inflection, where classical engineering disciplines collide with the radical power of generative artificial intelligence and new startup economic models. An analysis of current trends indicates that, despite the rapid turnover of tools and frameworks, the fundamental principles of system design remain the only reliable anchor for long-term professional relevance. This report examines the profound transformation of the developer’s role—from writing lines of code to formulating high-level solutions—as well as the emergence of the “one-person unicorn” phenomenon predicted by leaders of the technology sector.

Historical Retrospective and the Dynamics of Technological Abstraction

The history of the IT industry represents a continuous process of layering abstractions, the purpose of which is to distance humans from binary machine language and bring them closer to natural language and business logic. Each new iteration of abstraction has not reduced the amount of information required to create applications, but has made the ways of describing that information more concise.

Evolutionary Stages of Programming

The development of programming tools can be classified through the lens of reducing cognitive load associated with managing hardware resources and shifting focus toward solving applied problems.

The current stage, marked by the adoption of tools such as Copilot, Cursor, and ChatGPT, turns the traditional process upside down: AI does not merely follow instructions—it helps create them. This shift transforms programming from being solely a skill in writing syntactically correct code into a discipline focused on precisely describing problems and desired outcomes.

The Crisis of Fundamental Knowledge in the Era of the “Imitation Game”

One of the most pressing issues in today’s industry is the so-called “framework trap.” Junior developers often begin their careers by diving directly into high-level tools (e.g., React or GraphQL), bypassing the study of foundational programming principles, network protocols, and architectural patterns.

Picasso Metaphor in Software Development

A direct analogy is drawn between the decline of realistic painting in the 19th century and the current state of web development. After Picasso invented abstract art, new generations of artists tried to imitate his style without first mastering the skills of realism. In programming, this manifests when developers use complex abstractions (e.g., GraphQL) without ever designing proper REST APIs or understanding the fundamentals of client-server architecture.

The consequences of this approach include:

Technical uncertainty: Constant self-doubt and learning through trial and error.
Career stagnation: Developers spend years writing the same code without understanding how systems work “under the hood,” leading to low salaries and burnout.
Career fragility: When a popular framework falls out of favor, developers lacking foundational knowledge become noncompetitive.

Fundamentals are defined as concepts that remain stable for decades: algorithms, data structures, memory management, SOLID principles, and basic network protocols. Mastery of these basics enables a developer to quickly learn any new technology, since most modern libraries merely repackage classical ideas.

Senior Developer Mental Models as a Cognitive Foundation

True seniority is determined not by title, but by the way of thinking. Experts identify a set of mental models that allow for effective complexity management and well-reasoned decision-making under conditions of uncertainty.

Key Mental Models for Managing Resources and Risks

Model	Core Concept	Application in Engineering
Pareto Principle (80/20)	20% of effort produces 80% of results	Focus on critical system functions that deliver the most business value
Parkinson’s Law	Work expands to fill the time available	Set strict deadlines to prevent endless refactoring
Type 1 and Type 2 Decisions	Reversible and irreversible “doors”	Carefully analyze architectural foundations (e.g., databases) and make rapid decisions on secondary tools
Conway’s Law	The structure of a system mirrors the communication structure of the organization	Design teams so that the desired software architecture emerges naturally
Circle of Competence	Know the boundaries of your knowledge	Avoid using “hyped” tools in critical areas without a deep understanding of how they work

The use of these models helps to avoid the 'mental prison' of negative beliefs, where past failures in technical discussions or interviews paralyze a specialist's further development. An important part of advancing to a senior level is the ability to see 'beyond the code' and understand how the system interacts with the world, users, and other services

Artificial Intelligence and the Transformation of Professional Activity
Integrating AI into the coding process does not signal the end of the profession but radically changes the set of required skills. The developer of the future is not the one who writes syntax, but the one who manages the process of synthesizing solutions.

From Coding to Articulating Solutions
In the pre-AI era, programming involved controlling every detail: manually managing memory and writing low-level instructions. In the new reality, a developer only needs to be “technically expert in their domain” to ensure the absence of critical errors produced by the machine. The focus shifts to:

Deep understanding of the business domain: AI lacks empathy and cannot grasp the context of a specific business.
Translation skills: The ability to convert vague stakeholder requirements into precise prompts that the AI can use to design the architecture.
Quality management: AI can “autocomplete” code to make it appear functional, but achieving reliable results requires human oversight.

This shift creates a “Value Chasm” for junior developers. Senior engineers leverage AI as a productivity multiplier, while junior specialists lose opportunities to gain experience on simple tasks now handled by machines.

Economic Phenomenon: The One-Person Unicorn
One of the boldest predictions for 2025–2026 is the emergence of a startup valued at a billion dollars, created and managed by a single founder with the support of AI agents. Sam Altman and other Silicon Valley leaders are already betting on the date the first such “micro-unicorn” will appear.

Mechanisms for Scaling Solo Entrepreneurs

Traditionally, implementing any idea required massive human resources: teams of developers, marketers, lawyers, and support staff. AI radically consolidates this workforce.

Metric	Traditional AI-First Unicorn (2024)	Non-AI Unicorn
Median number of employees	203	414
Key drivers	AI-first infrastructure, PLG	Large sales and support departments
Stack dependency	High (2–3 AI platforms)	Low (custom development)

The concept of a “one-person unicorn” is often an illusion. Most such ventures rely on “invisible orchestras”—fractional workforces consisting of freelancers, micro-agencies, and automated systems. About 56% of AI startups regularly use part-time experts for highly specialized tasks.

Risks and Fragility of Solo Models
Despite their efficiency, these companies face unique vulnerabilities:

Stack centralization: 72% of solo startups rely on only 2–3 AI platforms for 80% of their operations. Changes in provider policies or pricing (e.g., OpenAI or Zapier) can instantly destroy the business.
Psychological pressure: Solo founders are highly prone to burnout, prompting the creation of specialized psychological support platforms in 2025 (FounderWell, SoloSanity).
Legal collisions: Jurisdictions like Singapore and Estonia began discussing in 2025 granting AI agents the status of partial co-founders to ensure legal accountability.

Strategic Career Management and Professional Image
In a world where “anyone can code,” technical skills become a commodity, and competitive advantage shifts to soft skills and professional positioning.

Creating a “Technical Mask” and CV Engineering
To succeed in 2025, coding ability alone is insufficient. Key points include:

Profile optimization: Companies source developers through LinkedIn and “surgically optimized” resumes.
Demonstrating expertise: Instead of endless Udemy courses (often ineffective), focus on building real systems and publicly discussing architectural solutions.
Overcoming skepticism: Developers are naturally skeptical of managers and colleagues, which can hinder taking risky but career-advancing decisions.

For full-stack employment, proven tech stacks remain relevant in 2025, ensuring rapid development and stability.

Rank	Backend + Frontend Stack	Notes
Top 1	Vue + Laravel	High prototyping speed
Top 2	React + Django	Strong ecosystem for AI/Data Science
Top 3	Angular + C#	Enterprise reliability standard
Consolation	React + Express	Popularity due to single language (JS)

Conclusion: The New Ethics of Engineering Mastery
The software industry has completed a cycle where a developer’s value was measured by knowledge of specific libraries. We return to an era where engineering discipline, understanding business needs, and leveraging AI to build complex systems are paramount. Those who continue to “imitate” and ignore fundamental knowledge risk being left behind in the new economy, where AI agents replace “coders” but not engineers.

The future belongs to specialists who can articulate solutions, manage invisible orchestras of automation, and preserve human empathy in a world of algorithms. Lessons from the past decade show that frameworks come and go, but fundamentals form the foundation of structures capable of withstanding any technological revolution. It is never wasted effort to study core IT technologies—such as networking, Linux, operating systems, and databases—as they form the bedrock of computing and will never disappear. Mastering these basics provides the most enduring and transferable skills.

The shift toward “one-person unicorns” and AI-assisted programming is not a threat but the highest form of democratized entrepreneurship, demanding developers become true architects of reality.

Why Learning PHP Still Makes Sense in 2025: The Data Speaks

Fedor S — Thu, 18 Dec 2025 08:15:39 GMT

#php

#webdev

#programming

Web development is an industry where trends change rapidly—new tools emerge while others fade away. Yet PHP, originally created in 1994 under the name Personal Home Page, continues to hold its ground. Even as languages like Python, Node.js, and Go dominate online discussions, real-world data shows that PHP remains a key player in today’s developer ecosystem.

PHP’s Strong Market Position

The statistics clearly demonstrate PHP’s reach:

77.4% of websites with a known server-side language rely on PHP (W3Techs, 2025)
WordPress, built with PHP, powers 43.2% of all websites worldwide (WordPress Stats, 2025)
More than 2.1 million active PHP developers globally (SlashData, 2024)
Laravel adoption grew by 32% over the last year
Composer hosts over 380,000 packages with billions of total downloads

Job Market Demand

PHP skills continue to be valuable in the employment market:

The average PHP developer salary in the US is $89,500 per year (Indeed, 2025)
Over 215,000 PHP-related job listings appeared in the past 12 months
68% of enterprises run at least one PHP-based application
Developers with Laravel experience earn 18–24% higher salaries

Performance Gains

Modern PHP is far removed from its early reputation:

PHP 8.3 runs up to 35% faster than PHP 7.0
The JIT compiler introduced in PHP 8.0 boosts performance by 20–30% for CPU-intensive tasks
Memory consumption has dropped by roughly 18%
PHP 8.x now competes with Node.js in many common web workloads

The PHP Ecosystem

PHP offers a mature and reliable environment:

38 actively maintained PHP frameworks
WordPress, Magento, and Shopify together power more than 60% of global e-commerce websites
The PHP Foundation secured $2.4 million in funding in 2024 to support continued development
PHP 8.4, expected in late 2025, already includes 42 approved feature proposals

Why PHP Is Still Beginner-Friendly

PHP remains an accessible entry point for new developers:

72% of junior web developers list PHP among the first three languages they learned
Building a first working application takes 43% less time compared to other backend languages
81% of hosting providers include PHP support in entry-level plans
Developers grasp core web concepts 29% faster when starting with PHP

Enterprise Usage

PHP is not limited to small or personal projects:

37% of Fortune 500 companies use PHP in some form
Facebook maintains Hack, a PHP-based language, and actively contributes to PHP’s core
Slack, Etsy, and Wikipedia rely on PHP for mission-critical systems
Enterprise PHP applications process more than 8.2 billion transactions daily across industries

Ongoing Language Evolution

PHP continues to modernize:

Support for attributes, named arguments, and union types
73% of enterprise-requested features have already been implemented
Static analysis tools like PHPStan and Psalm reduce production bugs by an average of 23%
PHP-FIG standards promote consistent coding practices across large projects

Return on Learning Investment

From a learning-effort perspective, PHP delivers strong value:

Average time to employability is about 4.3 months
Core PHP skills can be acquired in roughly 120 hours
PHP developers typically learn 3–4 additional languages within their first three years
Official PHP documentation is available in 52 languages

Conclusion

PHP is often labeled as outdated, but the numbers tell a different story. Its widespread adoption, robust ecosystem, improved performance, and steady evolution make PHP not only relevant but highly practical in 2025.

For developers focused on career growth, the conclusion is clear: PHP skills remain in demand, well paid, and applicable across countless industries. Whether you’re building a personal blog, a large-scale enterprise system, or a modern e-commerce platform, PHP continues to offer proven tools to get the job done.

In an industry that frequently chases the latest trends, PHP’s longevity highlights a simple truth: real-world usefulness often outweighs hype. Looking ahead, PHP’s role in the future of web development appears stable—not as a legacy relic, but as a constantly evolving and capable technology.

Grafana K6 Reference Guide: Complete Guide for Performance Testing Engineers

Fedor S — Sat, 09 Nov 2024 08:52:00 GMT

Introduction to Grafana K6

Grafana K6 is an open-source tool designed for performance testing. It's great for testing APIs, microservices, and websites at scale, providing developers and testers insights into system performance. This cheat sheet will cover the key aspects every performance engineer should know to get started with Grafana K6.

What is Grafana K6?

Grafana K6 is a modern load testing tool for developers and testers that makes performance testing simple, scalable, and easy to integrate into your CI pipeline.

When to use it?

Load testing
Stress testing
Spike testing
Performance bottleneck detection
API testing
Browser testing
Chaos engineering

Grafana K6 Cheat Sheet: Essential Aspects

Installation

Install Grafana K6 via Homebrew or Docker:

brew install k6
# Or with Docker
docker run -i grafana/k6 run -


Basic Test with a Public REST API
Here's how to run a simple test using a public REST API:
import http from "k6/http";
import { check, sleep } from "k6";

// Define the API endpoint and expected response
export default function () {
  const res = http.get("https://jsonplaceholder.typicode.com/posts/1");

  // Define the expected response
  const expectedResponse = {
    userId: 1,
    id: 1,
    title:
      "sunt aut facere repellat provident occaecati excepturi optio reprehenderit",
    body: "quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto",
  };

  // Assert the response is as expected
  check(res, {
    "status is 200": (r) => r.status === 200,
    "response is correct": (r) =>
      JSON.stringify(JSON.parse(r.body)) === JSON.stringify(expectedResponse),
  });

  sleep(1);
}

Running the Test and Utilization of Web Dashboard
To run the test and view the results in a web dashboard, we can use the following command:
K6_WEB_DASHBOARD=true K6_WEB_DASHBOARD_EXPORT=html-report.html k6 run ./src/rest/jsonplaceholder-api-rest.js

This will generate a report in the reports folder with the name html-report.html.
But we also can see the results in the web dashboard by accessing the following URL:
http://127.0.0.1:5665/
Once we access the URL, we can see the results in real time of the test in the web dashboard.
Test with a Public GraphQL API
Example using a public GraphQL API.
If you don't know what is a GraphQL API, you can visit the following URL: What is GraphQL?.
For more information about the GraphQL API we are going to use, you can visit the documentation of the following URL: GraphQL Pokémon.
For more information about how to test GraphQL APIs, you can visit the following URL: GraphQL Testing.
This is a simple test to get a pokemon by name and check if the response is successful:
import http from "k6/http";
import { check } from "k6";

// Define the query and variables
const query = `
  query getPokemon($name: String!) {
    pokemon(name: $name) {
      id
      name
      types
    }
  }`;

const variables = {
  name: "pikachu",
};

// Define the test function
export default function () {
  const url = "https://graphql-pokemon2.vercel.app/";
  const payload = JSON.stringify({
    query: query,
    variables: variables,
  });

  // Define the headers
  const headers = {
    "Content-Type": "application/json",
  };

  // Make the request
  const res = http.post(url, payload, { headers: headers });

  // Define the expected response
  const expectedResponse = {
    data: {
      pokemon: {
        id: "UG9rZW1vbjowMjU=",
        name: "Pikachu",
        types: ["Electric"],
      },
    },
  };

  // Assert the response is as expected
  check(res, {
    "status is 200": (r) => r.status === 200,
    "response is correct": (r) =>
      JSON.stringify(JSON.parse(r.body)) === JSON.stringify(expectedResponse),
  });
}


Best Practices for Structuring Performance Projects
Centralized Configuration
Define global configuration options such as performance thresholds, the number of virtual users (VU), and durations in one place for easy modification and maintenance.
Example configuration file:
// ./src/config/options.js
export const options = {
  stages: [
    { duration: '30s', target: 20 },
    { duration: '1m', target: 50 },
    { duration: '30s', target: 0 },
  ],
  thresholds: {
    http_req_duration: ['p(95)<500'],
    http_req_failed: ['rate<0.01'],
  },
};

Modular Code Organization
Break down test scripts into reusable functions and modules. This makes the code more maintainable and easier to understand.
Example of modular requests:
// ./src/utils/requests-jsonplaceholder.js
import http from "k6/http";

export function getPost(id) {
  return http.get(`https://jsonplaceholder.typicode.com/posts/${id}`);
}

export function createPost(post) {
  return http.post(
    "https://jsonplaceholder.typicode.com/posts",
    JSON.stringify(post),
    { headers: { "Content-Type": "application/json" } }
  );
}

Then use these functions in your test script:
// ./src/rest/jsonplaceholder-api-rest.js
import { check, sleep } from "k6";
import { getPost } from "../utils/requests-jsonplaceholder.js";
import { options } from "../config/options.js";

export { options };

// Function to test GET request
function testGetPost(id) {
  let res = getPost(id);
  check(res, {
    "GET status is 200": (r) => r.status === 200,
    "GET response has correct id": (r) => JSON.parse(r.body).id === id,
  });
}

// Main function
export default function () {
  testGetPost(1);
  sleep(1);
}

In the same way as for the example of API REST, we can improve our script by creating more atomic functions that we can reuse to create more complex scenarios in the future if necessary, making it simpler to understand what our test script does.
There is still a better way to optimize and have better parameterization of the response and request results.
Dynamic Data and Parameterization
Use dynamic data to simulate more realistic scenarios and load different data sets. K6 allows us to use shared arrays to load data from a file. Shared arrays are a way to store data that can be accessed by all VUs.
We can create a users-config.js file to load the users data from a JSON file users.json:
[
    { "id": 1 },
    { "id": 2 },
    { "id": 3 },
    { "id": 4 },
    { "id": 5 },
    { "id": 6 },
    { "id": 7 },
    { "id": 8 },
    { "id": 9 },
    { "id": 10 }
]

// ./src/config/users-config.js
import { SharedArray } from 'k6/data';

export const users = new SharedArray('User data', function () {
  return JSON.parse(open('../data/users.json')); // Load from a file
});

And then we can use it in our test script jsonplaceholder-api-rest.js:
// ./src/rest/jsonplaceholder-api-rest.js
import { check, sleep } from "k6";
import { getPost } from "../utils/requests-jsonplaceholder.js";
import { options } from "../config/options.js";
import { users } from "../config/users-config.js";

// Function to test GET request
function testGetPost(id) {
  let res = getPost(id);
  check(res, {
    "GET status is 200": (r) => r.status === 200,
    "GET response has correct id": (r) => JSON.parse(r.body).id === id,
  });
}

// Main function
export default function () {
  const user = users[Math.floor(Math.random() * users.length)];

  testGetPost(user.id);
  sleep(1);
}


Project Structure
A well-organized project structure helps in maintaining and scaling your tests. Here's a suggested folder structure:
/project-root
│
├── /src
│   ├── /graphql
│   │   ├── pokemon-graphql-test.js      # Test for GraphQL Pokémon API
│   │   ├── other-graphql-test.js        # Other GraphQL tests
│   │
│   ├── /rest
│   │   ├── jsonplaceholder-api-rest.js # REST API test for JSONPlaceholder
│   │   ├── other-rest-test.js          # Another REST API test
│   │
│   └── performance-scenarios.js  # Script combining multiple performance tests
│
├── /utils
│   ├── requests-graphql-pokemon.js # Reusable functions for GraphQL requests
│   ├── requests-jsonplaceholder.js # Reusable functions for REST requests
│   ├── checks.js                   # Reusable validation functions
│   ├── constants.js                 # Global constants, like URLs or headers
│
├── /config
│   ├── options.js                # Global configuration options
│   ├── users-config.js           # Configuration for users data
│
├── /reports
│   └── results.html             # Output file for results (generated after running tests)
│
├── /data
│   └── users.json             # Users data
│
├── README.md                    # Project documentation
└── .gitignore                   # Files and folders ignored by Git
This structure helps in keeping your project organized, scalable, and easy to maintain, avoiding clutter in the project root.
Another option would be to group test scripts into folders by functionality. You can test and compare what makes the most sense for your context. For example, if your project is about a wallet that makes transactions, you could have a folder for each type of transaction (deposit, withdrawal, transfer, etc.) and inside each folder you could have the test scripts for that specific transaction:
/project-root
│
├── /src
│   ├── /deposit
│   │   ├── deposit-test-1.js      # Test for deposit
│   │   ├── deposit-test-2.js      # Another deposit test
│   │
│   ├── /withdrawal
│   │   ├── withdrawal-test-1.js      # Test for withdrawal
│   │   ├── withdrawal-test-2.js      # Another withdrawal test
│   │
│   ├── /transfer
│   │   ├── transfer-test-1.js      # Test for transfer
│   │   ├── transfer-test-2.js      # Another transfer test
│   │
│   └── performance-scenarios.js  # Script combining multiple performance tests
│
├── /utils
│   ├── requests-deposit.js # Reusable functions for deposit
│   ├── requests-withdrawal.js # Reusable functions for withdrawal
│   ├── requests-transfer.js # Reusable functions for transfer
│   ├── checks.js             # Reusable validation functions
│   ├── constants.js          # Global constants, like URLs or headers
│
├── /config
│   ├── options.js                # Global configuration options
│   ├── users-config.js           # Configuration for users data
│   ├── accounts-config.js        # Configuration for accounts data
│
├── /reports
│   └── results.html             # Output file for results (generated after running tests)
│
├── /data
│   └── users.json             # Users data
│   └── accounts.json          # Accounts data
│
├── README.md                    # Project documentation
└── .gitignore                   # Files and folders ignored by Git
In this second example, we have a more complex data structure, but we can still reuse the same request functions that we created for the first example.

Summary
Performance testing with K6 is critical for identifying bottlenecks and ensuring application scalability. By following best practices such as modularizing code, centralizing configurations, and using dynamic data, engineers can create maintainable and scalable performance testing scripts.
Key takeaways:

Grafana K6 is a modern load testing tool that integrates well with CI/CD pipelines.
Modular code organization makes tests more maintainable and reusable.
Centralized configuration simplifies managing test parameters and thresholds.
Dynamic data enables realistic test scenarios with parameterized inputs.
Well-structured projects scale better and are easier to maintain.
Web dashboard provides real-time visualization of test results.

By following these practices, performance engineers can build robust, scalable test suites that provide valuable insights into system performance.



RabbitMQ: Implementing Message Queues Correctly
Fedor S — Mon, 21 Oct 2024 15:27:00 GMT
Introduction: The Power of Message Queues
Ever watched an application buckle under a flood of user requests, wondering why your system can't keep up? In 2025, companies leveraging RabbitMQ for message queues reduced system failures by 75%, ensuring seamless communication between services in high-traffic environments. RabbitMQ, an open-source message broker, excels at decoupling applications, enabling asynchronous processing, and scaling workloads efficiently. From e-commerce platforms handling Black Friday surges to IoT systems managing sensor data, RabbitMQ is the backbone of reliable, distributed architectures, empowering developers to build resilient systems.
This article is the ultimate guide to RabbitMQ: Message Queues Done Right, following a team's journey from synchronous chaos to asynchronous mastery. With comprehensive Java and Python code examples, flow charts, case studies, and a sprinkle of humor, we'll cover every aspect of RabbitMQ—from core concepts to advanced patterns, real-world challenges, and failure scenarios. Whether you're a beginner integrating your first queue or an architect designing fault-tolerant systems, you'll learn how to harness RabbitMQ's power, sidestep pitfalls, and answer tricky questions. Let's dive in and master message queues the right way!

The Story: From Bottlenecks to Seamless Messaging
Meet Priya, a backend developer at a logistics startup building a delivery tracking system. Her team's synchronous REST APIs crumbled during peak hours, as order updates overwhelmed their database, causing delays and angry customers. A critical holiday season loomed, threatening disaster. Desperate, Priya turned to RabbitMQ, implementing message queues to decouple order processing from database writes. Tasks were processed asynchronously, latency dropped by 80%, and the system scaled effortlessly. Priya's journey mirrors RabbitMQ's rise since its 2007 debut, evolving from a niche tool to a cornerstone of modern architectures, powering giants like Reddit and CloudAMQP. Follow this guide to avoid Priya's bottlenecks and make RabbitMQ your messaging superpower.

Section 1: Understanding RabbitMQ
What Is RabbitMQ?
RabbitMQ is an open-source message broker that facilitates asynchronous communication between applications using the Advanced Message Queuing Protocol (AMQP). It acts as a middleman, receiving messages from producers (senders) and delivering them to consumers (receivers) via queues, ensuring reliable, decoupled, and scalable messaging.
Key components:

Exchange: Routes messages to queues based on rules (e.g., direct, topic).
Queue: Stores messages until consumed.
Binding: Links exchanges to queues with routing keys.
Producer: Sends messages to exchanges.
Consumer: Retrieves messages from queues.
Broker: The RabbitMQ server managing exchanges and queues.

Analogy: RabbitMQ is like a post office—producers drop letters (messages) at the exchange (mailbox), which routes them to queues (PO boxes) for consumers (recipients) to pick up, ensuring delivery even if the recipient is busy.
Why RabbitMQ Matters

Decoupling: Separates producers and consumers, reducing dependencies.
Scalability: Handles millions of messages with clustering and load balancing.
Reliability: Ensures message delivery with persistence and acknowledgments.
Flexibility: Supports multiple messaging patterns (e.g., pub/sub, work queues).
Cost Efficiency: Open-source, with low operational overhead.
Career Boost: RabbitMQ skills are in demand for distributed systems and DevOps roles.

Common Misconceptions

Myth: RabbitMQ is only for large systems. Truth: It benefits small apps by simplifying asynchronous tasks (e.g., email sending).
Myth: RabbitMQ is complex to set up. Truth: Modern tools like Docker make deployment straightforward.
Myth: Message queues guarantee instant delivery. Truth: They prioritize reliability and order, not real-time speed.

Real-World Challenge: Teams often misuse synchronous APIs for tasks better suited to queues, causing bottlenecks.
Solution: Use RabbitMQ for async tasks like order processing or notifications.
Takeaway: RabbitMQ decouples systems, ensuring reliable, scalable messaging.

Section 2: How RabbitMQ Works
The Messaging Workflow

Producer Sends Message: Publishes a message to an exchange with a routing key.
Exchange Routes Message: Directs the message to one or more queues based on exchange type and bindings.
Queue Stores Message: Holds messages until a consumer is ready.
Consumer Processes Message: Retrieves and processes messages, sending acknowledgments.
Broker Manages Delivery: Ensures persistence and reliability.

Core Concepts
Exchange Types:

Direct: Routes based on exact routing key match.
Topic: Routes using pattern-based keys (e.g., order.*).
Fanout: Broadcasts to all bound queues.
Headers: Routes based on message header attributes.

Queue Properties:

Durable: Survives broker restarts.
Exclusive: Used by only one connection.
Auto-delete: Deleted when no longer in use.

Message Acknowledgments:

Automatic ACK: Message removed immediately after delivery.
Manual ACK: Consumer confirms processing before removal.


Section 3: Basic Implementation
Setting Up RabbitMQ
The easiest way to get started is using Docker:
docker run -d --name rabbitmq -p 5672:5672 -p 15672:15672 rabbitmq:3-management

This starts RabbitMQ with the management UI accessible at http://localhost:15672 (default credentials: guest/guest).
Java Example with Spring AMQP
Producer:
@Configuration
public class RabbitConfig {
    @Bean
    public Queue orderQueue() {
        return QueueBuilder.durable("order.queue").build();
    }

    @Bean
    public DirectExchange orderExchange() {
        return new DirectExchange("order.exchange");
    }

    @Bean
    public Binding orderBinding() {
        return BindingBuilder
            .bind(orderQueue())
            .to(orderExchange())
            .with("order.created");
    }
}

@Service
public class OrderService {
    @Autowired
    private RabbitTemplate rabbitTemplate;

    public void createOrder(Order order) {
        rabbitTemplate.convertAndSend(
            "order.exchange",
            "order.created",
            order
        );
    }
}

Consumer:
@Component
public class OrderConsumer {
    @RabbitListener(queues = "order.queue")
    public void handleOrder(Order order) {
        System.out.println("Processing order: " + order.getId());
        // Process order...
    }
}

Python Example with Pika
Producer:
import pika
import json

connection = pika.BlockingConnection(
    pika.ConnectionParameters('localhost')
)
channel = connection.channel()

channel.queue_declare(queue='order.queue', durable=True)

order = {
    'id': '12345',
    'userId': 'user789',
    'amount': 99.99
}

channel.basic_publish(
    exchange='order.exchange',
    routing_key='order.created',
    body=json.dumps(order),
    properties=pika.BasicProperties(delivery_mode=2)  # Persistent
)

connection.close()

Consumer:
import pika
import json

def callback(ch, method, properties, body):
    order = json.loads(body)
    print(f"Processing order: {order['id']}, User: {order['userId']}, Amount: {order['amount']}")
    ch.basic_ack(delivery_tag=method.delivery_tag)

connection = pika.BlockingConnection(
    pika.ConnectionParameters('localhost')
)
channel = connection.channel()

channel.queue_declare(queue='order.queue', durable=True)
channel.basic_qos(prefetch_count=1)
channel.basic_consume(
    queue='order.queue',
    on_message_callback=callback
)

channel.start_consuming()


Section 4: Advanced Patterns
Dead Letter Exchange (DLX)
When a message cannot be delivered or processed, it can be routed to a Dead Letter Exchange for analysis or retry.
Use Cases:

Handling failed message processing
Implementing retry mechanisms
Debugging message routing issues

Message Priorities
RabbitMQ supports priority queues where higher-priority messages are processed first.
Implementation:
@Bean
public Queue priorityQueue() {
    Map args = new HashMap<>();
    args.put("x-max-priority", 10);
    return QueueBuilder.durable("priority.queue")
        .withArguments(args)
        .build();
}

Clustering
RabbitMQ can be clustered across multiple nodes for high availability and load distribution.
Benefits:

Fault tolerance
Load balancing
Horizontal scaling


Section 5: Common Challenges and Solutions
Challenge 1: Message Loss
Problem: Messages are lost due to consumer crashes or network failures.
Symptoms: Missing orders, incomplete logs.
Solution:

Enable durable queues and persistent messages.
Use manual ACKs to confirm processing.
Configure DLX for failed messages.

Prevention: Test failure scenarios with chaos engineering.
Failure Case: Non-durable queues lose messages on restart.
Recovery: Recreate queues with durability and resend messages.
Challenge 2: Queue Backlogs
Problem: Slow consumers cause message pileups.
Symptoms: High queue lengths, delayed processing.
Solution:

Scale consumers with multiple workers.
Use prefetch (basic.qos) to limit unprocessed messages.
Monitor with RabbitMQ's management UI or Prometheus.

Prevention: Profile consumer performance and optimize code.
Failure Case: Backlogs overwhelm memory.
Recovery: Add queue limits and DLX routing.
Challenge 3: Connection Failures
Problem: Clients lose connection to RabbitMQ.
Symptoms: Producer errors, stalled consumers.
Solution:

Implement retry logic with exponential backoff.
Use connection pooling and automatic reconnection.
Configure heartbeats to detect dead connections.

Prevention: Configure heartbeats and monitor connections.
Failure Case: Retries overload the broker.
Recovery: Use exponential backoff in retries.
Challenge 4: Misconfigured Exchanges
Problem: Incorrect exchange types or bindings drop messages.
Symptoms: Messages not reaching consumers.
Solution:

Validate configurations in staging.
Use management UI to inspect bindings.
Log dropped messages to DLX.

Prevention: Document exchange/queue setups.
Failure Case: Topic exchange wildcards mismatch.
Recovery: Adjust routing keys and rebind queues.
Tricky Question: How do you handle duplicate messages?
Answer: Use idempotency:

Add unique message IDs.
Track processed IDs in a database or cache.
Ignore duplicates in consumers.

Risk: Database overhead for ID checks.
Solution: Use in-memory caches like Redis for performance.

Section 6: Best Practices
Reliability

Always use durable queues for important messages.
Enable message persistence for critical data.
Implement manual acknowledgments.
Set up Dead Letter Exchanges for error handling.

Performance

Use prefetch to control consumer load.
Scale consumers horizontally.
Monitor queue lengths and processing times.
Optimize message sizes.

Security

Enable TLS for encrypted connections.
Use strong credentials and limit access.
Configure virtual hosts for isolation.
Regularly update RabbitMQ versions.


Section 7: Monitoring and Management
Management UI
RabbitMQ provides a web-based management interface for:

Viewing queues, exchanges, and bindings
Monitoring message rates and queue lengths
Managing users and permissions
Viewing connection and channel information

Metrics to Monitor

Queue length: Number of messages waiting
Message rate: Messages per second
Consumer utilization: Active vs. idle consumers
Connection count: Number of active connections
Memory usage: Broker memory consumption

Integration with Monitoring Tools

Prometheus: Export metrics for alerting
Grafana: Visualize metrics with dashboards
ELK Stack: Centralized logging and analysis


Section 8: FAQs
Q: When should I use RabbitMQ vs. Kafka?
A: Use RabbitMQ for task queues and pub/sub, Kafka for high-throughput event streaming.
Q: Can RabbitMQ handle real-time messaging?
A: Yes, but it prioritizes reliability over sub-millisecond latency.
Q: How do I secure RabbitMQ?
A: Enable TLS, use strong credentials, and configure vhosts/users.
Q: What if a consumer processes messages slowly?
A: Scale consumers, optimize code, or use prefetch to limit load.
Q: How do I monitor RabbitMQ?
A: Use the management UI, Prometheus, or Grafana for metrics.
Q: Can RabbitMQ run in the cloud?
A: Yes, via CloudAMQP or self-hosted on AWS/GCP.

Section 9: Quick Reference Checklist

[ ] Install RabbitMQ with Docker.
[ ] Configure durable queues and persistent messages.
[ ] Use Spring AMQP or Pika for integration.
[ ] Set up exchanges, queues, and bindings.
[ ] Enable manual ACKs and DLX.
[ ] Monitor with management UI or Prometheus.
[ ] Test failure cases (e.g., consumer crashes).
[ ] Scale with clustering and multiple consumers.


Section 10: Conclusion
RabbitMQ is message queuing done right, enabling decoupled, scalable, and reliable systems. From task queues to pub/sub, this guide has equipped you to implement RabbitMQ, tackle challenges, and optimize for real-world demands. By addressing every failure case, tricky question, and advanced technique, you're ready to transform your applications, whether you're building a startup or scaling an enterprise.
Call to Action: Start now! Set up RabbitMQ, send your first message, and share your insights. Master message queues and make RabbitMQ your superpower!

Additional Resources
Books

RabbitMQ in Action by Alvaro Videla: Comprehensive RabbitMQ guide.
Enterprise Integration Patterns by Gregor Hohpe: Messaging patterns.

Tools

RabbitMQ: Message broker (Pros: Flexible, reliable; Cons: Config-heavy).
Spring AMQP: Java integration (Pros: Easy; Cons: Spring-focused).
Pika: Python client (Pros: Lightweight; Cons: Manual config).
Prometheus: Monitoring (Pros: Robust; Cons: Setup effort).

Communities

r/rabbitmq
Stack Overflow
RabbitMQ Users Group


Glossary

RabbitMQ: Open-source message broker using AMQP.
Exchange: Routes messages to queues.
Queue: Stores messages for consumers.
Producer: Sends messages.
Consumer: Processes messages.
DLX: Dead Letter Exchange for undeliverable messages.
ACK: Acknowledgment confirming message processing.




Go: Understanding Concurrency Internals and the Runtime Scheduler
Fedor S — Sat, 14 Sep 2024 11:40:00 GMT
Here's where we started this book:

Functions that run with go are called goroutines. The Go runtime juggles these goroutines and distributes them among operating system threads running on CPU cores. Compared to OS threads, goroutines are lightweight, so you can create hundreds or thousands of them.

That's generally correct, but it's a little too brief. In this chapter, we'll take a closer look at how goroutines work. We'll still use a simplified model, but it should help you understand how everything fits together.

Concurrency
At the hardware level, CPU cores are responsible for running parallel tasks. If a processor has 4 cores, it can run 4 instructions at the same time — one on each core.
  instr A     instr B     instr C     instr D
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ Core 1  │ │ Core 2  │ │ Core 3  │ │ Core 4  │ CPU
└─────────┘ └─────────┘ └─────────┘ └─────────┘
At the operating system level, a thread is the basic unit of execution. There are usually many more threads than CPU cores, so the operating system's scheduler decides which threads to run and which ones to pause. The scheduler keeps switching between threads to make sure each one gets a turn to run on a CPU, instead of waiting in line forever. This is how the operating system handles concurrency.
┌──────────┐              ┌──────────┐
│ Thread E │              │ Thread F │              OS
└──────────┘              └──────────┘
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Thread A │ │ Thread B │ │ Thread C │ │ Thread D │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
     │           │           │           │
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Core 1   │ │ Core 2   │ │ Core 3   │ │ Core 4   │ CPU
└──────────┘ └──────────┘ └──────────┘ └──────────┘
At the Go runtime level, a goroutine is the basic unit of execution. The runtime scheduler runs a fixed number of OS threads, often one per CPU core. There can be many more goroutines than threads, so the scheduler decides which goroutines to run on the available threads and which ones to pause. The scheduler keeps switching between goroutines to make sure each one gets a turn to run on a thread, instead of waiting in line forever. This is how Go handles concurrency.
┌─────┐┌─────┐┌─────┐┌─────┐┌─────┐┌─────┐
│ G15 ││ G16 ││ G17 ││ G18 ││ G19 ││ G20 │
└─────┘└─────┘└─────┘└─────┘└─────┘└─────┘
┌─────┐      ┌─────┐      ┌─────┐      ┌─────┐
│ G11 │      │ G12 │      │ G13 │      │ G14 │      Go runtime
└─────┘      └─────┘      └─────┘      └─────┘
  │            │            │            │
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Thread A │ │ Thread B │ │ Thread C │ │ Thread D │ OS
└──────────┘ └──────────┘ └──────────┘ └──────────┘
The Go runtime scheduler doesn't decide which threads run on the CPU — that's the operating system scheduler's job. The Go runtime makes sure all goroutines run on the threads it manages, but the OS controls how and when those threads actually get CPU time.

Goroutine Scheduler
The scheduler's job is to run M goroutines on N operating system threads, where M can be much larger than N. Here's a simple way to do it:

Put all goroutines in a queue.
Take N goroutines from the queue and run them.
If a running goroutine gets blocked (for example, waiting to read from a channel or waiting on a mutex), put it back in the queue and run the next goroutine from the queue.

Take goroutines G11-G14 and run them:
┌─────┐┌─────┐┌─────┐┌─────┐┌─────┐┌─────┐
│ G15 ││ G16 ││ G17 ││ G18 ││ G19 ││ G20 │          queue
└─────┘└─────┘└─────┘└─────┘└─────┘└─────┘
┌─────┐      ┌─────┐      ┌─────┐      ┌─────┐
│ G11 │      │ G12 │      │ G13 │      │ G14 │      running
└─────┘      └─────┘      └─────┘      └─────┘
  │            │            │            │
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Thread A │ │ Thread B │ │ Thread C │ │ Thread D │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
Goroutine G12 got blocked while reading from the channel. Put it back in the queue and replace it with G15:
┌─────┐┌─────┐┌─────┐┌─────┐┌─────┐┌─────┐
│ G16 ││ G17 ││ G18 ││ G19 ││ G20 ││ G12 │          queue
└─────┘└─────┘└─────┘└─────┘└─────┘└─────┘
┌─────┐      ┌─────┐      ┌─────┐      ┌─────┐
│ G11 │      │ G15 │      │ G13 │      │ G14 │      running
└─────┘      └─────┘      └─────┘      └─────┘
  │            │            │            │
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Thread A │ │ Thread B │ │ Thread C │ │ Thread D │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
But there are a few things to keep in mind.
Starvation
Let's say goroutines G11–G14 are running smoothly without getting blocked by mutexes or channels. Does that mean goroutines G15–G20 won't run at all and will just have to wait (starve) until one of G11–G14 finally finishes? That would be unfortunate.
That's why the scheduler checks each running goroutine roughly every 10 ms to decide if it's time to pause it and put it back in the queue. This approach is called preemptive scheduling: the scheduler can interrupt running goroutines when needed so others have a chance to run too.
System Calls
The scheduler can manage a goroutine while it's running Go code. But what happens if a goroutine makes a system call, like reading from disk? In that case, the scheduler can't take the goroutine off the thread, and there's no way to know how long the system call will take. For example, if goroutines G11–G14 in our example spend a long time in system calls, all worker threads will be blocked, and the program will basically "freeze".
To solve this problem, the scheduler starts new threads if the existing ones get blocked in a system call. For example, here's what happens if G11 and G12 make system calls:
┌─────┐┌─────┐┌─────┐┌─────┐
│ G17 ││ G18 ││ G19 ││ G20 │                        queue
└─────┘└─────┘└─────┘└─────┘

┌─────┐      ┌─────┐      ┌─────┐      ┌─────┐
│ G15 │      │ G16 │      │ G13 │      │ G14 │      running
└─────┘      └─────┘      └─────┘      └─────┘
  │            │            │            │
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Thread E │ │ Thread F │ │ Thread C │ │ Thread D │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
  │            │
┌──────────┐ ┌──────────┐
│ Thread A │ │ Thread B │              blocked in syscall
└──────────┘ └──────────┘
The scheduler created two new threads (E and F) to keep running goroutines while threads A and B are blocked in system calls. Once the system calls finish, threads A and B can be reused for other goroutines.

GOMAXPROCS
The GOMAXPROCS environment variable (or the runtime.GOMAXPROCS function) controls how many OS threads the Go scheduler can use to run goroutines. By default, it's set to the number of CPU cores.
For example, on a 4-core machine, GOMAXPROCS is 4 by default. This means the scheduler can use up to 4 OS threads to run goroutines. If you set GOMAXPROCS to 2, the scheduler will only use 2 threads, even if you have 4 cores.
You usually don't need to change GOMAXPROCS. The default value works well for most programs. However, if your program spends a lot of time waiting (for example, waiting for network I/O), you might benefit from increasing GOMAXPROCS to allow more goroutines to run concurrently.

Concurrency Primitives
The Go scheduler interacts with various concurrency primitives:

Channels: When a goroutine blocks on a channel operation, the scheduler moves it out of the running state and puts it in a waiting state. When the channel operation can proceed, the scheduler moves the goroutine back to the runnable state.

Mutexes: When a goroutine tries to lock a mutex that's already locked, the scheduler blocks the goroutine until the mutex becomes available.

WaitGroups: When a goroutine calls Wait() on a wait group, the scheduler blocks it until the wait group counter reaches zero.

Timers and Tickers: The scheduler manages timers and tickers, waking up goroutines when their time expires.

System Calls: When a goroutine makes a blocking system call, the scheduler may create a new OS thread to keep other goroutines running.



Scheduler Metrics
The Go runtime provides several metrics that can help you understand how the scheduler is performing:

runtime.NumGoroutine(): Returns the number of goroutines that currently exist.
runtime.NumCPU(): Returns the number of logical CPUs available.
runtime.GOMAXPROCS(): Returns the current value of GOMAXPROCS.

You can also use the runtime/metrics package to get more detailed metrics about the scheduler and goroutines.

Profiling
Profiling helps you understand where your program spends time and memory. Go provides built-in profiling support through the runtime/pprof package.
CPU Profile
A CPU profile shows which functions consume the most CPU time. To collect a CPU profile, you can use the go test command with the -cpuprofile flag:
go test -cpuprofile=cpu.prof ./...

Or you can enable the profiling HTTP server in your program:
import (
    _ "net/http/pprof"
    "net/http"
)

func main() {
    go func() {
        log.Println(http.ListenAndServe("localhost:6060", nil))
    }()
    // ... rest of your program
}

Then you can collect profiles by visiting URLs like:

http://localhost:6060/debug/pprof/profile?seconds=30 for CPU profile
http://localhost:6060/debug/pprof/heap for heap profile
http://localhost:6060/debug/pprof/goroutine for goroutine profile

Heap Profile
A heap profile shows memory allocations. You can collect it similarly:
go test -memprofile=heap.prof ./...

Block Profile
A block profile shows where goroutines are blocked (waiting on channels, mutexes, etc.). To enable it:
runtime.SetBlockProfileRate(1)

Then collect the profile:
go tool pprof http://localhost:6060/debug/pprof/block

Mutex Profile
A mutex profile shows contention on mutexes. To enable it:
runtime.SetMutexProfileFraction(1)

Then collect the profile:
go tool pprof http://localhost:6060/debug/pprof/mutex

Viewing Profiles
You can view profiles using the go tool pprof command:
go tool pprof -http=localhost:8080 cpu.prof

This opens a web interface where you can view:

Flame graphs: Show the call hierarchy and resource usage
Source view: Shows the exact lines of code
Top functions: Lists functions by resource usage


Tracing
Tracing records certain types of events while the program is running, mainly those related to concurrency and memory:

goroutine creation and state changes;
system calls;
garbage collection;
heap size changes;
and more.

If you enabled the profiling server as described earlier, you can collect a trace using this URL:
http://localhost:6060/debug/pprof/trace?seconds=N
Trace files can be quite large, so it's better to use a small N value.
After tracing is complete, you'll get a binary file that you can open in the browser using the go tool trace utility:
go tool trace -http=localhost:6060 trace.out

In the trace web interface, you'll see each goroutine's "lifecycle" on its own line. You can zoom in and out of the trace with the W and S keys, and you can click on any event to see more details.
You can also collect a trace manually:
import (
    "os"
    "runtime/trace"
)

func main() {
    // Start tracing and stop it when main exits.
    file, _ := os.Create("trace.out")
    defer file.Close()
    trace.Start(file)
    defer trace.Stop()

    // The rest of the program code.
    // ...
}

Flight Recorder
Flight recording is a tracing technique that collects execution data, such as function calls and memory allocations, within a sliding window that's limited by size or duration. It helps to record traces of interesting program behavior, even if you don't know in advance when it will happen.
The trace.FlightRecorder type (Go 1.25+) implements a flight recorder in Go. It tracks a moving window over the execution trace produced by the runtime, always containing the most recent trace data.
Here's an example of how you might use it.
First, configure the sliding window:
// Configure the flight recorder to keep
// at least 5 seconds of trace data,
// with a maximum buffer size of 3MB.
// Both of these are hints, not strict limits.
cfg := trace.FlightRecorderConfig{
    MinAge:   5 * time.Second,
    MaxBytes: 3 << 20, // 3MB
}

Then create the recorder and start it:
// Create and start the flight recorder.
rec := trace.NewFlightRecorder(cfg)
rec.Start()
defer rec.Stop()

Continue with the application code as usual:
// Simulate some workload.
done := make(chan struct{})
go func() {
    defer close(done)
    const n = 1 << 20
    var s []int
    for range n {
        s = append(s, rand.IntN(n))
    }
    fmt.Printf("done filling slice of %d elements\n", len(s))
}()
<-done

Finally, save the trace snapshot to a file when an important event occurs:
// Save the trace snapshot to a file.
file, _ := os.Create("/tmp/trace.out")
defer file.Close()
n, _ := rec.WriteTo(file)
fmt.Printf("wrote %dB to trace file\n", n)

done filling slice of 1048576 elements
wrote 8441B to trace file
Use go tool trace to view the trace in the browser:
go tool trace -http=localhost:6060 /tmp/trace.out


Summary
Now you can see how challenging the Go scheduler's job is. Fortunately, most of the time you don't need to worry about how it works behind the scenes — sticking to goroutines, channels, select, and other synchronization primitives is usually enough.
Key points to remember:

CPU cores execute instructions in parallel at the hardware level.
OS threads are managed by the operating system scheduler.
Goroutines are managed by the Go runtime scheduler.
The Go scheduler uses preemptive scheduling to prevent starvation.
System calls may require creating additional OS threads.
GOMAXPROCS controls the number of OS threads used by the scheduler.
Profiling and tracing help you understand scheduler behavior and performance.

The Go scheduler is a sophisticated piece of software that handles the complex task of managing thousands of goroutines efficiently. Understanding its internals can help you write better concurrent programs, but in most cases, you can rely on it to do its job without intervention.



Go: Atomic Operations and Lock-Free Programming Techniques
Fedor S — Wed, 28 Aug 2024 09:05:00 GMT
Some concurrent operations don't require explicit synchronization. We can use these to create lock-free types and functions that are safe to use from multiple goroutines. Let's dive into the topic!

Non-Atomic Increment
Suppose multiple goroutines increment a shared counter:
total := 0

var wg sync.WaitGroup
for range 5 {
    wg.Go(func() {
        for range 10000 {
            total++
        }
    })
}
wg.Wait()

fmt.Println("total", total)

total 40478
There are 5 goroutines, and each one increments total 10,000 times, so the final result should be 50,000. But it's usually less. Let's run the code a few more times:
total 26775
total 22978
total 30357
The race detector is reporting a problem:
$ go run -race total.go
==================
WARNING: DATA RACE
...
==================
total 33274
Found 1 data race(s)
This might seem strange — shouldn't the total++ operation be atomic? Actually, it's not. It involves three steps (read-modify-write):

Read the current value of total.
Add one to it.
Write the new value back to total.

If two goroutines both read the value 42, then each increments it and writes it back, the new total will be 43 instead of 44 like it should be. As a result, some increments to the counter will be lost, and the final value will be less than 50,000.
As we talked about in the Race conditions chapter, you can make an operation atomic by using mutexes or other synchronization tools. But for this chapter, let's agree not to use them. Here, when I say "atomic operation", I mean an operation that doesn't require the caller to use explicit locks, but is still safe to use in a concurrent environment.

Atomic Operations
An operation without synchronization can only be truly atomic if it translates to a single processor instruction. Such operations don't need locks and won't cause issues when called concurrently (even the write operations).
In a perfect world, every operation would be atomic, and we wouldn't have to deal with mutexes. But in reality, there are only a few atomics, and they're all found in the sync/atomic package. This package provides a set of atomic types:

Bool — a boolean value;
Int32/Int64 — a 4- or 8-byte integer;
Uint32/Uint64 — a 4- or 8-byte unsigned integer;
Value — a value of any type;
Pointer — a pointer to a value of type T (generic).

Each atomic type provides the following methods:
Load reads the value of a variable, Store sets a new value:
var n atomic.Int32
n.Store(10)
fmt.Println("Store", n.Load())

Store 10
Swap sets a new value (like Store) and returns the old one:
var n atomic.Int32
n.Store(10)
old := n.Swap(42)
fmt.Println("Swap", old, "->", n.Load())

Swap 10 -> 42
CompareAndSwap sets a new value only if the current value is still what you expect it to be:
var n atomic.Int32
n.Store(10)
swapped := n.CompareAndSwap(10, 42)
fmt.Println("CompareAndSwap 10 -> 42:", swapped)
fmt.Println("n =", n.Load())

CompareAndSwap 10 -> 42: true
n = 42
var n atomic.Int32
n.Store(10)
swapped := n.CompareAndSwap(33, 42)
fmt.Println("CompareAndSwap 33 -> 42:", swapped)
fmt.Println("n =", n.Load())

CompareAndSwap 33 -> 42: false
n = 10
Numeric types also provide an Add method that increments the value by the specified amount:
var n atomic.Int32
n.Store(10)
n.Add(32)
fmt.Println("Add 32:", n.Load())

Add 32: 42
And the And/Or methods for bitwise operations (Go 1.23+):
const (
    modeRead  = 0b100
    modeWrite = 0b010
    modeExec  = 0b001
)

var mode atomic.Int32
mode.Store(modeRead)
old := mode.Or(modeWrite)

fmt.Printf("mode: %b -> %b\n", old, mode.Load())

mode: 100 -> 110
All methods are translated to a single CPU instruction, so they are safe for concurrent calls.

Strictly speaking, this isn't always true. Not all processors support the full set of concurrent operations, so sometimes more than one instruction is needed. But we don't have to worry about that — Go guarantees the atomicity of sync/atomic operations for the caller. It uses low-level mechanisms specific to each processor architecture to do this.

Like other synchronization primitives, each atomic variable has its own internal state. So, you should only pass it as a pointer, not by value, to avoid accidentally copying the state.
When using atomic.Value, all loads and stores should use the same concrete type. The following code will cause a panic:
var v atomic.Value
v.Store(10)
v.Store("hi")

panic: sync/atomic: store of inconsistently typed value into Value
Now, let's go back to the counter program:
total := 0

var wg sync.WaitGroup
for range 5 {
    wg.Go(func() {
        for range 10000 {
            total++
        }
    })
}
wg.Wait()

fmt.Println("total", total)

And rewrite it to use an atomic counter:
var total atomic.Int32

var wg sync.WaitGroup
for range 5 {
    wg.Go(func() {
        for range 10000 {
            total.Add(1)
        }
    })
}
wg.Wait()

fmt.Println("total", total.Load())

total 50000
Much better!

Atomics Composition
An atomic operation in a concurrent program is a great thing. Such operation usually transforms into a single processor instruction, and it does not require locks. You can safely call it from different goroutines and receive a predictable result.
But what happens if you combine atomic operations? Let's find out.
Atomicity
Let's look at a function that increments a counter:
var counter int32

// increment increases the counter value by two.
func increment() {
    counter += 1
    sleep(10)
    counter += 1
}

// sleep pauses the current goroutine for up to maxMs ms.
func sleep(maxMs int) {
    dur := time.Duration(rand.IntN(maxMs)) * time.Millisecond
    time.Sleep(dur)
}

As you already know, increment isn't safe to call from multiple goroutines because counter += 1 causes a data race.
Now I will try to fix the problem and propose several options. In each case, answer the question: if you call increment from 100 goroutines, is the final value of the counter guaranteed?
Example 1:
var counter atomic.Int32

func increment() {
    counter.Add(1)
    sleep(10)
    counter.Add(1)
}

counter = 200
Is the counter value guaranteed?
Answer: It is guaranteed.
Example 2:
var counter atomic.Int32

func increment() {
    if counter.Load()%2 == 0 {
        sleep(10)
        counter.Add(1)
    } else {
        sleep(10)
        counter.Add(2)
    }
}

counter = 184
Is the counter value guaranteed?
Answer: It's not guaranteed.
Example 3:
var delta atomic.Int32
var counter atomic.Int32

func increment() {
    delta.Add(1)
    sleep(10)
    counter.Add(delta.Load())
}

counter = 9386
Is the counter value guaranteed?
Answer: It's not guaranteed.
Composition
People sometimes think that the composition of atomic operations also magically becomes an atomic operation. But it doesn't.
For example, the second of the above examples:
var counter atomic.Int32

func increment() {
    if counter.Load()%2 == 0 {
        sleep(10)
        counter.Add(1)
    } else {
        sleep(10)
        counter.Add(2)
    }
}

Call increment 100 times from different goroutines:
var wg sync.WaitGroup
for range 100 {
    wg.Go(increment)
}
wg.Wait()
fmt.Println("counter =", counter.Load())

Run the program with the -race flag — there are no races:
% go run -race atomic-2.go
192
% go run -race atomic-2.go
191
% go run -race atomic-2.go
189
But can we be sure what the final value of counter will be? Nope. counter.Load and counter.Add calls are interleaved from different goroutines. This causes a race condition (not to be confused with a data race) and leads to an unpredictable counter value.
Check yourself by answering the question: in which example is increment an atomic operation?
Answer: In none of them.
Sequence Independence
In all examples, increment is not an atomic operation. The composition of atomics is always non-atomic.
The first example, however, guarantees the final value of the counter in a concurrent environment:
var counter atomic.Int32

func increment() {
    counter.Add(1)
    sleep(10)
    counter.Add(1)
}

If we run 100 goroutines, the counter will ultimately equal 200.
The reason is that Add is a sequence-independent operation. The runtime can perform such operations in any order, and the result will not change.
The second and third examples use sequence-dependent operations. When we run 100 goroutines, the order of operations is different each time. Therefore, the result is also different.
A bulletproof way to make a composite operation atomic and prevent race conditions is to use a mutex:
var delta int32
var counter int32
var mu sync.Mutex

func increment() {
    mu.Lock()
    delta += 1
    sleep(10)
    counter += delta
    mu.Unlock()
}

// After 100 concurrent increments, the final value is guaranteed:
// counter = 1+2+...+100 = 5050

counter = 5050
But sometimes an atomic variable with CompareAndSwap is all you need. Let's look at an example.

Atomic Instead of Mutex
Let's say we have a gate that needs to be closed:
type Gate struct {
    closed bool // gate state
}

func (g *Gate) Close() {
    if g.closed {
        return // ignore repeated calls
    }
    g.closed = true
    // free resources
}

func work() {
    var g Gate
    defer g.Close()
    // do something while the gate is open
}

In a concurrent environment, there are data races on the closed field. We can fix this with a mutex:
type Gate struct {
    closed bool
    mu sync.Mutex // protects the state
}

func (g *Gate) Close() {
    g.mu.Lock()
    defer g.mu.Unlock()
    if g.closed {
        return // ignore repeated calls
    }
    g.closed = true
    // free resources
}

Alternatively, we can use CompareAndSwap on an atomic Bool instead of a mutex:
type Gate struct {
    closed atomic.Bool
}

func (g *Gate) Close() {
    if !g.closed.CompareAndSwap(false, true) {
        return // ignore repeated calls
    }
    // The gate is closed.
    // We can free resources now.
}

The Gate type is now more compact and simple.
This isn't a very common use case — we usually want a goroutine to wait on a locked mutex and continue once it's unlocked. But for "early exit" situations, it's perfect.

Summary
Atomics are a specialized but useful tool. You can use them for simple counters and flags, but be very careful when using them for more complex operations. You can also use them instead of mutexes to exit early.
Key points to remember:

Atomic operations translate to single CPU instructions and don't require locks.
Composition of atomics is not atomic — combining multiple atomic operations doesn't make the whole operation atomic.
Sequence-independent operations like Add can be safely composed, while sequence-dependent operations cannot.
CompareAndSwap can replace mutexes in "early exit" scenarios.
Use mutexes when you need to make composite operations atomic.

Atomics provide a powerful way to create lock-free concurrent code, but they require careful understanding of their limitations and proper use.



Go: Using Context for Request Cancellation and Timeout Management
Fedor S — Wed, 03 Jul 2024 08:20:00 GMT
In programming, context refers to information about the environment in which an object exists or a function executes. In Go, context usually refers to the Context interface from the context package. It was originally designed to make working with HTTP requests easier. However, contexts can also be used in regular concurrent code. Let's see how exactly.

Canceling with Channel
Suppose we have an execute() function, which can run a given function and supports cancellation:
// execute runs fn in a separate goroutine
// and waits for the result unless canceled.
func execute(cancel <-chan struct{}, fn func() int) (int, error) {
    ch := make(chan int, 1)

    go func() {
        ch <- fn()
    }()

    select {
    case res := <-ch:
        return res, nil
    case <-cancel:
        return 0, errors.New("canceled")
    }
}

Everything is familiar here:

The function takes a channel through which it can receive a cancellation signal.
It runs fn() in a separate goroutine.
It uses select to wait for fn() to complete or cancel, whichever occurs first.

Let's write a client that cancels operations with a 50% probability:
// work does something for 100 ms.
func work() int {
    time.Sleep(100 * time.Millisecond)
    fmt.Println("work done")
    return 42
}

// maybeCancel waits for 50 ms and cancels with 50% probability.
func maybeCancel(cancel chan struct{}) {
    time.Sleep(50 * time.Millisecond)
    if rand.Float32() < 0.5 {
        close(cancel)
    }
}

func main() {
    cancel := make(chan struct{})
    go maybeCancel(cancel)

    res, err := execute(cancel, work)
    fmt.Println(res, err)
}

work done
42 
Run it a few times:
0 canceled
work done
42 
0 canceled
work done
42 
No surprises here.
Now let's reimplement execute with context.

Canceling with Context
The main purpose of context in Go is to cancel operations.
Let's reimplement what we just did with a cancel channel – this time with a context. The execute() function accepts a context ctx instead of a cancel channel:
// execute runs fn in a separate goroutine
// and waits for the result unless canceled.
func execute(ctx context.Context, fn func() int) (int, error) {
    ch := make(chan int, 1)

    go func() {
        ch <- fn()
    }()

    select {
    case res := <-ch:
        return res, nil
    case <-ctx.Done():       // (1)
        return 0, ctx.Err()  // (2)
    }
}

The code has barely changed:

➊ Instead of the cancel channel, the cancellation signal comes from the ctx.Done() channel
➋ Instead of manually creating a "canceled" error, we return ctx.Err()

The client also changes slightly:
// maybeCancel waits for 50 ms and cancels with 50% probability.
func maybeCancel(cancel func()) {
    time.Sleep(50 * time.Millisecond)
    if rand.Float32() < 0.5 {
        cancel()
    }
}

func main() {
    ctx := context.Background()              // (1)
    ctx, cancel := context.WithCancel(ctx)   // (2)
    defer cancel()                           // (3)

    go maybeCancel(cancel)                   // (4)

    res, err := execute(ctx, work)           // (5)
    fmt.Println(res, err)
}

work done
42 
Here's what we do:

➊ Create an empty context with context.Background().
➋ Create a manual cancel context based on the empty context with context.WithCancel().
➌ Schedule a deferred cancel when main() exits.
➍ Cancel the context with a 50% probability.
➎ Pass the context to the execute() function.

context.WithCancel() returns the context itself and a cancel function to cancel it. Calling cancel() releases the resources occupied by the context and closes the ctx.Done() channel — we use this effect to interrupt execute(). If the context is canceled, ctx.Err() returns an error (in our case context.Canceled).
All in all, it works exactly the same as the previous version with the cancel channel:
work done
42 
0 context canceled
0 context canceled
work done
42 
A few nuances that were not present with the cancel channel:
Context is layered. A context object is immutable. To add new properties to a context, a new (child) context is created based on the old (parent) context. That's why we first created an empty context and then a cancel context based on it:
// parent context
ctx := context.Background()

// child context
ctx, cancel := context.WithCancel(ctx)

If the parent context is canceled, all child contexts are canceled as well (but not vice versa):
// parent context
parentCtx, parentCancel := context.WithCancel(context.Background())

// child context
childCtx, childCancel := context.WithCancel(parentCtx)

// parentCancel() cancels both parentCtx and childCtx.
// childCancel() cancels only childCtx.

Multiple cancels are safe. If you call close() on a channel twice, it will cause a panic. However, you can call cancel() on the context as many times as you want. The first cancel will work, and the rest will be ignored. This is convenient because you can schedule a deferred cancel() right after creating the context, and explicitly cancel the context if necessary (as we did in the maybeCancel function). This wouldn't be possible with a channel.

Timeout
The real power of context is its ability to handle both manual cancellation and timeouts.
// execute runs fn in a separate goroutine
// and waits for the result unless canceled.
func execute(ctx context.Context, fn func() int) (int, error) {
    // remains unchanged
}

// work does something for 100 ms.
func work() int {
    // remains unchanged
}

With a timeout of 150 ms, work() completes on time:
func main() {
    timeout := 150 * time.Millisecond
    ctx, cancel := context.WithTimeout(context.Background(), timeout)  // (1)
    defer cancel()

    res, err := execute(ctx, work)
    fmt.Println(res, err)
}

work done
42 
With a timeout of 50 ms, execution gets canceled:
func main() {
    timeout := 50 * time.Millisecond
    ctx, cancel := context.WithTimeout(context.Background(), timeout)  // (1)
    defer cancel()

    res, err := execute(ctx, work)
    fmt.Println(res, err)
}

0 context deadline exceeded
The execute() function remains unchanged, but context.WithCancel() in main is now replaced with context.WithTimeout() ➊. This change causes execute() to fail with a timeout error (context.DeadlineExceeded) when work() doesn't finish on time.
Thanks to the context, the execute() function doesn't need to know whether the cancellation was triggered manually or by a timeout. All it needs to do is listen for the cancellation signal on the ctx.Done() channel.
Convenient!

Parent and Child Timeouts
Let's say we have the same execute() function and two functions it can run — the faster work() and the slower slow():
// work does something for 100 ms.
func work() int {
    time.Sleep(100 * time.Millisecond)
    return 42
}

// slow does something for 200 ms.
func slow() int {
    time.Sleep(200 * time.Millisecond)
    return 99
}

We want to run both functions, but we don't want to wait more than 150 ms total. We can create a parent context with a 150 ms timeout and then create child contexts for each function:
func main() {
    parentCtx, parentCancel := context.WithTimeout(context.Background(), 150*time.Millisecond)
    defer parentCancel()

    // Child context for work()
    workCtx, workCancel := context.WithCancel(parentCtx)
    defer workCancel()

    // Child context for slow()
    slowCtx, slowCancel := context.WithCancel(parentCtx)
    defer slowCancel()

    var wg sync.WaitGroup
    wg.Add(2)

    go func() {
        defer wg.Done()
        res, err := execute(workCtx, work)
        fmt.Println("work:", res, err)
    }()

    go func() {
        defer wg.Done()
        res, err := execute(slowCtx, slow)
        fmt.Println("slow:", res, err)
    }()

    wg.Wait()
}

work: 42 
slow: 0 context deadline exceeded
Here, work() completes successfully, but slow() gets canceled because it exceeds the parent timeout of 150 ms.

Deadline
Instead of specifying a timeout duration, you can set an absolute deadline:
func main() {
    deadline := time.Now().Add(150 * time.Millisecond)
    ctx, cancel := context.WithDeadline(context.Background(), deadline)
    defer cancel()

    res, err := execute(ctx, work)
    fmt.Println(res, err)
}

work done
42 
context.WithDeadline() works similarly to context.WithTimeout(), but takes an absolute time instead of a duration.

Cancellation Cause
In Go 1.21+, you can specify a custom cause for cancellation. The context.WithCancelCause() function takes an additional parameter: the root cause of the cancellation.
ctx, cancel := context.WithCancelCause(context.Background())
cancel(errors.New("the night is dark"))

Use context.Cause() to get the error's cause:
fmt.Println(ctx.Err())
// context canceled

fmt.Println(context.Cause(ctx))
// the night is dark

context canceled
the night is dark
In Go 1.21+, you can specify a custom cause for timeout (or deadline) cancellation when creating a context. This cause is accessible through context.Cause() when the context is canceled due to a timeout (or deadline):
cause := errors.New("the night is dark")
ctx, cancel := context.WithTimeoutCause(
    context.Background(), 10*time.Millisecond, cause,
)
defer cancel()

time.Sleep(50 * time.Millisecond)
fmt.Println(ctx.Err())
// context deadline exceeded
fmt.Println(context.Cause(ctx))
// the night is dark

context deadline exceeded
the night is dark

context.AfterFunc
Suppose we're performing a long-running task with the option to cancel:
// work does something for 100 ms.
func work(ctx context.Context) {
    select {
    case <-time.After(100 * time.Millisecond):
    case <-ctx.Done():
    }
}

Let's set the timeout to 50 ms (expecting work() to be canceled):
// the context is canceled after a 50 ms timeout
ctx, cancel := context.WithTimeout(context.Background(), 50*time.Millisecond)
defer cancel()

start := time.Now()
work(ctx)

if ctx.Err() != nil {
    fmt.Println("canceled after", time.Since(start))
}

canceled after 50ms
What should we do if work() has occupied resources that need to be freed upon cancellation?
// cleanup frees up occupied resources.
func cleanup() {
    fmt.Println("cleanup")
}

We can add the cleanup() call directly into work():
func work(ctx context.Context) {
    select {
    case <-time.After(100 * time.Millisecond):
    case <-ctx.Done():
        cleanup()
    }
}

But Go 1.21+ offers a more flexible solution.
Calling a Function on Context Cancellation
You can register a function to execute when the context is canceled:
// the context is canceled after a 50 ms timeout
ctx, cancel := context.WithTimeout(context.Background(), 50*time.Millisecond)
defer cancel()

// cleanup is called after the context is canceled
context.AfterFunc(ctx, cleanup)

start := time.Now()
work(ctx)

if ctx.Err() != nil {
    fmt.Println("canceled after", time.Since(start))
}

cleanup
canceled after 50ms
In this version, work() doesn't need to know about cleanup().
context.AfterFunc() offers the following:

cleanup() runs in a separate goroutine.
You can register multiple functions by calling AfterFunc() multiple times. Each function runs independently in a separate goroutine when the context is canceled.
You can change your mind and "detach" a registered function.

Registering a Function After the Context is Canceled
If the context is already canceled when the function is registered, the function executes immediately:
// the context is canceled after a 50 ms timeout
ctx, cancel := context.WithTimeout(context.Background(), 50*time.Millisecond)
defer cancel()

start := time.Now()
work(ctx)

if ctx.Err() != nil {
    fmt.Println("canceled after", time.Since(start))
}

// cleanup is called immediately since the context is already canceled
context.AfterFunc(ctx, cleanup)

canceled after 50ms
cleanup
Canceling the Registration
Example of "changing one's mind":
// the context is canceled after a 50 ms timeout
ctx, cancel := context.WithTimeout(context.Background(), 50*time.Millisecond)
defer cancel()

// cleanup is called after the context is canceled
stopCleanup := context.AfterFunc(ctx, cleanup)  // (1)

// I changed my mind, let's not call cleanup
stopped := stopCleanup()                        // (2)
work(ctx)

fmt.Println("stopped cleanup:", stopped)

stopped cleanup: true
In ➊, we saved the "detach" function (returned by context.AfterFunc()) in the stopCleanup variable, and in ➋, we called it, detaching cleanup() from ctx. As a result, the context was canceled due to the timeout, but cleanup() did not execute.
If the context is canceled and the function has already started executing, you can't detach it:
// the context is canceled after a 50 ms timeout
ctx, cancel := context.WithTimeout(context.Background(), 50*time.Millisecond)
defer cancel()

// cleanup is called after the context is canceled
stopCleanup := context.AfterFunc(ctx, cleanup)

work(ctx)

// I changed my mind about calling cleanup, but it's already too late
stopped := stopCleanup()
fmt.Println("stopped cleanup:", stopped)

cleanup
stopped cleanup: false
Phew. AfterFunc() is not the most intuitive context-related feature.

Context with Values
The main purpose of context in Go is to cancel operations, either manually or by timeout/deadline. But it can also pass additional information about a call using context.WithValue(), which creates a context with a value for a specific key:
type contextKey string

// "id" and "user" keys
var idKey = contextKey("id")
var userKey = contextKey("user")

// work does something.
func work() int {
    return 42
}

func main() {
    {
        ctx := context.Background()
        // context with request ID
        ctx = context.WithValue(ctx, idKey, 1234)
        // and user
        ctx = context.WithValue(ctx, userKey, "admin")
        res := execute(ctx, work)
        fmt.Println(res)
    }

    {
        // empty context
        ctx := context.Background()
        res := execute(ctx, work)
        fmt.Println(res)
    }
}

We use values of a custom type contextKey as keys instead of strings or numbers. This prevents conflicts if two packages modify the same context and both add a value with the id or user key.
To retrieve a value by key, use the Value() method:
// execute runs fn with respect to ctx.
func execute(ctx context.Context, fn func() int) int {
    reqId := ctx.Value(idKey)
    if reqId != nil {
        fmt.Printf("Request ID = %d\n", reqId)
    } else {
        fmt.Println("Request ID unknown")
    }

    user := ctx.Value(userKey)
    if user != nil {
        fmt.Printf("Request user = %s\n", user)
    } else {
        fmt.Println("Request user unknown")
    }
    return fn()
}

Request ID = 1234
Request user = admin
42
Request ID unknown
Request user unknown
42
Both context.WithValue() and Context.Value() work with values of type any (they were added to the standard library long before generics):
func WithValue(parent Context, key, val any) Context

type Context interface {
    // ...
    Value(key any) any
}

I'm mentioning this feature for completeness, but it's generally better to avoid passing values in context. It's better to use explicit parameters or custom structs instead.

Summary
Use context to safely cancel and timeout operations in a concurrent environment. It's a perfect fit for remote calls, pipelines, or other long-running operations.
Now you know how to:

Manually cancel operations.
Cancel on timeout or deadline.
Use child contexts to restrict timeouts.
Specify the cancellation reason.
Register functions to execute when the context is canceled.

Context provides a powerful and standardized way to manage cancellation and timeouts across your concurrent Go programs.



Go: Using Semaphores to Manage Concurrent Execution
Fedor S — Wed, 12 Jun 2024 12:18:00 GMT
Having the full power of multi-core hardware is great, but sometimes we prefer to limit concurrency in certain parts of a system. Semaphores are a great way to do this. Let's learn more about them!

Mutex: One Goroutine at a Time
Let's say our program needs to call a legacy system, represented by the External type. This system is so ancient that it can handle no more than one call at a time. That's why we protect it with a mutex:
// External is a adapter for an external system.
type External struct {
    lock sync.Mutex
}

// Call calls the external system.
func (e *External) Call() {
    e.lock.Lock()
    defer e.lock.Unlock()
    // Simulate a remote call.
    time.Sleep(10 * time.Millisecond)
}

Now, no matter how many goroutines try to access the external system at the same time, they'll have to take turns:
func main() {
    const nCalls = 12
    ex := new(External)
    start := time.Now()

    var wg sync.WaitGroup
    for range nCalls {
        wg.Go(func() {
            ex.Call()
            fmt.Print(".")
        })
    }
    wg.Wait()

    fmt.Printf(
        "\n%d calls took %d ms\n",
        nCalls, time.Since(start).Milliseconds(),
    )
}

............
12 calls took 120 ms
Suppose the developers of the legacy system made some changes and now they say we can make up to four simultaneous calls. In this case, our approach with the mutex stops working, because it blocks all goroutines except the one that managed to lock the mutex.
It would be great to use a tool that allows several goroutines to run at the same time, but no more than N. Luckily for us, such a tool already exists.

Semaphore: ≤ N Goroutines at a Time
So, we want to make sure that no more than 4 goroutines access the external system at the same time. To do this, we'll use a semaphore. You can think of a semaphore as a container with N available slots and two operations: acquire to take a slot and release to free a slot.
Here are the semaphore rules:

Calling Acquire takes a free slot.
If there are no free slots, Acquire blocks the goroutine that called it.
Calling Release frees up a previously taken slot.
If there are any goroutines blocked on Acquire when Release is called, one of them will immediately take the freed slot and unblock.

Let's see how this works. To keep things simple, let's assume that someone has already implemented a Semaphore type for us, and we can just use it.
We add a semaphore to the external system adapter:
type External struct {
    sema Semaphore
}

func NewExternal(maxConc int) *External {
    // maxConc sets the maximum allowed
    // number of concurrent calls.
    return &External{NewSemaphore(maxConc)}
}

We acquire a spot in the semaphore before calling the external system. After the call, we release it:
func (e *External) Call() {
    e.sema.Acquire()
    defer e.sema.Release()
    // Simulate a remote call.
    time.Sleep(10 * time.Millisecond)
}

Now let's allow 4 concurrent calls and perform a total of 12 calls. The client code doesn't change:
func main() {
    const nCalls = 12
    const nConc = 4

    ex := NewExternal(nConc)
    start := time.Now()

    var wg sync.WaitGroup
    for range nCalls {
        wg.Go(func() {
            ex.Call()
            fmt.Print(".")
        })
    }
    wg.Wait()

    fmt.Printf(
        "\n%d calls took %d ms\n",
        nCalls, time.Since(start).Milliseconds(),
    )
}

............
12 calls took 30 ms
12 calls were completed in three steps (each step = 4 concurrent calls). Each step took 10 ms, so the total time was 30 ms.
You might have noticed a downside to this approach: even though only 4 goroutines (nConc) run concurrently, we actually start all 12 (nCalls) right away. With small numbers, this isn't a big deal, but if nCalls is large, the waiting goroutines will use up memory for no good reason.
We can modify the program so that there are never more than nConc goroutines at any given time. To do this, we add the Acquire and Release methods directly to External and remove them from Call:
type External struct {
    // Semaphore is embedded into External so clients
    // can call Acquire and Release directly on External.
    Semaphore
}

func NewExternal(maxConc int) *External {
    return &External{NewSemaphore(maxConc)}
}

func (e *External) Call() {
    // Simulate a remote call.
    time.Sleep(10 * time.Millisecond)
}

The client calls Acquire before starting each new goroutine in the loop, and calls Release when it's finished:
func main() {
    const nCalls = 12
    const nConc = 4

    ex := NewExternal(nConc)
    start := time.Now()

    var wg sync.WaitGroup
    for range nCalls {
        ex.Acquire()
        wg.Go(func() {
            defer ex.Release()
            ex.Call()
            fmt.Print(".")
        })
    }
    wg.Wait()

    fmt.Printf(
        "\n%d calls took %d ms\n",
        nCalls, time.Since(start).Milliseconds(),
    )
}

............
12 calls took 30 ms
Now there are never more than 4 goroutines at any time (not counting the main goroutine, of course).
In summary, the semaphore helped us solve the problem of limited concurrency:

goroutines are allowed to run concurrently,
but no more than N at the same time.

Unfortunately, the standard library doesn't include a Semaphore type. So in the next step, we'll implement it ourselves!

There is a semaphore available in the golang.org/x/sync/semaphore package. But for simple cases like ours, it's perfectly fine to use your own implementation.


Implementing a Semaphore
Here's a simple implementation of a semaphore using a channel:
// A synchronization semaphore.
type Semaphore struct {
    ch chan struct{}
}

// NewSemaphore creates a new semaphore with the given capacity.
func NewSemaphore(n int) *Semaphore {
    s := &Semaphore{
        ch: make(chan struct{}, n),
    }
    // Fill the channel with tokens.
    for i := 0; i < n; i++ {
        s.ch <- struct{}{}
    }
    return s
}

// Acquire takes a spot in the semaphore if one is available.
// Otherwise, it blocks the calling goroutine.
func (s *Semaphore) Acquire() {
    <-s.ch
}

// Release frees up a spot in the semaphore and unblocks
// one of the blocked goroutines (if there are any).
func (s *Semaphore) Release() {
    s.ch <- struct{}{}
}

The implementation uses a buffered channel with capacity N. Initially, the channel is filled with N tokens (empty structs). When a goroutine calls Acquire(), it tries to receive a token from the channel. If tokens are available, it gets one immediately. If not, it blocks until a token becomes available. When a goroutine calls Release(), it sends a token back to the channel, which unblocks one waiting goroutine.
This implementation is simple, safe, and efficient. It avoids data races and busy-waiting, making it suitable for production use.

Rendezvous
Sometimes you need two goroutines to wait for each other at a specific point before continuing. This pattern is called a rendezvous.
Let's say we have two goroutines that need to synchronize at a certain point. In the first step, each goroutine wants to wait for the other. Here's what their execution looks like without a rendezvous:
var wg sync.WaitGroup

wg.Go(func() {
    fmt.Println("1: started")
    time.Sleep(10 * time.Millisecond)
    fmt.Println("1: reached the sync point")

    // Sync point: how do I wait for the second goroutine?

    fmt.Println("1: going further")
    time.Sleep(20 * time.Millisecond)
    fmt.Println("1: done")
})

time.Sleep(20 * time.Millisecond)

wg.Go(func() {
    fmt.Println("2: started")
    time.Sleep(20 * time.Millisecond)
    fmt.Println("2: reached the sync point")

    // Sync point: how do I wait for the first goroutine?

    fmt.Println("2: going further")
    time.Sleep(10 * time.Millisecond)
    fmt.Println("2: done")
})

wg.Wait()

1: started
1: reached the sync point
1: going further
2: started
1: done
2: reached the sync point
2: going further
2: done
As you can see, the second goroutine is just getting started, while the first one is already finished. No one is waiting for anyone else.
Let's set up a rendezvous for them:
var wg sync.WaitGroup

ready1 := make(chan struct{})
ready2 := make(chan struct{})

wg.Go(func() {
    fmt.Println("1: started")
    time.Sleep(10 * time.Millisecond)
    fmt.Println("1: reached the sync point")

    // Sync point.
    close(ready1)
    <-ready2

    fmt.Println("1: going further")
    time.Sleep(20 * time.Millisecond)
    fmt.Println("1: done")
})

time.Sleep(20 * time.Millisecond)

wg.Go(func() {
    fmt.Println("2: started")
    time.Sleep(20 * time.Millisecond)
    fmt.Println("2: reached the sync point")

    // Sync point.
    close(ready2)
    <-ready1

    fmt.Println("2: going further")
    time.Sleep(10 * time.Millisecond)
    fmt.Println("2: done")
})

wg.Wait()

1: started
1: reached the sync point
2: started
2: reached the sync point
2: going further
1: going further
2: done
1: done
Now everything works fine: the goroutines waited for each other at the sync point before moving on.
Here's how it works:

G1 closes its own channel when it's ready and then blocks on the other channel. G2 does the same thing.
When G1's channel is closed, it unblocks G2, and when G2's channel is closed, it unblocks G1.
As a result, both goroutines are unblocked at the same time.

Here, we close the channel to signal an event. We've done this before:

With a done channel in the Channels chapter (the goroutine signals the caller that the work is finished).
With a cancel channel in the Pipelines chapter (the caller signals the goroutine to stop working).

Caution: Using Print for debugging
Since printing uses a single output device (stdout), goroutines that print concurrently have to synchronize access to it. So, using print statements adds a synchronization point to your program. This can cause unexpected results that are different from what actually happens in production (where there is no printing).
In the book, I use print statements only because it's much harder to understand the material without them.

Synchronization Barrier
Imagine you walk up to a crosswalk with a traffic light and see a button that's supposed to turn the light green. You press the button, but nothing happens. Another person comes up behind you and presses the button too, but still nothing changes. Two more people arrive, both press the button, but the light stays red. Now the four of you are just standing there, not sure what to do. Finally, a fifth person comes, presses the button, and the light turns green. All five of you cross the street together.
This kind of logic in concurrent programs is called a synchronization barrier:

The barrier has a counter (starting at 0) and a threshold N.
Each goroutine that reaches the barrier increases the counter by 1.
The barrier blocks any goroutine that reaches it.
Once the counter reaches N, the barrier unblocks all waiting goroutines.

Let's say there are N goroutines. Each one first does a preparation step, then the main step. Here's what their execution looks like without a barrier:
const nWorkers = 4
start := time.Now()

var wg sync.WaitGroup
for i := range nWorkers {
    wg.Go(func() {
        // Simulate the preparation step.
        dur := time.Duration((i+1)*10) * time.Millisecond
        time.Sleep(dur)
        fmt.Printf("ready to go after %d ms\n", dur.Milliseconds())

        // Simulate the main step.
        fmt.Println("go!")
    })
}

wg.Wait()
fmt.Printf("all done in %d ms\n", time.Since(start).Milliseconds())

ready to go after 10 ms
go!
ready to go after 20 ms
go!
ready to go after 30 ms
go!
ready to go after 40 ms
go!
all done in 40 ms
Each goroutine proceeds to the main step as soon as it's ready, without waiting for the others.
Let's say we want the goroutines to wait for each other before moving on to the main step. To do this, we just need to add a barrier after the preparation step:
const nWorkers = 4
start := time.Now()

var wg sync.WaitGroup
b := NewBarrier(nWorkers)
for i := range nWorkers {
    wg.Go(func() {
        // Simulate the preparation step.
        dur := time.Duration((i+1)*10) * time.Millisecond
        time.Sleep(dur)
        fmt.Printf("ready to go after %d ms\n", dur.Milliseconds())

        // Wait for all goroutines to reach the barrier.
        b.Touch()

        // Simulate the main step.
        fmt.Println("go!")
    })
}

wg.Wait()
fmt.Printf("all done in %d ms\n", time.Since(start).Milliseconds())

ready to go after 10 ms
ready to go after 20 ms
ready to go after 30 ms
ready to go after 40 ms
go!
go!
go!
go!
all done in 40 ms
Now the faster goroutines waited at the barrier for the slower ones, and only after that did they all move on to the main step together.
Here are some examples of when a synchronization barrier can be useful:

Parallel computing. If you're sorting in parallel and then merging the results, the sorting steps must finish before the merging starts. If you merge too soon, you'll get the wrong results.
Multiplayer applications. If a duel in the game involves N players, all resources for those players need to be fully prepared before the duel begins. Otherwise, some players might be at a disadvantage.
Distributed systems. To create a backup, you need to wait until all nodes in the system reach a consistent state (a checkpoint). Otherwise, the backup's integrity could be compromised.

The standard library doesn't have a barrier, so now is a great time to make one yourself!

Summary
You've learned the classic synchronization tools — mutexes, semaphores, rendezvous, and barriers. Be careful when using them. Try to avoid complicated setups, and always write tests for tricky concurrent situations.
Key points to remember:

Mutexes allow only one goroutine at a time to access a resource.
Semaphores allow up to N goroutines to access a resource concurrently.
Rendezvous enables two goroutines to wait for each other at a synchronization point.
Synchronization barriers ensure all goroutines reach a point before any proceed.
Channels can be used to implement semaphores and other synchronization primitives.

These tools provide powerful ways to control concurrency and coordinate goroutines in your Go programs.



Go: Building Concurrent Data Processing Pipelines
Fedor S — Sat, 18 May 2024 11:30:00 GMT
Now that we understand how to use goroutines and channels, let's explore how to combine them into concurrent pipelines for efficient data processing.

Leaked Goroutines
Consider a function that sends numbers within a specified range to a channel:
func rangeGen(start, stop int) <-chan int {
    out := make(chan int)
    go func() {
        for i := start; i < stop; i++ {
            out <- i
        }
        close(out)
    }()
    return out
}

It appears to work correctly:
func main() {
    generated := rangeGen(41, 46)
    for val := range generated {
        fmt.Println(val)
    }
}

41
42
43
44
45
However, let's examine what happens if we exit the loop prematurely:
func main() {
    generated := rangeGen(41, 46)
    for val := range generated {
        fmt.Println(val)
        if val == 42 {
            break
        }
    }
}

41
42
At first glance, it still works. But there's a problem — the rangeGen() goroutine becomes stuck:
func rangeGen(start, stop int) <-chan int {
    out := make(chan int)
    go func() {
        for i := start; i < stop; i++ {    // (1)
            out <- i                       // (2)
        }
        close(out)
    }()
    return out
}

Since main() breaks its loop at number 42 and stops reading from the generated channel, the loop inside rangeGen() ➊ didn't complete. It got permanently blocked trying to send number 43 to the out channel ➋. The goroutine is stuck. The out channel didn't close, so if other goroutines depended on it, they would also get stuck.
In this case, it's not critical: when main() exits, the runtime will terminate all other goroutines. But if main() continued to run and called rangeGen() repeatedly, the leaked goroutines would accumulate. This is problematic: goroutines are lightweight but not completely "free". Eventually, you might run out of memory (the garbage collector doesn't collect goroutines).
We need a mechanism to terminate a goroutine early.

Cancel Channel
First, we'll create a separate cancel channel through which main() will signal rangeGen() to exit:
func main() {
    cancel := make(chan struct{})    // (1)
    defer close(cancel)              // (2)

    generated := rangeGen(cancel, 41, 46)    // (3)
    for val := range generated {
        fmt.Println(val)
        if val == 42 {
            break
        }
    }
}

We create a cancel channel ➊ and immediately set up a deferred close(cancel) ➋. This is a common practice to avoid tracking every place in the code where the channel needs to be closed. defer ensures that the channel is closed when the function exits, so you don't have to worry about it.
Next, we pass the cancel channel to the goroutine ➌. Now, when the channel closes, the goroutine needs to detect this and exit. Ideally, you'd add a check like this:
func rangeGen(cancel <-chan struct{}, start, stop int) <-chan int {
    out := make(chan int)
    go func() {
        defer close(out)
        for i := start; i < stop; i++ {
            out <- i
            if <-cancel == struct{}{} {    // (1)
                return
            }
        }
    }()
    return out
}

fatal error: all goroutines are asleep - deadlock!
If cancel is closed, the check ➊ will pass (a closed channel always returns a zero value, remember?), and the goroutine will exit. However, if cancel isn't closed, the goroutine would block and not continue to the next loop iteration.
We need a different, non-blocking approach:

If cancel is closed, exit the goroutine;
Otherwise, send the next value to out.

Go has a select statement for this:
func rangeGen(cancel <-chan struct{}, start, stop int) <-chan int {
    out := make(chan int)
    go func() {
        defer close(out)
        for i := start; i < stop; i++ {
            select {
            case out <- i:    // (1)
            case <-cancel:    // (2)
                return
            }
        }
    }()
    return out
}

41
42
select is somewhat like switch, but specifically designed for channels. Here's what it does:

Checks which cases are not blocked.
If multiple cases are ready, randomly selects one to execute.
If all cases are blocked, waits until one is ready.

In our case, while cancel is open, its case ➋ is blocked (you can't read from a channel if no one is writing to it). However, the out <- i case ➊ is unblocked because main() is reading from out. So, select will execute out <- i in each loop iteration.
Then main() will reach number 42 and stop reading from out. After that, both select cases will block, and the goroutine will (temporarily) hang.
Finally, main() will execute the deferred close(cancel), which will unblock the second select case ➋, and the goroutine will exit. The out channel will close too, thanks to defer.
If main() decides not to stop at 42 and continues to read all values, the cancel channel approach will still work correctly:
func main() {
    cancel := make(chan struct{})
    defer close(cancel)

    generated := rangeGen(cancel, 41, 46)
    for val := range generated {
        fmt.Println(val)
    }
}

41
42
43
44
45
Here, rangeGen() will finish before main() calls close(cancel). Which is perfectly fine.
So thanks to the cancel channel and the select statement, the rangeGen() goroutine will exit correctly regardless of what happens in main(). No more leaked goroutines!

Cancel vs. Done
The cancel channel is similar to the done channel that we covered in previous chapters.
Done channel:
// Goroutine B receives a channel to signal
// when it has finished its work.
func b(done chan<- struct{}) {
    // do work...
    done <- struct{}{}
}

func a() {
    done := make(chan struct{})
    go b(done)
    // Goroutine A waits for B to finish its work.
    <-done
}

Cancel channel:
// Goroutine B receives a channel
// to get a cancel signal.
func b(cancel <-chan struct{}) {
    // do work...
    select {
    case <-cancel:
        return
    }
}

func a() {
    cancel := make(chan struct{})
    go b(cancel)
    // Goroutine A signals to B
    // that it is time to stop.
    close(cancel)
}

In practice, both cancel and done channels are often named "done", so don't be surprised. To avoid confusion, we'll use "cancel" for cancellation and "done" for completion.

Merging Channels
Sometimes several independent functions send results to their own channels. But it's more convenient to work with a single result channel. So you need to merge the output channels of these functions into a single channel.
Sequential Merging
The simplest approach is to merge channels sequentially:
func merge(ch1, ch2 <-chan int) <-chan int {
    out := make(chan int)
    go func() {
        defer close(out)
        for val := range ch1 {
            out <- val
        }
        for val := range ch2 {
            out <- val
        }
    }()
    return out
}

This works, but it processes channels one after another, which may not be optimal for concurrent processing.
Concurrent Merging
A better approach is to merge channels concurrently:
func merge(ch1, ch2 <-chan int) <-chan int {
    out := make(chan int)
    go func() {
        defer close(out)
        var wg sync.WaitGroup
        wg.Add(2)

        go func() {
            defer wg.Done()
            for val := range ch1 {
                out <- val
            }
        }()

        go func() {
            defer wg.Done()
            for val := range ch2 {
                out <- val
            }
        }()

        wg.Wait()
    }()
    return out
}

This approach processes both channels simultaneously, which is more efficient.
Merging with Select
For multiple channels, you can use select:
func merge(channels ...<-chan int) <-chan int {
    out := make(chan int)
    go func() {
        defer close(out)
        var wg sync.WaitGroup
        wg.Add(len(channels))

        for _, ch := range channels {
            go func(c <-chan int) {
                defer wg.Done()
                for val := range c {
                    out <- val
                }
            }(ch)
        }

        wg.Wait()
    }()
    return out
}


Pipelines
A pipeline is a series of stages connected by channels, where each stage is a group of goroutines running the same function. In each stage, the goroutines:

Receive values from upstream via inbound channels
Perform some function on that data, usually producing new values
Send values downstream via outbound channels

Each stage has any number of inbound and outbound channels, except the first and last stages, which have only outbound or inbound channels, respectively. The first stage is sometimes called the source or producer; the last stage, the sink or consumer.
Here's a simple pipeline example:
The "generate" stage:
// generate sends numbers from 1 to stop inclusive.
func generate(stop int) <-chan int {
    out := make(chan int)
    go func() {
        defer close(out)
        for i := 1; i <= stop; i++ {
            out <- i
        }
    }()
    return out
}

The "calculate" stage:
// Answer represents the result of a calculation.
type Answer struct {
    x, y int
}

// calculate produces answers for the given numbers.
func calculate(in <-chan int) <-chan Answer {
    out := make(chan Answer)
    go func() {
        defer close(out)
        for n := range in {
            out <- fetchAnswer(n)
        }
    }()
    return out
}

The "print" stage (in the main goroutine for simplicity):
func main() {
    inputs := generate(5)
    answers := calculate(inputs)
    for ans := range answers {
        fmt.Printf("%d -> %d\n", ans.x, ans.y)
    }
}

1 -> 1
2 -> 4
3 -> 9
4 -> 16
5 -> 25

Preventing Goroutine Leaks
To prevent goroutine leaks in pipelines, we need to ensure that:

All channels are properly closed when no longer needed
Goroutines can be cancelled when the pipeline is no longer needed
We use defer close() to ensure cleanup happens

Here's an improved version with cancellation support:
func generate(cancel <-chan struct{}, stop int) <-chan int {
    out := make(chan int)
    go func() {
        defer close(out)
        for i := 1; i <= stop; i++ {
            select {
            case out <- i:
            case <-cancel:
                return
            }
        }
    }()
    return out
}

func calculate(cancel <-chan struct{}, in <-chan int) <-chan Answer {
    out := make(chan Answer)
    go func() {
        defer close(out)
        for {
            select {
            case n, ok := <-in:
                if !ok {
                    return
                }
                select {
                case out <- fetchAnswer(n):
                case <-cancel:
                    return
                }
            case <-cancel:
                return
            }
        }
    }()
    return out
}


Error Handling
The fetchAnswer function is responsible for retrieving an answer for a given number from a remote API:
func fetchAnswer(n int) Answer {
    // ...
}

Hoping that a remote service will always work properly is a bit unrealistic. We have to account for errors:
func fetchAnswer(n int) (Answer, error) {
    // ...
}

But what should we do with these errors (if any)? There is no place for them in calculate.
As it turns out, we have three options.
Return on First Error
If we don't tolerate errors, the easiest thing to do is to return from calculate as soon as fetchAnswer encounters an error. Since the out channel only accepts Answers, let's add a separate errc channel with a place for a single error:
// calculate produces answers for the given numbers.
func calculate(in <-chan int) (<-chan Answer, <-chan error) {
    out := make(chan Answer)
    errc := make(chan error, 1)  // (1)
    go func() {
        defer close(out)
        for n := range in {
            ans, err := fetchAnswer(n)
            if err != nil {
                errc <- err      // (2)
                return
            }
            out <- ans
        }
        errc <- nil              // (3)
    }()
    return out, errc
}

The error channel is buffered with a capacity of one ➊. As a result of calculate execution, it will contain either an actual error ➋ or nil ➌, depending on the results of the remote call in fetchAnswer.
Since errc is guaranteed to contain a value (an error or nil), we can read from it in the next pipeline step without using select:
func main() {
    inputs := generate(5)
    answers, errs := calculate(inputs)

    for ans := range answers {
        fmt.Printf("%d -> %d\n", ans.x, ans.y)
    }
    if err := <-errs; err != nil {
        fmt.Println("error:", err)
    }
}

1 -> 1
error: bad number
But what if we don't want to stop the whole pipeline because of a single error? Enter the next option.
Composite Result Type
Let's get rid of the error channel and return an error along with the answer. To do this, we will introduce a separate result type:
// Result contains an answer or an error.
type Result struct {
    answer Answer
    err    error
}

Now calculate can send result values to the output channel:
// calculate produces answers for the given numbers.
func calculate(in <-chan int) <-chan Result {
    out := make(chan Result)
    go func() {
        defer close(out)
        for n := range in {
            ans, err := fetchAnswer(n)
            out <- Result{ans, err}
        }
    }()
    return out
}

And the next pipeline step can process these results however it wants:
func main() {
    inputs := generate(5)
    results := calculate(inputs)
    for res := range results {
        if res.err == nil {
            fmt.Printf("%d -> %d\n", res.answer.x, res.answer.y)
        } else {
            fmt.Printf("%d -> error: %s\n", res.answer.x, res.err)
        }
    }
}

1 -> 1
2 -> error: bad number
3 -> 9
4 -> error: bad number
5 -> 25
We don't have to introduce a separate result type for each possible pipeline step in our program. A single generic Result will suffice:
// Result contains a value or an error.
type Result[T any] struct {
    val T
    err error
}

func (r Result[T]) OK() bool {
    return r.err == nil
}
func (r Result[T]) Val() T {
    return r.val
}
func (r Result[T]) Err() error {
    return r.err
}

// calculate produces answers for the given numbers.
func calculate(in <-chan int) <-chan Result[Answer] {
    // ...
}

func main() {
    inputs := generate(5)
    results := calculate(inputs)
    for res := range results {
        if res.OK() {
            fmt.Printf("✓ %v\n", res.Val())
        } else {
            fmt.Printf("✗ %v\n", res.Err())
        }
    }
}

Collect Errors Separately
Let's say we don't want to bother handling errors from individual pipeline stages. What we want is a single error collector for the entire pipeline. For simplicity, it'll just log all errors:
// collectErrors prints all incoming errors.
func collectErrors(in <-chan error) <-chan struct{} {
    done := make(chan struct{})
    go func() {
        defer close(done)
        for err := range in {
            fmt.Printf("error: %s\n", err)
        }
    }()
    return done
}

Since there will be a single error channel for all pipeline stages, we'll create it in main and pass it to each of the pipeline steps:
func main() {
    errc := make(chan error)
    done := collectErrors(errc)

    inputs := generate(5, errc)
    answers := calculate(inputs, errc)

    for ans := range answers {
        fmt.Printf("%d -> %d\n", ans.x, ans.y)
    }

    close(errc)
    <-done
}

At each stage of the pipeline, we'll send any errors we encounter to the error channel:
// calculate produces answers for the given numbers.
func calculate(in <-chan int, errc chan<- error) <-chan Answer {
    out := make(chan Answer)
    go func() {
        defer close(out)
        for n := range in {
            ans, err := fetchAnswer(n)
            if err == nil {
                out <- ans
            } else {
                errc <- err
            }
        }
    }()
    return out
}

1 -> 1
3 -> 9
error: bad number
error: bad number
5 -> 25
Works like a charm. But there is one caveat: since errors are no longer tied to answers, we do not know which numbers caused the remote API to fail. Of course, we can add the necessary information to the error text, or even create a richer error type, so it's probably not a big deal.
Also, the error collector should be reasonably fast, so that it does not slow down (or even block) the normal pipeline flow in case of occasional errors. We can add a buffer to the error channel and use select, just to be sure:
func main() {
    // A buffered channel to queue up to 100 errors.
    errc := make(chan error, 100)
    // ...
}

// calculate produces answers for the given numbers.
func calculate(in <-chan int, errc chan<- error) <-chan Answer {
    out := make(chan Answer)
    go func() {
        defer close(out)
        for n := range in {
            ans, err := fetchAnswer(n)
            if err == nil {
                out <- ans
            } else {
                select {
                case errc <- err:
                default:
                    // If errc is full, drop the error.
                }
            }
        }
    }()
    return out
}

This approach is quite rare. Using a result type is more common in practice.

Summary
Pipelines are one of the most common uses of concurrency in real-world programs. Now you know how to:

Combine pipelines of independent blocks.
Split and merge data streams.
Cancel pipeline stages.
Prevent goroutine leaks.
Handle pipeline step errors.

Pipelines provide a powerful way to process data concurrently while maintaining clear structure and error handling capabilities.



Go: Preventing Data Races and Ensuring Thread-Safe Access
Fedor S — Thu, 25 Apr 2024 10:45:00 GMT
What happens if multiple goroutines modify the same data structure? Sadly, nothing good. Let's learn more about it.

Concurrent Modification
So far, our goroutines haven't gotten in each other's way. They've used channels to exchange data, which is safe. But what happens if several goroutines try to access the same object at the same time? Let's find out.
Let's write a program that counts word frequencies:
func main() {
    // generate creates 100 words, each 3 letters long,
    // and sends them to the channel.
    in := generate(100, 3)

    var wg sync.WaitGroup
    wg.Add(2)

    // count reads words from the input channel
    // and counts how often each one appears.
    count := func(counter map[string]int) {
        defer wg.Done()
        for word := range in {
            counter[word]++
        }
    }

    counter := map[string]int{}
    go count(counter)
    go count(counter)
    wg.Wait()

    fmt.Println(counter)
}

fatal error: concurrent map writes

goroutine 1 [sync.WaitGroup.Wait]:
sync.runtime_SemacquireWaitGroup(0x140000021c0?)

goroutine 34 [chan send]:
main.generate.func1()

goroutine 35 [running]:
internal/runtime/maps.fatal({0x104b4039e?, 0x14000038a08?})

goroutine 36 [runnable]:
internal/runtime/maps.newTable(0x104b78340, 0x80, 0x0, 0x0)
What is generate:
// generate creates nWords words, each wordLen letters long,
// and sends them to the channel.
func generate(nWords, wordLen int) <-chan string {
    out := make(chan string)
    go func() {
        defer close(out)
        for ; nWords > 0; nWords-- {
            out <- randomWord(wordLen)
        }
    }()
    return out
}

// randomWord returns a random word with n letters.
func randomWord(n int) string {
    const vowels = "eaiou"
    const consonants = "rtnslcdpm"
    chars := make([]byte, n)
    for i := 0; i < n; i += 2 {
        chars[i] = consonants[rand.IntN(len(consonants))]
    }
    for i := 1; i < n; i += 2 {
        chars[i] = vowels[rand.IntN(len(vowels))]
    }
    return string(chars)
}

generate() generates words and sends them to the in channel. main() creates an empty map called counter and passes it to two count() goroutines. count() reads from the in channel and fills the map with word counts. In the end, counter should contain the frequency of each word.
Let's run it:
map[cec:1 ... nol:2 not:3 ... tut:1]
And once again, just in case:
fatal error: concurrent map writes

goroutine 1 [sync.WaitGroup.Wait]:
sync.runtime_SemacquireWaitGroup(0x140000021c0?)

goroutine 34 [chan send]:
main.generate.func1()

goroutine 35 [running]:
internal/runtime/maps.fatal({0x104b4039e?, 0x14000038a08?})

goroutine 36 [runnable]:
internal/runtime/maps.newTable(0x104b78340, 0x80, 0x0, 0x0)
Panic!
Go doesn't let multiple goroutines write to a map at the same time. At first, this might seem odd. Here's the only operation that the count() goroutine does with the map:
counter[word]++

Looks like an atomic action. Why not perform it from multiple goroutines?
The problem is that the action only seems atomic. The operation "increase the key value in the map" actually involves several smaller steps. If one goroutine does some of these steps and another goroutine does the rest, the map can get messed up. That's what the runtime is warning us about.

Data Race
When multiple goroutines access the same variable at the same time, and at least one of them changes it, it's called a data race. Concurrent map modification in the previous section is an example of a data race.
A data race doesn't always cause a runtime panic (the map example in the previous section is a nice exception: Go's map implementation has built-in runtime checks that can catch some data races). That's why Go provides a special tool called the race detector. You can turn it on with the race flag, which works with the test, run, build, and install commands.

To use Go's race detector, you'll need to install gcc, the C compiler.

For example, take this program:
func main() {
    var total int
    var wg sync.WaitGroup

    wg.Go(func() {
        total++
    })
    wg.Go(func() {
        total++
    })

    wg.Wait()
    fmt.Println(total)
}

2
At first glance, it seems to work correctly. But actually, it has a data race:
go run -race race.go

==================
WARNING: DATA RACE
Read at 0x00c000112038 by goroutine 6:
  main.main.func1()
      race.go:16 +0x74

Previous write at 0x00c000112038 by goroutine 7:
  main.main.func2()
      race.go:21 +0x84

Goroutine 6 (running) created at:
  main.main()
      race.go:14 +0x104

Goroutine 7 (finished) created at:
  main.main()
      race.go:19 +0x1a4
==================
2
Found 1 data race(s)

If you're wondering why a data race is a problem for a simple operation like total++ — we'll cover it later in the chapter on atomic operations.

Channels, on the other hand, are safe for concurrent reading and writing, and they don't cause data races:
func main() {
    ch := make(chan int, 2)
    var wg sync.WaitGroup

    wg.Go(func() {
        ch <- 1
    })
    wg.Go(func() {
        ch <- 1
    })

    wg.Wait()
    fmt.Println(<-ch + <-ch)
}

2
Data races are dangerous because they're hard to spot. Your program might work fine a hundred times, but on the hundred and first try, it could give the wrong result. Always check your code with a race detector.

Sequential Modification
You can often rewrite a program to avoid concurrent modifications. Here is a possible approach for our word frequency program:

Each count() goroutine counts frequencies in its own map.
A separate merge() function goes through the frequency maps and builds the final map.

func main() {
    // generate creates 100 words, each 3 letters long,
    // and sends them to the channel.
    in := generate(100, 3)

    var wg sync.WaitGroup
    wg.Add(2)

    // count reads words from the input channel
    // and counts how often each one appears.
    count := func(counters []map[string]int, idx int) {
        defer wg.Done()
        counter := map[string]int{}
        for word := range in {
            counter[word]++
        }
        counters[idx] = counter
    }

    counters := make([]map[string]int, 2)
    go count(counters, 0)
    go count(counters, 1)
    wg.Wait()

    // merge combines frequency maps.
    counter := merge(counters...)
    fmt.Println(counter)
}

// merge combines frequency maps into a single map.
func merge(counters ...map[string]int) map[string]int {
    merged := map[string]int{}
    for _, counter := range counters {
        for word, freq := range counter {
            merged[word] += freq
        }
    }
    return merged
}

map[cec:1 ... nol:2 not:3 ... tut:1]
Now each count() goroutine works with its own map, so there are no concurrent modifications. After all goroutines finish, merge() combines the results into a single map. This approach avoids data races, but it requires more memory and an extra merge step.

Mutex
Sometimes you can't avoid concurrent modifications. In such cases, you need to synchronize access to shared data. A mutex (short for "mutual exclusion") is a synchronization primitive that ensures only one goroutine can access a piece of code at a time.
Let's fix our word frequency program using a mutex:
func main() {
    // generate creates 100 words, each 3 letters long,
    // and sends them to the channel.
    in := generate(100, 3)

    var wg sync.WaitGroup
    wg.Add(2)

    // count reads words from the input channel
    // and counts how often each one appears.
    count := func(lock *sync.Mutex, counter map[string]int) {
        defer wg.Done()
        for word := range in {
            lock.Lock()       // (2)
            counter[word]++
            lock.Unlock()     // (3)
        }
    }

    var lock sync.Mutex       // (1)
    counter := map[string]int{}
    go count(&lock, counter)
    go count(&lock, counter)
    wg.Wait()

    fmt.Println(counter)
}

map[cec:1 ... nol:2 not:3 ... tut:1]
We created the lock mutex ➊ and used it to protect access to the shared counter map ➋ ➌. This way, the count() goroutines don't cause data races, and the final counter[word] value is correct.
Here's how a mutex works:

Lock() acquires the mutex. If another goroutine already has it, Lock() blocks until the mutex becomes available.
Unlock() releases the mutex, allowing other goroutines to acquire it.

The code between Lock() and Unlock() is called a critical section. Only one goroutine can execute the critical section at a time.
Important notes about mutexes:

Always unlock a mutex after locking it. Use defer to ensure unlocking happens even if the code panics.
Don't lock a mutex twice in the same goroutine without unlocking it first — this will cause a deadlock. Go's mutexes are not reentrant. This makes things harder for people who like to use mutexes in recursive functions (which isn't a great idea anyway).
Like a wait group, a mutex has internal state, so you should only pass it as a pointer.


Read-Write Mutex
A regular mutex doesn't distinguish between read and write access: if one goroutine locks the mutex, others can't access the protected code. This isn't always necessary.
Here's the situation:

One writer goroutine writes data.
Four reader goroutines read that same data.

var wg sync.WaitGroup
wg.Add(5)

var lock sync.Mutex

// writer fills in the word frequency map.
writer := func(counter map[string]int, nWrites int) {
    defer wg.Done()
    for ; nWrites > 0; nWrites-- {
        word := randomWord(3)
        lock.Lock()
        counter[word]++
        time.Sleep(time.Millisecond)
        lock.Unlock()
    }
}

// reader looks up random words in the frequency map.
reader := func(counter map[string]int, nReads int) {
    defer wg.Done()
    for ; nReads > 0; nReads-- {
        word := randomWord(3)
        lock.Lock()
        _ = counter[word]
        time.Sleep(time.Millisecond)
        lock.Unlock()
    }
}

start := time.Now()

counter := map[string]int{}
go writer(counter, 100)
go reader(counter, 100)
go reader(counter, 100)
go reader(counter, 100)
go reader(counter, 100)
wg.Wait()

fmt.Println("Took", time.Since(start))

Took 500ms
Even though we started 4 reader goroutines, they run sequentially because of the mutex. This isn't really necessary. It makes sense for readers to wait while the writer is updating the map. But why can't the readers run in parallel? They're not changing any data.
The sync package includes a sync.RWMutex that separates readers and writers. It provides two sets of methods:

Lock / Unlock lock and unlock the mutex for both reading and writing.
RLock / RUnlock lock and unlock the mutex for reading only.

Here's how it works:

If a goroutine locks the mutex with Lock(), other goroutines will be blocked if they try to use Lock() or RLock().
If a goroutine locks the mutex with RLock(), other goroutines can also lock it with RLock() without being blocked.
If at least one goroutine has locked the mutex with RLock(), other goroutines will be blocked if they try to use Lock().

This creates a "single writer, multiple readers" setup. Let's verify it:
var wg sync.WaitGroup
wg.Add(5)

var lock sync.RWMutex          // (1)

// writer fills in the word frequency map.
writer := func(counter map[string]int, nWrites int) {
    // Not changed.
    defer wg.Done()
    for ; nWrites > 0; nWrites-- {
        word := randomWord(3)
        lock.Lock()
        counter[word]++
        time.Sleep(time.Millisecond)
        lock.Unlock()
    }
}

// reader looks up random words in the frequency map.
reader := func(counter map[string]int, nReads int) {
    defer wg.Done()
    for ; nReads > 0; nReads-- {
        word := randomWord(3)
        lock.RLock()           // (2)
        _ = counter[word]
        time.Sleep(time.Millisecond)
        lock.RUnlock()         // (3)
    }
}

start := time.Now()

counter := map[string]int{}
go writer(counter, 100)
go reader(counter, 100)
go reader(counter, 100)
go reader(counter, 100)
go reader(counter, 100)
wg.Wait()

fmt.Println("Took", time.Since(start))

Took 200ms
The mutex type ➊ has changed, so have the locking ➋ and unlocking ➌ methods in the reader. Now, readers run concurrently, but they always wait while the writer updates the map. That's exactly what we need!

Channel as Mutex
Let's go back to the program that counts word frequencies:
func main() {
    // generate creates 100 words, each 3 letters long,
    // and sends them to the channel.
    in := generate(100, 3)

    var wg sync.WaitGroup
    wg.Add(2)

    // count reads words from the input channel
    // and counts how often each one appears.
    count := func(lock *sync.Mutex, counter map[string]int) {
        defer wg.Done()
        for word := range in {
            lock.Lock()       // (2)
            counter[word]++
            lock.Unlock()     // (3)
        }
    }

    var lock sync.Mutex       // (1)
    counter := map[string]int{}
    go count(&lock, counter)
    go count(&lock, counter)
    wg.Wait()

    fmt.Println(counter)
}

We created the lock mutex ➊ and used it to protect access to the shared counter map ➋ ➌. This way, the count() goroutines don't cause data races, and the final counter[word] value is correct.
We can also use a channel instead of a mutex to protect shared data:
type token struct{}

func main() {
    // generate creates 100 words, each 3 letters long,
    // and sends them to the channel.
    in := generate(100, 3)

    var wg sync.WaitGroup
    wg.Add(2)

    // count reads words from the input channel
    // and counts how often each one appears.
    count := func(lock chan token, counter map[string]int) {
        defer wg.Done()
        for word := range in {
            lock <- token{}     // (2)
            counter[word]++
            <-lock              // (3)
        }
    }

    lock := make(chan token, 1) // (1)

    counter := map[string]int{}
    go count(lock, counter)
    go count(lock, counter)
    wg.Wait()

    fmt.Println(counter)
}

map[cec:1 ... nol:2 not:3 ... tut:1]
We created a lock channel with a one-element buffer ➊ and used it to protect access to the shared counter map ➋ ➌.
Two count() goroutines run concurrently. However, in each loop iteration, only one of them can put a token into the lock channel (like locking a mutex), update the counter, and take the token back out (like unlocking a mutex). So, even though the goroutines run in parallel, changes to the map happen sequentially.
As a result, the count() goroutines don't cause data races, and the final counter[word] value is correct — just like when we used a mutex.
Go's channels are a versatile concurrency tool. Often, you can use a channel instead of lower-level synchronization primitives. Sometimes, using a channel is unnecessary, as in the example above. Other times, however, it makes your code simpler and helps prevent mistakes. You'll see this idea come up again throughout the book.

Summary
Now you know how to safely change shared data from multiple goroutines using mutexes. Be careful not to overuse them — it's easy to make mistakes and cause data races or deadlocks.
Key points to remember:

Data races occur when multiple goroutines access the same variable concurrently, and at least one modifies it.
Use the race detector (go run -race) to detect data races in your code.
Avoid concurrent modifications when possible by using separate data structures per goroutine.
Use mutexes to protect shared data when concurrent modifications are unavoidable.
Consider RWMutex when you have multiple readers and fewer writers.
Channels can serve as mutexes in some scenarios, though mutexes are usually more appropriate.

Safe concurrent programming requires careful attention to data access patterns and proper synchronization.



Distributed Services vs. Unified Applications: Striking the Perfect Balance
Fedor S — Fri, 22 Mar 2024 13:45:00 GMT
Distributed Services vs. Unified Applications: Striking the Perfect Balance
Hot take: You don't have a microservice architecture, you have a distributed monolith with trust issues.
In the rush to "go micro," many teams end up slicing their systems into tens of tiny, chatty services that spend more time talking to each other than doing any real work. Every API call adds latency. Every dependency adds failure points. Every "independent" deployment ends up blocked by another team's version bump.
Sound familiar? 🙂
The pain you're feeling isn't the cost of scale, it's the cost of premature, arbitrary decomposition.

The Microservices Trap
How we got here:
It starts innocently enough. You read about Netflix's architecture. You attended some random conference, you read some articles online. Someone mentions "Conway's Law" in a late Friday meeting. Suddenly, the mandate comes down: "We're going microservices."
Within six months, you have:

A user service
An auth service
A notification service
An email service (because notifications and emails are totally different domains)
A logging service
A metrics service
A service that just... creates uuids?

Each one has its own:

Repository
CI/CD pipeline
Database
Deployment schedule
API versioning scheme
Team ownership

The reality check:
To fetch a user's profile, you now make 7 API calls across 4 services. Your p99 latency is 800ms. Your error budget is constantly exceeded because something is always down. Your observability costs more than your compute.
You've achieved distributed monolith status.

The Hidden Costs Nobody Talks About
1. Network is not free
Monolith: function call = 0.001ms
Microservice: HTTP call = 5-50ms (plus serialization, auth, retries...)
When your checkout flow hits 12 services, that's 60-600ms of network overhead before you've done any real work.
And that's assuming everything works. Add retries, circuit breakers, and cascading failures, and you're looking at seconds, not milliseconds.
2. Distributed debugging is a nightmare
Bug report: "User can't complete checkout."
In a monolith:

Check the logs
Set a breakpoint
Find the issue
Fix it
Deploy

In microservices:

Which service failed?
Check distributed traces (if they exist)
Correlate logs across 6 services
Find the issue is a timeout in service D caused by a memory leak in service B triggered by bad data from service A
Coordinate deployments across 3 teams
Hope you didn't introduce new bugs

3. "Independent" deployments aren't independent
Your user-service runs on SQLAlchemy 1.4. The payments team just upgraded their shared models package to SQLAlchemy 2.0 for "better async support." Now your queries throw deprecation warnings everywhere and half your tests fail.

When Microservices Actually Make Sense
Don't get me wrong, microservices can be the right choice. But they're an optimization for specific problems, not a default architecture pattern.
When done right, microservices unlock real organizational power.
They let large teams ship features independently, scale bottlenecks in isolation, and mix technologies to fit different workloads. You can deploy a single service without freezing the entire platform. You can experiment faster, fail safely, and iterate without merge conflicts across 50 engineers.
For truly global-scale systems, think payments, logistics, or media streaming, microservices let you scale the right parts independently. Instead of scaling the whole app just because one endpoint gets hammered, you scale that service and keep costs predictable.
They also make it easier to enforce clear domain ownership. Each team owns their service, their schema, and their roadmap, which reduces cross-team dependency chaos when you're big enough to need it.
Good reasons to split services:
1. Genuine scale differences  
Example: Your image processing pipeline handles 10K requests/sec
         Your admin panel handles 10 requests/sec
These shouldn't share resources. Split them.
2. Team autonomy at real scale
If you have 50+ engineers stepping on each other's toes in the same codebase, and you've already tried modularization, then consider splitting.
3. Technology constraints
You need Python's ML libraries for recommendations but Go's performance for your API gateway. Fair enough.
4. Actual domain boundaries
Payments and product catalogs are genuinely different domains with different business rules, compliance requirements, and failure modes. They can evolve independently.

The Monolith Advantage (That Nobody Admits)
A well-structured monolith gives you:
Simplicity:

One codebase to understand
One deployment pipeline
One database transaction (ACID guarantees for free!)
One place to search for code
One set of dependencies to manage

Performance:

In memory function calls, not HTTP
No serialization overhead
No network failures
Shared caches actually work

Developer experience:

Run the entire app locally
Debugger actually works
Tests run fast
Refactoring is safe

"But monoliths don't scale!"
Wrong. Shopify runs on a Rails monolith and handles Black Friday traffic. GitHub's monolith serves millions of developers. Stack Overflow famously runs on a handful of servers.
You scale a monolith by:

Vertical scaling (modern instances are HUGE)
Horizontal scaling (stateless apps scale fine)
Strategic caching
Database optimization


The Middle Path (What You Should Actually Do)
Here's the nuance nobody talks about: You don't choose between monolith and microservices. You choose when to split.
Start with a modular monolith: This isn't just about folders, it's about Bounded Contexts.  
app/
├── modules/
│   ├── users/
│   │   ├── domain/
│   │   ├── api/
│   │   └── repository/
│   ├── payments/
│   │   └── ...
│   └── inventory/
│       └── ...
Good modules have:

Clear interfaces (defined contracts between modules)
Weak coupling (changes in one don't ripple to others)
Strong cohesion (related logic lives together)

When to extract a service:
You have data that justifies the split:

This module is causing 80% of deploys
This team is blocked waiting for other teams
This component needs different scaling characteristics
This domain has genuinely independent lifecycle

The extraction looks like:  
Monolith → Modular Monolith → 3 well-defined services → Scale what needs it
Not:
Monolith → 47 microservices → ??? → Black Magic

Red Flags You've Gone Too Micro 🚩
You might have a problem if:

Your services call each other in chains
If request flow looks like: A → B → C → D → B → E, you've just built a distributed ball of mud.
You can't add a feature without touching 5+ services
That's not independence, that's tight coupling with extra steps.
Your team spends more time on infrastructure than features
Kubernetes, service mesh, distributed tracing... these are costs, not features.
Simple changes require "cross-team coordination meetings"
You've replaced code dependencies with human dependencies. That's slower.
Your error messages look like:
"Service timeout in payment-gateway calling order-validator calling inventory-checker calling warehouse api"


Wrapping Up
The truth is: microservices aren't a magic scalability pill, they're an organizational tool.
If your team isn't struggling with coordination or monolith scaling yet, breaking things apart just creates complexity without benefit.
The real skill isn't in cutting your system into tiny pieces, it's knowing where to draw the lines. Strong service boundaries come from domain understanding, not arbitrary code size.
So before you spin up service number 47, ask yourself:
"Is this solving a scaling problem, or just creating a communication problem?"
Sometimes the best architecture decision is the one you don't make.
--
What's your take? Are you running a microservices architecture or a distributed monolith? Let me know in the comments, I'd love to hear your war stories.



Go: Goroutines and Concurrent Programming
Fedor S — Thu, 07 Mar 2024 09:15:00 GMT
Эта статья представляет собой введение в конкурентное программирование на Go через практические примеры. Давайте сразу перейдем к написанию конкурентной программы!

Goroutines
Предположим, у нас есть функция, которая произносит фразу слово за словом с паузами:
// say выводит каждое слово фразы.
func say(phrase string) {
    for _, word := range strings.Fields(phrase) {
        fmt.Printf("Simon says: %s...\n", word)
        dur := time.Duration(rand.Intn(100)) * time.Millisecond
        time.Sleep(dur)
    }
}

Вызовем её из функции main:
func main() {
    say("go is awesome")
}

Теперь создадим двух говорящих, каждый со своей фразой:
// say выводит каждое слово фразы.
func say(id int, phrase string) {
    for _, word := range strings.Fields(phrase) {
        fmt.Printf("Worker #%d says: %s...\n", id, word)
        dur := time.Duration(rand.Intn(100)) * time.Millisecond
        time.Sleep(dur)
    }
}

Запустим программу:
func main() {
    say(1, "go is awesome")
    say(2, "cats are cute")
}

Функции выполняются последовательно. Чтобы они говорили одновременно, добавим go перед вызовом функции say():
func main() {
    go say(1, "go is awesome")
    go say(2, "cats are cute")
    time.Sleep(500 * time.Millisecond)
}

Теперь они действительно соревнуются за наше внимание! Когда мы пишем go f(), функция f() выполняется независимо от остальных.
Если вы знакомы с конкурентностью в Python, JavaScript или других языках с async/await, не пытайтесь применить этот опыт к Go. Go использует совершенно другой подход к конкурентности. Попробуйте взглянуть на это свежим взглядом.
Функции, запускаемые с go, называются goroutines. Среда выполнения Go управляет этими goroutines и распределяет их между потоками операционной системы, работающими на ядрах CPU. По сравнению с потоками ОС, goroutines легковесны, поэтому вы можете создавать сотни или тысячи их.
Вы можете задаться вопросом, зачем нам нужен time.Sleep() в функции main. Давайте это проясним.

Зависимые и независимые goroutines
Goroutines полностью независимы. Когда мы вызываем go say(), функция выполняется сама по себе. main не ждёт её завершения. Поэтому если мы напишем main так:
func main() {
    go say(1, "go is awesome")
    go say(2, "cats are cute")
}

— программа ничего не выведет. main завершается раньше, чем наши goroutines успевают что-то сказать, и поскольку main завершена, вся программа завершается.
Функция main также является goroutine, но она запускается неявно при старте программы. Таким образом, у нас есть три goroutines: main, say(1) и say(2), все они независимы. Единственная особенность в том, что когда main завершается, всё остальное тоже завершается.
WaitGroup
Использование time.Sleep() для ожидания goroutines — плохая идея, потому что мы не можем предсказать, сколько времени они займут. Лучший подход — использовать wait group:
func main() {
    var wg sync.WaitGroup // (1)

    wg.Add(1)             // (2)
    go say(&wg, 1, "go is awesome")

    wg.Add(1)             // (2)
    go say(&wg, 2, "cats are cute")

    wg.Wait()             // (3)
}

// say выводит каждое слово фразы.
func say(wg *sync.WaitGroup, id int, phrase string) {
    for _, word := range strings.Fields(phrase) {
        fmt.Printf("Worker #%d says: %s...\n", id, word)
        dur := time.Duration(rand.Intn(100)) * time.Millisecond
        time.Sleep(dur)
    }
    wg.Done()             // (4)
}

wg ➊ имеет внутри счётчик. Вызов wg.Add(1) ➋ увеличивает его на единицу, а wg.Done() ➍ уменьшает. wg.Wait() ➌ блокирует goroutine (в данном случае main) до тех пор, пока счётчик не достигнет нуля. Таким образом, main ждёт завершения say(1) и say(2) перед выходом.
Однако этот подход смешивает бизнес-логику (say) с логикой конкурентности (wg). В результате мы не можем легко запустить say в обычном, неконкурентном коде.
В Go принято разделять логику конкурентности и бизнес-логику. Обычно это делается с помощью отдельных функций. В простых случаях, как у нас, подойдут даже анонимные функции:
func main() {
    var wg sync.WaitGroup
    wg.Add(2)

    go func() {
        defer wg.Done()
        say(1, "go is awesome")
    }()

    go func() {
        defer wg.Done()
        say(2, "cats are cute")
    }()

    wg.Wait()
}

Вот как это работает:

Мы знаем, что будет две goroutines, поэтому сразу вызываем wg.Add(2). Альтернативно, мы можем вызывать wg.Add(1) перед запуском каждой goroutine — результат будет тот же.

Анонимные функции запускаются с go так же, как и обычные.

defer wg.Done() гарантирует, что goroutine уменьшит счётчик перед выходом, даже если say вызовет панику.

Сама say ничего не знает о конкурентности и просто работает.


WaitGroup.Go
Метод WaitGroup.Go (Go 1.25+) автоматически увеличивает счётчик wait group, запускает функцию в goroutine и уменьшает счётчик, когда она завершается. Это означает, что мы можем переписать пример выше без использования wg.Add() и wg.Done():
func main() {
    var wg sync.WaitGroup

    wg.Go(func() {
        fmt.Println("go is awesome")
    })

    wg.Go(func() {
        fmt.Println("cats are cute")
    })

    wg.Wait()
    fmt.Println("done")
}

Реализация использует Add и Done так же, как мы делали раньше:
// https://github.com/golang/go/blob/master/src/sync/waitgroup.go
func (wg *WaitGroup) Go(f func()) {
    wg.Add(1)
    go func() {
        defer wg.Done()
        f()
    }()
}


Channels
Запуск множества goroutines — это здорово, но как они обмениваются данными? В Go goroutines могут передавать значения друг другу через channels. Канал — это как окно, через которое одна goroutine может что-то бросить, а другая — поймать.
┌─────────────┐    ┌─────────────┐
│ goroutine A │    │ goroutine B │
│             └────┘             │
│        X <-  chan  <- X        │
│             ┌────┐             │
│             │    │             │
└─────────────┘    └─────────────┘

Goroutine B отправляет значение X в goroutine A.
Вот как это работает:
func main() {
    // Чтобы создать канал, используйте `make(chan type)`.
    // Канал может принимать только значения указанного типа:
    messages := make(chan string)

    // Чтобы отправить значение в канал,
    // используйте синтаксис `channel <-`.
    // Отправим "ping":
    go func() { messages <- "ping" }()

    // Чтобы получить значение из канала,
    // используйте синтаксис `<-channel`.
    // Получим "ping" и выведем его:
    msg := <-messages
    fmt.Println(msg)
}

Когда программа выполняется, первая goroutine (анонимная) отправляет сообщение второй (main) через канал messages.
Отправка значения через канал — это синхронная операция. Когда отправляющая goroutine записывает значение в канал (messages <- "ping"), она блокируется и ждёт, пока кто-то получит это значение (<-messages). Только после этого она продолжает работу:
func main() {
    messages := make(chan string)

    go func() {
        fmt.Println("B: Sending message...")
        messages <- "ping"                    // (1)
        fmt.Println("B: Message sent!")       // (2)
    }()

    fmt.Println("A: Doing some work...")
    time.Sleep(500 * time.Millisecond)
    fmt.Println("A: Ready to receive a message...")

    <-messages                               //  (3)

    fmt.Println("A: Message received!")
    time.Sleep(100 * time.Millisecond)
}

После отправки сообщения в канал ➊ goroutine B блокируется. Только когда goroutine A получает сообщение ➌, goroutine B продолжает и выводит "message sent" ➋.
Таким образом, каналы не только передают данные, но и помогают синхронизировать независимые goroutines. Это пригодится позже.

Паттерн Producer-Consumer
В программировании часто встречается паттерн "производитель-потребитель":

Производитель поставляет данные.

Потребитель получает и обрабатывает их.


В следующих примерах мы исследуем, как производители и потребители могут взаимодействовать через каналы.
Работаем с функцией, которая считает цифры в словах:
// counter хранит количество цифр в каждом слове.
// Ключ — слово, значение — количество цифр.
type counter map[string]int

// countDigitsInWords считает количество цифр
// в словах фразы.
func countDigitsInWords(phrase string) counter {
    words := strings.Fields(phrase)
    // ...
    return stats
}

Результатный канал
Сделаем следующее:

Запустим goroutine.

В этой goroutine пройдёмся по словам, посчитаем цифры в каждом и запишем в канал counted (производитель).

Во внешней функции читаем значения из канала и заполняем счётчик stats (потребитель).


func countDigitsInWords(phrase string) counter {
    words := strings.Fields(phrase)
    counted := make(chan int)

    go func() {
        // Проходим по словам,
        // считаем количество цифр в каждом,
        // и записываем в канал counted.
    }()

    // Читаем значения из канала counted
    // и заполняем stats.

    return stats
}


Генератор
До сих пор мы предполагали, что функция countDigitsInWords заранее знает все слова.
Но в реальной жизни мы не можем рассчитывать на такую роскошь. Данные могут приходить из базы данных или по сети, и функция не знает, сколько слов будет.
Давайте смоделируем эту ситуацию, передав функцию-генератор next вместо фразы. Каждый вызов next() даёт нам следующее слово из источника. Когда слов больше нет, она возвращает пустую строку.
Последовательная программа выглядела бы так:
// counter хранит количество цифр в каждом слове.
// Ключ — слово, значение — количество цифр.
type counter map[string]int

// countDigitsInWords считает количество цифр в словах,
// получая следующее слово через next().
func countDigitsInWords(next func() string) counter {
    stats := counter{}

    for {
        word := next()
        if word == "" {
            break
        }
        count := countDigits(word)
        stats[word] = count
    }

    return stats
}

func main() {
    phrase := "0ne 1wo thr33 4068"
    next := wordGenerator(phrase)
    stats := countDigitsInWords(next)
    printStats(stats)
}

Теперь добавим конкурентность.
Генератор с goroutines
Если вы попытаетесь решить упражнение как предыдущее, столкнётесь с парой проблем:
func countDigitsInWords(next func() string) counter {
    counted := make(chan int)

    // считаем цифры в словах
    go func() {
        for {
            // должна вернуться, когда
            // слов больше нет
            word := next()
            count := countDigits(word)
            counted <- count
        }
    }()

    // заполняем stats словами
    stats := counter{}
    for {
        count := <-counted
        // как выйти из цикла,
        // когда слов больше нет?
        if ... {
            break
        }
        // откуда должно браться слово?
        stats[...] = count
    }

    return stats
}

Подумайте, что отправлять в канал counted, чтобы решить обе проблемы. Обратите внимание на тип пары.

Reader и Worker
Для более сложных задач полезно иметь goroutine для чтения данных (reader) и другую для обработки данных (worker). Используем этот подход в нашей функции:
┌───────────────┐               ┌───────────────┐
│ отправляет    │               │ считает цифры │               ┌────────────────┐
│ слова для     │ → (pending) → │ в словах     │ → (counted) → │ заполняет stats│
│ подсчёта      │               │              │               └────────────────┘
└───────────────┘               └───────────────┘
  reader           канал          worker            канал        внешняя функция

Сделаем следующее:

Запустим goroutine, которая получает слова из генератора и отправляет их в канал pending (reader).

Запустим вторую goroutine, которая читает из pending, считает цифры и записывает в канал counted (worker).

Во внешней функции читаем из counted и обновляем итоговый счётчик stats.


func countDigitsInWords(next func() string) counter {
    pending := make(chan string)
    counted := make(chan pair)

    // отправляет слова для подсчёта
    go func() {
        // Получаем слова из генератора
        // и отправляем их в канал pending.
    }()

    // считает цифры в словах
    go func() {
        // Читаем слова из канала pending,
        // считаем количество цифр в каждом слове,
        // и отправляем результаты в канал counted.
    }()

    // Читаем значения из канала counted
    // и заполняем stats.

    return stats
}


Именованные goroutines
После разделения логики между reader и worker функция стала довольно объёмной:
func countDigitsInWords(next func() string) counter {
    // ...

    // отправляет слова для подсчёта
    go func() {
        // ...
    }()

    // считает цифры в словах
    go func() {
        // ...
    }()

    // заполняет stats
    // ...

    return stats
}

Чётко видны три логических блока:

Отправка слов для подсчёта.

Подсчёт цифр в словах.

Заполнение итоговых результатов.


Было бы удобно выделить эти блоки в отдельные функции, которые обмениваются данными через каналы:
func countDigitsInWords(next func() string) counter {
    pending := make(chan string)
    go submitWords(next, pending)

    counted := make(chan pair)
    go countWords(pending, counted)

    return fillStats(counted)
}


Выходной канал
Вот функция, к которой мы пришли:
// countDigitsInWords считает количество цифр в словах,
// получая следующее слово через next().
func countDigitsInWords(next func() string) counter {
    pending := make(chan string)
    go submitWords(next, pending)

    counted := make(chan pair)
    go countWords(pending, counted)

    return fillStats(counted)
}

Она выглядит хорошо, но есть ещё одна вещь, которую мы могли бы изменить.
Канал pending создаётся в родительской функции только для передачи в дочернюю функцию submitWords. Лучше было бы создать канал в submitWords и вернуть его родителю, чтобы submitWords полностью владела им. То же самое относится к каналу counted и функции countWords.
Тогда countDigitsInWords будет выглядеть так:
func countDigitsInWords(next func() string) counter {
    pending := submitWords(next)
    counted := countWords(pending)
    return fillStats(counted)
}

Теперь владение ясно, и программу легче понимать. Но куда делись все goroutines? Мы запускаем их внутри submitWords и countWords так:
// submitWords отправляет слова для подсчёта.
func submitWords(next func() string) chan string {
    out := make(chan string)
    go func() {
        for {
            word := next()
            out <- word
            if word == "" {
                break
            }
        }
    }()
    return out
}

// countWords считает цифры в словах.
func countWords(in chan string) chan pair {
    out := make(chan pair)
    go func() {
        for {
            word := <-in
            count := countDigits(word)
            out <- pair{word, count}
            if word == "" {
                break
            }
        }
    }()
    return out
}

Убедимся, что программа всё ещё работает как ожидается:
func main() {
    phrase := "0ne 1wo thr33 4068"
    next := wordGenerator(phrase)
    stats := countDigitsInWords(next)
    printStats(stats)
}

У этого подхода есть недостаток: submitWords и countWords теперь более сложные и запускают goroutines. С другой стороны, countDigitsInWords проще и надёжнее (особенно если мы сделаем каналы направленными, что мы обсудим позже). Какой вариант выбрать, зависит от ваших предпочтений, но определённо не стоит смешивать два метода.
Возврат выходного канала из функции и заполнение его внутри внутренней goroutine — это распространённый паттерн в Go.

Резюме

Goroutines — это легковесные потоки выполнения в Go, которые позволяют писать конкурентные программы.

WaitGroup используется для синхронизации и ожидания завершения нескольких goroutines.

Channels обеспечивают безопасную передачу данных между goroutines и их синхронизацию.

Разделение логики на reader и worker делает код более понятным и масштабируемым.

Паттерн выходного канала упрощает композицию конкурентных функций.





Postgres: Index Scans
Fedor S — Sat, 20 Jan 2024 10:00:00 GMT
Using an index to improve query performance is a fundamental database practice. An index is a specialized structure organized to make data access cheaper than scanning the entire disk. In PostgreSQL, the data stored on disk is referred to as the Heap.

The Reality of Indexes
Having an index on a table does not guarantee that it will be used. The PostgreSQL planner evaluates multiple execution paths and chooses the one with the lowest estimated cost. 
It is vital to test and measure the effect of an index using the EXPLAIN command. Blindly adding indexes can actually degrade performance, as they increase the overhead for write operations (INSERT, UPDATE, DELETE) and consume disk space.

1. Practical Scenario: When Indexes are Ignored
Let's create a table with 10 million records and a composite index to see how the planner behaves.
Setup
-- Create table
CREATE TABLE foo(id1 INT, id2 INT, id3 INT, id4 INT, descr TEXT);

-- Insert 10M records
INSERT INTO foo 
SELECT i, i*3, i+i, i*2, 'hello' || i 
FROM generate_series(1, 10000000) i;

-- Create a composite index
CREATE INDEX idx_id1_id2_id3 ON foo(id1, id2, id3);

Running EXPLAIN
If we run a query that filters for a large portion of the table:
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM foo WHERE id1 > 1000;

Planner Output:
Seq Scan on foo  (cost=0.00..198530.00 rows=9999027 width=28) (actual time=0.283..2260.973 rows=9999000 loops=1)
   Filter: (id1 > 1000)
   Rows Removed by Filter: 1000
   Buffers: shared hit=16274 read=57256
Why was the Sequential Scan chosen? When using an index, the database must read the index structure and then fetch the relevant data pages from the heap. This often results in 2 disk I/Os per row. If the query returns almost the entire table (low selectivity), reading the file linearly (Seq Scan) is much faster than jumping between the index and the heap.

2. Analyzing Alternative Plans
Standard EXPLAIN only shows the "winning" plan. To understand why the index was rejected, we can use an extension like PG_ALL_PLANS to see all considered paths.
Comparative Costs for id1 > 1000:

Plan 1 (Seq Scan): Cost 198,530.00 (Winner).
Plan 2 (Index Scan): Cost 498,815.28 (Rejected due to high I/O cost).
Plan 3 (Bitmap Heap Scan): Cost 429,813.47 (Rejected; lower than index scan but higher startup than seq scan).
Plan 4 (Parallel Seq Scan): Cost 1,126,516.03 (Rejected).


3. Increasing Selectivity to Force Index Usage
We can persuade the planner to use the index by reducing the number of rows returned. This increases the selectivity of the query.
EXPLAIN (ANALYZE, BUFFERS) 
SELECT * FROM foo WHERE id1 > 1000 AND id1 < 1500000;

Updated Result:
Index Scan using idx_id1_id2_id3 on foo  (cost=0.43..187892.55 rows=1498365 width=28)
   Index Cond: ((id1 > 1000) AND (id1 < 1500000))
   Buffers: shared hit=2 read=16768
Now, the Index Scan cost (~187k) is lower than the Seq Scan cost (~223k). Because we are only fetching ~1.5M rows instead of 10M, the index becomes the most efficient path.

Summary & Best Practices

Selectivity is Key: Indexes are most effective when they filter out the vast majority of the data.

Use EXPLAIN ANALYZE: Always verify your performance assumptions in a staging environment.

Monitor Usage: Indexes are not free. Use pg_stat_user_indexes to identify and remove unused indexes that slow down writes.

Analyze Statistics: Ensure the planner has up-to-date data by running the ANALYZE command regularly.





gRPC versus REST: A Performance Comparison
Fedor S — Mon, 15 Jan 2024 11:30:00 GMT
gRPC versus REST: A Performance Comparison
TL;DR:
We benchmarked REST vs gRPC under identical local setups using .NET. While gRPC is widely believed to outperform REST, our results show REST can be equally, or even slightly faster in specific data-heavy operations.
As microservices multiply across modern systems, the choice of API protocol can make or break performance. REST has ruled for years, but gRPC promises to be faster, leaner and more efficient. So why hasn't it taken over? Microservices can be heavily dependent on each other, which means speed and stability is key. When gRPC claims to be faster than REST, why isn't it the de facto standard? In this blog we will put gRPC and REST head to head, to see which is actually faster.
gRPC is a superior technology to REST! At least that is what multiple sources claim. According to various blogs, gRPC performs better and faster than REST on several metrics. In this blog we will test specifically, how fast a REST client can handle different requests and responses, and compare it to how fast a similar gRPC client handles the same requests and responses. This begs the question...
Is gRPC faster than REST?
We hypothesize that gRPC is able to send and receive requests faster than a traditional REST. To test this, the following experiments have been developed.

The Experiment
To test the hypothesis, two experiments were created, one utilizing gRPC and one utilizing REST. These experiments have to adhere to the following:
Rules

To ensure accurate measurements, the results must be obtained from the same computer.
Multiple data structures will be tested.
The setup for both APIs has to be as similar as possible.
The time used for measuring should be obtained from the client.

Set up

To adhere to the multiple data structures rule, a local database has been created, this database will provide a single instance of an object, as well as multiple instances of objects that will be stored in a list.
Each API will have 12 methods to call.  
Three for a single instance which takes a parameter of Id.  
Each one with a larger payload  


Three for a single instance which takes a parameter of Id.  
Each one with a deeper payload  


Three for a collection of 100 instances, which takes no parameters.  
Each one with a larger payload  


Three for a collection of 100 instances, which takes no parameters.  
Each one with a deeper payload




Each API will be tested with a client written in C#, as a console app.  
Time will be measured with .NET Stopwatch.  
The stopwatch will begin when the method is called and end when the API returns the full data.


To minimize anomalies and outliers, each operation will be executed 100 times, and the average call speed will be evaluated.

Specs
I7-9700k, 8 core, 4.6GHz
Samsung SSD 840 EVO 250GB
NVIDIA GeForce GTX 1070
2x8 GB HyperX Fury 2666MHz DDR4 Memory

REST
What is REST?
We have decided to work with the common implementation of REST and not the full implementation of a RESTful API.
The key features to take note of when using REST:
Separation of client and server

Server and client can be implemented independently without knowing each other.
Server code can be changed without affecting the client.
Client code can be changed without affecting the server.
Both server and client are aware of methods available.

Statelessness

Stateless means that the server is not required to know the current state of the client and vice versa.
Either end can understand any method calls, without knowing the previously called methods.

Invocation

We invoke a method on the server via HTTP operations  
GET  
POST  
PUT  
DELETE




Setting Up the Experiment for the REST API
The architecture for this experiment is a simple one:
REST Architecture

Sample Project and Metrics
If you want to replicate this experiment yourself, the database setup can be found in the repository and the source code for the rest-api can be found in the repository.
Running our setup yielded the following results:
Single payload
The difference between a single small payload and a single large payload is small in the context of a daily task. A single small payload has a mean response time of 0.0181 whilst a single large payload has a mean response time of 0.0204 seconds. But in relation to each other, it's a 12.7% increase in response time.
API Single Payload
To put this into perspective a small payload contains 10 values of data. A large payload contains (4+(69))6+4 or 352 values. This means that we have requested 3420% more data and it only took 12.7% longer.
To test different scenarios we also created a "deep" payload that contains a different amount of nested objects. The deepest payload contains a total of eight nested objects, however, the total amount of values is far less in comparison to the previous payload. The previously mentioned payload peaked at 352 values whereas the deepest payload peaks at (4+(64))+(4+(74))+(4+(8*4)) values, or 96 values in total. In other words, the deep payloads are much smaller in size but different in structure.
To give a concrete example, a large payload is structured like so:  
large_payload {
    id,
    string_Value,
    int_value,
    double_value,
    medium_...(2743 chars omitted)...s.

gRPC
What is gRPC?
gRPC is a modern, open-source remote procedure call (RPC) framework that can run anywhere. It enables client and server applications to communicate transparently and develop connected systems.
Some key features we would like to highlight:
HTTP/2 support
HTTP/2 is HTTP/1's successor, which is what most websites and frameworks utilize today. In many ways, HTTP/2 is an improved version of HTTP/1, and HTTP/3 is already in the works.
Language independent
gRPC is language independent, which means it doesn't matter which language you develop in. The framework supports a handful of popular languages. This is quite an advantage when you're developing microservices, which might have services developed in different languages and frameworks.
Contract First
gRPC is strictly contract first which is a design approach that works especially well in larger development teams. It also excels when developing microservices, as a contract would be created before any actual implementations can be done. The contract is designed in the .proto file, which is also where gRPC gains some of its speed from, seeing as .proto files are...
Strongly typed As a by-product of a strongly typed proto file, which is used as a contract between client and server, but also used as an extensible mechanism for serializing structured data.

Setting Up the gRPC Project
For the gRPC architecture we use the same as the REST, we have a client and a server running locally. The client calls the methods exposed by the proto file. The method then gets executed on the server and queries the database, once the data has been obtained it replies to the client. When the client has received all the data, we stop and log the time elapsed since the call started.
gRPC Architecture

Sample Project and Metrics
If you want to replicate this experiment yourself, database setup can be found in the repository and the source code for the grpc-project can be found in the repository.
Running our setup yielded the following results:
Single payloads
gRPC Wide Payload Results
The test on single payloads yielded quite odd results, where the medium payload proved to be the fastest on average, and the largest payload only being slightly slower than the smallest. Just to recap the numbers; a larger payload contains 3400% more data than a small payload, and yet it only took 2.36% longer to get that data.
gRPC Deep Payload Results
Even more odd were the results of the deep payloads. Once again the payload containing a "medium" amount of data, was the fastest, just like previously. But unlike previously, the deepest payload was significantly faster than the deep payload, to be precise; the deepest payload, which contains 300% more data than a deep payload was 28.63% faster. As the results are rather unexpected we have to take a close look at possible errors that could have occurred.
Collection of payloads
gRPC Wide Payload Results
Collections paint a different picture, a small payload collection averaged 0.01911 seconds, while a collection of large payloads took 0.7025. This means a large payload on average took 267.6% longer to get. These results are much closer to what we would expect. We did the same test with a collection of deep payloads.
gRPC Deep Payload Collection Results
The results of this test, were as you would expect, as the payloads incrementally increase in size, they also increase incrementally in response time.

Conclusion
When we put the two charts next to each other, it's easy to see which one has an edge. The REST API is represented by the blue blocks, whilst gRPC is represented by the red blocks, just like previously.
Wide Payload results comparison
Deep payload results comparison
This is the case for both single instances of objects as well as collections of objects.
Deep payload collection result comparison
Wide payload collection result comparison
We hypothesized that gRPC would be faster than REST, based on the numerous blogs claiming this to be true, with their own tests. Contrary to popular belief, our experiments suggest that under certain conditions, REST can outperform gRPC in raw speed, reminding us that performance isn't universal but contextual.
These results might not seem as much, but it has been proven that people on average don't wait around for data to load and will abandon a web page or program if loading times are too long. When moving large amounts of data, a small amount of time can be the difference between keeping or losing a customer.
This prompts the question: When to use gRPC and when to use REST
We would argue that gRPC fits into a setting, where you need to have multiple programs or services talking to each other across different languages, especially when the task that needs to connect to an endpoint is an action that needs to be executed; one such action could be TurnOnTheWater(). This argument is based on the research made into gRPC, rather than the results of these particular tests.
REST on the other hand operates on the four aforementioned HTTP operations, these operations indicated data transfers of one sort or the other. While REST can execute the same actions as gRPC, the action TurnOnTheWater() doesn't fit into what a REST API was designed for. We would instead use REST where we required data transfers and other typical CRUD mechanics.
Ultimately, the right choice depends on your use case: REST for simplicity and interoperability, gRPC for high-efficiency internal microservice communication.

Possible Errors

Network outage during some of the tests
The gRPC serverside logging was set to critical, tweaking this option might yield different results.


What's Next?
This blog has only been about the differences in speed between REST and gRPC, but in reality, many other factors are present, if we were to truly compare the two frameworks. gRPC has claimed to not only be faster, but also more reliable, stable, and secure, and all of these metrics, as well as other metrics, would be interesting to cover, they are however out of scope in this particular blog-post.

Technologies Used

gRPC
.NET Web Api
.NET console app
MySQL Database




ClickHouse: Advantages, Challenges, and Pitfalls
Fedor S — Thu, 28 Dec 2023 14:15:00 GMT
ClickHouse: Advantages, Challenges, and Pitfalls
ClickHouse is one of those databases that generates excitement after the first benchmark. It's extremely fast, column-oriented, and designed for analytics at scale.
It's also remarkably simple to integrate into an existing infrastructure. You can stream data from Postgres, MongoDB, S3, or virtually any source. That's what makes it so attractive. You reach the point where your Postgres queries begin to struggle, you don't want to rebuild everything, so you introduce ClickHouse and suddenly your dashboard loads in milliseconds.
It's like adding a turbocharger to your reporting system.
ClickHouse is also evolving at an incredible pace. Every month they release new features, bug fixes, and faster query performance.
But with speed comes responsibility. ClickHouse is a powerful system. It'll reward you when you treat it properly, but it'll cause problems if you take shortcuts.

Cloud vs Self-Hosting
Your first major decision is whether to self-host or choose a managed provider.




Cloud Self-Hosted



Setup time Minutes Days/weeks

Cost at scale $$$ $ + engineer time

Backup/HA Automatic DIY

Headaches A few but not hosting related Many

Good for Most use cases Cost optimization


The cloud approach is straightforward. You get uptime, automatic scaling, and minimal headaches. The trade-off is cost. ClickHouse Cloud, Altinity Cloud, Tinybird, they all work well, but the bill can become significant once you start moving large amounts of data. You don't deal with server issues, but you pay for that peace of mind.
In ClickHouse Cloud you also don't need to worry about replication, as this is handled automatically so you don't need to create replicated and distributed tables.
Self-hosting appears simple at first, but it's not.
You set up a single node, everything runs smoothly, and then one day something fails, stops merging data, corrupts data, or something else goes wrong. This is just the tip of the iceberg.
To handle real production traffic you'll end up with replicated and distributed tables. You'll need to choose between vertical scaling, horizontal scaling, or both. Then you start worrying about corrupted parts, cluster topology, and backups.
Running ClickHouse yourself works fine for smaller setups. Once you grow, it's a full-time job unless you use something like the Altinity ClickHouse Operator on Kubernetes. That operator makes things manageable. You define clusters in YAML, it handles replication, ZooKeeper (ClickHouse Keeper), and has excellent backup strategies. If you ever plan to self-host long-term, start there.

The Dark Side of Joins
Joins in ClickHouse are not the joins you're accustomed to. They work, but they're not "free."
ClickHouse doesn't have a full query optimizer like Postgres or MySQL. That means it doesn't plan your joins intelligently. If you join two large tables, it'll happily try to load everything in memory and fail in the process.
You have to think ahead. Filter first, join later.
A few ways to survive:

Use CTEs or sub-queries to narrow down the joined dataset before the join actually happens.
Use dictionaries (in-memory lookup tables) for small reference data. They're extremely fast, but they have to fit in memory.
Know your sorting keys. ClickHouse relies on them for efficient reads. Poor keys make joins worse.
Always join the smaller table

You'll notice when you have a bad join since it will take a long time or fail completely.

Updates, Deletes, and the Reality of Immutability
ClickHouse was never designed for frequent updates or deletes. It's a write-once, append-forever kind of database. You can't just run UPDATE users SET ... like in Postgres.
To their credit, the ClickHouse team has made significant progress here. They've added lightweight deletes and updates, and there are new table engines like ReplacingMergeTree and VersionedCollapsingMergeTree that can simulate mutable data. But it still requires extra consideration.
You need to design your tables knowing that changing data later is more difficult. That's fine for analytics workloads, but painful if you expect relational behavior.
These kinds of issues still occur today. Hoping the lightweight updates which is in beta now will make life easier.

Inserting Data the Right Way
Here's the biggest beginner trap.
ClickHouse loves large inserts. It hates small ones.
Every insert triggers background merges, index updates, compression, and part creation. Do that one row at a time and you'll overwhelm it. Batch your inserts into chunks, ideally thousands of rows at a time. You'll immediately see CPU drop and throughput increase dramatically.
If you're ingesting data continuously, throw it into a queue and batch it there. That's what we do at OpenPanel.dev. It smooths out traffic spikes and keeps our ingestion fast and predictable.

Replication and Sharding
This isn't a bad thing about ClickHouse. In fact, it's one of its best features.
But I still want to cover a few parts that confused me when I first started using it.
There are three kinds of tables you'll deal with when setting up replication or sharding:

Your existing table (usually MergeTree or something similar)
A replicated table (ReplicatedMergeTree)
A distributed table (Distributed)

Each of these plays a different role in your cluster, and it's worth understanding them before you begin.

First, decide how many replicas you want. Most setups use three replicas for high availability and to get replication working, you'll replace your existing table with a replicated one.
You can do that by creating a new table and swapping MergeTree for ReplicatedMergeTree in the engine section.  
CREATE TABLE events_replicated ON CLUSTER '{cluster}' (
  ...
)
ENGINE = ReplicatedMergeTree(
  '/clickhouse/{installation}/{cluster}/tables/{shard}/openpanel/v1/{table}',
  '{replica}'
)
PARTITION BY toYYYYMM(created_at)
ORDER BY (created_at)

Once that's done, any data written to one node will be replicated to the others. ZooKeeper or ClickHouse Keeper handles the synchronization automatically.
If you want to move your data from your existing table to the replicated table you can use INSERT SELECT to do this.  
INSERT INTO events_replicated SELECT * FROM events;


Now let's look at where sharding and Distributed tables come in.
Sharding is how you scale horizontally by splitting data into smaller chunks and spreading them across nodes. That said, it's usually better to scale vertically first, because ClickHouse handles vertical scaling surprisingly well.
If you decide to shard, you'll need to create a distributed table. A distributed table knows where your data lives and redirects queries to the right node.
When creating one, you define how data should be split across nodes. In the example below, the data is sharded using cityHash64(project_id), which spreads rows evenly based on the project_id.  
CREATE TABLE events_distributed ON CLUSTER '{cluster}' AS events_replicated
ENGINE = Distributed(
  '{cluster}',
  currentDatabase(),
  events_replicated,
  cityHash64(project_id)
)

Now you can query data from any node, and ClickHouse will automatically route the request to where the data actually sits.
If you want to dig deeper, check out the official docs on Distributed tables and Replication.

So Why Stick With It?
Because when it works, it's magic.
At OpenPanel we hit all these issues. Slow inserts, bad joins, tricky replication, and we still use ClickHouse every single day. Once you set it up correctly, nothing else compares. It's incredibly fast and scales far beyond what most relational databases can handle.
You just have to respect it. Treat it like a Ferrari, not a Corolla.
If you want me to go deeper into how we deploy and manage our own cluster on Kubernetes using the Altinity operator, let me know in the comments. I can show exactly how we keep it stable and cost-efficient.



Go: Wait Groups and Coordinating Multiple Goroutines
Fedor S — Mon, 11 Dec 2023 15:10:00 GMT
Channels are a multi-purpose concurrency tool in Go. In Part 1 of the book, we covered their main use cases:

Transferring data between goroutines.
Synchronizing goroutines (the done channel).
Canceling goroutines (the cancel channel).

Transferring data is what channels were designed for, and they excel at it. For canceling goroutines, there is a special tool besides channels — a context (which we've also discussed). For synchronizing goroutines, there is also a special tool — a wait group. Let's talk about it.

Wait Group
A wait group lets you wait for one or more goroutines to finish. We started with a wait group in the very first chapter on goroutines, and now we'll go into more detail.
Suppose we want to start a goroutine and wait for it to complete. Here's how to do it with a done channel:
func main() {
    done := make(chan struct{}, 1)

    go func() {
        time.Sleep(50 * time.Millisecond)
        fmt.Print(".")
        done <- struct{}{}
    }()

    <-done
    fmt.Println("done")
}

.done
And here's how to do it with a wait group:
func main() {
    var wg sync.WaitGroup

    wg.Add(1)
    go func() {
        time.Sleep(50 * time.Millisecond)
        fmt.Print(".")
        wg.Done()
    }()

    wg.Wait()
    fmt.Println("done")
}

.done
Interestingly, a WaitGroup doesn't know anything about the goroutines it manages. It works with an internal counter. Calling wg.Add(1) increments the counter by one, while wg.Done() decrements it. wg.Wait() blocks the calling goroutine (in this case, main) until the counter reaches zero. So, main() waits for the called goroutine to finish before exiting.
The WaitGroup.Go method (Go 1.25+) automatically increments the wait group counter, runs a function in a goroutine, and decrements the counter when it's done. This means we can rewrite the example above without using wg.Add() and wg.Done():
func main() {
    var wg sync.WaitGroup

    wg.Go(func() {
        time.Sleep(50 * time.Millisecond)
        fmt.Print(".")
    })

    wg.Wait()
    fmt.Println("done")
}

.done
In short, you can wait for a goroutine to finish using these methods:

Done channel.
Wait group Add+Done+Wait.
Wait group Go+Wait.

Typically, if you just need to wait for goroutines to complete without needing a result from them, you use a wait group instead of a done channel. The default choice should be the Go method, but in this chapter, I'll use Add+Done a lot because they do a better job of showing how things work internally.

Inner World
As we discussed, the wait group knows nothing about goroutines and works with a counter instead. This simplifies the implementation a lot. Conceptually, you can think of the wait group like this:
// A WaitGroup waits for a collection of goroutines to finish.
type WaitGroup struct {
    n int
}

// Add adds delta to the WaitGroup counter.
func (wg *WaitGroup) Add(delta int) {
    wg.n += delta
    if wg.n < 0 {
        panic("negative counter")
    }
}

// Done decrements the WaitGroup counter by one.
func (wg *WaitGroup) Done() {
    wg.Add(-1)
}

// Wait blocks until the WaitGroup counter is zero.
func (wg *WaitGroup) Wait() {
    for wg.n > 0 {}
}

func main() {
    var wg WaitGroup

    wg.Add(1)
    go func() {
        time.Sleep(50 * time.Millisecond)
        fmt.Print(".")
        wg.Done()
    }()

    wg.Wait()
    fmt.Println("done")
}

.done
Of course, in practice it's more complicated:

All methods can be called concurrently from multiple goroutines. Modifying the shared variable n from multiple goroutines is unsafe — concurrent access can corrupt data (we'll talk more about this in the chapter on data races).
A loop-based Wait implementation will max out a CPU core until the loop finishes (this type of waiting is also known as busy waiting). Such code is strongly discouraged in production.

However, our naive implementation shows the properties of a wait group that are also present in the actual sync.WaitGroup:

Add increments or decrements (if delta < 0) the counter. Positive deltas are much more common, but technically nothing prevents you from calling Add(-1).
Wait blocks execution until the counter reaches 0. So if you call Wait before the first Add, the goroutine won't block.
After Wait completes, the wait group returns to its initial state (counter is 0). You can then reuse it.

As for the Go method, it's a simple wrapper that combines Add and Done. Here's the complete implementation taken directly from the standard library code:
// https://github.com/golang/go/blob/master/src/sync/waitgroup.go
func (wg *WaitGroup) Go(f func()) {
    wg.Add(1)
    go func() {
        defer wg.Done()
        f()
    }()
}

Try changing the example above from Add+Done to Go and see if it works.

Value vs. Pointer
Another important implementation nuance: you should pass the wait group as a pointer (*WaitGroup), not as a value (WaitGroup). Otherwise, each recipient will get its own copy with a duplicate counter, and synchronization won't work.
Here's an example of passing a value:
func runWork(wg sync.WaitGroup) {
    wg.Add(1)
    go func() {
        time.Sleep(50 * time.Millisecond)
        fmt.Println("work done")
        wg.Done()
    }()
}

func main() {
    var wg sync.WaitGroup
    runWork(wg)
    wg.Wait()
    fmt.Println("all done")
}

all done
runWork got a copy of the group and increased its counter with Add. Meanwhile, main has its own copy with a zero counter, so Wait didn't block execution. As a result, main finished without waiting for the runWork goroutine to complete.
Here's an example of passing a pointer:
func runWork(wg *sync.WaitGroup) {
    wg.Add(1)
    go func() {
        time.Sleep(50 * time.Millisecond)
        fmt.Println("work done")
        wg.Done()
    }()
}

func main() {
    var wg sync.WaitGroup
    runWork(&wg)
    wg.Wait()
    fmt.Println("all done")
}

work done
all done
Now runWork and main share the same instance of the group, so everything works as it should.
An even better approach would be not to pass the wait group around at all. Instead, we can encapsulate it in a separate type that hides the implementation details and provides a nice interface. Let's see how to do that.

Encapsulation
In Go, it's considered a good practice to hide synchronization details from clients calling your code. Fellow developers won't thank you for forcing them to deal with wait groups. It's better to encapsulate the synchronization logic in a separate function or type, and provide a convenient interface.
Wrapper Functions
Let's say I wrote a function called RunConc that runs a set of given functions concurrently:
// RunConc executes functions concurrently.
func RunConc(wg *sync.WaitGroup, funcs ...func()) {
    wg.Add(len(funcs))
    for _, fn := range funcs {
        go func() {
            defer wg.Done()
            fn()
        }()
    }
}

A client would use it like this:
func main() {
    var wg sync.WaitGroup

    RunConc(&wg, work1, work2, work3)
    wg.Wait()
    fmt.Println("all done")
}

This works, but the client still needs to create a wait group and call Wait(). We can improve this by hiding the wait group inside the function:
// RunConc executes functions concurrently and waits for them to finish.
func RunConc(funcs ...func()) {
    var wg sync.WaitGroup
    wg.Add(len(funcs))
    for _, fn := range funcs {
        go func() {
            defer wg.Done()
            fn()
        }()
    }
    wg.Wait()
}

Now the client code is much simpler:
func main() {
    RunConc(work1, work2, work3)
    fmt.Println("all done")
}

Encapsulated Types
For more complex scenarios, you can create a type that encapsulates the wait group:
// ConcurrentGroup runs functions concurrently.
type ConcurrentGroup struct {
    wg sync.WaitGroup
}

// Run adds a function to the group and executes it in a goroutine.
func (cg *ConcurrentGroup) Run(fn func()) {
    cg.wg.Add(1)
    go func() {
        defer cg.wg.Done()
        fn()
    }()
}

// Wait blocks until all functions in the group have finished.
func (cg *ConcurrentGroup) Wait() {
    cg.wg.Wait()
}

The client code becomes:
func main() {
    var cg ConcurrentGroup

    cg.Run(work1)
    cg.Run(work2)
    cg.Run(work3)
    cg.Wait()
    fmt.Println("all done")
}

In rare cases, a client may want to explicitly access your code's synchronization machinery. But usually it's better to encapsulate the synchronization logic.

Add after Wait
Normally, all Add calls happen before Wait. But technically, there's nothing stopping us from doing some of the Add calls before Wait and some after (from another goroutine).
Let's say we have a function runWork that does its job in a separate goroutine:
// runWork performs work in a goroutine.
func runWork(wg *sync.WaitGroup) {
    wg.Add(1)
    fmt.Println("starting work...")
    go func() {
        time.Sleep(50 * time.Millisecond)
        fmt.Println("work done")
        wg.Done()
    }()
}

We'll do the following:

Start a runWork goroutine (worker);
Start another goroutine to wait for the work to finish (waiter);
Start two more workers;
When all three workers have finished, the waiter will wake up and signal completion to the main function.

func main() {
    // main wait group
    var wgMain sync.WaitGroup

    // worker wait group
    var wgWork sync.WaitGroup

    // run the first worker
    runWork(&wgWork)

    // the waiter goroutine waits for all workers to finish,
    // and then completes the main wait group
    wgMain.Add(1)
    go func() {
        fmt.Println("waiting for work to be done...")
        wgWork.Wait()
        fmt.Println("all work done")
        wgMain.Done()
    }()

    // run two more workers after a while
    time.Sleep(10 * time.Millisecond)
    runWork(&wgWork)
    runWork(&wgWork)

    // executes when the waiter goroutine finishes
    wgMain.Wait()
}

starting work...
waiting for work to be done...
starting work...
starting work...
work done
work done
work done
all work done
This is rarely used in practice.

Multiple Waits
Another not-so-popular WaitGroup feature: you can call Wait from multiple goroutines. They will all block until the group's counter reaches zero.
For example, we can start one worker and three waiters:
func main() {
    var wg sync.WaitGroup

    // worker
    wg.Add(1)
    go func() {
        // do stuff
        time.Sleep(50 * time.Millisecond)
        fmt.Println("work done")
        wg.Done()
    }()

    // first waiter
    go func() {
        wg.Wait()
        fmt.Println("waiter 1 done")
    }()

    // second waiter
    go func() {
        wg.Wait()
        fmt.Println("waiter 2 done")
    }()

    // main waiter
    wg.Wait()
    fmt.Println("main waiter done")
}

work done
waiter 1 done
waiter 2 done
main waiter done
All waiters unblock after the worker calls wg.Done(). But the order in which this happens is not guaranteed. Could be this:
work done
waiter 1 done
waiter 2 done
main waiter done
Or this:
work done
waiter 1 done
main waiter done
waiter 2 done
Or even this:
work done
main waiter done
In the last case, the main waiter finished first, and then main exited before the other waiters could even print anything.
We'll see another use case for multiple Waits in the chapter on semaphores.

Panic
If multiple goroutines are involved in the wait group, there are multiple possible panic sources.
Let's say there's a work function that panics on even numbers:
func work() {
    if n := rand.IntN(9) + 1; n%2 == 0 {
        panic(fmt.Errorf("bad number: %d", n))
    }
    // do stuff
}

We start four work goroutines:
func main() {
    var wg sync.WaitGroup

    for range 4 {
        wg.Go(work)
    }

    wg.Wait()
    fmt.Println("work done")
}

panic: bad number: 8

goroutine 9 [running]:
main.work()
    /sandbox/src/main.go:19 +0x76
sync.(*WaitGroup).Go.func1()
    /usr/local/go/src/sync/waitgroup.go:239 +0x4a
created by sync.(*WaitGroup).Go in goroutine 1
    /usr/local/go/src/sync/waitgroup.go:237 +0x73 (exit status 2)
And we face a panic (unless we are very lucky).
Shared Recover
Let's add recover to catch the panic and run the program again:
func main() {
    defer func() {
        val := recover()
        if val == nil {
            fmt.Println("work done")
        } else {
            fmt.Println("panicked!")
        }
    }()

    var wg sync.WaitGroup
    for range 4 {
        wg.Go(work)
    }
    wg.Wait()
}

panic: bad number: 6

goroutine 10 [running]:
main.work()
    /sandbox/src/main.go:19 +0x76
sync.(*WaitGroup).Go.func1()
    /usr/local/go/src/sync/waitgroup.go:239 +0x4a
created by sync.(*WaitGroup).Go in goroutine 1
    /usr/local/go/src/sync/waitgroup.go:237 +0x73 (exit status 2)
Nope. You might expect recover to catch the panic and print "panicked". But instead we get the same unhandled panic as before.
The problem is that recover has an important limitation: it only works within the same goroutine that caused the panic. In our case, the panic comes from the work goroutines, while recover runs in the main goroutine — so it doesn't catch the panic. Goroutines are completely independent, remember? You can only catch the panic happening in those goroutines themselves.
Per-Goroutine Recover
Let's move recover inside the work goroutines:
func main() {
    var wg sync.WaitGroup
    panicked := false

    catchPanic := func() {
        err := recover()
        if err != nil {
            panicked = true
        }
    }

    for range 4 {
        wg.Go(func() {
            defer catchPanic()
            work()
        })
    }

    wg.Wait()
    if !panicked {
        fmt.Println("work done")
    } else {
        fmt.Println("panicked!")
    }
}

panicked!
Now, the panic is caught in its own goroutine, which then sets the panicked flag in the main goroutine. Now the program works fine and prints "panicked" as we expected.

Here we are modifying the shared panicked variable from multiple goroutines. In general, this is not a good practice because it leads to data races (we'll talk about them in the next chapter). But in this particular case, there's no real harm from races.

Key takeaway: you cannot catch a panic from "child" goroutines in the "parent" goroutine. If you want to catch a panic, do it in the goroutine where it happens.

Summary
The wait group is used to wait for goroutines to finish. Now you understand how it works and how to apply it. Key points to remember:

Wait groups use an internal counter to track goroutines, not direct references to them.
Always pass wait groups as pointers to ensure proper synchronization.
Encapsulate wait groups in functions or types to hide implementation details from clients.
Panics must be recovered in the same goroutine where they occur.
Multiple waits are possible but execution order is not guaranteed.

Wait groups provide a simple and efficient way to synchronize goroutines in concurrent Go programs.



Synchronizing Multiple Goroutines in Go: Four Key Approaches
Fedor S — Wed, 29 Nov 2023 16:42:00 GMT
In Go, the main goroutine often needs to wait for other goroutines to finish their tasks before continuing execution or exiting the program. This is a common requirement for concurrent synchronization. Go provides several mechanisms to achieve this, depending on the scenario and requirements.

Method 1: Using sync.WaitGroup
sync.WaitGroup is the most commonly used synchronization tool in Go, designed to wait for a group of goroutines to finish their tasks. It works through a counter mechanism and is especially suitable when the main goroutine needs to wait for multiple sub-goroutines.
Example Code
package main

import (
    "fmt"
    "sync"
)

func main() {
    var wg sync.WaitGroup

    // Start 3 goroutines
    for i := 1; i <= 3; i++ {
        wg.Add(1) // Increment the counter by 1
        go func(id int) {
            defer wg.Done() // Decrement the counter by 1 when the task is done
            fmt.Printf("Goroutine %d is running\n", id)
        }(i)
    }

    wg.Wait() // Main goroutine waits for all goroutines to finish
    fmt.Println("All goroutines finished")
}

Output (order may vary):
Goroutine 1 is running
Goroutine 2 is running
Goroutine 3 is running
All goroutines finished

How it works:

wg.Add(n): Increases the counter to indicate the number of goroutines to wait for.
wg.Done(): Called by each goroutine upon completion, decreases the counter by 1.
wg.Wait(): Blocks the main goroutine until the counter reaches zero.

Advantages:

Simple and easy to use, suitable for a fixed number of goroutines.
No need for additional channels, low performance overhead.


Method 2: Using Channel
By passing signals through channels, the main goroutine can wait until all other goroutines have sent completion signals. This method is more flexible, but usually a bit more complex than WaitGroup.
Example Code
package main

import "fmt"

func main() {
    done := make(chan struct{}) // A signal channel to notify completion
    numGoroutines := 3

    for i := 1; i <= numGoroutines; i++ {
        go func(id int) {
            fmt.Printf("Goroutine %d is running\n", id)
            done <- struct{}{} // Send a signal when the task is done
        }(i)
    }

    // Wait for all goroutines to finish
    for i := 0; i < numGoroutines; i++ {
        <-done // Receive signals
    }
    fmt.Println("All goroutines finished")
}

Output (order may vary):
Goroutine 1 is running
Goroutine 2 is running
Goroutine 3 is running
All goroutines finished

How it works:

Each goroutine sends a signal to the done channel upon completion.
The main goroutine confirms that all tasks are done by receiving the specified number of signals.

Advantages:

High flexibility, can carry data (such as task results).
Suitable for a dynamic number of goroutines.

Disadvantages:

Need to manually manage the number of receives, which can make the code a bit cumbersome.


Method 3: Controlling Exit with Context
Using context.Context allows you to gracefully control goroutine exits and have the main goroutine wait until all tasks are done. This method is especially useful in scenarios requiring cancellation or timeouts.
Example Code
package main

import (
    "context"
    "fmt"
    "sync"
)

func main() {
    ctx, cancel := context.WithCancel(context.Background())
    var wg sync.WaitGroup

    for i := 1; i <= 3; i++ {
        wg.Add(1)
        go func(id int) {
            defer wg.Done()
            select {
            case <-ctx.Done():
                fmt.Printf("Goroutine %d cancelled\n", id)
                return
            default:
                fmt.Printf("Goroutine %d is running\n", id)
            }
        }(i)
    }

    // Simulate task completion
    cancel()       // Send cancel signal
    wg.Wait()      // Wait for all goroutines to exit
    fmt.Println("All goroutines finished")
}

Output (may vary depending on when cancellation occurs):
Goroutine 1 is running
Goroutine 2 is running
Goroutine 3 is running
All goroutines finished

How it works:

The context is used to notify goroutines to exit.
WaitGroup ensures that the main goroutine waits for all goroutines to complete.

Advantages:

Supports cancellation and timeout, suitable for complex concurrent scenarios.

Disadvantages:

Slightly more complex code.


Method 4: Using errgroup (Recommended)
golang.org/x/sync/errgroup is an advanced tool that combines the waiting functionality of WaitGroup with error handling, making it especially suitable for waiting for a group of tasks and handling errors.
Example Code
package main

import (
    "fmt"
    "golang.org/x/sync/errgroup"
)

func main() {
    var g errgroup.Group

    for i := 1; i <= 3; i++ {
        id := i
        g.Go(func() error {
            fmt.Printf("Goroutine %d is running\n", id)
            return nil // No error
        })
    }

    if err := g.Wait(); err != nil {
        fmt.Println("Error:", err)
    } else {
        fmt.Println("All goroutines finished")
    }
}

Output:
Goroutine 1 is running
Goroutine 2 is running
Goroutine 3 is running
All goroutines finished

How it works:

g.Go() starts a goroutine and adds it to the group.
g.Wait() waits for all goroutines to finish and returns the first non-nil error (if any).

Advantages:

Simple and elegant, supports error propagation.
Built-in context support (can use errgroup.WithContext).

Installation:

Requires go get golang.org/x/sync/errgroup.


Which Method to Choose?
sync.WaitGroup

Applicable scenarios: Simple tasks with a fixed number.
Advantages: Simple and efficient.
Disadvantages: Does not support error handling or cancellation.

Channel

Applicable scenarios: Dynamic tasks or when results need to be passed.
Advantages: Highly flexible.
Disadvantages: Manual management is more complex.

context

Applicable scenarios: Complex situations where cancellation or timeout is required.
Advantages: Supports cancellation and timeout.
Disadvantages: Code is slightly more complex.

errgroup

Applicable scenarios: Modern applications that require error handling and waiting.
Advantages: Elegant and powerful.
Disadvantages: Requires extra dependency.


Others: Why Doesn't the Main Goroutine Just Sleep?
time.Sleep only introduces a fixed delay and cannot accurately wait for tasks to finish. This may cause the program to exit prematurely or lead to unnecessary waiting. Synchronization tools are more reliable.

Summary
The most commonly used method for the main goroutine to wait for other goroutines is sync.WaitGroup, which is simple and efficient. If you need error handling or cancellation capabilities, errgroup or a combination with context is recommended. Choose the appropriate tool according to your specific requirements to ensure clear program logic and prevent resource leaks.



Go: Understanding Race Conditions and Atomic Synchronization
Fedor S — Thu, 05 Oct 2023 17:33:00 GMT
Preventing data races with mutexes may sound easy, but dealing with race conditions is a whole other matter. Let's learn how to handle these beasts!

Race Condition
Let's say we're keeping track of the money in users' accounts:
// Accounts - money in users' accounts.
type Accounts struct {
    bal map[string]int
    mu  sync.Mutex
}

// NewAccounts creates a new set of accounts.
func NewAccounts(bal map[string]int) *Accounts {
    return &Accounts{bal: maps.Clone(bal)}
}

We can check the balance by username or change the balance:
// Get returns the user's balance.
func (a *Accounts) Get(name string) int {
    a.mu.Lock()
    defer a.mu.Unlock()
    return a.bal[name]
}

// Set changes the user's balance.
func (a *Accounts) Set(name string, amount int) {
    a.mu.Lock()
    defer a.mu.Unlock()
    a.bal[name] = amount
}

Account operations — Get and Set — are concurrent-safe, thanks to the mutex.
There's also a store that sells Lego sets:
// A Lego set.
type LegoSet struct {
    name  string
    price int
}

Alice has 50 coins in her account. She wants to buy two sets: "Castle" for 40 coins and "Plants" for 20 coins:
func main() {
    acc := NewAccounts(map[string]int{
        "alice": 50,
    })
    castle := LegoSet{name: "Castle", price: 40}
    plants := LegoSet{name: "Plants", price: 20}

    var wg sync.WaitGroup

    // Alice buys a castle.
    wg.Go(func() {
        balance := acc.Get("alice")
        if balance < castle.price {
            return
        }
        time.Sleep(5 * time.Millisecond)
        acc.Set("alice", balance-castle.price)
        fmt.Println("Alice bought the castle")
    })

    // Alice buys plants.
    wg.Go(func() {
        balance := acc.Get("alice")
        if balance < plants.price {
            return
        }
        time.Sleep(10 * time.Millisecond)
        acc.Set("alice", balance-plants.price)
        fmt.Println("Alice bought the plants")
    })

    wg.Wait()

    balance := acc.Get("alice")
    fmt.Println("Alice's balance:", balance)
}

Alice bought the castle
Alice bought the plants
Alice's balance: 30
What a twist! Not only did Alice buy both sets for a total of 60 coins (even though she only had 50 coins), but she also ended up with 30 coins left! Great deal for Alice, not so great for us.
The problem is that checking and updating the balance is not an atomic operation:
// body of the second goroutine
balance := acc.Get("alice")             // (1)
if balance < plants.price {             // (2)
    return
}
time.Sleep(10 * time.Millisecond)
acc.Set("alice", balance-plants.price)  // (3)

At point ➊, we see a balance of 50 coins (the first goroutine hasn't done anything yet), so the check at ➋ passes. By point ➌, Alice has already bought the castle (the first goroutine has finished), so her actual balance is 10 coins. But we don't know this and still think her balance is 50 coins. So at point ➌, Alice buys the plants for 20 coins, and the balance becomes 30 coins (the "assumed" balance of 50 coins minus the 20 coins for the plants = 30 coins).
Individual actions on the balance are safe (there's no data race). However, balance reads/writes from different goroutines can get "mixed up", leading to an incorrect final balance. This situation is called a race condition.
You can't fully eliminate uncertainty in a concurrent environment. Events will happen in an unpredictable order — that's just how concurrency works. However, you can protect the system's state — in our case, the purchased sets and balance — so it stays correct no matter what order things happen in.
Let's check and update the balance in one atomic operation, protecting the entire purchase with a mutex. This way, purchases are processed strictly sequentially:
// Shared mutex.
var mu sync.Mutex

// Alice buys a castle.
wg.Go(func() {
    // Protect the entire purchase with a mutex.
    mu.Lock()
    defer mu.Unlock()

    balance := acc.Get("alice")
    if balance < castle.price {
        return
    }
    time.Sleep(5 * time.Millisecond)
    acc.Set("alice", balance-castle.price)
    fmt.Println("Alice bought the castle")
})

// Alice buys plants.
wg.Go(func() {
    // Protect the entire purchase with a mutex.
    mu.Lock()
    defer mu.Unlock()

    balance := acc.Get("alice")
    if balance < plants.price {
        return
    }
    time.Sleep(10 * time.Millisecond)
    acc.Set("alice", balance-plants.price)
    fmt.Println("Alice bought the plants")
})

Alice bought the plants
Alice's balance: 30
One of the goroutines will run first, lock the mutex, check and update the balance, then unlock the mutex. Only after that will the second goroutine be able to lock the mutex and make its purchase.
We still can't be sure which purchase will happen — it depends on the order the goroutines run. But now we are certain that Alice won't buy more than she's supposed to, and the final balance will be correct:
Alice bought the castle
Alice's balance: 10
Or:
Alice bought the plants
Alice's balance: 30
To reiterate:

A data race happens when multiple goroutines access shared data, and at least one of them modifies it. We need to protect the data from this kind of concurrent access.
A race condition happens when an unpredictable order of operations leads to an incorrect system state. In a concurrent environment, we can't control the exact order things happen. Still, we need to make sure that no matter the order, the system always ends up in the correct state.

Go's race detector can find data races, but it doesn't catch race conditions. It's always up to the programmer to prevent race conditions.

Compare-and-Set
Let's go back to the situation with the race condition before we added the mutex:
// Alice's balance = 50 coins.
// Castle price = 40 coins.
// Plants price = 20 coins.

// Alice buys a castle.
wg.Go(func() {
    balance := acc.Get("alice")
    if balance < castle.price {
        return
    }
    time.Sleep(5 * time.Millisecond)
    acc.Set("alice", balance-castle.price)
    fmt.Println("Alice bought the castle")
})

// Alice buys plants.
wg.Go(func() {
    balance := acc.Get("alice")
    if balance < plants.price {
        return
    }
    time.Sleep(10 * time.Millisecond)
    acc.Set("alice", balance-plants.price)
    fmt.Println("Alice bought the plants")
})

Alice bought the castle
Alice bought the plants
Alice's balance: 30
As we discussed, the reason for the incorrect final state is that buying a set (checking and updating the balance) is not an atomic operation:
// body of the second goroutine
balance := acc.Get("alice")             // (1)
if balance < plants.price {             // (2)
    return
}
time.Sleep(10 * time.Millisecond)
acc.Set("alice", balance-plants.price)  // (3)

At point ➊, we see a balance of 50 coins, so the check at ➋ passes. By point ➌, Alice has already bought the castle, so her actual balance is 10 coins. But we don't know this and still think her balance is 50 coins. So at point ➌, Alice buys the plants for 20 coins, and the balance becomes 30 coins (the "assumed" balance of 50 coins minus the 20 coins for the plants = 30 coins).
To solve the problem, we can protect the entire purchase with a mutex, just like we did before. But there's another way to handle it.
We can keep two separate operations (checking and updating the balance), but make sure they happen atomically. To do this, we'll use a compare-and-set pattern:
// Buy attempts to purchase a set for the given buyer.
func (a *Accounts) Buy(buyer string, price int) bool {
    a.mu.Lock()
    defer a.mu.Unlock()

    balance := a.bal[buyer]
    if balance < price {
        return false
    }
    a.bal[buyer] = balance - price
    return true
}

Now the purchase logic is atomic — checking and updating happen together, protected by the mutex. The client code becomes simpler:
// Alice buys a castle.
wg.Go(func() {
    if acc.Buy("alice", castle.price) {
        fmt.Println("Alice bought the castle")
    }
})

// Alice buys plants.
wg.Go(func() {
    if acc.Buy("alice", plants.price) {
        fmt.Println("Alice bought the plants")
    }
})

Alice bought the castle
Alice's balance: 10
Now only one purchase can succeed, and the final balance will always be correct.

Idempotence and Atomicity
An operation is idempotent if performing it multiple times has the same effect as performing it once. For example, setting a value is idempotent:
acc.Set("alice", 30)
acc.Set("alice", 30)  // Same effect as the first call

But incrementing a value is not idempotent:
acc.Set("alice", acc.Get("alice") + 10)
acc.Set("alice", acc.Get("alice") + 10)  // Different effect!

An operation is atomic if it appears to happen all at once from the perspective of other goroutines. Even if the operation involves multiple steps internally, other goroutines can't see intermediate states.
In our purchase example, the Buy method is atomic because the entire check-and-update operation happens while holding the mutex. Other goroutines can't see the balance between the check and the update.

Locker
The sync.Locker interface provides a standard way to work with locks:
type Locker interface {
    Lock()
    Unlock()
}

Both sync.Mutex and sync.RWMutex implement this interface. This allows you to write code that works with any type of lock:
func doWork(lock sync.Locker) {
    lock.Lock()
    defer lock.Unlock()
    // Do work...
}


TryLock
Sometimes you want to try to acquire a lock, but if it's not available, you should immediately return an error instead of waiting.
We can use the TryLock method of a mutex to implement this logic:
// External is a client for an external system.
type External struct {
    lock sync.Mutex
}

// Call calls the external system.
func (e *External) Call() error {
    if !e.lock.TryLock() {
        return errors.New("busy")  // (1)
    }
    defer e.lock.Unlock()
    // Simulate a remote call.
    time.Sleep(100 * time.Millisecond)
    return nil
}

TryLock tries to lock the mutex, just like a regular Lock. But if it can't, it returns false right away instead of blocking the goroutine. This way, we can immediately return an error at ➊ instead of waiting for the system to become available.
Now, out of four simultaneous calls, only one will go through. The others will get a "busy" error:
func main() {
    const nCalls = 4
    ex := new(External)
    start := time.Now()

    var wg sync.WaitGroup
    for range nCalls {
        wg.Go(func() {
            err := ex.Call()
            if err != nil {
                fmt.Println(err)
            } else {
                fmt.Println("success")
            }
        })
    }
    wg.Wait()

    fmt.Printf(
        "%d calls took %d ms\n",
        nCalls, time.Since(start).Milliseconds(),
    )
}

busy
busy
busy
success
4 calls took 100 ms
According to the standard library docs, TryLock is rarely needed. In fact, using it might mean there's a problem with your program's design. For example, if you're calling TryLock in a busy-wait loop ("keep trying until the resource is free") — that's usually a bad sign:
for {
    if mutex.TryLock() {
        // Use the shared resource.
        mutex.Unlock()
        break
    }
}

This code will keep one CPU core at 100% usage until the mutex is unlocked. It's much better to use a regular Lock so the scheduler can take the blocked goroutine off the CPU.

Shared Nothing
Let's go back one last time to Alice and the Lego sets we started the chapter with.
We manage user accounts:
// Accounts - money in users' accounts.
type Accounts struct {
    bal map[string]int
    mu  sync.Mutex
}

// NewAccounts creates a new set of accounts.
func NewAccounts(bal map[string]int) *Accounts {
    return &Accounts{bal: maps.Clone(bal)}
}

// Get returns the user's balance.
func (a *Accounts) Get(name string) int {
    a.mu.Lock()          // (1)
    defer a.mu.Unlock()
    return a.bal[name]
}

// Set changes the user's balance.
func (a *Accounts) Set(name string, amount int) {
    a.mu.Lock()          // (2)
    defer a.mu.Unlock()
    a.bal[name] = amount
}

And handle purchases:
acc := NewAccounts(map[string]int{
    "alice": 50,
})
castle := LegoSet{name: "Castle", price: 40}
plants := LegoSet{name: "Plants", price: 20}

// Shared mutex.
var mu sync.Mutex

// Alice buys a castle.
wg.Go(func() {
    // Protect the entire purchase with a mutex.
    mu.Lock()            // (3)
    defer mu.Unlock()

    // Check and update the balance.
})

// Alice buys plants.
wg.Go(func() {
    // Protect the entire purchase with a mutex.
    mu.Lock()            // (4)
    defer mu.Unlock()

    // Check and update the balance.
})

This isn't a very complex use case — I'm sure you've seen worse. Still, we had to put in some effort:

Protect the balance with a mutex to prevent a data race ➊ ➋.
Protect the entire purchase operation with a mutex (or use compare-and-set) to make sure the final state is correct ➌ ➍.

We were lucky to notice and prevent the race condition during a purchase. What if we had missed it?
There's another approach to achieving safe concurrency: instead of protecting shared state when working with multiple goroutines, we can avoid shared state altogether. Channels can help us do this.
Here's the idea: we'll create a Processor function that accepts purchase requests through an input channel, processes them, and sends the results back through an output channel:
// A purchase request.
type Request struct {
    buyer string
    set   LegoSet
}

// A purchase result.
type Purchase struct {
    buyer   string
    set     LegoSet
    succeed bool
    balance int // balance after purchase
}

// Processor handles purchases.
func Processor(acc map[string]int) (chan<- Request, <-chan Purchase) {
    // ...
}

Buyer goroutines will send requests to the processor's input channel and receive results (successful or failed purchases) from the output channel:
func main() {
    const buyer = "Alice"
    acc := map[string]int{buyer: 50}

    wishlist := []LegoSet{
        {name: "Castle", price: 40},
        {name: "Plants", price: 20},
    }

    reqs, purs := Processor(acc)

    // Alice buys stuff.
    var wg sync.WaitGroup
    for _, set := range wishlist {
        wg.Go(func() {
            reqs <- Request{buyer: buyer, set: set}
            pur := <-purs
            if pur.succeed {
                fmt.Printf("%s bought the %s\n", pur.buyer, pur.set.name)
                fmt.Printf("%s's balance: %d\n", buyer, pur.balance)
            }
        })
    }
    wg.Wait()
}

Alice bought the Plants
Alice's balance: 30
This approach offers several benefits:

Buyer goroutines send their requests and get results without worrying about how the purchase is done.
All the buying logic is handled inside the processor goroutine.
No need for mutexes.

All that's left is to implement the processor. How about this:
// Processor handles purchases.
func Processor(acc map[string]int) (chan<- Request, <-chan Purchase) {
    in := make(chan Request)
    out := make(chan Purchase)
    acc = maps.Clone(acc)

    go func() {
        for {
            // Receive the purchase request.
            req := <-in

            // Handle the purchase.
            balance := acc[req.buyer]
            pur := Purchase{buyer: req.buyer, set: req.set, balance: balance}
            if balance >= req.set.price {
                pur.balance -= req.set.price
                pur.succeed = true
                acc[req.buyer] = pur.balance
            } else {
                pur.succeed = false
            }

            // Send the result.
            out <- pur
        }
    }()

    return in, out
}


It would have been a good idea to add a way to stop the processor using context, but I decided not to do it to keep the code simple.

The processor clones the original account states and works with its own copy. This approach makes sure there is no concurrent access to the accounts, so there are no races. Of course, we should avoid running two processors at the same time, or we could end up with two different versions of the truth.
It's not always easy to structure a program in a way that avoids shared state. But if you can, it's a good option.

Summary
Now you know how to protect shared data (from data races) and sequences of operations (from race conditions) in a concurrent environment using mutexes. Be careful with them and always test your code thoroughly with the race detector enabled.
Use code reviews, because the race detector doesn't catch every data race and can't detect race conditions at all. Having someone else look over your code can be really helpful.
Key points to remember:

Data races occur when multiple goroutines access shared data concurrently, with at least one modifying it.
Race conditions occur when an unpredictable order of operations leads to incorrect system state.
Atomic operations ensure that check-and-update sequences happen as a single unit.
Compare-and-set patterns help make operations atomic.
Shared nothing architecture avoids concurrency issues by eliminating shared state.
TryLock can be useful but is rarely needed and may indicate design problems.

Safe concurrent programming requires careful design and thorough testing.



Go: Managing Time and Timing in Concurrent Applications
Fedor S — Fri, 22 Sep 2023 13:55:00 GMT
In this chapter, we'll explore various techniques for managing time in concurrent Go programs.

Throttling
Suppose we have work that needs to be done in large quantities:
func work() {
    // Something very important, but not very fast.
    time.Sleep(100 * time.Millisecond)
}

The simplest approach is to process tasks sequentially:
func main() {
    start := time.Now()

    work()
    work()
    work()
    work()

    fmt.Println("4 calls took", time.Since(start))
}

4 calls took 400ms
Four calls of 100 ms each take a total of 400 ms when executed one after the other.
Of course, it's faster to do the work in parallel with N handlers like this:

If there's a free handler, give it the task.
Otherwise, wait until one becomes available.

In the "Channels" chapter we solved a similar problem using a semaphore. Recall the principle:

Create an empty channel with a buffer size of N.
Before starting, a goroutine puts a token (some value) into the channel.
Once finished, the goroutine takes a token from the channel.

Let's create a wrapper throttle(n, fn) to ensure concurrent execution. We'll set up a sema channel and make sure that no more than n work functions are running at the same time:
func throttle(n int, fn func()) (handle func(), wait func()) {
    // Semaphore for n goroutines.
    sema := make(chan struct{}, n)

    // Execute fn functions concurrently, but not more than n at a time.
    handle = func() {
        sema <- struct{}{}
        go func() {
            fn()
            <-sema
        }()
    }

    // Wait until all functions have finished.
    wait = func() {
        for range n {
            sema <- struct{}{}
        }
    }

    return handle, wait
}

Now the client calls the work() function through the wrapper, not directly:
func main() {
    handle, wait := throttle(2, work)
    start := time.Now()

    handle()
    handle()
    handle()
    handle()
    wait()

    fmt.Println("4 calls took", time.Since(start))
}

4 calls took 200ms
Here's how it works:

The first and second calls start processing immediately;
The third and fourth wait for the previous two to finish.

With two handlers, 4 calls complete in 200 ms.
Such throttling works well when the parallelism level n and the individual work() times match (more or less) the rate of handle() calls. Then each call has a good chance of being processed immediately or with a small delay.
However, if there are many more calls than the handlers can manage, the system will slow down. Each work() will still take 100 ms, but handle() calls will hang, waiting for a place in a semaphore. This isn't a big deal for data pipelines, but could be problematic for online requests.
Sometimes, clients may prefer to get an immediate error when all handlers are busy. We need another approach for such cases.

Backpressure
Let's change the throttle() logic:

If there's room in the semaphore, execute the function.
Otherwise, return an error immediately.

This way, the client doesn't have to wait for a stuck call.
The select statement will help us once again.
Before:
// Execute fn functions concurrently,
// but not more than n at a time.
handle = func() {
    sema <- struct{}{}
    go func() {
        fn()
        <-sema
    }()
}

After:
// Execute fn functions concurrently,
// but not more than n at a time.
handle = func() error {
    select {
    case sema <- struct{}{}:
        go func() {
            fn()
            <-sema
        }()
        return nil
    default:
        return errors.New("busy")
    }
}

Let's recall how select works:

Checks which cases are not blocked.
If multiple cases are ready, randomly selects one to execute.
If all cases are blocked, waits until one is ready.

The third point (all cases are blocked) actually splits into two:

If there's no default case, select waits until one is ready.
If there is a default case, select executes it.

The default case is perfect for our situation:

If there's a token in the sema channel, we run fn.
Otherwise, we return a "busy" error without waiting.

Let's look at the client:
func main() {
    handle, wait := throttle(2, work)

    start := time.Now()

    err := handle()
    fmt.Println("1st call, error:", err)

    err = handle()
    fmt.Println("2nd call, error:", err)

    err = handle()
    fmt.Println("3rd call, error:", err)

    err = handle()
    fmt.Println("4th call, error:", err)

    wait()

    fmt.Println("4 calls took", time.Since(start))
}

1st call, error: 
2nd call, error: 
3rd call, error: busy
4th call, error: busy
4 calls took 100ms
The first two calls ran concurrently (each took 100 ms), while the third and fourth got an error immediately. All calls were handled in 100 ms.
Of course, this approach (sometimes called backpressure) requires some awareness on the part of the client. The client should understand that a "busy" error means overload, and either delay further handle() calls or reduce their frequency.

Operation Timeout
Here's a function that normally takes 10 ms, but in 20% of the calls it takes 200 ms:
func work() int {
    if rand.Intn(10) < 8 {
        time.Sleep(10 * time.Millisecond)
    } else {
        time.Sleep(200 * time.Millisecond)
    }
    return 42
}

Let's say we don't want to wait more than 50 ms. So, we set a timeout — the maximum time we're willing to wait for a response. If the operation doesn't complete within the timeout, we'll consider it an error.
Let's create a wrapper that runs the given function with the given timeout:
func withTimeout(timeout time.Duration, fn func() int) (int, error) {
    // ...
}

We'll call it like this:
func main() {
    for range 10 {
        start := time.Now()
        timeout := 50 * time.Millisecond
        if answer, err := withTimeout(timeout, work); err != nil {
            fmt.Printf("Took longer than %v. Error: %v\n", time.Since(start), err)
        } else {
            fmt.Printf("Took %v. Result: %v\n", time.Since(start), answer)
        }
    }
}

Took 10ms. Result: 42
Took 10ms. Result: 42
Took 10ms. Result: 42
Took longer than 50ms. Error: timeout
Took 10ms. Result: 42
Took longer than 50ms. Error: timeout
Took 10ms. Result: 42
Took 10ms. Result: 42
Took 10ms. Result: 42
Took 10ms. Result: 42
Here's the idea behind withTimeout():

Run the given fn() in a separate goroutine.
Wait for the timeout period.
If fn() returns a result, return it.
If it doesn't finish in time, return an error.

Here's how you can implement it:
// withTimeout executes a function with a given timeout.
func withTimeout(timeout time.Duration, fn func() int) (int, error) {
    var result int

    done := make(chan struct{})
    go func() {
        result = fn()
        close(done)
    }()

    select {
    case <-done:
        return result, nil
    case <-time.After(timeout):
        return 0, errors.New("timeout")
    }
}

The time.After(d) function returns a channel that will receive a value after duration d. It's a convenient way to implement timeouts.

Timer
A timer is a mechanism for executing code after a certain delay. Go provides the time.Timer type for this purpose.
Basic Timer Usage
func main() {
    timer := time.NewTimer(2 * time.Second)
    <-timer.C
    fmt.Println("Timer fired!")
}

NewTimer(d) creates a timer that will send the current time to channel C after duration d. You can stop the timer before it fires using Stop():
func main() {
    timer := time.NewTimer(2 * time.Second)
    go func() {
        time.Sleep(1 * time.Second)
        if timer.Stop() {
            fmt.Println("Timer stopped")
        }
    }()
    <-timer.C
    fmt.Println("Timer fired!")
}

Timer Reset
Sometimes you need to reset a timer to extend or restart its duration. However, there are important considerations when resetting timers.
Let's consider a consumer that processes tokens and logs a warning if no tokens arrive within a timeout period:
type token struct{}

func consumer(cancel <-chan token, in <-chan token) {
    const timeout = time.Hour
    for {
        select {
        case <-in:
            // do stuff
        case <-time.After(timeout):
            // log warning
        case <-cancel:
            return
        }
    }
}

Let's write a client that measures the memory usage after 100K channel sends:
func main() {
    cancel := make(chan token)
    defer close(cancel)

    tokens := make(chan token)
    go consumer(cancel, tokens)

    measure(func() {
        for range 100000 {
            tokens <- token{}
        }
    })
}

Memory used: 24223 KB, # allocations: 300011
Behind the scenes, each time.After creates a timer that is later freed by the garbage collector. So our for loop is essentially creating a myriad of timers, doing a lot of allocations, and creating unnecessary work for the GC. This is usually not what we want.
To avoid creating a timer on each loop iteration, you can create it at the beginning and reset it before moving on to the next iteration. The Reset method in Go 1.23+ is perfect for this:
func consumer(cancel <-chan token, in <-chan token) {
    const timeout = time.Hour
    timer := time.NewTimer(timeout)
    for {
        timer.Reset(timeout)
        select {
        case <-in:
            // do stuff
        case <-timer.C:
            // log warning
        case <-cancel:
            return
        }
    }
}

Memory used: 0 KB, # allocations: 2
This approach does not create new timers, so the GC does not need to collect them.
Reset in Go pre-1.23
Due to implementation quirks in Go versions prior to 1.23, Reset should only be called on an already stopped or expired timer with an empty output channel. So, to reset the timer correctly, you have to use a helper function:
// resetTimer stops, drains and resets the timer.
func resetTimer(t *time.Timer, d time.Duration) {
    if !t.Stop() {
        select {
        case <-t.C:
        default:
        }
    }
    t.Reset(d)
}

func consumer(cancel <-chan token, in <-chan token) {
    const timeout = time.Hour
    timer := time.NewTimer(timeout)
    for {
        resetTimer(timer, timeout)
        select {
        case <-in:
            // do stuff
        case <-timer.C:
            // log warning
        case <-cancel:
            return
        }
    }
}

Memory used: 0 KB, # allocations: 2
time.AfterFunc
To make matters worse, time.AfterFunc also creates a timer, but a very different one. It has a nil C channel, so the Reset method works differently:

If the timer is still active (not stopped, not expired), Reset clears the timeout, effectively restarting the timer.
If the timer is already stopped or expired, Reset schedules a new function execution.

func main() {
    var start time.Time

    work := func() {
        fmt.Printf("work done after %dms\n", time.Since(start).Milliseconds())
    }

    // run work after 10 milliseconds
    timeout := 10 * time.Millisecond
    start = time.Now()  // ignore the data race for simplicity
    t := time.AfterFunc(timeout, work)

    // wait for 5 to 15 milliseconds
    delay := time.Duration(5+rand.Intn(11)) * time.Millisecond
    time.Sleep(delay)
    fmt.Printf("%dms has passed...\n", delay.Milliseconds())

    // Reset behavior depends on whether the timer has expired
    t.Reset(timeout)
    start = time.Now()

    time.Sleep(50*time.Millisecond)
}

If the timer has not expired, Reset clears the timeout:
8ms has passed...
work done after 10ms
If the timer has expired, Reset schedules a new function call:
work done after 10ms
13ms has passed...
work done after 10ms
To reiterate:

Go ≤ 1.22: For a Timer created with NewTimer, Reset should only be called on stopped or expired timers with drained channels.
Go ≥ 1.23: For a Timer created with NewTimer, it's safe to call Reset on timers in any state (active, stopped or expired). No channel drain is required.
For a Timer created with AfterFunc, Reset either reschedules the function (if the timer is still active) or schedules the function to run again (if the timer has stopped or expired).

Timers are not the most obvious things in Go, are they?

Ticker
Sometimes you want to perform an action at regular intervals. There's a tool for this in Go called a ticker. A ticker is like a timer, but it keeps firing until you stop it:
func work(at time.Time) {
    fmt.Printf("%s: work done\n", at.Format("15:04:05.000"))
}

func main() {
    ticker := time.NewTicker(50 * time.Millisecond)
    defer ticker.Stop()

    go func() {
        for {
            at := <-ticker.C
            work(at)
        }
    }()

    // enough for 5 ticks
    time.Sleep(260 * time.Millisecond)
}

07:20:00.150: work done
07:20:00.200: work done
07:20:00.250: work done
07:20:00.300: work done
07:20:00.350: work done
NewTicker(d) creates a ticker that sends the current time to the channel C at interval d. You must stop the ticker eventually with Stop() to free up resources.
In our case, the interval is 50 ms, which allows for 5 ticks.
If the channel reader can't keep up with the ticker, the ticker will skip ticks:
func work(at time.Time) {
    fmt.Printf("%s: work done\n", at.Format("15:04:05.000"))
    time.Sleep(100 * time.Millisecond)
}

func main() {
    ticker := time.NewTicker(50 * time.Millisecond)
    defer ticker.Stop()

    go func() {
        for {
            at := <-ticker.C
            work(at)
        }
    }()

    // enough for 3 ticks because of the slow work()
    time.Sleep(260 * time.Millisecond)
}

07:20:00.150: work done
07:20:00.200: work done
07:20:00.300: work done
In this case, the receiver starts to fall behind after the second tick.
As you can see, the ticks don't pile up; they adapt to the slow receiver.

Summary
Now you know that handling time in concurrent programs is not about (ab)using time.Sleep. Here are some useful tools you've learned:

Timeouts limit operation time.
Timers help with delayed operations.
Tickers are for periodic actions.
Default case in select allows non-blocking processing.

These tools provide efficient and reliable ways to manage time-based operations in concurrent Go programs.

Cloud	Self-Hosted
Setup time	Minutes	Days/weeks
Cost at scale	$$$	$ + engineer time
Backup/HA	Automatic	DIY
Headaches	A few but not hosting related	Many
Good for	Most use cases	Cost optimization