Turing Pi V2 announcement

Today we are thrilled to announce the Turing Pi V2. We’ve been collecting a massive amount of information, synthesizing all of it in product variations, and thinking about what steps we should do next. Today we are going to announce only some of the fundamental details about the V2. We are still working on checking some of the hypotheses, but believe me, the other parts of the announcement will be exciting too. We will disclose more closer to opening orders.

After we released the original 7 nodes Raspberry Pi cluster, we had many questions. What to do next, how to increase the product’s value, what compute modules to choose, how many nodes, etc. We decided to determine the minimal cluster block size with options to connect hard drives and extension boards. The cluster block should be a self-sustained base node and with ample scope of scale. Next, we wanted the minimum cluster blocks to have an option to connect and form cluster federations and, at the same time, be cost-efficient and easy to scale. The speed of scale should be higher than connecting regular computers on the network and cheaper than the typical server hardware. Another thing, the minimum cluster units should be compact enough, mobile, energy-efficient, cost-effective, and easy to maintain. This is one of the key differences between server racks and everything related to them.

Number of nodes

To determine the minimal cluster unit, we started by picking the optimal number of nodes. By simple logical judgments, we realized that a 4 nodes cluster is the best option. One node is not a cluster. Two nodes, not enough – 1 master and 1 worker don’t give too many options to scale within the block, especially for heterogeneous options. Three nodes look better but still limited to scale the block. We decided to choose 4 nodes option. More than 4 nodes would be either more expensive for PoC (Proof of Concept) or challenging to fit into the mini ITX form factor.

We believe 4 nodes is a golden median because:

The more affordable production cost of the cluster board and allows getting a more affordable fully equipped cluster as a result
A more solid architecture with 1 main node and 3 workers
More heterogeneous variations with general-compute + accelerated-compute

Compute modules

Turing Pi Compute Module with Raspberry Pi 4 support

While searching the compute modules, we opened a whole market—various modules with 128 MB RAM to 8 GB RAM. We are expecting to see 16 Gb RAM options soon too. For hosting cloud-native apps on the edge, 1 GB of RAM is already not enough, and the recent appearance of modules with 2, 4, and even 8 GB of RAM provides good space for growth. We also considered options with FPGA modules for machine learning but decided to hold on to this because the software ecosystem is still not mature enough. While studying the compute modules market, we came up with creating a unified compute module interface that we are implementing in the Turing Pi 2. This will allow connecting compute modules from other manufacturers and also combine different purpose modules to solve specific tasks.

Today we are announcing V2 support for the Raspberry Pi 4 Compute Module (CM4), including Lite and 8 GB RAM versions.

General-Purpose Modules	Cores	RAM, GB	Storage, GB	Network, Gbps
pi4.1-lite	4	1	SD	1
pi4.2-lite	4	2	SD	1
pi4.4-lite	4	4	SD	1
pi4.8-lite	4	8	SD	1
pi4.1-8	4	1	8	1
pi4.2-8	4	2	8	1
pi4.4-8	4	4	8	1
pi4.8-8	4	8	8	1
pi4.1-16	4	1	16	1
pi4.2-16	4	2	16	1
pi4.4-16	4	4	16	1
pi4.8-16	4	8	16	1
pi4.1_32	4	1	32	1
pi4.2_32	4	2	32	1
pi4.4_32	4	4	32	1
pi4.8_32	4	8	32	1

Turing Pi Compute Modules

Peripherals

After determining the compute modules vendor and the number of nodes, we started to study the PCI bus option. The PCI bus is a standard for peripheral devices, and it is found in almost all compute modules. We have several nodes and ideally so that each node can share PCI devices in concurrent request mode. For example, if a hard drive is connected to the bus, it is available to all nodes. We started looking for PCI switches with multi-host support and found that none of them fit our requirements. These solutions were mostly limited to 1 host or multi-hosts, but no concurrent endpoints request mode. The second problem is the high cost of $50 or more per chip. In the Turing Pi 2, we decided to postpone experiments with PCI switches and return to this later as we mature. So far, we decided to assign a role for each node along the path: the first two nodes are exposed to mini PCI express port, the third node is exposed to 2-ports 6 Gbps SATA controller. You can use the network file system within the cluster to access hard drives from other nodes. Why not?

Sketches & result

We decided to share some sketches of how the minimal cluster unit evolved over time through discussions and observations.

As a result, we came to a cluster unit with 4x 260-pin nodes, 2x mini PCIe (Gen 2) ports, 2x SATA (Gen 3) ports. The cluster board has a Layer-2 Managed Switch with VLAN support. You can use the mini PCIe port connected to the first node to install a network card and get another Ethernet port or connect a 5G modem and make the first node perform as a router.

The cluster management bus has more functions now, including the ability to flash modules directly through all slots and of course, FAN connectors with speed control on each node.

Use cases

Edge infrastructure for self-hosted applications & services

We designed the Turing Pi 2 as a minimal building block for consumer / commercial-grade edge infrastructure. With V2, it’s affordable to start a proof-of-concept and scale as you grow, gradually migrating applications that are more cost-effective and practical to host on the edge. Cluster blocks can be linked together and form larger clusters. This can be done sequentially without interrupting already established processes. There are already a huge number of business applications to host apps locally.

ARM Workstation

With up to 32 GB of RAM per cluster unit, the first node can be used to run the desktop version of the OS, for example Ubuntu Desktop 20.04 LTS. The remaining 3 nodes can be used for compilation, testing, debugging tasks, and development of cloud-native solutions for ARM clusters.

The Turing V2 cluster is architecturally similar to the AWS Graviton clusters. The CM4 processor uses the ARMv8 architecture. You can build images and applications for AWS instances of Graviton 1 and 2, which are known to be much cheaper than x86 instances.

https://twitter.com/lizthegrey/status/1314618333140971521

Turing V2 can easily be included in the existing clusters (cloud or on-premises) and shared with other participants using Kubernetes RBAC.

Wrapout

The Turing Pi 2 is more functional than V1 and we also expect it to be cheaper to manufacture.

We are very grateful to everyone who supported us and baked the Turing Pi project by purchasing the Turing Pi V1. The Turing Pi 2 wouldn’t be possible without your support. As a way to thank our supporters, we will offer 25% off for the Turing Pi 2 for everyone who purchased the V1.

That’s it for now. We hope you are excited about the upcoming V2 which we plan to release next year. We will post more updates on the progress along the way. Stay tuned.

Turing Pi V2 announcement