Deep Dive into NKN System Architecture

Posted on June 14, 2018

What is NKN?

NKN is the new kind of network connectivity protocol & ecosystem powered by blockchain for an open, decentralized, and shared Internet. The significance of NKN is that it tokenized connectivity and data transmission capability as a valuable resource whose value lies in the usage of data transmission capabilities (to extend the scope of network connectivity and improve bandwidth for the community). A part of the NKN tokens are used for the initial ecological construction, and the other part is based on specific algorithms and generated through a large number of network data transmissions. Through this mixed distribution model, the construction and use of a “New Internet” is promoted. NKN is intuitively a connectivity exchange for network data transmission. It uses a distributed ledger of numerous nodes throughout a peer-to-peer network to identify and record validated network traffic transactions. The NKN tokens permit users to complete the digital asset transfer and payment of data transmission resources in a decentralized, peer-to-peer network. The unique feature is that it does not require a central authority or telecommunications organization to conduct data transfer transactions. Users only need internet connections and NKN software to pay for another public account or address.

Why Open Source？

NKN open source means that all the source code of the software is open, a programmer who understands the code, can get all the source code of NKN, and read the logic of it, compile to run it. The software can also be modified re-compiled for a new software.

Since the core software of NKN is open source, it means that this software has no secret. The internal logic is crystal clear. Everyone can audit whether it has loopholes, or whether there is a Trojan horse, or the back door. This is to show that this software is not controlled. And the subsequent software/source code evolution and upgrade is also clear to everyone. The purpose of doing so is an essential aspect of NKN as a public blockchain, which is helpful for the community to use NKN in this open and transparent environment and jointly build and own NKN ecosystem. (NKN Github: https://github.com/nknorg)

NKN Core Repository Overview

Technically speaking, NKN consists of many nodes distributed around the world （A live NKN testnet preview comes soon）. Each node only connects its neighbor nodes. Packets can be forwarded from any node to any other node in an efficient and verifiable route. Data can be sent to any client without a public or static IP address encrypted with a permanent NKN address.

There are two main types of devices in NKN as nodes and clients. Nodes are devices that send, receive, and most importantly relay data. A client is a device that only sends and receives data but does not relay data. The clients interact with others through the nodes. The node is the maintainer and builder of NKN and gain NKN token rewards through a useful proof of work. A node needs to have a public IP address to receive messages from nodes that have not yet established a connection. Clients are simple consumers in NKN ecosystem. They obtain data transfer service through payments to nodes. The client does not need to have a public IP address because it will establish a connection with the node and then send and receive data through them. The reason why device type is differentiated is to prevent network stability problems caused by inherent “free riders” of a peer-to-peer network, who may enter and leave the network frequently. Therefore, when selecting an NKN address, nodes and clients have different schemes. Overall, NKN hopes to encourage customers/clients to switch to nodes through the blockchain’s economic model of token and promote the co-building and sharing of Internet.

In NKN system, to prevent fork, a consensus must be achieved in each block. Because a node is both a data forwarder and a consensus participant, the consensus algorithm based on Cellular Automata is used to communicate with neighbors between nodes, and a consensus between large-scale nodes can be effectively achieved. The relay workload for data transmission can be verified by a proof of relay (PoR) algorithm. The PoR randomly selects a small number of fixed packets as proof and is sent to other nodes for payment and reward. It therefore has features that can be verified but cannot be predicted or controlled.

Based on the above descriptions, the technical highlights of the NKN core code repository include: Decentralized Data Transmission Network (DDTN), Scalable Cellular Automaton Consensus Mechanism (MVCA), Useful Proof of Work: Relay Proof (PoR) , Relay Path Verification (RPV) and NKN address schemes. Let’s introduce them one by one.

Decentralized Data Transmission Network (DDTN)

The decentralized data transmission network (DDTN) corresponds to the network layer of the NKN blockchain platform. Its function is to transmit any data to any node/client without any central server.

As a data transmission network, NKN may contain millions of nodes or more. Moreover, the network is dynamic as every node could join or leave the network at any time. At such scale, it is unrealistic for every node to maintain an up-to-date list of all nodes in the network. Instead, every node in the network is only connected to and aware of a few other nodes in the network which are called neighbors. Network topology, determined by the choice of neighbors, is crucial to performance and security. To be scalable and secure, we need to choose a proper topology that has the following properties:

(1) network should be connected and has small diameter;

(2) efficient routing algorithms exist between any node pairs using only information about neighbors;

(3) load is balanced among nodes given random traffic with uniformly distributed source and destination;

(4) choice of neighbors should be unpredictable but verifiable to prevent attacks.

NKN uses the network topology of Chord Distributed Hash Table (DHT) for high scalability. Each node has an m-bit random address on the ring. Node is connected to a set of other O(logN) nodes that have specific distance from it such that the choice of neighbors can be verified. Routing between any node pair is up to O(logN) and is deterministic and verifiable given the topology.

MVCA Consensus Algorithm Based on Cellular Automata

In the centralized electronic transfer and transaction system, a middleman is used as the central authority or credit endorsement to determine the validity of the transaction. For example, in this role played by telecom operators in the telecommunications industry, telecom operators can prove the effectiveness of a data transmission bandwidth transaction.

Because NKN does not have an intermediary or a centralized role, who is responsible for confirming the effectiveness of data transmission/relay transactions? In such a decentralized network, to achieve a consensus is somewhat similar to a process of election. By voting by everyone, the data transmission transaction that receives the majority votes will be considered as a validated transaction. Without a central authority, this way seems to be a lack of credit. However, the validated transaction resulting from voting is the most trustworthy. Therefore, NKN is a decentralized network that meets the concept of the original Internet concept and has also become a realistic version of “New Internet”, owned by the community (“New Internet” was known from Piedpiper’s Internet in “Silicon Valley”).

It is worth mentioning that the NKN consensus mechanism does not require global voting like the Byzantine Agreement, but innovatively adopts the Ising model voting mechanism based on Cellular Automata or “Majority Voting Cellular Automata” (MVCA) algorithms. The MVCA consensus algorithm can achieve a highly scalable network that can support millions or even billions of nodes. (The blockchain scalability usually includes two levels of meaning: one is the transaction speed, and the otheris the scale of the network. Here refers to the latter alone). It had been detailed described with mathematical proofs in the NKN whitepaper ( https://www.nkn.org/doc/NKN_Whitepaper.pdf ). In short, the MVCA is effective in terms of consensus time and number of messages sent, because it only requires communication iterations between several neighbors to reach a global consensus. With a Gossip protocol, the information required for consensus is sent to all participating nodes at the beginning of the consensus. This protocol takes O(logN) time (N is the number of nodes in the entire network and k is the number of neighbors), which is the main time cost of the consensus process.

Proof of Relay (PoR)

One of the key problems in NKN is to prove how much data a node relayed, which NKN defines as Proof of Relay (PoR). PoR is crucial to the NKN network as the amount of data being relayed is directly related to token rewards. An ideal proof should satisfy the following conditions:

1) verifiable: anyone, involved in data transmission or not, is able to verify the proof correctly using only public information;

2) unforgeable: no party, unless controlled all involved nodes, is able to forge a valid proof with nontrivial probability;

3) untamperable: no party, unless controlled all involved nodes, is able to modify a valid proof with nontrivial probability.

Unlike Bitcoin and Ethereum mining schemes, PoR is a useful proof of work: mining is relaying data. PoR is implemented through a special signature chain. To improve efficiency, PoR uses a hash signature chain to randomly select samples in an uncontrollable but verifiable manner. In addition, PoR is also characterized by the fact that even if a malicious attacker controls most nodes in the network, PoR is difficult to forge.

In simple terms, the PoR signature chain is a hash chain that relay nodes sign in turn when relaying data packets. The principle of the signature chain is that the participating parties in an active communication channel form a signature chain in the order of timestamps, and the hash value of the signature chain contains multiple 0s. The active channel is used for data transmission. Mining here is an exhaustive way to find a compliant hash (hash value must be less than a certain threshold). If the signature chain hash required by the NKN system increases with the number of zeros, the PoR workload increases exponentially. Once a miner node finds a qualified signature chain, it also determines a block. According to the hash of the block, a certain calculation rule can be employed to select the bookkeeper among the nodes participating in the signature chain. Therefore, the verifiable block is deterministic. Unless a certain amount of work is performed on the block, the block cannot be changed. Since the hash algorithm is an exhaustive method, if the number of honest nodes exceeds about 2/3, it will have a very low probability that an attacker intends to forge a transaction successfully.

A signature chain is a chain of signature, signed by data relayers sequentially when relaying NKN packet. Each element of the chain consists of the following fields:

1) Relayer NKN address and public key.

2) Next relayer NKN address.

3) Signature(signature of the previous element on chain, relayer NKN address, next relayer NKN address) signed with relayer private key.

The first element of the signature chain is signed by source, and the signature field is replaced by signature(payload hash, payload size, source NKN address and public key, destination NKN address and public key, next relayer NKN address).

NKN uses a packet with payload field removed as a proof of relay work for all relay nodes along the route. It satisfies the verifiable, unforgeable and untamperable requirements such that everyone is able to verify the validity of a signature chain, while no one can forge or modify a valid signature chain without controlling (have private keys) all nodes in the route.

Signature chain cannot be forked because each element contains the NKN address and public key of the next node. If a node on the route is malicious and removes or modifies some previous elements on the chain when generating his signature, the chain is no longer valid. Similarly, If a partially signed signature chain is intercepted by a malicious party, no valid signature chain can be generated without the private key of the designated next node.

PoR also takes into account the efficiency issues. Every NKN packet has a proof which can be used as a receipt to generate transactions from source to relayers. However, it is inefficient to create a transaction for every packet transmitted in the NKN. Instead, only packets whose last signature on the signature chain is smaller than a threshold are eligible for transactions.

The last signature on the signature chain is verifiable to everyone, while still being unpredictable and uncontrollable unless all nodes along the route including source and destination are controlled by the same party. The last signature is essentially deterministic given the payload and the full path, but cannot be computed in advance without all the private keys along the route.

With an ideal hash function, the last signature on the signature chain is random across packets. Thus, selecting only packets with small enough last signature for transactions (with adjusted price per packet) does not change the expected rewards for relay nodes but introduced some variation in pricing and rewards. The threshold should be chosen to balance the need for less transactions and smaller reward variation.

It can be seen that in the PoR algorithm, the bookkeeper candidate vote is not a vote of a person/IP address, but a vote of a signature chain (active communication channel). Well, if one wants to control NKN, he must have enough active channels under control. Then the larger scale of NKN, the more difficult he is to control. Just as a person may be able to control a limited number of votes easily, the larger the size of the ballot, the harder it is to control.

In addition, in a P2P network where such voting is done, assuming that the workload is very small, then confirming the transaction requires only a small amount of data transmission capability to complete, which may lead to the forge of such transactions. Therefore, NKN introduces PoR mechanism to validate a transaction, which takes a certain amount of data transmission capacity and time. Once the data transmission is successful, then a transaction is determined. In turn, for other nodes, as long as it performs a very simple operation, can know whether the transaction is real or counterfeit. This kind of PoR mechanism is like the fact that the state spends a lot of effort on counterfeit banknote design in real life. However, ordinary people only need to make simple identification of banknotes to recognize true or false.

Traditional Bitcoin’s proof of work (PoW) mechanism cannot guarantee a consensus achieved before every block creation. Therefore, although there is no malicious nodes, it may naturally produces a fork. In contrast, each block in the NKN is created after a consensus achieved, so that NKN will not be forked in nature. However，one exception is that NKN may fork if a node was completely surrounded by malicious nodes, A fork would happen but it is only a local fork rather than a global fork. After this kind of local fork happened, traditional Bitcoin’s PoW will select a longer chain as validated one. However, since NKN will not be forked in nature, and this local fork phenomena indicates an attack happens, and the attacked node will need to rejoin the network to get rid of malicious neighbors for a correction.

Finally, since Bitcoin naturally forks, it needs to take a period of time to confirm the transactions. On the other hand, NKN will not naturally fork, so it will not have such a problem. In NKN, as long as there is a block created, and after randomly selecting some nodes from the network to confirm (to avoid a node being attacked), the transaction can be confirmed directly without additional delay. This is an advantage of NKN, compared to Bitcoin.

Relay Path Validation (RPV)

Although signatures in a signature chain guarantee that it is signed by the claimed relayers, they do not guarantee the correctness of the relay path, where a path is defined as correct if it is the designated path of DDTN. Being able to validate the correctness of a path is crucial for the security of signature chains, otherwise attackers could break the assumption that each hop leads to a random node by selecting specific node as next hop, and a malicious party could construct signature chains fully under control by relaying packets only to nodes under control. Signature chain fully controlled by a party is no longer unpredictable by the party, and can be computed by the party without actually transmitting any data. The malicious party can then gain unfair economic advantage by producing more signature chains than it should, increasing its chance to get mining rewards.

The validity of any relay path should be consensus among all nodes as it is a prerequisite to select globally unique mining node for each block. There are two ways to achieve the consensus:

1) nodes use global information (e.g. previous blocks) that has already be agreed on;

2) nodes use their own local information, and reach consensus later.

NKN chooses the first approach as it does not require extra communication between nodes and is much more efficient. The disadvantage of such approach is that global information has time delay so that the topology may be different from the time when past consensus was reached. NKN considers this to be acceptable as long as the valid path is unique or almost unique, and the valid path still exists and will be selected by honest nodes with nontrivial probability at the time of validation.

NKN Address Scheme

Nodes and clients have different scheme when choosing NKN address.

First, it is important that NKN address of a node is random (or at least uniformly distributed and uncontrollable) for two reasons.

1) It is hard for malicious nodes to choose specific NKN addresses. Being able to choose NKN address makes it easier for malicious nodes to attack the system as neighbors and routing are based on NKN addresses.

2) Load is more balanced among nodes given random NKN addresses, effectively making the network more decentralized.

To guarantee the randomness of NKN address and prevent malicious nodes from choosing specific NKN address, a unique, unpredictable, uncontrollable yet still verifiable function is used to generate NKN address when a node joins NKN network. We choose the hash of public IP address and latest block hash at the time of node joining

Node NKN address = hash(Node public IP address, latest block hash)

such that it’s verifiable by other nodes but unpredictable in advance.

Second, clients have different NKN address scheme from nodes. The scheme should satisfy the following properties:

1) Client NKN address is permanent such that a client can be reached from the same NKN address when in different physical network.
2) Client NKN address is associated with its public key to avoid NKN address collision so that other clients cannot receive packets sending to it.

NKN chooses the scheme such that client NKN address is computed from a url-like NKN address string consisting of an arbitrary string chosen by the client and its public key

Client NKN address = hash(“arbitrary-string.client-public-key”)

In the NKN address string, the last substring separated by a dot (client public key) represents the unique identity of the client, similar to root domain in a url; the rest is chosen by the client, similar to subdomain in a url. Such a scheme satisfies the above two properties. In addition, a user (key pair holder) can generate as many NKN addresses as he wants sharing the same account (key pair).

Clients do not distribute their NKN addresses directly. Instead, a client distributes its NKN address string to nodes or other clients and they compute the client’s NKN address locally such that they know both the client’s NKN address and public key at the same time. Using such a scheme, end-to-end encryption between any clients is easy to implement.

Conclusion

The significance of NKN’s open source is to gain community consensus. This is the essential characteristic of NKN as a public blockchain. Community consensus is the essence of a blockchain. Just like Bitcoin ecosystem, and the value of NKN lies in its users and ecosystems. Open source is also a manifestation of the NKN team’s self-confidence in technology and self-confidence in community building. Only open and transparent technologies can be publicly recognized by the community. Only the community recognizes the value of NKN and the project can take long.