Data sharing between nodes / epochs

#1

How is the data shared between nodes when a new epoch starts? E.g. I upload 2GB to perform some computation over several epochs. Is the data sent to all nodes (stored on the internal state database) before execution of the contract or does every randomly selected node have to download the data before a new epoch starts?

More specifically I’m thinking about the feasability of running a genetic algorithm on the Enigma network where dozens of nodes work in parallel over multiple epochs.

#2

A secret contract (with it’s state) has an upper data limit of 4gb. More accurately physical memory allocated to the enclave by the bios is very small (128MB). However, V2 of the Linux driver use the paging feature of the Linux kernel to extend the heap size to up to 4GB. (see more -> https://software.intel.com/en-us/forums/intel-software-guard-extensions-intel-sgx/topic/670322). That said, this is a theoretical limit. If a worker is working on multiple contracts in an epoch (depends on network adoption i.e. active workers vs. active secret contracts), then this theoretical limit will be divided among multiple contracts.

Data / state is stored encrypted in the p2p network that Enigma workers run on. The state is shared across all workers but it’s encrypted. At each epoch keys are distributed to relevant workers, so that they can do the required work on the data set.

I guess the answer to your question regarding feasibility, depends on the data set and the performance requirements of the application. Do you have more on this?

#3

I’m exploring feasibility of training ANN models with an evolutionary/GA approach on the Enigma network. The data itself is a multidimensional time series. So the computational demand would be quite significant.

One advantage of evolutionary approaches for training ANNs is the better compression techniques available. i.e. models shared and stored across the network at each epoch are in the range of a few kb (or even below 1kb). The main disadvantage is the high number of CPUs necessary to run the computation in a reasonable time.

Put aside the time needed to complete the training the more relevant question will be if the cost are competitive. What will be the price compared to e.g. Amazon AWS? Or asked differently can the Enigma Network be used for such computation intensive tasks or should as much of the computation be outsourced, e.g. by using techniques such as federated learning where the models are only verified and averaged on the Enigma network.

#4

@can Concerning competitiveness against something like Amazon AWS its clear that there is no straight forward answer since you also pay for privacy (when using Enigma) and the value of that privacy will largely depend on the data itself. I’m wondering more if its a n-fold of lets say n < 5 or much larger, n >> 5. Will we have some benchmark after the testnet release?

Another question: What is the length of an epoch?

thank you