question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Federated Learning and TF Encrypted

See original GitHub issue

Goals

  • reduce burden on users
    • align with TF as far as possible by eg matching their API
    • align with TF Federated where possible
    • find common ground across TF Encrypted, and minimise concepts and terminology specific to federated learning
    • reduce redundancy and ambiguity; make sure every construct has a reason
  • ensure flexibility to experiment
    • models, aggregation strategies, and cryptographic techniques
  • keep what separated from how to lower complexity and increase reuse
    • allow higher-order ops such as Dense and ReLU to sometimes let the how depend on the what

Deliverables:

  • TFE native FL API and example
  • Bridge for using TFE for secure aggregation with TF Federated
  • Bridge for using TFE for secure aggregation with a distribution strategy

Background

TF Distribute strategies

From the docs, distribute strategies are about state & compute distribution policy on a list of devices.

From the guide:

The only things that need to change in a user’s program are: (1) Create an instance of the appropriate tf.distribute.Strategy and (2) Move the creation and compiling of Keras model inside strategy.scope.

strategy.scope() indicated which parts of the code to run distributed. Creating a model inside this scope allows us to create mirrored variables instead of regular variables. Compiling under the scope allows us to know that the user intends to train this model using this strategy. Once this is setup, you can fit your model like you would normally [i.e. outside scope].

mirrored_strategy = tf.distribute.MirroredStrategy()
with mirrored_strategy.scope():
  model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(1,))])
  model.compile(loss='mse', optimizer='sgd')

dataset = tf.data.Dataset.from_tensors(([1.], [1.])).repeat(100).batch(10)
model.fit(dataset, epochs=2)
model.evaluate(dataset)

TF Federated

From the docs:

Currently, TensorFlow does not fully support serializing and deserializing eager-mode TensorFlow. Thus, serialization in TFF currently follows the TF 1.0 pattern, where all code must be constructed inside a tf.Graph that TFF controls. This means currently TFF cannot consume an already-constructed model; instead, the model definition logic is packaged in a no-arg function that returns a tff.learning.Model. This function is then called by TFF to ensure all components of the model are serialized.

On TF distribution strategies vs Federated Core, from the docs:

the stated goal of tf.distribute is to allow users to use existing models and training code with minimal changes to enable distributed training, and much focus is on how to take advantage of distributed infrastructure to make existing training code more efficient. The goal of TFF’s Federated Core is to give researchers and practitioners explicit control over the specific patterns of distributed communication they will use in their systems. The focus in FC is on providing a flexible and extensible language for expressing distributed data flow algorithms, rather than a concrete set of implemented distributed training capabilities.

One of the primary target audiences for TFF’s FC API is researchers and practitioners who might want to experiment with new federated learning algorithms and evaluate the consequences of subtle design choices that affect the manner in which the flow of data in the distributed system is orchestrated, yet without getting bogged down by system implementation details. The level of abstraction that FC API is aiming for roughly corresponds to pseudocode one could use to describe the mechanics of a federated learning algorithm in a research publication - what data exists in the system and how it is transformed, but without dropping to the level of individual point-to-point network message exchanges.

From the tutorial on text:

def create_tff_model():
  ...
  keras_model_clone = compile(tf.keras.models.clone_model(keras_model))
  return tff.learning.from_compiled_keras_model(
      keras_model_clone, dummy_batch=dummy_batch)

# This command builds all the TensorFlow graphs and serializes them
fed_avg = tff.learning.build_federated_averaging_process(model_fn=create_tff_model)

# Perform federated training steps
state = fed_avg.initialize()
state, metrics = fed_avg.next(state, [example_dataset.take(1)])
print(metrics)

Note that state can used to update a local clone of the model for evaluation after each iteration:

state = fed_avg.initialize()

state = tff.learning.state_with_new_model_weights(
    state,
    trainable_weights=[v.numpy() for v in keras_model.trainable_weights],
    non_trainable_weights=[
        v.numpy() for v in keras_model.non_trainable_weights
    ])

def keras_evaluate(state, round_num):
  tff.learning.assign_weights_to_keras_model(keras_model, state.model)
  print('Evaluating before training round', round_num)
  keras_model.evaluate(example_dataset, steps=2)

for round_num in range(NUM_ROUNDS):
  keras_evaluate(state, round_num)
  state, metrics = fed_avg.next(state, train_datasets)
  print('Training metrics: ', metrics)

keras_evaluate(state, NUM_ROUNDS + 1)

Terminology

Functionality and Protocol: the former is basically used as a function from and to local tensors, while the latter as a more general means to specify how functionalities are to be computed using e.g. cryptographic techniques. As such, protocols are the only one of the two that are used as context handlers. Roughly follows UC terminology although functionalities are not intended to be used as sub-protocols.

Suggested API

# specify the players involved
model_owner = tfe.Player('model_owner')
data_owners = [
    tfe.Player('data_owner_0'),
    tfe.Player('data_owner_1'),
    tfe.Player('data_owner_2'),
]

# build data pipeline on each data owner;
# this would likely be an unique function per owner
data_sources = [
    build_data_pipeline(data_owner)
    for data_owner in data_owners
]

# use fast, non-resilient, secure aggregation based on additive secret sharing
aggregation = tfe.functionalities.AdditiveSecureAverage

# ... alternatively we could have instantiated it explicitly,
# resulting in exactly the same thing
aggregation = tfe.functionalities.AdditiveSecureAverage(
    compute_players=data_owners,
    output_receiver=model_owner)

# ... or we could have unrolled its (simplified) implementation
def aggregation(plaintext_grads):
  pond = tfe.protocols.Pond(data_owners)
  with pond:
    grads = [
        tfe.define_private_input(grad, owner)
        for grad, owner in zip(plaintext_grads, data_owners)
    ]
    aggregated_grad = tfe.add_n(grads) / len(grads)
    return tfe.reveal(aggregated_grad, model_owner)

# initialising the federated protocol with the model owner in order
# to specify where the reference weights of the model should live
# and where updates should happen
federated = tfe.protocol.FederatedLearning(model_owner, aggregation)

# use it as a context to essentially trace the `model_fn` function to be
# executed on both model and data owners, with all variables controlled
with federated:

  # creating outside the context would cause an error due to wrong locality
  # between inputs and weights
  model = tfe.keras.Sequential()
  model.add(tfe.keras.Dense())
  model.add(tfe.keras.ReLU())

  # compiling outside the context would cause an error due to wrong locality
  # between weights and weight updates
  model.compile(
      optimizer=tf.train.AdamOptimizer(0.001),
      loss='categorical_crossentropy',
      metrics=['accuracy'])

# ... alternatively, `model_fn` functions can be passed to the protocol for
# compilation, allowing e.g. for different models to be run on the players
def model_fn():
  model = tfe.keras.Sequential()
  model.add(tfe.keras.Dense())
  model.add(tfe.keras.ReLU())

  model.compile(
      optimizer=tf.train.AdamOptimizer(0.001),
      loss='categorical_crossentropy',
      metrics=['accuracy'])

model = federated.compile({
    player: model_fn
    for player in [model_owner] + data_owners
})

# fitting can be done anywhere (following ordinary TF) yet the locality of
# the training data must match with the data owners
model.fit(data_sources, epochs=10)

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:4
  • Comments:11 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
mortendahlcommented, Oct 2, 2019

For example [here], we would like to avoid defining manually the model weights.

Yes, fully agree.

It would be nice to kick the training process with model.fit(data_sources, epochs=10) and see the training process.

Following the current abstractions in the example, it may make more sense to take the data owners as input instead of the data sources.

it would be nice to have a method which takes data as a numpy format (or even tfrecord) and number of players. Then the data would be distributed evenly among these players.

This seems useful indeed, but let’s make sure to clearly mark it as something that’s for simulation use only.

1reaction
yanndupiscommented, Oct 1, 2019

@mortendahl - happy to explore other alternatives. In general, I think it’s important that’s easy to use and we can support arbitrary Keras models. So maybe we just need to improve the abstraction and automate some of the stuff in the example. For example [here], we would like to avoid defining manually the model weights.

It would be nice to kick the training process with model.fit(data_sources, epochs=10) and see the training process. Also I think we would like to have some utils to distribute the data among several data owners.

Read more comments on GitHub >

github_iconTop Results From Across the Web

A Framework for Encrypted Machine Learning in TensorFlow
TF Encrypted aims to make privacy-preserving machine learning readily available, without requiring expertise in cryptography, distributed systems, or high ...
Read more >
Privacy Preserving Deep Learning – PySyft Versus TF ...
Both PySyft and TF-Encrypted have incorporated federated learning, and with either framework you can either mitigate private data leakage with a ...
Read more >
Building Secure Aggregation into TensorFlow Federated
Summary. We introduce a secure aggregation protocol into TensorFlow Federated based on encryption primitives from TF Encrypted.
Read more >
A Secure Federated Transfer Learning Framework - arXiv
Index Terms—Federated Learning, Transfer Learning, Multi-party Computation, Secret Sharing, Homomorphic Encryption. ♢. 1 INTRODUCTION. RECENT Artificial ...
Read more >
TF Encrypted (@tf_encrypted) / Twitter
Federated Learning with Secure Aggregation in TensorFlow. Integrating TF Encrypted ... Encrypted Deep Learning training & inference with TF Encrypted Keras ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found