I/O에서 발표된 Google AI Edge 관련 발표에 대해 자세히 알아보세요.

이 페이지는 Cloud Translation API를 통해 번역되었습니다.

TensorFlow 작업 퓨전

개요

이 페이지에서는 TensorFlow의 복합 작업을 TensorFlow Lite의 통합 작업으로 변환하는 데 필요한 설계와 단계를 설명합니다. 이 인프라는 범용이며 TensorFlow의 모든 복합 작업을 TensorFlow Lite의 상응하는 통합 작업으로 변환할 수 있도록 지원합니다.

이 인프라의 사용 예는 여기에 설명된 대로 TensorFlow Lite에 대한 TensorFlow RNN 작업을 융합하는 것입니다.

통합 작업이란 무엇인가요?

TensorFlow 작업은 기본 작업(예: tf.add)이거나 다른 기본 작업(예: tf.einsum)으로 구성될 수 있습니다. 기본 작업은 TensorFlow 그래프에서 단일 노드로 표시되는 반면 복합 작업은 TensorFlow 그래프의 노드 모음입니다. 복합 작업을 실행하는 것은 각각의 구성요소 기본 작업을 실행하는 것과 같습니다.

융합 연산은 각 기본 연산에서 실행된 모든 계산을 해당 복합 연산 내에 포함하는 단일 연산에 해당합니다.

통합 작업의 이점

융합 연산은 전체 계산을 최적화하고 메모리 공간을 줄여 기본 커널 구현의 성능을 극대화하기 위해 존재합니다. 이는 특히 지연 시간이 짧은 추론 워크로드와 리소스가 제한된 모바일 플랫폼에 매우 유용합니다.

또한 융합 연산은 양자화와 같은 복잡한 변환을 정의하기 위한 상위 수준 인터페이스를 제공합니다. 양자화는 그렇게 하지 않으면 실행 불가능하거나 더 세분화된 수준에서는 매우 어려웠습니다.

TensorFlow Lite에는 위에서 설명한 이유로 인해 융합된 작업의 인스턴스가 많이 있습니다. 이러한 통합 작업은 일반적으로 소스 TensorFlow 프로그램의 복합 작업에 해당합니다. TensorFlow Lite에서 단일 융합 작업으로 구현되는 TensorFlow의 복합 연산의 예로는 단방향 및 양방향 시퀀스 LSTM, 컨볼루션 (conv2d, 고품질 덧셈, relu), 완전 연결 (matmul, 바이어스 덧셈, relu)과 같은 다양한 RNN 작업이 있습니다. TensorFlow Lite에서 LSTM 양자화는 현재 합성된 LSTM 연산에서만 구현됩니다.

통합 작업의 문제점

TensorFlow Lite의 복합 작업을 TensorFlow에서 통합 작업으로 변환하는 것은 어려운 문제입니다. 이유는 다음과 같습니다.

복합 작업은 TensorFlow 그래프에서 잘 정의된 경계가 없는 기본 작업 집합으로 표현됩니다. 이러한 복합 연산에 해당하는 하위 그래프를 식별 (예: 패턴 일치)하는 것은 매우 어려울 수 있습니다.
통합된 TensorFlow Lite 작업을 타겟팅하는 TensorFlow 구현이 두 개 이상 있을 수 있습니다. 예를 들어 TensorFlow에는 많은 LSTM 구현 (Keras, Babelfish/lingvo 등)이 있으며 각각은 서로 다른 기본 연산으로 구성되지만 모두 TensorFlow Lite에서 동일한 통합 LSTM 작업으로 변환될 수 있습니다.

따라서 융합된 작업의 변환은 상당히 어려운 것으로 입증되었습니다.

복합 작업에서 TFLite 맞춤 작업으로 변환 (권장)

`tf.function`에 복합 작업 래핑

대부분의 경우 모델의 일부를 TFLite에서 단일 작업에 매핑할 수 있습니다. 이렇게 하면 특정 작업에 최적화된 구현을 작성할 때 성능에 도움이 될 수 있습니다. TFLite에서 통합 작업을 만들려면 그래프에서 통합 작업을 나타내는 부분을 식별하고 값이 true인 속성 값이 tfl_fusable_op인 tf.function에 대한 'experimental_implementations' 속성이 있는 tf.function으로 래핑합니다. 맞춤 작업에서 속성을 사용하면 동일한 'experimental_implementations'의 일부로 속성을 전달합니다.

예:

def get_implements_signature():
  implements_signature = [
    # 'name' will be used as a name for the operation.
    'name: "my_custom_fused_op"',
    # attr "tfl_fusable_op" is required to be set with true value.
    'attr {key: "tfl_fusable_op" value { b: true } }',
    # Example attribute "example_option" that the op accepts.
    'attr {key: "example_option" value { i: %d } }' % 10
  ]
  return ' '.join(implements_signature)

@tf.function(experimental_implements=get_implements_signature())
def my_custom_fused_op(input_1, input_2):
  # An empty function that represents pre/post processing example that
  # is not represented as part of the Tensorflow graph.
  output_1 = tf.constant(0.0, dtype=tf.float32, name='first_output')
  output_2 = tf.constant(0.0, dtype=tf.float32, name='second_output')
  return output_1, output_2

class TestModel(tf.Module):
  def __init__(self):
    super(TestModel, self).__init__()
    self.conv_1 = tf.keras.layers.Conv2D(filters=1, kernel_size=(3, 3))
    self.conv_2 = tf.keras.layers.Conv2D(filters=1, kernel_size=(3, 3))

  @tf.function(input_signature=[
      tf.TensorSpec(shape=[1, 28, 28, 3], dtype=tf.float32),
      tf.TensorSpec(shape=[1, 28, 28, 3], dtype=tf.float32),
  ])
  def simple_eval(self, input_a, input_b):
    return my_custom_fused_op(self.conv_1(input_a), self.conv_2(input_b))

tfl_fusable_op 속성이 이미 이를 암시하므로 변환기에서 allow_custom_ops를 설정할 필요는 없습니다.

맞춤 작업을 구현하고 TFLite 인터프리터로 등록

통합 작업을 TFLite 맞춤 작업으로 구현합니다. instructions를 참조하세요.

작업을 등록할 때 사용하는 이름은 구현 서명의 name 속성에 지정된 이름과 유사해야 합니다.

예시에 있는 연산의 예는

  TfLiteRegistration reg = {};
  // This name must match the name specified in the implements signature.
  static constexpr char kOpName[] = "my_custom_fused_op";
  reg.custom_name = kOpName;
  reg.prepare = [](TfLiteContext* context, TfLiteNode* node) -> TfLiteStatus {
    // Add your code.
    return kTfLiteOk;
  };
  reg.invoke = [](TfLiteContext* context, TfLiteNode* node) -> TfLiteStatus {
    // Add your code.
    return kTfLiteOk;
  };
  reg.builtin_code = kTfLiteCustom;
  resolver->AddCustom(kOpName, &reg);

복합 작업에서 통합 작업으로 변환 (Advanced)

TensorFlow 복합 작업을 TensorFlow Lite 통합 작업으로 변환하기 위한 전체 아키텍처는 다음과 같습니다.

`tf.function`에 복합 작업 래핑

TensorFlow 모델 소스 코드에서 experimental_implements 함수 주석을 사용하여 복합 작업을 식별하고 tf.function으로 추상화합니다. 임베딩 조회의 예를 참조하세요. 함수는 인터페이스를 정의하고 인수는 변환 로직을 구현하는 데 사용해야 합니다.

전환 코드 작성

변환 코드는 implements 주석을 사용하여 함수의 인터페이스별로 작성됩니다. 임베딩 조회에 대한 퓨전 예를 참고하세요. 개념적으로 변환 코드는 이 인터페이스의 복합 구현을 통합 구현으로 대체합니다.

준비 복합 함수 패스에서 변환 코드의 플러그인을 사용합니다.

고급 용도에서는 통합 연산의 피연산자를 파생하기 위해 복합 연산 피연산자의 복잡한 변환을 구현할 수 있습니다. 예를 들어 Keras LSTM을 참고하세요.

TensorFlow Lite로 변환

TFLiteConverter.from_saved_model API를 사용하여 TensorFlow Lite로 변환합니다.

자세히 들여다보기

이제 TensorFlow Lite에서 통합 작업으로의 변환에 관한 전반적인 설계를 개략적으로 설명합니다.

TensorFlow에서 작업 작성

experimental_implements 함수 속성과 함께 tf.function을 사용하면 사용자는 TensorFlow 기본 연산을 사용하여 새 작업을 명시적으로 작성하고 결과 복합 연산이 구현할 인터페이스를 지정할 수 있습니다. 이는 다음과 같은 이점을 제공하므로 매우 유용합니다.

기본 TensorFlow 그래프의 복합 작업을 위한 잘 정의된 경계
이 작업이 구현하는 인터페이스를 명시적으로 지정합니다. tf.function의 인수는 이 인터페이스의 인수에 해당합니다.

예를 들어 임베딩 조회를 구현하도록 정의된 복합 작업을 가정해 보겠습니다. 이는 TensorFlow Lite의 융합 작업에 매핑됩니다.

  @tf.function(
        experimental_implements="embedding_lookup")
    def EmbFprop(embs, ids_vec):
      """Embedding forward prop.

      Effectively, it computes:
        num = size of ids_vec
        rets = zeros([num, embedding dim])
        for i in range(num):
          rets[i, :] = embs[ids_vec[i], :]
        return rets

      Args:
        embs: The embedding matrix.
        ids_vec: A vector of int32 embedding ids.

      Returns:
        The result of embedding lookups. A matrix of shape
        [num ids in ids_vec, embedding dims].
      """
      num = tf.shape(ids_vec)[0]
      rets = inplace_ops.empty([num] + emb_shape_suf, py_utils.FPropDtype(p))

      def EmbFpropLoop(i, embs, ids_vec, rets):
        # row_id = ids_vec[i]
        row_id = tf.gather(ids_vec, i)
        # row = embs[row_id]
        row = tf.reshape(tf.gather(embs, row_id), [1] + emb_shape_suf)
        # rets[i] = row
        rets = inplace_ops.alias_inplace_update(rets, [i], row)
        return embs, ids_vec, rets

      _, _, rets = functional_ops.For(
          start=0,
          limit=num,
          delta=1,
          inputs=[embs, ids_vec, rets],
          body=EmbFpropLoop,
          rewrite_with_while=compiled)
      if len(weight_shape) > 2:
        rets = tf.reshape(rets, [num, symbolic.ToStatic(p.embedding_dim)])
      return rets

위의 설명처럼 tf.function을 통해 모델이 복합 작업을 사용하도록 하면 이러한 작업을 통합 TensorFlow Lite 작업으로 식별하고 변환하는 일반 인프라를 빌드할 수 있습니다.

TensorFlow Lite 변환기 확장

올해 초에 출시된 TensorFlow Lite 변환기는 TensorFlow 모델을 그래프로 가져오는 것만 지원했으며 모든 변수를 상응하는 상수 값으로 대체했습니다. 이러한 그래프에는 모든 함수가 인라인으로 표시되어 변수가 상수로 변환될 수 있으므로 작업 융합에서는 작동하지 않습니다.

변환 과정에서 experimental_implements 기능과 함께 tf.function을 활용하려면 변환 프로세스의 후반부까지 함수를 보존해야 합니다.

따라서 복합 작업 퓨전 사용 사례를 지원하기 위해 변환기에서 TensorFlow 모델을 가져오고 변환하는 새로운 워크플로를 구현했습니다. 구체적으로, 추가된 새로운 기능은 다음과 같습니다.

이를 통해 함수 인라인 및 변수 고정 전에 복합 작업을 나타내는 함수를 사용하여 작업 융합을 수행할 수 있습니다.

작업 퓨전 구현

작업 퓨전 패스를 더 자세히 살펴보겠습니다. 이 패스는 다음을 실행합니다.

MLIR 모듈의 모든 함수를 순환합니다.
함수에 tf._implementations 속성이 있으면 속성 값에 따라 적절한 작업 퓨전 유틸리티를 호출합니다.
연산 퓨전 유틸리티는 함수의 피연산자 및 속성 (변환을 위한 인터페이스 역할을 함)에서 작동하고 함수의 본문을 융합 연산이 포함된 이에 상응하는 함수 본문으로 바꿉니다.
대부분의 경우 교체된 본문에는 통합 작업 이외의 작업이 포함됩니다. 이는 융합된 연산의 피연산자를 얻기 위해 함수의 피연산자에 대한 일부 정적 변환에 해당합니다. 이러한 계산은 모두 일정한 방식으로 접을 수 있으므로 융합된 연산만 존재하는 내보낸 플랫 버퍼에는 존재하지 않습니다.

다음은 기본 워크플로를 보여주는 패스의 코드 스니펫입니다.

void PrepareCompositeFunctionsPass::ConvertTFImplements(FuncOp func,
                                                        StringAttr attr) {
  if (attr.getValue() == "embedding_lookup") {
    func.eraseBody();
    func.addEntryBlock();
    // Convert the composite embedding_lookup function body to a
    // TFLite fused embedding_lookup op.
    ConvertEmbeddedLookupFunc convert_embedded_lookup(func);
    if (failed(convert_embedded_lookup.VerifySignature())) {
      return signalPassFailure();
    }
    convert_embedded_lookup.RewriteFunc();
  } else if (attr.getValue() == mlir::TFL::kKerasLstm) {
     func.eraseBody();
     func.addEntryBlock();
     OpBuilder builder(func.getBody());
     if (failed(ConvertKerasLSTMLayer(func, &builder))) {
       return signalPassFailure();
     }
  } else if (.....) /* Other fusions can plug in here */
}

다음은 함수를 변환 인터페이스로 활용하여 TensorFlow Lite의 통합 작업에 이 복합 작업을 매핑하는 방법을 보여주는 코드 스니펫입니다.

void RewriteFunc() {
    Value lookup = func_.getArgument(1);
    Value value = func_.getArgument(0);
    auto output_type = func_.getType().getResult(0);

    OpBuilder builder(func_.getBody());
    auto op = builder.create<mlir::TFL::EmbeddingLookupOp>(
        func_.getLoc(), output_type, lookup, value);

    builder.create<mlir::ReturnOp>(func_.getLoc(), op.getResult());
  }

TensorFlow 작업 퓨전

개요

통합 작업이란 무엇인가요?

통합 작업의 이점

통합 작업의 문제점

복합 작업에서 TFLite 맞춤 작업으로 변환 (권장)

tf.function에 복합 작업 래핑

맞춤 작업을 구현하고 TFLite 인터프리터로 등록

복합 작업에서 통합 작업으로 변환 (Advanced)

tf.function에 복합 작업 래핑

전환 코드 작성

TensorFlow Lite로 변환

자세히 들여다보기

TensorFlow에서 작업 작성

TensorFlow Lite 변환기 확장

작업 퓨전 구현

`tf.function`에 복합 작업 래핑

`tf.function`에 복합 작업 래핑