Make it an option to use TransformerEngine activation function in FFN block #1233

guyueh1 · 2024-10-21T18:58:31Z

[draft]
Add a new config parameter use_te_activation_func to control if we want to use TE custom kernels for activation function in MLP.

Signed-off-by: Guyue Huang <[email protected]>

* Add activation_func field in MLPSubmodules * In extensions/transformer_engine.py, add TEActivationOp, TEActivationOpFp8, and TERowParallelLinearOp which all have type te.ops.Sequential(te.ops.<OP_NAME>) * Add specs of the new classes to get_layer_specs.py when instantiating mlp Signed-off-by: Guyue Huang <[email protected]>

erhoo82 · 2024-11-06T17:55:51Z

megatron/core/extensions/transformer_engine.py

+            'transformer engine. Consider setting use_te_activation_func=False')
+        return te_ops.Sequential(instance)
+
+class TEActivationOpFp8:


Why do we need this?
In case of fp8 recipe, the fp8 quantization should be handled by linear module, right?

Also, te_ops.Quantize doesn't cast using scale_factor from amax history, right?

This module was what I designed to enable cast fusion. But I have decided to make this PR focused on enabling TE activation, and enable the cast fusion in a new PR, so we can discuss API design once I create the new PR. I have removed this class.

erhoo82 · 2024-11-06T17:56:10Z

megatron/core/extensions/transformer_engine.py

@@ -523,6 +575,181 @@ def sharded_state_dict(self, prefix='', sharded_offsets=(), metadata=None):
        )


+class TERowParallelLinearOp(te_ops.Sequential):


Why do we need TERowParallelLinearOp module?

tl;dr: this module is no longer necessary due to a recent refact at TE side. I have removed it;
This module was designed to make RowParallelLinear a TE op. Previously there were two code paths to implement row parallel linear, one is legacy and the other is TE operation-based API. The latter is what I needed. However, recent refact of TE has unified the two code paths and there is no longer legacy layer, so my wrapper class here is not necessary. I have removed this.

Conflicts: megatron/core/transformer/mlp.py megatron/core/transformer/transformer_config.py

* Remove wrapping TE activation with TE sequential, directly use TE op class * Remove the TE activation class dedicated for fp8, we will enable cast fusion in a new PR * Remove the TE linear op class because TE has refacted its linear class to use op so mcore doesn't need to * Fix bug * Remove unused file megatron/core/transformer/te_activation_func_utils.py Signed-off-by: Guyue Huang <[email protected]>

Guyue Huang added 2 commits October 21, 2024 11:57

Make it an option to use te activation function in MLP

d1d3e63

Signed-off-by: Guyue Huang <[email protected]>

erhoo82 reviewed Nov 6, 2024

View reviewed changes

guyueh1 added 2 commits November 19, 2024 10:14

Merge branch 'main' into te_activation_func

8a02d8e

Conflicts: megatron/core/transformer/mlp.py megatron/core/transformer/transformer_config.py

guyueh1 changed the title ~~Make it an option to use te activation function in MLP~~ Make it an option to use TransformerEngine activation function in FFN block Nov 19, 2024

guyueh1 marked this pull request as ready for review November 19, 2024 19:29

guyueh1 mentioned this pull request Nov 19, 2024

Add new option 'use_te_activation_func' to megatron config NVIDIA/NeMo#11329

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make it an option to use TransformerEngine activation function in FFN block #1233

Make it an option to use TransformerEngine activation function in FFN block #1233

guyueh1 commented Oct 21, 2024

erhoo82 Nov 6, 2024

guyueh1 Nov 19, 2024

erhoo82 Nov 6, 2024

guyueh1 Nov 19, 2024

		@@ -523,6 +575,181 @@ def sharded_state_dict(self, prefix='', sharded_offsets=(), metadata=None):
		)


		class TERowParallelLinearOp(te_ops.Sequential):

Make it an option to use TransformerEngine activation function in FFN block #1233

Are you sure you want to change the base?

Make it an option to use TransformerEngine activation function in FFN block #1233

Conversation

guyueh1 commented Oct 21, 2024

erhoo82 Nov 6, 2024

Choose a reason for hiding this comment

guyueh1 Nov 19, 2024

Choose a reason for hiding this comment

erhoo82 Nov 6, 2024

Choose a reason for hiding this comment

guyueh1 Nov 19, 2024

Choose a reason for hiding this comment