This CL fixes 2 issues:
1. in Quantize pass, for float16 quant, the pass replaces the original op with an identical one. This causes infinite loop for lstm ops. This CL adds checks to avoid such situations.
2. in SplitMergedOperands pass, const stateful operands are duplicated. But for float16 quantization, stateful operands have const->dequantize pattern. This CL add duplication for such pattern.
PiperOrigin-RevId: 627826212
Imported from GitHub PR https://github.com/openxla/xla/pull/11784
Copybara import of the project:
--
fc334e480802d2686907ff720a97697e51177690 by mmakevic <Milica.Makevic@amd.com>:
Add dropout descriptor for rocm
--
52cf3ad49e70b3ecf632d8bf83283a068fbb58d0 by mmakevic <Milica.Makevic@amd.com>:
Use SetRNNDescriptor version 2
Merging this change closes#11784
PiperOrigin-RevId: 627770243
- Allow injected values to be roots of epilogues
- Allow n-ary values to be injected
- Always generate an epilogue function if it's requested.
- Simplify EmitEpilogue interface to hide indexes and calling convention
details from the caller.
This simplifies the logic in the reduction emitter somewhat.
PiperOrigin-RevId: 627770230
- Use the correct absl::BitGen.
- Fix the definition of `std::common_type` for `shlo_ref::F16`
- Fix building `shlo_ref::F16` from values that can be converted to `float`.
PiperOrigin-RevId: 627723091
Here is the signature of the provide API:
```c++
// Converts a TensorFlow model (either from a SavedModel or an MLIR module) to a
// StableHLO MLIR module.
//
// Args:
// input_path: The path to the input TensorFlow SavedModel or MLIR module.
// context: The MLIR context to use for parsing or creating the MLIR module.
// exported_model_signatures: A comma-separated list of exported model
// signatures (functions) to convert.
// tag_names: A comma-separated list of tag names used for loading SavedModel.
// input_arg_shapes_str: A string representation of input argument shapes.
// Shapes for different tensors are separated by ':', and dimension sizes for
// the same tensor are separated by ','. For example,
// 'input-arg-shapes=1,2::1,?' expresses input arguments with shapes [1,2],
// [] and [1,?].
// is_input_mlir_module: If true, `input_path` is treated as an MLIR
// module instead of a SavedModel.
//
// Returns:
// An absl::StatusOr containing the converted StableHLO MLIR module on
// success, or an absl::Status with an error message on failure.
absl::StatusOr<OwningOpRef<ModuleOp>> TfToStablehlo(
absl::string_view input_path, MLIRContext* context,
absl::string_view exported_model_signatures, absl::string_view tag_names,
absl::string_view input_arg_shapes_str, bool is_input_mlir_module = false);
```
PiperOrigin-RevId: 627698652
Imported from GitHub PR https://github.com/openxla/xla/pull/11313
Copybara import of the project:
--
98ee3b456f62a9404bfaf976ab9a2d25415293cc by Harsha HS <harsha.havanurshamsundara@amd.com>:
Distinguish order of padding operation between CUDA and ROCm
--
6f318827a4f2f31938349e24d292966fc6b95b97 by Harsha HS <harsha.havanurshamsundara@amd.com>:
Use CHECK-DAG to check for low and high padding mlir
--
8a3da357c20a865dc91f090d6e600ade492c3e7f by Harsha HS <harsha.havanurshamsundara@amd.com>:
Add -no_gpu tag to the test, so that it is picked by CI
--
f13a4003aedb5a005c3b0668c474da30b46fe5e2 by Harsha HS <harsha.havanurshamsundara@amd.com>:
Revert "Add -no_gpu tag to the test, so that it is picked by CI"
This reverts commit 8a3da357c20a865dc91f090d6e600ade492c3e7f.
--
9fca062f688dca198e232b1cfcd04eba3a8d99d2 by Harsha HS <harsha.havanurshamsundara@amd.com>:
Remove tags as it is covered by CPU CI
Merging this change closes#11313
PiperOrigin-RevId: 627638965
Imported from GitHub PR https://github.com/openxla/xla/pull/11761
Currently `--xla_gpu_enable_nccl_comm_splitting` is a no-op unless using single process mode. This PR allows it to work in multiprocess mode by removing the IsLocal check and fixing the key in the rank -> comm map which was causing the following error:
```
7: Communicator for rank 1 not found in a NCCL clique devices=[3,7]; stream=0
5: E0403 15:37:24.104311 3871038 pjrt_stream_executor_client.cc:2809] Execution of replica 0 failed: INTERNAL: Communicator for rank 1 not found in a NCCL clique devices=[1,5]; stream=0
6: E0403 15:37:24.104477 3870182 pjrt_stream_executor_client.cc:2809] Execution of replica 0 failed: INTERNAL: Communicator for rank 1 not found in a NCCL clique devices=[2,6]; stream=0
4: E0403 15:37:24.105872 3871021 pjrt_stream_executor_client.cc:2809] Execution of replica 0 failed: INTERNAL: Communicator for rank 1 not found in a NCCL clique devices=[0,4]; stream=0
```
Copybara import of the project:
--
c77ed08f77e252a5068a23d02555ffa765913d42 by Trevor Morris <tmorris@nvidia.com>:
Enable nccl comm split for multiprocess mode
Merging this change closes#11761
PiperOrigin-RevId: 627626547
Imported from GitHub PR https://github.com/openxla/xla/pull/11603
Configure PJRT GPU plugin so it can be built for ROCm as well.
Copybara import of the project:
--
a1c8bcb4be41dc56899118d44bf604a2723a3c56 by mmakevic <Milica.Makevic@amd.com>:
Configure pjrt gpu plugin for rocm
--
9ca24357f52c53febb474c798c67d0b8dda586ee by mmakevic <Milica.Makevic@amd.com>:
Change platform name defining
Merging this change closes#11603
PiperOrigin-RevId: 627625764
Before this cl, `hlo_sharding_util::ReshapeSharding` can handle the cases where source and target shapes can be transformed to each other by merging and splitting dimension sizes. It returns `std::nullopt` if transpose is needed between source and target shapes.
This cl extracts the `gcd(source_sharding_tile_size, target_shape)` when `source_shape % source_sharding_tile_size == 0` in the major dimensions. We also skip the source_dim if `source_sharding_tile_size` is 1. An example is shown below.
```
input_shape: [6, 2, 5]
output_shape: [4, 3, 5]
input_sharding: {devices=[2, 1, 5]<=[10]}
output_sharding: {devices=[2, 1, 5]<=[10]}
```
PiperOrigin-RevId: 627592738