Commit Graph

165835 Commits

Author SHA1 Message Date
Jacques Pienaar
3d599f3ad5 Prefetch LLVM c9549e10e9ea70428ada80a34d15afeaf5710b2d
Needed for building the Jax python wheels.

PiperOrigin-RevId: 644150860
2024-06-17 15:56:34 -07:00
Kyle Lucke
1c0ad8a6c1 Remove unused TpuExecutor functions.
PiperOrigin-RevId: 644145033
2024-06-17 15:27:37 -07:00
Dimitar (Mitko) Asenov
6029d13a58 [XLA:GPU] Pass the correct stream when running the Cub Sort kernel.
PiperOrigin-RevId: 644136103
2024-06-17 15:20:32 -07:00
A. Unique TensorFlower
5252bc9129 Created a new LegalizeMlirToHloReproducer proto and dump that instead of just the mlir module.
PiperOrigin-RevId: 644130713
2024-06-17 15:13:29 -07:00
Kyle Lucke
ef4eb00fd1 Stop using xla/statusor.h now that it just contains an alias for absl::Status.
In some situations, this meant also changing unrelated files to directly include tsl/platform/statusor.h to get the definitions for TF_ASSIGN_OR_RETURN, etc., where they were getting transitively included for free.

PiperOrigin-RevId: 644129488
2024-06-17 14:28:28 -07:00
A. Unique TensorFlower
636f805ea0 Modify boot_id per LocalTopology when using mock NCCL
A device's slice_index depends on its LocalTopologyProto's boot_id. Currently, all devices in a mocked GPU client will have the same slice_index due to the boot_id being identical, which breaks hybrid mesh construction in AOT compilation.

This change sets a distinct boot_id for each LocalTopology.

PiperOrigin-RevId: 644118436
2024-06-17 13:56:17 -07:00
Greg Olechwierowicz
7a70848100 [XLA:GPU] Print PGLE profile when found.
PiperOrigin-RevId: 644112277
2024-06-17 13:50:58 -07:00
Quoc Truong
4a4d5519f9 Update GCS Staging artifacts for Tensorflow upload.
PiperOrigin-RevId: 644108046
2024-06-17 13:45:36 -07:00
David Dunleavy
f599e3f3a3 Don't use layering_check as it doesn't work with XLA's toolchain
Set `bes_upload_mode` to `fully_async` to prevent errors

Use `_DEFAULT_BAZEL_OPTIONS` for JAX builds

PiperOrigin-RevId: 644106310
2024-06-17 13:28:51 -07:00
Adam Banaś
87e0544384 [xla:cpu] Add more convolution benchmarks
This CL adds more convolution benchmarks. The benchmarks are based on shapes from XLA convolution tests, TF convolution benchmarks, and Eigen spatial convolution benchmarks.

PiperOrigin-RevId: 644104994
2024-06-17 13:19:52 -07:00
Kyle Lucke
d450b82f4d Stop using xla/statusor.h now that it just contains an alias for absl::Status.
In some situations, this meant also changing unrelated files to directly include tsl/platform/statusor.h to get the definitions for TF_ASSIGN_OR_RETURN, etc., where they were getting transitively included for free.

PiperOrigin-RevId: 644097216
2024-06-17 12:59:33 -07:00
A. Unique TensorFlower
c41df4594a Implements the FullyReplicatedShard method for the BasicStringArray class.
PiperOrigin-RevId: 644072353
2024-06-17 12:51:32 -07:00
A. Unique TensorFlower
64f503b9fd Remove dependency of tensorflow/lite/string_util
PiperOrigin-RevId: 644065618
2024-06-17 12:16:52 -07:00
Arturo Schmidt
0f2d21520a Replace translate ConvertMlirToGraph (with control ret nodes) with tf2xla version and remove translate version. Functionality is unchanged.
PiperOrigin-RevId: 644065323
2024-06-17 12:11:37 -07:00
Quentin Khan
f4f2393888 Register StableHLO composite in the built-in operation resolver.
PiperOrigin-RevId: 644060676
2024-06-17 12:01:43 -07:00
A. Unique TensorFlower
25379bb66b Move TF dependence from TFL tflite_copts
PiperOrigin-RevId: 644050058
2024-06-17 11:52:05 -07:00
Junwhan Ahn
b2eda97caf Replace the use of xla::ifrt::Array::Reshard() in JAX Python binding with xla::ifrt::Client::CopyArrays()
IFRT API now distinguishes reshard vs. copy, so this CL is reflecting such a semantics change to the JAX Python binding. Since pjit input sharding is relatively easy to batch, it was also rewritten to leverage the batched `CopyArrays` API.

PiperOrigin-RevId: 644048754
2024-06-17 11:43:36 -07:00
zoranjovanovic-ns
e6e20850e2 PR #13842: [ROCm] fixed build due to https://github.com/openxla/xla/commit/19c11
Imported from GitHub PR https://github.com/openxla/xla/pull/13842

…baa83f31e25a3f841cf41fa47a53e8ca161

@xla-rotation
https://github.com/openxla/xla/blob/main/xla/service/gpu/ir_emitter_triton_rocm.cc was partially updated according to the change, but there was missing include directive in it (because of which ROCm build is broken).
Copybara import of the project:

--
710f0acd1cb36381938a0b0322fc39d5f983f841 by Zoran Jovanovic <zjovanov@amd.com>:

[ROCm] fixed build due to 19c11baa83

Merging this change closes #13842

PiperOrigin-RevId: 644048524
2024-06-17 11:30:29 -07:00
A. Unique TensorFlower
59f14f10dd [Multi-host GPU] Build GpuTopology only by device ids when the topology is asymmetric.
PiperOrigin-RevId: 644048385
2024-06-17 11:22:13 -07:00
A. Unique TensorFlower
076a07d11c Remove unneeded dependency from quantize_weights.
PiperOrigin-RevId: 644043376
2024-06-17 11:14:01 -07:00
A. Unique TensorFlower
c8eac773e6 Remove unneeded dependency from tfl_quantizer.
PiperOrigin-RevId: 644043255
2024-06-17 10:45:59 -07:00
A. Unique TensorFlower
5ee3db654b Remove unneeded header from mlir_tflite_runner.
PiperOrigin-RevId: 644042893
2024-06-17 10:40:18 -07:00
A. Unique TensorFlower
8acac2b81a Remove unneeded dependency from quantize_model.
PiperOrigin-RevId: 644042409
2024-06-17 10:34:39 -07:00
A. Unique TensorFlower
d4669c0734 Remove unneeded dependency from sparsify_model.
PiperOrigin-RevId: 644041988
2024-06-17 10:28:40 -07:00
Arturo Schmidt
0f76d9fae8 Migrate tensorflow::ConvertMlirFunctionToFunctionLibraryDef to tensorflow::tf2xla::v2::ConvertMlirFunctionToFunctionLibraryDef. Functionality is unchanged, location of code is the only difference. Previous location is deleted.
PiperOrigin-RevId: 644041897
2024-06-17 10:08:57 -07:00
David Dunleavy
7557d352d0 Go back to old continuous build until L4 RBE is ready
PiperOrigin-RevId: 644035693
2024-06-17 10:01:14 -07:00
Penporn Koanantakool
a3d43ee9bd [oneDNN] Remove unused variable.
PiperOrigin-RevId: 644034495
2024-06-17 09:42:12 -07:00
Matthias Kramm
8b5f1d5f8a Add XlaSplitND and XlaConcatND ops to tf_generated_ops.td.
PiperOrigin-RevId: 644021276
2024-06-17 09:15:24 -07:00
Emilio Cota
337286957f [xla] add missing includes for absl::StrCat
PiperOrigin-RevId: 644016440
2024-06-17 09:09:52 -07:00
George Karpenkov
1cbaea7c4f [XLA] [NFC] Remove dead code
PiperOrigin-RevId: 644012964
2024-06-17 09:03:55 -07:00
Shraiysh
67dc709e2e PR #13190: Add pipelined while loop annotator
Imported from GitHub PR https://github.com/openxla/xla/pull/13190

This patch recognises pipelined while loop using the rotate right pattern.
It recognises a rotate-right pattern on sharded inputs and labels the
surrounding while loop as a pipelined while loop. This is an unsafe
optimization. This is hidden behind a debug flag so it wont be triggered in
TPU pipelines or other GPU pipelines unexpectedly.

Copybara import of the project:

--
b643b17a5d25d46e838a37fe87a4134b2cc512aa by Shraiysh Vaishay <svaishay@nvidia.com>:

Add pipelined while loop annotator

This patch recognises pipelined while loop using the rotate right pattern.
It recognises a rotate-right pattern on sharded inputs and labels the
surrounding while loop as a pipelined while loop. This is an unsafe
optimization. This is hidden behind a debug flag so it wont be triggered in
TPU pipelines or other GPU pipelines unexpectedly.

Merging this change closes #13190

PiperOrigin-RevId: 644012174
2024-06-17 08:57:54 -07:00
Emilio Cota
9ec6201b4f [tflite] add missing include for absl::StrCat
PiperOrigin-RevId: 644006957
2024-06-17 08:16:43 -07:00
pemeliya
3e0c870fe9 PR #13462: [ROCM][NFC] gpublas-lt refactoring after adding workspace and scratch allocator
Imported from GitHub PR https://github.com/openxla/xla/pull/13462

This PR https://github.com/openxla/xla/pull/11514 added workspace allocation to cublas-lt. Basically, it doubled the implementation of a number of functions in gpu/cu/hipblas-lt, i.e. now we have:

```
DoMatmul(..., std::optional<DeviceMemoryBase> workspace)
DoMatmul(..., std::optional<ScratchAllocator*> scratch_allocator)
DoMatmul(..., std::optional<DeviceMemoryBase> workspace, std::optional<ScratchAllocator*> scratch_allocator)
```
and the same holds for ```ExecuteOnStream```. This makes gpublas_lt interface barely readable.  The first two functions outlined above are just forwarding calls to the 3rd most generic one. Therefore, I do not see any need to implement these inside the derived classes, i.e. hip_blas_lt.h and cuda_blas_lt.h. Instead, forwarding can be handled in gpu_blas_lt.h interface.

@xla-rotation: could you please have a look?

Copybara import of the project:

--
6d3700a7b4141dee82a3b3f4d6be492a0a67d92b by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>:

refactoring

--
495b2cc7b5a4e944804acddc9abc9442d9cce32a by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>:

cuda side adaptions

--
4078221daebb8cb88faebe9423e87a1a781a765b by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>:

parameter fix

Merging this change closes #13462

PiperOrigin-RevId: 644003051
2024-06-17 08:10:34 -07:00
Alexander Belyaev
ecb3380d36 [XLA:GPU][MLIR-based emitters] Add more tests for thread indexing maps.
Minor clean-ups.

PiperOrigin-RevId: 644002369
2024-06-17 08:03:43 -07:00
Peter Hawkins
68da791239 [TSL] Remove apparently unnecessary "template" keywords that are yielding a clang warning.
PiperOrigin-RevId: 643994891
2024-06-17 07:43:42 -07:00
Emilio Cota
fc4cf5a631 [tsl] logging_test: test LOG/VLOG/VLOG_IS_ON and associated flags/envvars
We have substantial changes coming soon to TSL logging; add tests
to make sure we will not introduce regressions.

Also remove vmodule_test from tensorflow/core/platform since
it is now redundant. Besides, TSL is a more appropriate place for
these tests.

PiperOrigin-RevId: 643994337
2024-06-17 07:36:30 -07:00
Benjamin Chetioui
19259c3bea [XLA:GPU] Dissociate the legacy Triton support logic from the new one.
Now, in order to query support information about the legacy Triton emitters,
it is necessary to call functions in the `xla::gpu::legacy_triton` namespace.

This helps clarify our tests, and most notably allows us to evolve the new
(respectively old) Triton emitters without back- (respectively front-) porting
the logic to the old (respectively new) Triton emitters.

PiperOrigin-RevId: 643988589
2024-06-17 07:15:18 -07:00
Alexander Pivovarov
b5b8550ff0 PR #13771: Fix algebraic_simplifier for rsqrt
Imported from GitHub PR https://github.com/openxla/xla/pull/13771

Some algsimp patterns used `IsPositive()` function instead of using `IsNonNegative()` to check its operands.

As a result, the patterns did not work when the operand was >= 0. However, they should.

## Fixes:

### Fix for pattern `rsqrt(B) * rsqrt(B) => 1/B`
Issue: Pattern did not work for B >= 0, for example when B=abs(x).
Solution: Fixed by checking that B isNonNegative
Validation:
- If `B==0` the `result is inf`
- If `B>0` the `result > 0`
- If `B is inf` the `result is 0`
- If `B is nan` the `result is nan`

### Fix for pattern `rsqrt(pow(A, -2)) => A`
Issue: Pattern did not work for A >= 0, for example when A=abs(x).
Solution: Fixed by checking that A isNonNegative.
Validation (before and after the simplification):
- If `A==0` the result is `0`
- If `A>0` the `result > 0`
- If `A is inf` the `result is inf`
- If `A is nan` the `result is nan`

Additional fix: Since we know that A is non-negative we can use A directly without wrapping it with `abs()`.

### Fix for pattern `rsqrt(1/A)) => sqrt(A)`
Issue: Pattern did not work for A >= 0, for example when A=abs(x).
Solution: Fixed by checking that A isNonNegative
Validation (before and after the simplification):
- If `A==0` the result is `0`
- If `A>0` the `result > 0`
- If `A is inf` the `result is inf`
- If `A is nan` the `result is nan`
Copybara import of the project:

--
e0251aa875c38442834be6f93ae534e481d06e2a by Alexander Pivovarov <pivovaa@amazon.com>:

Fix algebraic_simplifier for rsqrt

Merging this change closes #13771

PiperOrigin-RevId: 643982420
2024-06-17 06:52:31 -07:00
A. Unique TensorFlower
c2c0ca8848 Integrate Triton up to 71b8d336c22e508ff0c37fc090da6a38adf09a11(71b8d336c2)
PiperOrigin-RevId: 643973562
2024-06-17 05:58:42 -07:00
pemeliya
33bc0a0d85 PR #13779: [ROCM] added memory management functions
Imported from GitHub PR https://github.com/openxla/xla/pull/13779

Here we enable XLA-managed workspace buffer for rocblas

@xla-rotation: would you please have a look ?
Copybara import of the project:

--
548701656df3a9e12bc6bae201113c5a7410f9b4 by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>:

added memory management functions

--
b01f6c03b0ef1b9f9588fa438fa278aac68d727b by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>:

addressing reviewer comments

--
3b96edbee38cbfcc30779f8820956c219a512da7 by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>:

beautified rocblas wrapper

--
d65b695a1444f59d89095031ac8e00e3e1556d61 by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>:

added explicit nullptr

Merging this change closes #13779

PiperOrigin-RevId: 643967546
2024-06-17 05:51:57 -07:00
Christian Sigg
30d6e3a6af Split SlowReduceWindow from hlo_evaluator_test into a separate target so that we can exclude it from tsan/asan/zapfhahn.
PiperOrigin-RevId: 643966476
2024-06-17 05:45:33 -07:00
Tamás Danyluk
bdc917fbf9 [XLA:GPU] Fix that Simplify returned false, even if it did simplify
I recently played with this code, and it was confusing to me that it said that it didn't simplify, but it did in fact.

PiperOrigin-RevId: 643965860
2024-06-17 05:10:51 -07:00
Alexander Lyashuk
53cd4fd5d0 [XLA:GPU] Make BufferComparator accept tolerance as a parameter
PiperOrigin-RevId: 643952464
2024-06-17 04:32:57 -07:00
Tamás Danyluk
a1eda34ca6 [XLA:GPU] Add and test != operator for IndexingMap-related types
PiperOrigin-RevId: 643948829
2024-06-17 04:25:01 -07:00
Penporn Koanantakool
e1d7ca5fc2 Add kDomain and kOptimizationBarrier as no-op opcodes for thunks. Move Constant op handling next to other ops that also emit empty thunks.
PiperOrigin-RevId: 643948540
2024-06-17 04:17:05 -07:00
George Karpenkov
d9c89b9884 [XLA] Remove ServiceInterface
The abstraction does not add much value over just using Service class.

PiperOrigin-RevId: 643946442
2024-06-17 03:56:52 -07:00
Greg Olechwierowicz
ceef8097a4 [XLA:GPU] Improve error message for missing costs/latencies in PGLE.
PiperOrigin-RevId: 643933487
2024-06-17 03:24:47 -07:00
pemeliya
0eea73a7ec PR #13722: [ROCM] rocBLAS: default algorithm fallback
Imported from GitHub PR https://github.com/openxla/xla/pull/13722

Some number types (like complex64) are generally supported by rocBLAS but the library does not provide any solutions for autotuning. In that case, we shall fallback to use the default solution (kDefaultAlgorithm).

Besides, I have also added workspace buffer provided by XLA runtime which previously was ignored by rocBLAS.

@xla-rotation: would you please have a look ?
Copybara import of the project:

--
57645a2bfb357b9d00f6b85ae3b0e77a4b00fb61 by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>:

use fallback default algoritm if no solutions are provided by the library

--
be8327f8e2e11f0eaf2383171340f69a23759260 by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>:

adding space

Merging this change closes #13722

PiperOrigin-RevId: 643925837
2024-06-17 03:04:29 -07:00
Dimitar (Mitko) Asenov
658f70625d Reverts changelist 578813627
PiperOrigin-RevId: 643924789
2024-06-17 02:48:45 -07:00
A. Unique TensorFlower
5cd6ae2cfe compat: Update forward compatibility horizon to 2024-06-17
PiperOrigin-RevId: 643922506
2024-06-17 02:41:22 -07:00