Commit Graph

166848 Commits

Author SHA1 Message Date
dependabot[bot]
8603f6ae0a
Bump certifi from 2024.6.2 to 2024.7.4
Bumps [certifi](https://github.com/certifi/python-certifi) from 2024.6.2 to 2024.7.4.
- [Commits](https://github.com/certifi/python-certifi/compare/2024.06.02...2024.07.04)

---
updated-dependencies:
- dependency-name: certifi
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-07-11 17:04:16 +00:00
Dragan Mladjenovic
05797a7187 PR #13879: [ROCm] Pin amdhip64 soversion in dso loader
Imported from GitHub PR https://github.com/openxla/xla/pull/13879

This prevents us accidentally loading a second copy of HIP runtime in local_config_rocm. Do similar for rocblas to guard against ABI break in rocm 6.0.

Merging this change closes #13879

PiperOrigin-RevId: 651388560
2024-07-11 09:57:58 -07:00
Adrian Kuegel
8ab1fc3e38 Add missing header include.
This is needed for M_LN2l
Without the include the build is failing on MacOS.

PiperOrigin-RevId: 651387236
2024-07-11 09:28:16 -07:00
Oleg Shyshkov
c56871e192 [XLA:GPU] Return early in RemoveUnusedSymbols/Dimensions.
`DetectUnusedVariables` can be expensive, but often we don't have symbols in the indexing map at all, so there is nothing to remove.

PiperOrigin-RevId: 651385393
2024-07-11 09:21:17 -07:00
A. Unique TensorFlower
c4fb138843 [XLA:GPU] Remove sparse pass from ROCm Triton emitter
PiperOrigin-RevId: 651379019
2024-07-11 09:14:22 -07:00
Victor Stone
1c7e37a219 Improve HloVerifier's error message when the size of minor-to-major and the size of dimensions mismatch.
PiperOrigin-RevId: 651378879
2024-07-11 08:54:18 -07:00
pemeliya
ca4c83e3d3 PR #13479: [ROCM] adding new ROCM-6.2 features: hipGetFuncBySymbol and error codes
Imported from GitHub PR https://github.com/openxla/xla/pull/13479

In this PR we enable some new rocm-6.2 features: mainly the missing **hipGetFuncBySymbol** in rocm_runtime, so that we had to the workaround. This affects only rocm-specific files.

@xla-rotation: could you have a look please ?
Copybara import of the project:

--
bcd2b2341887d305583161a592c23750f5ee584c by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>:

adding new ROCM-6.2 features

--
3eb5aa9c69e8905d9f408ab2a141130084670d29 by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>:

solving conflicts after rebase

--
09938d6c6b9a358e6571fb13acdc1623fe205a4e by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>:

added blas get_version test

--
215b92dddd440f5bacee8d7f678e1f5138761e00 by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>:

added runtime_version from DeviceDescription

Merging this change closes #13479

PiperOrigin-RevId: 651378287
2024-07-11 08:46:58 -07:00
Oleg Shyshkov
a4c7375c85 [XLA:GPU] Use llvm::SmallVector instead of std::vector.
PiperOrigin-RevId: 651371534
2024-07-11 08:39:25 -07:00
A. Unique TensorFlower
ac2fd97dc6 Automated Code Change
PiperOrigin-RevId: 651370675
2024-07-11 08:32:24 -07:00
Tori Baker
483085cdf7 Add gpu.thread_id conversion to nvvm after sparse dot lowering
We already converted triton gpu dialect to nvvm in TritonGPUTOLLVMPass but since we need to lower SparseDot afterwards and we generate a gpu.thread_id in the lowering, add a pattern to also convert that to nvvm.

PiperOrigin-RevId: 651369703
2024-07-11 08:23:16 -07:00
A. Unique TensorFlower
070cc89fd3 Automated Code Change
PiperOrigin-RevId: 651368824
2024-07-11 08:16:19 -07:00
A. Unique TensorFlower
d24e5e9c17 Automated Code Change
PiperOrigin-RevId: 651368594
2024-07-11 08:09:28 -07:00
A. Unique TensorFlower
bf622e8572 Automated Code Change
PiperOrigin-RevId: 651368231
2024-07-11 08:02:11 -07:00
A. Unique TensorFlower
3cb9c1c060 Automated Code Change
PiperOrigin-RevId: 651366444
2024-07-11 07:51:45 -07:00
A. Unique TensorFlower
a404eb3239 Automated Code Change
PiperOrigin-RevId: 651365254
2024-07-11 07:44:43 -07:00
Christian Sigg
a892116f56 [XLA:GPU] Move SparseWGMMAOpPattern from Triton to OpenXLA.
PiperOrigin-RevId: 651361331
2024-07-11 07:37:34 -07:00
Sergey Kozub
016a0a596d PR #14796: Fix gemm_fusion_autotuner_test on Hopper
Imported from GitHub PR https://github.com/openxla/xla/pull/14796

Updated result type and error thresholds for the SelectsSplitK test.
Previously this failed on Hopper.
Copybara import of the project:

--
5005f288b67a2a34ec643cfcc3fbae815b5f0ef6 by Sergey Kozub <skozub@nvidia.com>:

Fix gemm_fusion_autotuner_test on Hopper

Merging this change closes #14796

PiperOrigin-RevId: 651359673
2024-07-11 07:30:45 -07:00
A. Unique TensorFlower
9aea579e75 Automated Code Change
PiperOrigin-RevId: 651358878
2024-07-11 07:21:42 -07:00
A. Unique TensorFlower
f8975ea946 Automated Code Change
PiperOrigin-RevId: 651355024
2024-07-11 07:14:47 -07:00
A. Unique TensorFlower
53184cf551 Automated Code Change
PiperOrigin-RevId: 651353992
2024-07-11 07:07:51 -07:00
A. Unique TensorFlower
068d60380f Automated Code Change
PiperOrigin-RevId: 651352887
2024-07-11 07:00:55 -07:00
Adrian Kuegel
7a0ab26bce Mark gloo_collectives_test with tag nomac.
gloo is not supported on MacOS.

PiperOrigin-RevId: 651352354
2024-07-11 06:40:41 -07:00
A. Unique TensorFlower
a590708f67 Automated Code Change
PiperOrigin-RevId: 651350115
2024-07-11 06:30:21 -07:00
A. Unique TensorFlower
ad220e4377 Automated Code Change
PiperOrigin-RevId: 651347794
2024-07-11 06:23:49 -07:00
A. Unique TensorFlower
29e6d860b0 Automated Code Change
PiperOrigin-RevId: 651347343
2024-07-11 06:17:10 -07:00
Oleg Shyshkov
5479bd7ebc [XLA:GPU] Replace block_id_to_tile_offsets_indexing with N-d tile_offsets_indexing map.
Currently we compute an indexing map from 1-d block_id to N-d tile offset for each TiledHloInstruction. We use that indexing map to deduplicate identical tiles. To get the map we compute delinearization of block_id in SymbolicTileAnalysis.

Composition and simplification of `block_id_to_tile_offsets_indexing` is actually very computationally intensive, because it the expression has a lot of mods and floordivs from delinearization. This is not necessary for out purposes.

After this change, `TiledHloComputation` will have N-d to M-d map from N-d tile indexing into M-d tile offsets of the instruction. This way expressions in the map
are much smaller and easier to simplify (see changes in symbolic_tile_analysis_test).

This change has an additional benefit that we don't enforce 1-d launch grid at the early stage.

PiperOrigin-RevId: 651344451
2024-07-11 06:10:20 -07:00
A. Unique TensorFlower
a53acb0d0a Automated Code Change
PiperOrigin-RevId: 651343668
2024-07-11 05:57:06 -07:00
A. Unique TensorFlower
88f418747f Automated Code Change
PiperOrigin-RevId: 651343567
2024-07-11 05:51:28 -07:00
A. Unique TensorFlower
1e2db29600 Automated Code Change
PiperOrigin-RevId: 651343129
2024-07-11 05:45:49 -07:00
A. Unique TensorFlower
ba76840342 Integrate LLVM at llvm/llvm-project@694b132177
Updates LLVM usage to match
[694b132177a9](https://github.com/llvm/llvm-project/commit/694b132177a9)

PiperOrigin-RevId: 651340955
2024-07-11 05:40:07 -07:00
Benjamin Chetioui
1028724c1f [XLA:GPU][NFC] Delete path to reduce from legacy Triton emitter.
Reductions only arise when using the new Triton generic emitter now.

PiperOrigin-RevId: 651335997
2024-07-11 04:54:44 -07:00
A. Unique TensorFlower
8116978a26 Automated Code Change
PiperOrigin-RevId: 651334675
2024-07-11 04:49:20 -07:00
A. Unique TensorFlower
da6b16843e Automated Code Change
PiperOrigin-RevId: 651334435
2024-07-11 04:36:47 -07:00
Chao
fcfd6083cb PR #14792: [ROCM ] hotfix ROCm build
Imported from GitHub PR https://github.com/openxla/xla/pull/14792

related rocm part change is missing and internal CL is merged without check due to this c40dbf2b3c

@xla-rotation @gflegar @beckerhe

Thanks in advance!

Copybara import of the project:

--
0f4236ca8a3767666ce03713fd7ae9e4d1254e5c by Chao Chen <cchen104@amd.com>:

fixed build due to c40dbf2b3c

Merging this change closes #14792

PiperOrigin-RevId: 651333429
2024-07-11 04:29:04 -07:00
Johannes Reifferscheid
8ec15da58d Simplifier optimizations.
- minimize storage uniquer invocations
- don't allocate std::functions
- don't put symbol and dims ranges in dense map in RangeEvaluator,
  also don't put them in a vector first.

After this, the biggest thing left to to is to remove the MLIR simplifier,
which is now responsible for 2/3 or so of the runtime of simplify.

PiperOrigin-RevId: 651330275
2024-07-11 04:21:40 -07:00
Benjamin Chetioui
c6a85cf3f5 [XLA:GPU] Add support for multidimensional tiles in Triton reduction lowering rule.
PiperOrigin-RevId: 651327801
2024-07-11 04:12:27 -07:00
A. Unique TensorFlower
64f9e5f814 Automated Code Change
PiperOrigin-RevId: 651327215
2024-07-11 04:05:44 -07:00
A. Unique TensorFlower
61de8c3d40 compat: Update forward compatibility horizon to 2024-07-11
PiperOrigin-RevId: 651323797
2024-07-11 03:58:35 -07:00
A. Unique TensorFlower
4ab30daf71 Update GraphDef version to 1920.
PiperOrigin-RevId: 651323780
2024-07-11 03:51:33 -07:00
Adrian Kuegel
366a25f834 Fix remaining build issues in the gpu directory for non-gpu builds.
PiperOrigin-RevId: 651319915
2024-07-11 03:40:04 -07:00
Henning Becker
3b0732c5c4 Add support for libnvjitlink
This is preparing the CUDA backend for linking and compiling with libnvjitlink. The plan is to replace ptxas and nvlink command line tools eventually.

This change is so far only adding a function `CompileAndLinkUsingLibNvJitLink`, but it's not yet being used (outside of the corresponding unit tests).

PiperOrigin-RevId: 651319016
2024-07-11 03:33:34 -07:00
Alexander Belyaev
9d353fbe06 [XLA:GPU][MLIR-based emitter] Set unroll factor to 1 for scatters.
PiperOrigin-RevId: 651318862
2024-07-11 03:27:00 -07:00
Benjamin Chetioui
9b84e02372 [XLA:GPU][NFC] Refactor code to reduce SoftmaxRewriterTriton::FindAllFusibleDiamondChains's complexity.
PiperOrigin-RevId: 651318662
2024-07-11 03:20:19 -07:00
Johannes Reifferscheid
2b29dd5fe0 Reduce number of simplify calls in multi-result affine map simplifier.
Currently, we rerun the simplifier for all results, even when only one changes.
Also, we rerun our simplifier in the last round (when the upstream simplifier
does not find any more changes), but it's not necessary, since Simplify is
idempotent.

PiperOrigin-RevId: 651317828
2024-07-11 03:01:27 -07:00
A. Unique TensorFlower
33bcc06d0f Automated Code Change
PiperOrigin-RevId: 651307920
2024-07-11 02:54:16 -07:00
Shaogang Wang
5981600814 PR #14725: [XLA:GPU] Lowering FusedMHABackward thunk to command buffer
Imported from GitHub PR https://github.com/openxla/xla/pull/14725

This PR lowers FusedMHABackwardThunk into command buffer, the command buffer lowering knob is DebugOptions::CUDNN.
Copybara import of the project:

--
ff9156f57569cb5e88a4671a110365e79c9f857f by Shawn Wang <shawnw@nvidia.com>:

support lowering fusedMHABackward to command buffer

--
83ddf0cbadf5f0f9513c67e7bbdd7ecea4f3404c by Shawn Wang <shawnw@nvidia.com>:

fix rebase conflicts

--
9dd82651d0434beab21bed16ab2edea06611f8a0 by Shawn Wang <shawnw@nvidia.com>:

remove duplicated inclusion

Merging this change closes #14725

PiperOrigin-RevId: 651306567
2024-07-11 02:41:00 -07:00
Adrian Kuegel
3b7b99c1b3 Avoid compile errors in builds without GPU configured.
Currently, triton_test_util depends on ir_emitter_triton unconditionally, but
ir_emitter_triton only gives access to the ir_emitter_triton.h header in builds
with a GPU configured.
We can make the ir_emitter_triton.h header available in all builds if we add a
stub implementation that returns errors.

PiperOrigin-RevId: 651303559
2024-07-11 02:32:40 -07:00
A. Unique TensorFlower
535e270f4c Automated Code Change
PiperOrigin-RevId: 651296179
2024-07-11 02:24:59 -07:00
A. Unique TensorFlower
63a7d95df2 Automated Code Change
PiperOrigin-RevId: 651294944
2024-07-11 02:14:58 -07:00
A. Unique TensorFlower
1384ec54a7 Automated Code Change
PiperOrigin-RevId: 651294385
2024-07-11 02:07:58 -07:00