tensorflow

mirror of https://github.com/tensorflow/tensorflow.git synced 2024-11-22 05:33:37 +00:00

Author	SHA1	Message	Date
A. Unique TensorFlower	36304aadbd	Add RBE toolchains for Clang on Windows. PiperOrigin-RevId: 627830607	2024-04-24 13:50:35 -07:00
A. Unique TensorFlower	a9eaf69885	Fix float16 quantization for LSTM ops This CL fixes 2 issues: 1. in Quantize pass, for float16 quant, the pass replaces the original op with an identical one. This causes infinite loop for lstm ops. This CL adds checks to avoid such situations. 2. in SplitMergedOperands pass, const stateful operands are duplicated. But for float16 quantization, stateful operands have const->dequantize pattern. This CL add duplication for such pattern. PiperOrigin-RevId: 627826212	2024-04-24 13:30:06 -07:00
A. Unique TensorFlower	c2059d7396	Uses intervals instead of a liveness matrix to construct all memory constraints. PiperOrigin-RevId: 627825584	2024-04-24 13:23:12 -07:00
A. Unique TensorFlower	80d76b8af0	Add `IsPredeterminedError` to annotate an error buffer which doesn't have any events on the device. PiperOrigin-RevId: 627822828	2024-04-24 13:09:14 -07:00
Eugene Zhulenev	ab771c1914	[tsl:concurrency] Disable ambiguous AndThen callback for AsyncValueRef<absl::Status> PiperOrigin-RevId: 627814960	2024-04-24 12:49:21 -07:00
Alexander Belyaev	71d9dd0f06	[XLA:GPU][MLIR-based emitters] Add verifier for apply_indexing op. PiperOrigin-RevId: 627807819	2024-04-24 12:42:18 -07:00
Roman Dzhabarov	f9e60c7763	Reverts `21f6e08a3f` PiperOrigin-RevId: 627800352	2024-04-24 12:07:44 -07:00
Anastasia Petrushkina	e30abe681f	Reverts `d9417b25e8` PiperOrigin-RevId: 627795237	2024-04-24 11:52:57 -07:00
A. Unique TensorFlower	f497887d7d	Defines an alternate interface for the Memory Term Reducer that uses primitive intervals instead of liveness matrices. PiperOrigin-RevId: 627790657	2024-04-24 11:46:30 -07:00
David Dunleavy	5371a1b2f0	Move tsl/concurrency to xla/tsl/concurrency PiperOrigin-RevId: 627785092	2024-04-24 11:32:05 -07:00
Christian Sigg	8aef6ef46c	Fix execution engine after `7da63426ac`. PiperOrigin-RevId: 627782515	2024-04-24 11:10:10 -07:00
Changhui Lin	5ea9fc1642	Update GpuTopologyProto to include multi-host information. PiperOrigin-RevId: 627775502	2024-04-24 10:56:09 -07:00
David Dunleavy	a8b703351d	Remove unused `requires-gpu` tag from `tf_gpu_test_tags` PiperOrigin-RevId: 627774282	2024-04-24 10:51:41 -07:00
mmakevic-amd	7b201a9d70	PR #11784 : [ROCm] Add Dropout support based on MIOpen API Imported from GitHub PR https://github.com/openxla/xla/pull/11784 Copybara import of the project: -- fc334e480802d2686907ff720a97697e51177690 by mmakevic <Milica.Makevic@amd.com>: Add dropout descriptor for rocm -- 52cf3ad49e70b3ecf632d8bf83283a068fbb58d0 by mmakevic <Milica.Makevic@amd.com>: Use SetRNNDescriptor version 2 Merging this change closes #11784 PiperOrigin-RevId: 627770243	2024-04-24 10:39:54 -07:00
Johannes Reifferscheid	b46d184d9d	Remove unnecessary epilogue restrictions and corner cases. - Allow injected values to be roots of epilogues - Allow n-ary values to be injected - Always generate an epilogue function if it's requested. - Simplify EmitEpilogue interface to hide indexes and calling convention details from the caller. This simplifies the logic in the reduction emitter somewhat. PiperOrigin-RevId: 627770230	2024-04-24 10:26:17 -07:00
Deqiang Chen	dadbf6d47e	Allow XlaHostComputeOp to the allowed list in mark_ops_for_outside_compilation.cc PiperOrigin-RevId: 627746297	2024-04-24 09:22:32 -07:00
A. Unique TensorFlower	51a4843aff	Allow for short-circuiting optimization in ShapeUtil::EqualStructure. Note that operator& doesn't do short-circuiting, unlike operator&&. PiperOrigin-RevId: 627745915	2024-04-24 09:07:42 -07:00
A. Unique TensorFlower	da8b29341c	Don't try to patch pywrap_calibrator.so, as it was removed. Follow-up to `c8ee776269` PiperOrigin-RevId: 627732799	2024-04-24 08:18:03 -07:00
A. Unique TensorFlower	f484927771	BUILD visibility change only PiperOrigin-RevId: 627726838	2024-04-24 08:04:28 -07:00
Alan Kelly	75c816d7fd	FC bias is per channel quantized when per channel quantization is used PiperOrigin-RevId: 627725365	2024-04-24 07:52:24 -07:00
Quentin Khan	ff20b3c8c0	#shlo_ref Fix OSS. - Use the correct absl::BitGen. - Fix the definition of `std::common_type` for `shlo_ref::F16` - Fix building `shlo_ref::F16` from values that can be converted to `float`. PiperOrigin-RevId: 627723091	2024-04-24 07:38:16 -07:00
A. Unique TensorFlower	cc32f277ca	Integrate LLVM at llvm/llvm-project@b3ca9c30de Updates LLVM usage to match [b3ca9c30dedf](https://github.com/llvm/llvm-project/commit/b3ca9c30dedf) PiperOrigin-RevId: 627709586	2024-04-24 06:38:54 -07:00
Adrian Kuegel	4cfb194769	Small code simplifications in hlo_instructions.cc (NFC) PiperOrigin-RevId: 627700209	2024-04-24 06:03:01 -07:00
Sandeep Dasgupta	d070b7c9ab	Tool to convert TF SavedModel to StableHLO Here is the signature of the provide API: ```c++ // Converts a TensorFlow model (either from a SavedModel or an MLIR module) to a // StableHLO MLIR module. // // Args: // input_path: The path to the input TensorFlow SavedModel or MLIR module. // context: The MLIR context to use for parsing or creating the MLIR module. // exported_model_signatures: A comma-separated list of exported model // signatures (functions) to convert. // tag_names: A comma-separated list of tag names used for loading SavedModel. // input_arg_shapes_str: A string representation of input argument shapes. // Shapes for different tensors are separated by ':', and dimension sizes for // the same tensor are separated by ','. For example, // 'input-arg-shapes=1,2::1,?' expresses input arguments with shapes [1,2], // [] and [1,?]. // is_input_mlir_module: If true, `input_path` is treated as an MLIR // module instead of a SavedModel. // // Returns: // An absl::StatusOr containing the converted StableHLO MLIR module on // success, or an absl::Status with an error message on failure. absl::StatusOr<OwningOpRef<ModuleOp>> TfToStablehlo( absl::string_view input_path, MLIRContext* context, absl::string_view exported_model_signatures, absl::string_view tag_names, absl::string_view input_arg_shapes_str, bool is_input_mlir_module = false); ``` PiperOrigin-RevId: 627698652	2024-04-24 05:49:45 -07:00
A. Unique TensorFlower	1f1cba4df5	Automated Code Change PiperOrigin-RevId: 627677804	2024-04-24 04:08:14 -07:00
A. Unique TensorFlower	908dadf58c	Automated Code Change PiperOrigin-RevId: 627674518	2024-04-24 03:51:56 -07:00
Lu Teng	1eb72c151a	PR #11306 : [XLA:GPU] Fix not unique name issue in sanitize constant pass Imported from GitHub PR https://github.com/openxla/xla/pull/11306 Fix `Instruction name is not unique` error reported by JAX UT [`ShardMapTest.test_matmul_reduce_scatter`](https://github.com/google/jax/blob/jaxlib-v0.4.24/tests/shard_map_test.py#L149-L162) in XLA:GPU. ### Background Error message: ``` !ContainsKey(instruction_names, instruction->name()) Instruction name is not unique: param_1 ``` This error is reported after `PrepareHloModuleForIrEmitting()`. The non-unique name `param_1` is generated from 2 different passes `GpuSanitizeConstantNames` and `FusionWrapper`. The related HLO changes are as follows: 1. Original HLO, there's no `param_1` and only got `param_0` in `async_computation`: ```ll %main.19_spmd (param: s32[4,4], param.1: s32[4,8]) -> s32[2,8] { %param = s32[4,4]{1,0} parameter(0), sharding={devices=[2,2]<=[4]}, metadata={op_name="jit(fwd)/jit(main)/shard_map[mesh=Mesh(\'x\': 2, \'y\': 2) in_names=({0: (\'x\',), 1: (\'y\',)}, {0: (\'x\',)}) out_names=({0: (\'x\', \'y\')},) check_rep=True rewrite=True auto=frozenset()]" source_file="/home/sdp/tenglu/intel-jax/tests/shard_map_test.py" source_line=161} %param.1 = s32[4,8]{1,0} parameter(1), sharding={devices=[2,1,2]<=[4] last_tile_dim_replicate}, metadata={op_name="jit(fwd)/jit(main)/shard_map[mesh=Mesh(\'x\': 2, \'y\': 2) in_names=({0: (\'x\',), 1: (\'y\',)}, {0: (\'x\',)}) out_names=({0: (\'x\', \'y\')},) check_rep=True rewrite=True auto=frozenset()]" source_file="/home/sdp/tenglu/intel-jax/tests/shard_map_test.py" source_line=161} %dot.1 = s32[4,8]{1,0} dot(s32[4,4]{1,0} %param, s32[4,8]{1,0} %param.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, metadata={op_name="jit(fwd)/jit(main)/jit(shmap_body)/dot_general[dimension_numbers=(((1,), (0,)), ((), ())) precision=None preferred_element_type=int32]" source_file="/home/sdp/tenglu/intel-jax/tests/shard_map_test.py" source_line=158} %reduce-scatter-start = ((s32[4,8]{1,0}), s32[2,8]{1,0}) reduce-scatter-start(s32[4,8]{1,0} %dot.1), channel_id=1, replica_groups={{0,1},{2,3}}, use_global_device_ids=true, dimensions={0}, to_apply=%region_0.7, metadata={op_name="jit(fwd)/jit(main)/jit(shmap_body)/reduce_scatter[axis_name=y scatter_dimension=0 axis_index_groups=None axis_size=2 tiled=True]" source_file="/home/sdp/tenglu/intel-jax/tests/shard_map_test.py" source_line=159}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"collective_backend_config":{"is_sync":false,"no_parallel_custom_call":false}} ROOT %reduce-scatter-done = s32[2,8]{1,0} reduce-scatter-done(((s32[4,8]{1,0}), s32[2,8]{1,0}) %reduce-scatter-start), metadata={op_name="jit(fwd)/jit(main)/jit(shmap_body)/reduce_scatter[axis_name=y scatter_dimension=0 axis_index_groups=None axis_size=2 tiled=True]" source_file="/home/sdp/tenglu/intel-jax/tests/shard_map_test.py" source_line=159} } %async_computation (param_0: s32[4,8]) -> s32[2,8] { %param_0 = s32[4,8]{1,0} parameter(0) ROOT %reduce-scatter.2 = s32[2,8]{1,0} reduce-scatter(s32[4,8]{1,0} %param_0), channel_id=1, replica_groups={{0,1},{2,3}}, use_global_device_ids=true, dimensions={0}, to_apply=%region_0.7, metadata={op_name="jit(fwd)/jit(main)/jit(shmap_body)/reduce_scatter[axis_name=y scatter_dimension=0 axis_index_groups=None axis_size=2 tiled=True]" source_file="/home/sdp/tenglu/intel-jax/tests/shard_map_test.py" source_line=159} } ``` 2. `param_0` was changed to `param_1` after `GpuSanitizeConstantNames` pass: ```ll %main.19_spmd (param: s32[4,4], param.1: s32[4,8]) -> s32[2,8] { %param = s32[4,4]{1,0} parameter(0), sharding={devices=[2,2]<=[4]}, metadata={op_name="jit(fwd)/jit(main)/shard_map[mesh=Mesh(\'x\': 2, \'y\': 2) in_names=({0: (\'x\',), 1: (\'y\',)}, {0: (\'x\',)}) out_names=({0: (\'x\', \'y\')},) check_rep=True rewrite=True auto=frozenset()]" source_file="/home/sdp/tenglu/intel-jax/tests/shard_map_test.py" source_line=161} %param.1 = s32[4,8]{1,0} parameter(1), sharding={devices=[2,1,2]<=[4] last_tile_dim_replicate}, metadata={op_name="jit(fwd)/jit(main)/shard_map[mesh=Mesh(\'x\': 2, \'y\': 2) in_names=({0: (\'x\',), 1: (\'y\',)}, {0: (\'x\',)}) out_names=({0: (\'x\', \'y\')},) check_rep=True rewrite=True auto=frozenset()]" source_file="/home/sdp/tenglu/intel-jax/tests/shard_map_test.py" source_line=161} %dot.1 = s32[4,8]{1,0} dot(s32[4,4]{1,0} %param, s32[4,8]{1,0} %param.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, metadata={op_name="jit(fwd)/jit(main)/jit(shmap_body)/dot_general[dimension_numbers=(((1,), (0,)), ((), ())) precision=None preferred_element_type=int32]" source_file="/home/sdp/tenglu/intel-jax/tests/shard_map_test.py" source_line=158} %reduce-scatter-start = ((s32[4,8]{1,0}), s32[2,8]{1,0}) reduce-scatter-start(s32[4,8]{1,0} %dot.1), channel_id=1, replica_groups={{0,1},{2,3}}, use_global_device_ids=true, dimensions={0}, to_apply=%region_0.7, metadata={op_name="jit(fwd)/jit(main)/jit(shmap_body)/reduce_scatter[axis_name=y scatter_dimension=0 axis_index_groups=None axis_size=2 tiled=True]" source_file="/home/sdp/tenglu/intel-jax/tests/shard_map_test.py" source_line=159}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"collective_backend_config":{"is_sync":false,"no_parallel_custom_call":false}} ROOT %reduce-scatter-done = s32[2,8]{1,0} reduce-scatter-done(((s32[4,8]{1,0}), s32[2,8]{1,0}) %reduce-scatter-start), metadata={op_name="jit(fwd)/jit(main)/jit(shmap_body)/reduce_scatter[axis_name=y scatter_dimension=0 axis_index_groups=None axis_size=2 tiled=True]" source_file="/home/sdp/tenglu/intel-jax/tests/shard_map_test.py" source_line=159} } %async_computation (param_1: s32[4,8]) -> s32[2,8] { %param_1 = s32[4,8]{1,0} parameter(0) ROOT %reduce-scatter.2 = s32[2,8]{1,0} reduce-scatter(s32[4,8]{1,0} %param_1), channel_id=1, replica_groups={{0,1},{2,3}}, use_global_device_ids=true, dimensions={0}, to_apply=%region_0.7, metadata={op_name="jit(fwd)/jit(main)/jit(shmap_body)/reduce_scatter[axis_name=y scatter_dimension=0 axis_index_groups=None axis_size=2 tiled=True]" source_file="/home/sdp/tenglu/intel-jax/tests/shard_map_test.py" source_line=159} } ``` 3. Another `param_1` was generated after `FusionWrapper` pass: ```ll %async_computation (param_1: s32[4,8]) -> s32[2,8] { %param_1 = s32[4,8]{1,0} parameter(0) ROOT %reduce-scatter.2 = s32[2,8]{1,0} reduce-scatter(s32[4,8]{1,0} %param_1), channel_id=1, replica_groups={{0,2},{1,3},{4,6},{5,7}}, use_global_device_ids=true, dimensions={0}, to_apply=%region_0.7, metadata={op_name="jit(fwd)/jit(main)/jit(shmap_body)/reduce_scatter[axis_name=y scatter_dimension=0 axis_index_groups=None axis_size=2 tiled=True]" source_file="/home/sdp/tenglu/intel-jax/tests/shard_map_test.py" source_line=159} } %wrapped_dot_computation (param_0.1: s32[4,4], param_1: s32[4,8]) -> s32[4,8] { %param_0.1 = s32[4,4]{1,0} parameter(0) %param_1 = s32[4,8]{1,0} parameter(1) ROOT %dot.2 = s32[4,8]{1,0} dot(s32[4,4]{1,0} %param_0.1, s32[4,8]{1,0} %param_1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, metadata={op_name="jit(fwd)/jit(main)/jit(shmap_body)/dot_general[dimension_numbers=(((1,), (0,)), ((), ())) precision=None preferred_element_type=int32]" source_file="/home/sdp/tenglu/intel-jax/tests/shard_map_test.py" source_line=158} } %main.19_spmd (param: s32[4,4], param.1: s32[4,8]) -> s32[2,8] { %param = s32[4,4]{1,0} parameter(0), sharding={devices=[2,2]<=[4]}, metadata={op_name="jit(fwd)/jit(main)/shard_map[mesh=Mesh(\'x\': 2, \'y\': 2) in_names=({0: (\'x\',), 1: (\'y\',)}, {0: (\'x\',)}) out_names=({0: (\'x\', \'y\')},) check_rep=True rewrite=True auto=frozenset()]" source_file="/home/sdp/tenglu/intel-jax/tests/shard_map_test.py" source_line=161} %param.1 = s32[4,8]{1,0} parameter(1), sharding={devices=[2,1,2]<=[4] last_tile_dim_replicate}, metadata={op_name="jit(fwd)/jit(main)/shard_map[mesh=Mesh(\'x\': 2, \'y\': 2) in_names=({0: (\'x\',), 1: (\'y\',)}, {0: (\'x\',)}) out_names=({0: (\'x\', \'y\')},) check_rep=True rewrite=True auto=frozenset()]" source_file="/home/sdp/tenglu/intel-jax/tests/shard_map_test.py" source_line=161} %dot.1 = s32[4,8]{1,0} dot(s32[4,4]{1,0} %param, s32[4,8]{1,0} %param.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, metadata={op_name="jit(fwd)/jit(main)/jit(shmap_body)/dot_general[dimension_numbers=(((1,), (0,)), ((), ())) precision=None preferred_element_type=int32]" source_file="/home/sdp/tenglu/intel-jax/tests/shard_map_test.py" source_line=158} %reduce-scatter-start = ((s32[4,8]{1,0}), s32[2,8]{1,0}) reduce-scatter-start(s32[4,8]{1,0} %dot.1), channel_id=1, replica_groups={{0,1},{2,3}}, use_global_device_ids=true, dimensions={0}, to_apply=%region_0.7, metadata={op_name="jit(fwd)/jit(main)/jit(shmap_body)/reduce_scatter[axis_name=y scatter_dimension=0 axis_index_groups=None axis_size=2 tiled=True]" source_file="/home/sdp/tenglu/intel-jax/tests/shard_map_test.py" source_line=159}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"collective_backend_config":{"is_sync":false,"no_parallel_custom_call":false}} ROOT %reduce-scatter-done = s32[2,8]{1,0} reduce-scatter-done(((s32[4,8]{1,0}), s32[2,8]{1,0}) %reduce-scatter-start), metadata={op_name="jit(fwd)/jit(main)/jit(shmap_body)/reduce_scatter[axis_name=y scatter_dimension=0 axis_index_groups=None axis_size=2 tiled=True]" source_file="/home/sdp/tenglu/intel-jax/tests/shard_map_test.py" source_line=159} } ``` ### Root cause `GpuSanitizeConstantNames` runs before `FusionWrapper` and may change HLO instruction name by a [local name uniquer](https://github.com/openxla/xla/blob/main/xla/service/gpu/gpu_sanitize_constant_names.cc#L37). There're 2 issues here: * The original HLO instruction name may be changed unexpectedly even though [the pass does not intend to do so](https://github.com/openxla/xla/blob/main/xla/service/gpu/gpu_sanitize_constant_names.cc#L50-L51). * The global name uniquer of HLO module isn't aware of this change and may wrongly assign the same name to another HLO instruction in other passes, e.g. `FusionWrapper`. `param_0` is changed to `param_1` by `GpuSanitizeConstantNames` unexpectedly. ### Solution Only record HLO instruction names in the local name uniquer of `GpuSanitizeConstantNames` and do not change the original names. It exactly follows the pass design. Copybara import of the project: -- c39bc256957db681cfe1706fb716986da89edf3a by Lu Teng <teng.lu@intel.com>: Fix not unique name issue in sanitize constant pass. Merging this change closes #11306 PiperOrigin-RevId: 627661920	2024-04-24 02:52:42 -07:00
A. Unique TensorFlower	5e65cdf3d0	Update GraphDef version to 1842. PiperOrigin-RevId: 627652913	2024-04-24 02:24:51 -07:00
A. Unique TensorFlower	b2d1ffadd6	compat: Update forward compatibility horizon to 2024-04-24 PiperOrigin-RevId: 627652860	2024-04-24 02:11:11 -07:00
A. Unique TensorFlower	9fab96b5ab	Automated Code Change PiperOrigin-RevId: 627644782	2024-04-24 01:51:36 -07:00
Harsha H S	427a14ea52	PR #11313 : Use CHECK-DAG to validate low and high padding mlir Imported from GitHub PR https://github.com/openxla/xla/pull/11313 Copybara import of the project: -- 98ee3b456f62a9404bfaf976ab9a2d25415293cc by Harsha HS <harsha.havanurshamsundara@amd.com>: Distinguish order of padding operation between CUDA and ROCm -- 6f318827a4f2f31938349e24d292966fc6b95b97 by Harsha HS <harsha.havanurshamsundara@amd.com>: Use CHECK-DAG to check for low and high padding mlir -- 8a3da357c20a865dc91f090d6e600ade492c3e7f by Harsha HS <harsha.havanurshamsundara@amd.com>: Add -no_gpu tag to the test, so that it is picked by CI -- f13a4003aedb5a005c3b0668c474da30b46fe5e2 by Harsha HS <harsha.havanurshamsundara@amd.com>: Revert "Add -no_gpu tag to the test, so that it is picked by CI" This reverts commit 8a3da357c20a865dc91f090d6e600ade492c3e7f. -- 9fca062f688dca198e232b1cfcd04eba3a8d99d2 by Harsha HS <harsha.havanurshamsundara@amd.com>: Remove tags as it is covered by CPU CI Merging this change closes #11313 PiperOrigin-RevId: 627638965	2024-04-24 01:44:31 -07:00
A. Unique TensorFlower	09e58b5c60	Automated Code Change PiperOrigin-RevId: 627637071	2024-04-24 01:30:24 -07:00
A. Unique TensorFlower	193db149b6	Automated Code Change PiperOrigin-RevId: 627636880	2024-04-24 01:04:05 -07:00
A. Unique TensorFlower	c121a459d8	Automated Code Change PiperOrigin-RevId: 627627484	2024-04-24 00:50:19 -07:00
Trevor Morris	46e1b15508	PR #11761 : [GPU] Support nccl comm splitting in multiprocess mode Imported from GitHub PR https://github.com/openxla/xla/pull/11761 Currently `--xla_gpu_enable_nccl_comm_splitting` is a no-op unless using single process mode. This PR allows it to work in multiprocess mode by removing the IsLocal check and fixing the key in the rank -> comm map which was causing the following error: ``` 7: Communicator for rank 1 not found in a NCCL clique devices=[3,7]; stream=0 5: E0403 15:37:24.104311 3871038 pjrt_stream_executor_client.cc:2809] Execution of replica 0 failed: INTERNAL: Communicator for rank 1 not found in a NCCL clique devices=[1,5]; stream=0 6: E0403 15:37:24.104477 3870182 pjrt_stream_executor_client.cc:2809] Execution of replica 0 failed: INTERNAL: Communicator for rank 1 not found in a NCCL clique devices=[2,6]; stream=0 4: E0403 15:37:24.105872 3871021 pjrt_stream_executor_client.cc:2809] Execution of replica 0 failed: INTERNAL: Communicator for rank 1 not found in a NCCL clique devices=[0,4]; stream=0 ``` Copybara import of the project: -- c77ed08f77e252a5068a23d02555ffa765913d42 by Trevor Morris <tmorris@nvidia.com>: Enable nccl comm split for multiprocess mode Merging this change closes #11761 PiperOrigin-RevId: 627626547	2024-04-24 00:35:54 -07:00
A. Unique TensorFlower	7a93c411ee	Automated Code Change PiperOrigin-RevId: 627625947	2024-04-24 00:26:44 -07:00
Emilio Cota	e0c55cb113	multihost_hlo_runner: support host parameter buffers PiperOrigin-RevId: 627625806	2024-04-24 00:20:18 -07:00
mmakevic-amd	a795106888	PR #11603 : [ROCm] Configure pjrt gpu plugin for rocm Imported from GitHub PR https://github.com/openxla/xla/pull/11603 Configure PJRT GPU plugin so it can be built for ROCm as well. Copybara import of the project: -- a1c8bcb4be41dc56899118d44bf604a2723a3c56 by mmakevic <Milica.Makevic@amd.com>: Configure pjrt gpu plugin for rocm -- 9ca24357f52c53febb474c798c67d0b8dda586ee by mmakevic <Milica.Makevic@amd.com>: Change platform name defining Merging this change closes #11603 PiperOrigin-RevId: 627625764	2024-04-24 00:13:47 -07:00
A. Unique TensorFlower	521d646206	Automated Code Change PiperOrigin-RevId: 627625513	2024-04-24 00:07:13 -07:00
A. Unique TensorFlower	5f858cf009	Automated Code Change PiperOrigin-RevId: 627608517	2024-04-23 23:00:46 -07:00
A. Unique TensorFlower	0c135c2530	Automated Code Change PiperOrigin-RevId: 627608157	2024-04-23 22:54:28 -07:00
TensorFlower Gardener	4a80864d46	Merge pull request #64520 from Intel-tensorflow:amin/xla-disable-remapper PiperOrigin-RevId: 627598905	2024-04-23 21:52:53 -07:00
A. Unique TensorFlower	bdd3d08755	Update ops-related pbtxt files. PiperOrigin-RevId: 627597893	2024-04-23 21:44:26 -07:00
Ce Zheng	75ef9c3f2b	[XLA] Avoid crash in LatencyHidingScheduler when it fails to schedule a node and improve error message. PiperOrigin-RevId: 627596827	2024-04-23 21:28:59 -07:00
Ziyin Huang	ff9a2058aa	Add new sets of XlaSparseDenseMatmulWithStaticBufferSizeOps that takes a static buffer size as an attribute. PiperOrigin-RevId: 627594700	2024-04-23 21:08:35 -07:00
Zixuan Jiang	d9417b25e8	Support shape transpose in `hlo_sharding_util::ReshapeSharding`. Before this cl, `hlo_sharding_util::ReshapeSharding` can handle the cases where source and target shapes can be transformed to each other by merging and splitting dimension sizes. It returns `std::nullopt` if transpose is needed between source and target shapes. This cl extracts the `gcd(source_sharding_tile_size, target_shape)` when `source_shape % source_sharding_tile_size == 0` in the major dimensions. We also skip the source_dim if `source_sharding_tile_size` is 1. An example is shown below. ``` input_shape: [6, 2, 5] output_shape: [4, 3, 5] input_sharding: {devices=[2, 1, 5]<=[10]} output_sharding: {devices=[2, 1, 5]<=[10]} ``` PiperOrigin-RevId: 627592738	2024-04-23 20:54:54 -07:00
Steven Toribio	7c5bc63acd	adding support for Composite ops in TFLite flatbuffer schema PiperOrigin-RevId: 627588497	2024-04-23 20:40:50 -07:00
Thai Nguyen	c8ee776269	Remove the CalibrationSingleton The Singleton is a legacy design that only works on CPU. PiperOrigin-RevId: 627586362	2024-04-23 20:28:15 -07:00
A. Unique TensorFlower	5de67474fb	Integrate LLVM at llvm/llvm-project@688c10d236 Updates LLVM usage to match [688c10d23630](https://github.com/llvm/llvm-project/commit/688c10d23630) PiperOrigin-RevId: 627579261	2024-04-23 20:21:34 -07:00
A. Unique TensorFlower	d7825e61e7	Update ops-related pbtxt files. PiperOrigin-RevId: 627575911	2024-04-23 19:45:42 -07:00

1 2 3 4 5 ...

163386 Commits