~yeonykim2/sandbox-yeonykim2/+git/trunk:dev-onnxruntime

Last commit made on 2023-11-10
Get this branch:
git clone -b dev-onnxruntime https://git.launchpad.net/~yeonykim2/sandbox-yeonykim2/+git/trunk

Branch merges

Branch information

Name:
dev-onnxruntime
Repository:
lp:~yeonykim2/sandbox-yeonykim2/+git/trunk

Recent commits

42b032a... by SuYeon <email address hidden>

[add] add wget repository url source path

Add tar.gz

Signed-off-by: SuYeon <email address hidden>

34ca5bd... by SuYeon <email address hidden>

[Add] add package tar.gz file

add tar.gz file

dc4324e... by SuYeon <email address hidden>

[tmp] prev success build cmake without URL.

prev-success build fild uploads github.

7d08940... by SuYeon <email address hidden>

[build] deps.txt SHA1 modified

eigen version modified 3.4.0 to 3.4
commit tags is changed.

48e440c... by SuYeon <email address hidden>

[build] add external source path

add external cmake source path

6afb7eb... by SuYeon <email address hidden>

[build] Add external source tar.gz files

add external source tar.gz files

d1b85f5... by kunal-vaishnavi <email address hidden>

Reduce LLaMA memory usage (#18181)

### Description
This PR reduces the memory usage when exporting and benchmarking LLaMA.

### Motivation and Context
- Exporting: The PyTorch model is deleted from memory after a successful
export instead of deleting it from memory after exporting + converting
the ONNX model to the desired precision.
- Benchmarking: In the ONNX model with GroupQueryAttention, the KV cache
inputs use the same GPU memory for both the prompt and token generation
benchmarks.

2b95e74... by RandySheriffH <email address hidden>

Versioning for custom op (#18088)

Allow custom ops to have versions.

---------

Co-authored-by: Randy Shuai <email address hidden>

62c7894... by Scott McKay <email address hidden>

Add mobile CIs to list run by script for external PRs. (#18094)

### Description
<!-- Describe your changes. -->
Add the mobile CIs to the list so we check external PRs don't break
those.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Recent external PR was found to break iOS CI after checkin

ed41a28... by Aditya Goel <email address hidden>

Fix cast removal bug (#17953)

The `RemoveDuplicateCastTransformer` fairly naively removed Cast nodes
from the graph without considering precision loss when using the same
`TypeGroup`. For instance, F64 -> F32 -> F64 would be optimised out of
the graph.

I also noticed that signedness was not accounted for, which is not
covered by any existing issue but is a problem. For example doing int ->
unsigned int -> int produces very different values for negative inputs
and so should not be optimised out

One could argue that we shouldn't be performing such cast elimination at
all (at least not in this transformer). The original scope might be well
restricted to only eliminating unnecessary casts from the
`InsertCastTransformer` and no others.

### Motivation and Context
This should fix https://github.com/microsoft/onnxruntime/issues/17565,
ttps://github.com/microsoft/onnxruntime/issues/9915 and
https://github.com/microsoft/onnxruntime/issues/8787.