Skip to content

gh-146640: Optimize int operations by mutating uniquely-referenced operands in place (JIT only)#146641

Draft
eendebakpt wants to merge 12 commits intopython:mainfrom
eendebakpt:int_int_inplace
Draft

gh-146640: Optimize int operations by mutating uniquely-referenced operands in place (JIT only)#146641
eendebakpt wants to merge 12 commits intopython:mainfrom
eendebakpt:int_int_inplace

Conversation

@eendebakpt
Copy link
Copy Markdown
Contributor

@eendebakpt eendebakpt commented Mar 30, 2026

In this PR we add inplace binary operations for int type, similar to #146397. A complication compared to the float case is that the int operations can return small ints which should not be modified in place. In this PR we:

  • Use PyJitRef_MakeUnique(sym_new_compact_int(ctx)); to mark that a symbol is an int that is either unique or one of the small ints.
  • Add _BINARY_OP_ADD_INT_INPLACE (with variations for the other operations) that handles both small ints as input and overflows. An alternative to handling these cases in the opcode would be do deopt. Here we decided not to deopts because this could lead to many deopts.

Micro benchmark results:

Pattern main (ns/iter) branch (ns/iter) Speedup Notes
total += a*b + c (non-small) 34.4 27.8 1.24x inplace mutation on unique a*b result
total += a + b (non-small) 19.8 13.0 1.52x a+b result is unique, inplace on +=
t = a + b (plain assign) 13.4 14.1 -- no +=, not affected
total += a*b + c*d (chain) 40.4 34.6 1.17x two unique products, inplace on +
total += a*b + 1 (small int) 23.2 24.1 -- TARGET is small int, fallback to standard

All values are non-small integers (> 1024) unless noted.

Selected pyperformance benchmarks:

### deltablue ###
Mean +- std dev: 1.95 ms +- 0.03 ms -> 1.97 ms +- 0.03 ms: 1.01x slower
Not significant

### scimark_fft ###
Mean +- std dev: 208 ms +- 5 ms -> 206 ms +- 1 ms: 1.01x faster
Not significant

### scimark_lu ###
Mean +- std dev: 58.1 ms +- 0.2 ms -> 58.7 ms +- 0.8 ms: 1.01x slower
Not significant

### scimark_monte_carlo ###
Mean +- std dev: 41.7 ms +- 1.7 ms -> 41.6 ms +- 0.8 ms: 1.00x faster
Not significant

### scimark_sor ###
Mean +- std dev: 66.8 ms +- 0.6 ms -> 64.3 ms +- 0.5 ms: 1.04x faster
Significant (t=26.88)

### scimark_sparse_mat_mult ###
Mean +- std dev: 3.64 ms +- 0.03 ms -> 3.64 ms +- 0.04 ms: 1.00x slower
Not significant

### spectral_norm ###
Mean +- std dev: 56.4 ms +- 0.2 ms -> 51.5 ms +- 0.4 ms: 1.10x faster
Significant (t=79.61)

// On mutation success, DUP the target so POP_TOP_INT can safely
// decref the original. Tier 2 only.
tier2 op(_BINARY_OP_ADD_INT_INPLACE, (left, right -- res, l, r)) {
INT_INPLACE_OP(left, right, left, +);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can factor out all the common code here into a function that takes a function pointer (the operation to do).

For each op, pass the function pointer e.g. _PyCompactLong_Add or whatever, and then the CPU/inliner can fix it up/constant-fold it away in the compiler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants