Backport checkpoint restore arg layout handling to 12.9.x#2145
Conversation
5df5e59 to
b34a70f
Compare
2dd69d8 to
440be94
Compare
|
/ok to test |
c924a85 to
648c67f
Compare
648c67f to
0292ca7
Compare
|
I edited the PR description: -The 12.9.x branch did not already have the CUcheckpointGpuPair generated blocks that main had before #2144, so this backport includes those conditional template blocks as well.
+Unlike main, the CUDA 12.9 headers do not define `CUcheckpointGpuPair`, so this backport keeps the 12.9 restore-args layout reserved-only while preserving the conditional template support needed for newer checkpoint restore layouts. |
0292ca7 to
6884723
Compare
|
My current understanding of this PR (@kkraus14 , please correct me if this is incorrect): This is primarily a maintenance/alignment backport, not a functional fix for the normal CUDA 12.9.x build/test path. For CUDA 12.9 headers, typedef struct CUcheckpointRestoreArgs_st {
cuuint64_t reserved[8];
} CUcheckpointRestoreArgs;and CUDA 12.9 does not define What this PR does provide is keeping the checkpoint restore-args generation logic on
So I think the value is future maintenance and reduced semantic drift between the branch copies, rather than fixing a currently observed CUDA 12.9 behavior. Keith, please correct me if I’m missing a functional 12.9.x scenario that this backport is intended to cover. |
| {{if struct_field_types.get('CUcheckpointRestoreArgs_st.reserved') == 'char'}} | ||
| @property | ||
| def reserved(self): | ||
| return PyBytes_FromStringAndSize(self._pvt_ptr[0].reserved, {{struct_field_array_lengths['CUcheckpointRestoreArgs_st.reserved']}}) | ||
| @reserved.setter | ||
| def reserved(self, reserved): | ||
| if len(reserved) != {{struct_field_array_lengths['CUcheckpointRestoreArgs_st.reserved']}}: | ||
| raise ValueError("reserved length must be {{struct_field_array_lengths['CUcheckpointRestoreArgs_st.reserved']}}, is " + str(len(reserved))) | ||
| if CHAR_MIN == 0: | ||
| for i, b in enumerate(reserved): | ||
| if b < 0 and b > -129: | ||
| b = b + 256 | ||
| self._pvt_ptr[0].reserved[i] = b | ||
| else: | ||
| for i, b in enumerate(reserved): | ||
| if b > 127 and b < 256: | ||
| b = b - 256 | ||
| self._pvt_ptr[0].reserved[i] = b | ||
| {{endif}} |
There was a problem hiding this comment.
This is potentially concerning as this is an API breaking change, though this reserved keyword is just reserved for future usage and is always supposed to be zeroed currently. Should we special case this to always return the list[cuuint64_t] to maintain the previous behavior?
| cdef struct CUcheckpointRestoreArgs_st: | ||
| cuuint64_t reserved[8] | ||
| {{if struct_field_types.get('CUcheckpointRestoreArgs_st.reserved') == 'char'}} | ||
| char reserved[{{struct_field_array_lengths['CUcheckpointRestoreArgs_st.reserved']}}] | ||
| {{endif}} | ||
| {{if struct_field_types.get('CUcheckpointRestoreArgs_st.reserved') == 'cuuint64_t'}} | ||
| cuuint64_t reserved[{{struct_field_array_lengths['CUcheckpointRestoreArgs_st.reserved']}}] | ||
| {{endif}} |
| # SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE | ||
|
|
||
| # This code was automatically generated with version 12.9.0, generator version 49a8141. Do not modify it directly. | ||
| # This code was automatically generated with version 12.9.0, generator version 0.3.1.dev1711+g875fec45. Do not modify it directly. |
There was a problem hiding this comment.
I wonder why this wasn't updated in the previous refresh...
Backport of the CUDA checkpoint restore argument layout handling from #2144 to the 12.9.x branch.
This keeps
CUcheckpointRestoreArgsrendering version-flexible across the checkpoint restore layouts:reservedremainscuuint64_t[8]gpuPairs,gpuPairsCount,reservedaschar[44], andreserved1gpuPairs,gpuPairsCount, andreservedaschar[52]Unlike main, the CUDA 12.9 headers do not define
CUcheckpointGpuPair, so this backport keeps the 12.9 restore-args layout reserved-only while preserving the conditional template support needed for newer checkpoint restore layouts.Validation: