In the Linux kernel, the following vulnerability has been resolved:
drm/i915/ttm: fix CCS handling
Crucible + recent Mesa seems to sometimes hit:
GEM_BUG_ON(num_ccs_blks > NUM_CCS_BLKS_PER_XFER)
And it looks like we can also trigger this with gem_lmem_swapping, if we modify the test to use slightly larger object sizes.
Looking closer it looks like we have the following issues in migrate_copy():
We are using plain integer in various places, which we can easily overflow with a large object.
We pass the entire object size (when the src is lmem) into emit_pte() and then try to copy it, which doesn't work, since we only have a few fixed sized windows in which to map the pages and perform the copy. With an object > 8M we therefore aren't properly copying the pages. And then with an object > 64M we trigger the GEM_BUG_ON(num_ccs_blks > NUM_CCS_BLKS_PER_XFER).
So it looks like our copy handling for any object > 8M (which is our CHUNK_SZ) is currently broken on DG2.
Testcase: igt@gem_lmem_swapping (cherry picked from commit 8676145eb2f53a9940ff70910caf0125bd8a4bc2)
| Software | From | Fixed in |
|---|---|---|
| linux / linux_kernel | 5.19 | 5.19.8 |
| linux / linux_kernel | 6.0-rc1 | 6.0-rc1.x |
| linux / linux_kernel | 6.0-rc2 | 6.0-rc2.x |
| linux / linux_kernel | 6.0-rc3 | 6.0-rc3.x |