[6.1.0-rc5-next-20221118] Kernel crash (alloc_buddy_hugetlb_folio) - THP tests
From: Sachin Sant
Date: Sun Nov 20 2022 - 09:18:00 EST
While running transparent hugepage defragmentation test [1]
on Power10 server following crash was seen
[11725.379229] Kernel attempted to read user page (8) - exploit attempt? (uid: 0)
[11725.379251] BUG: Kernel NULL pointer dereference on read at 0x00000008
[11725.379257] Faulting instruction address: 0xc0000000004da04c
[11725.379266] Oops: Kernel access of bad area, sig: 11 [#1]
[11725.379269] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[11725.379275] Modules linked in: nvram(E) rpadlpar_io(E) rpaphp(E) uinput(E) torture(E) vmac(E) poly1305_generic(E) chacha_generic(E) chacha20poly1305(E) n_gsm(E) pps_ldisc(E) ppp_synctty(E) ppp_async(E) ppp_generic(E) serport(E) slcan(E) can_dev(E) slip(E) slhc(E) snd_hrtimer(E) snd_seq(E) snd_seq_device(E) snd_timer(E) snd(E) soundcore(E) pcrypt(E) crypto_user(E) n_hdlc(E) dummy(E) veth(E) nfsv3(E) nfs_acl(E) nfs(E) lockd(E) grace(E) fscache(E) netfs(E) tun(E) brd(E) overlay(E) exfat(E) vfat(E) fat(E) btrfs(E) blake2b_generic(E) xor(E) raid6_pq(E) zstd_compress(E) xfs(E) loop(E) sctp(E) ip6_udp_tunnel(E) udp_tunnel(E) dm_mod(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) bonding(E) rfkill(E) tls(E) ip_set(E) nf_tables(E) libcrc32c(E) nfnetlink(E) sunrpc(E) pseries_rng(E) vmx_crypto(E) ext4(E) mbcache(E) jbd2(E)
[11725.379333] sd_mod(E) t10_pi(E) crc64_rocksoft(E) crc64(E) sg(E) ibmvscsi(E) scsi_transport_srp(E) ibmveth(E) fuse(E) [last unloaded: ipistorm(OE)]
[11725.379371] CPU: 1 PID: 2273459 Comm: sysctl Tainted: G OE 6.1.0-rc5-next-20221118 #1
[11725.379376] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1030.00 (NH1030_026) hv:phyp pSeries
[11725.379381] NIP: c0000000004da04c LR: c0000000004d9f94 CTR: 0000000000000000
[11725.379385] REGS: c00000000872b740 TRAP: 0300 Tainted: G OE (6.1.0-rc5-next-20221118)
[11725.379390] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24048224 XER: 00000001
[11725.379402] CFAR: c0000000004d9fa0 DAR: 0000000000000008 DSISR: 40000000 IRQMASK: 0
[11725.379402] GPR00: c0000000004d9f94 c00000000872b9e0 c0000000013bee00 0000000000000000
[11725.379402] GPR04: c000000002ac8ac0 c000000001218470 0000000000000005 c000000471d7e280
[11725.379402] GPR08: 0000000000000000 0000000000000000 0000000000000002 0000000000000000
[11725.379402] GPR12: 0000000024048242 c000000efffff300 0000000000000000 0000000000000000
[11725.379402] GPR16: 0000000000000000 0000000000000000 0000000000000000 c00000000872bb18
[11725.379402] GPR20: 0000000000000001 0000000000000100 0000000000300cca 0000000000000001
[11725.379402] GPR24: c00000000872bb18 c000000002ac8ac0 0000000000000002 0000000000000000
[11725.379402] GPR28: 0000000000000001 0000000000000005 0000000000000001 0000000000346cca
[11725.379455] NIP [c0000000004da04c] alloc_buddy_hugetlb_folio+0x17c/0x250
[11725.379464] LR [c0000000004d9f94] alloc_buddy_hugetlb_folio+0xc4/0x250
[11725.379469] Call Trace:
[11725.379471] [c00000000872b9e0] [c0000000004dc004] alloc_fresh_hugetlb_folio.part.78+0x224/0x2f0 (unreliable)
[11725.379478] [c00000000872ba70] [c0000000004dfa38] alloc_pool_huge_page+0x118/0x190
[11725.379484] [c00000000872bac0] [c0000000004dff8c] __nr_hugepages_store_common+0x4dc/0x610
[11725.379490] [c00000000872bba0] [c0000000004e036c] hugetlb_sysctl_handler_common+0x13c/0x180
[11725.379496] [c00000000872bc40] [c0000000006413b0] proc_sys_call_handler+0x210/0x350
[11725.379503] [c00000000872bcc0] [c00000000055da10] vfs_write+0x2e0/0x460
[11725.379508] [c00000000872bd80] [c00000000055dd6c] ksys_write+0x7c/0x140
[11725.379513] [c00000000872bdd0] [c000000000035220] system_call_exception+0x140/0x350
[11725.379519] [c00000000872be10] [c00000000000c6d4] system_call_common+0xf4/0x278
[11725.379525] --- interrupt: c00 at 0x7fff7fb20c34
[11725.379530] NIP: 00007fff7fb20c34 LR: 0000000134d554bc CTR: 0000000000000000
[11725.379533] REGS: c00000000872be80 TRAP: 0c00 Tainted: G OE (6.1.0-rc5-next-20221118)
[11725.379538] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 28002202 XER: 00000000
[11725.379548] IRQMASK: 0
[11725.379548] GPR00: 0000000000000004 00007fffedd64fc0 00007fff7fc07300 0000000000000003
[11725.379548] GPR04: 000000014eac37f0 0000000000000006 fffffffffffffff6 0000000000000000
[11725.379548] GPR08: 000000014eac37f0 0000000000000000 0000000000000000 0000000000000000
[11725.379548] GPR12: 0000000000000000 00007fff80000940 0000000000000000 0000000000000000
[11725.379548] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[11725.379548] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[11725.379548] GPR24: 0000000000000001 0000000000000010 0000000000000006 000000014eac5800
[11725.379548] GPR28: 00007fff7fbfc2c8 000000014eac5800 0000000000000006 000000014eac0fc0
[11725.379589] NIP [00007fff7fb20c34] 0x7fff7fb20c34
[11725.379593] LR [0000000134d554bc] 0x134d554bc
[11725.379596] --- interrupt: c00
[11725.379598] Instruction dump:
[11725.379601] 39400001 55283032 7d291ef4 7fc8f050 7f184a14 7d49f036 7d40c0a8 7d4a4b78
[11725.379609] 7d40c1ad 40c2fff4 39200000 382100b0 <e9490008> e8010010 81810008 eae1ffb8
[11725.379616] ---[ end trace 0000000000000000 ]---
[11725.380328]
[11726.380333] Kernel panic - not syncing: Fatal exception
[11726.392564] Rebooting in 10 seconds..
Git bisect points to following patch
commit 7a410b1d773853d1d6ec522a871869541fe48c22 (refs/bisect/bad)
Date: Thu Nov 17 13:15:01 2022 -0800
mm/hugetlb: change hugetlb allocation functions to return a folio
Reverting this patch allows the test to run to completion.
- Sachin
[1] https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/transparent_hugepages_defrag.py