Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated README for MarkDown (This is not a final version but a draft). #16

Closed
wants to merge 1 commit into from
Closed

Conversation

ghost
Copy link

@ghost ghost commented Mar 29, 2012

This is a MarkDown version of the README file... must be renamed to README.md to work properly (I think).

Linux is a clone of the operating system Unix, written from scratch by
Linus Torvalds with assistance from a loosely-knit team of hackers across
the Net. It aims towards POSIX and Single UNIX Specification compliance.
<blockquote>
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone necessarily lend a hand to make critically posts I might state. That is the first time I frequented your website page and thus far? I amazed with the analysis you made to make this actual put up extraordinary. Fantastic task!
Comprar Nike Air Max Baratas http://sofos.scsalud.es/fondosDoc/Farmacia/AIR/comprar-nike-air-max-baratas.cfm

koenkooi pushed a commit to koenkooi/linux that referenced this pull request Apr 2, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fabio-porcedda pushed a commit to fabio-porcedda/linux-ge863-pro3 that referenced this pull request Apr 4, 2012
Printing the "start_ip" for every secondary cpu is very noisy on a large
system - and doesn't add any value. Drop this message.

Console log before:
Booting Node   0, Processors  #1
smpboot cpu 1: start_ip = 96000
 #2
smpboot cpu 2: start_ip = 96000
 #3
smpboot cpu 3: start_ip = 96000
 #4
smpboot cpu 4: start_ip = 96000
       ...
 torvalds#31
smpboot cpu 31: start_ip = 96000
Brought up 32 CPUs

Console log after:
Booting Node   0, Processors  #1 #2 #3 #4 #5 torvalds#6 torvalds#7 Ok.
Booting Node   1, Processors  torvalds#8 torvalds#9 torvalds#10 torvalds#11 torvalds#12 torvalds#13 torvalds#14 torvalds#15 Ok.
Booting Node   0, Processors  torvalds#16 torvalds#17 torvalds#18 torvalds#19 torvalds#20 torvalds#21 torvalds#22 torvalds#23 Ok.
Booting Node   1, Processors  torvalds#24 torvalds#25 torvalds#26 torvalds#27 torvalds#28 torvalds#29 torvalds#30 torvalds#31
Brought up 32 CPUs

Acked-by: Borislav Petkov <bp@amd64.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/4f452eb42507460426@agluck-desktop.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
fabio-porcedda pushed a commit to fabio-porcedda/linux-ge863-pro3 that referenced this pull request Apr 4, 2012
Add a parameter to avoid using MSI/MSI-X for PCIe native hotplug; it's
known to be buggy on some platforms.

In my environment, while shutting down, following stack trace is shown
sometimes.

  irq 16: nobody cared (try booting with the "irqpoll" option)
  Pid: 1081, comm: reboot Not tainted 3.2.0 #1
  Call Trace:
   <IRQ>  [<ffffffff810cec1d>] __report_bad_irq+0x3d/0xe0
   [<ffffffff810cee1c>] note_interrupt+0x15c/0x210
   [<ffffffff810cc485>] handle_irq_event_percpu+0xb5/0x210
   [<ffffffff810cc621>] handle_irq_event+0x41/0x70
   [<ffffffff810cf675>] handle_fasteoi_irq+0x55/0xc0
   [<ffffffff81015356>] handle_irq+0x46/0xb0
   [<ffffffff814fbe9d>] do_IRQ+0x5d/0xe0
   [<ffffffff814f146e>] common_interrupt+0x6e/0x6e
   [<ffffffff8106b040>] ? __do_softirq+0x60/0x210
   [<ffffffff8108aeb1>] ? hrtimer_interrupt+0x151/0x240
   [<ffffffff814fb5ec>] call_softirq+0x1c/0x30
   [<ffffffff810152d5>] do_softirq+0x65/0xa0
   [<ffffffff8106ae9d>] irq_exit+0xbd/0xe0
   [<ffffffff814fbf8e>] smp_apic_timer_interrupt+0x6e/0x99
   [<ffffffff814f9e5e>] apic_timer_interrupt+0x6e/0x80
   <EOI>  [<ffffffff814f0fb1>] ? _raw_spin_unlock_irqrestore+0x11/0x20
   [<ffffffff812629fc>] pci_bus_write_config_word+0x6c/0x80
   [<ffffffff81266fc2>] pci_intx+0x52/0xa0
   [<ffffffff8127de3d>] pci_intx_for_msi+0x1d/0x30
  [<ffffffff8127e4fb>] pci_msi_shutdown+0x7b/0x110
   [<ffffffff81269d34>] pci_device_shutdown+0x34/0x50
   [<ffffffff81326c4f>] device_shutdown+0x2f/0x140
   [<ffffffff8107b981>] kernel_restart_prepare+0x31/0x40
   [<ffffffff8107b9e6>] kernel_restart+0x16/0x60
   [<ffffffff8107bbfd>] sys_reboot+0x1ad/0x220
   [<ffffffff814f4b90>] ? do_page_fault+0x1e0/0x460
   [<ffffffff811942d0>] ? __sync_filesystem+0x90/0x90
   [<ffffffff8105c9aa>] ? __cond_resched+0x2a/0x40
   [<ffffffff814ef090>] ? _cond_resched+0x30/0x40
   [<ffffffff81169e17>] ? iterate_supers+0xb7/0xd0
   [<ffffffff814f9382>] system_call_fastpath+0x16/0x1b
  handlers:
  [<ffffffff8138a0f0>] usb_hcd_irq
  [<ffffffff8138a0f0>] usb_hcd_irq
  [<ffffffff8138a0f0>] usb_hcd_irq
  Disabling IRQ torvalds#16

An un-wanted interrupt is generated when PCI driver switches from
MSI/MSI-X to INTx while shutting down the device.  The interrupt does
not happen if MSI/MSI-X is not used on the device.
I confirmed that this problem does not happen if pcie_hp=nomsi was
specified and hotplug operation worked fine as usual.

v2: Automatically disable MSI/MSI-X against following device:
    PCI bridge: Integrated Device Technology, Inc. Device 807f (rev 02)
v3: Based on the review comment, combile the if statements.
v4: Removed module parameter.
    Move some code to build pciehp as a module.
    Move device specific code to driver/pci/quirks.c.
v5: Drop a device specific code until getting a vendor statement.

Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: MUNEDA Takahiro <muneda.takahiro@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
vivien pushed a commit to vivien/linux that referenced this pull request Apr 6, 2012
Vivek reported a kernel crash:
[   94.217015] BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
[   94.218004] IP: [<ffffffff81142fae>] kmem_cache_free+0x5e/0x200
[   94.218004] PGD 13abda067 PUD 137d52067 PMD 0
[   94.218004] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[   94.218004] CPU 0
[   94.218004] Modules linked in: [last unloaded: scsi_wait_scan]
[   94.218004]
[   94.218004] Pid: 0, comm: swapper/0 Not tainted 3.2.0+ torvalds#16 Hewlett-Packard HP xw6600 Workstation/0A9Ch
[   94.218004] RIP: 0010:[<ffffffff81142fae>]  [<ffffffff81142fae>] kmem_cache_free+0x5e/0x200
[   94.218004] RSP: 0018:ffff88013fc03de0  EFLAGS: 00010006
[   94.218004] RAX: ffffffff81e0d020 RBX: ffff880138b3c680 RCX: 00000001801c001b
[   94.218004] RDX: 00000000003aac1d RSI: ffff880138b3c680 RDI: ffffffff81142fae
[   94.218004] RBP: ffff88013fc03e10 R08: ffff880137830238 R09: 0000000000000001
[   94.218004] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[   94.218004] R13: ffffea0004e2cf00 R14: ffffffff812f6eb6 R15: 0000000000000246
[   94.218004] FS:  0000000000000000(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
[   94.218004] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   94.218004] CR2: 000000000000001c CR3: 00000001395ab000 CR4: 00000000000006f0
[   94.218004] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   94.218004] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   94.218004] Process swapper/0 (pid: 0, threadinfo ffffffff81e00000, task ffffffff81e0d020)
[   94.218004] Stack:
[   94.218004]  0000000000000102 ffff88013fc0db20 ffffffff81e22700 ffff880139500f00
[   94.218004]  0000000000000001 000000000000000a ffff88013fc03e20 ffffffff812f6eb6
[   94.218004]  ffff88013fc03e90 ffffffff810c8da2 ffffffff81e01fd8 ffff880137830240
[   94.218004] Call Trace:
[   94.218004]  <IRQ>
[   94.218004]  [<ffffffff812f6eb6>] icq_free_icq_rcu+0x16/0x20
[   94.218004]  [<ffffffff810c8da2>] __rcu_process_callbacks+0x1c2/0x420
[   94.218004]  [<ffffffff810c9038>] rcu_process_callbacks+0x38/0x250
[   94.218004]  [<ffffffff810405ee>] __do_softirq+0xce/0x3e0
[   94.218004]  [<ffffffff8108ed04>] ? clockevents_program_event+0x74/0x100
[   94.218004]  [<ffffffff81090104>] ? tick_program_event+0x24/0x30
[   94.218004]  [<ffffffff8183ed1c>] call_softirq+0x1c/0x30
[   94.218004]  [<ffffffff8100422d>] do_softirq+0x8d/0xc0
[   94.218004]  [<ffffffff81040c3e>] irq_exit+0xae/0xe0
[   94.218004]  [<ffffffff8183f4be>] smp_apic_timer_interrupt+0x6e/0x99
[   94.218004]  [<ffffffff8183e330>] apic_timer_interrupt+0x70/0x80

Once a queue is quiesced, it's not supposed to have any elvpriv data or
icq's, and elevator switching depends on that.  Request alloc path
followed the rule for elvpriv data but forgot apply it to icq's
leading to the following crash during elevator switch. Fix it by not
allocating icq's if ELVPRIV is not set for the request.

Reported-by: Vivek Goyal <vgoyal@redhat.com>
Tested-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
vivien pushed a commit to vivien/linux that referenced this pull request Apr 6, 2012
…S block during isolation for migration

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a
 torvalds#7 [d72d3d1] zone_watermark_ok at c02d26cb
 torvalds#8 [d72d3d2c] compact_zone at c030b8de
 torvalds#9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
vivien pushed a commit to vivien/linux that referenced this pull request Apr 6, 2012
If the netdev is already in NETREG_UNREGISTERING/_UNREGISTERED state, do not
update the real num tx queues. netdev_queue_update_kobjects() is already
called via remove_queue_kobjects() at NETREG_UNREGISTERING time. So, when
upper layer driver, e.g., FCoE protocol stack is monitoring the netdev
event of NETDEV_UNREGISTER and calls back to LLD ndo_fcoe_disable() to remove
extra queues allocated for FCoE, the associated txq sysfs kobjects are already
removed, and trying to update the real num queues would cause something like
below:

...
PID: 25138  TASK: ffff88021e64c440  CPU: 3   COMMAND: "kworker/3:3"
 #0 [ffff88021f007760] machine_kexec at ffffffff810226d9
 #1 [ffff88021f0077d0] crash_kexec at ffffffff81089d2d
 #2 [ffff88021f0078a0] oops_end at ffffffff813bca78
 #3 [ffff88021f0078d0] no_context at ffffffff81029e72
 #4 [ffff88021f007920] __bad_area_nosemaphore at ffffffff8102a155
 #5 [ffff88021f0079f0] bad_area_nosemaphore at ffffffff8102a23e
 torvalds#6 [ffff88021f007a00] do_page_fault at ffffffff813bf32e
 torvalds#7 [ffff88021f007b10] page_fault at ffffffff813bc045
    [exception RIP: sysfs_find_dirent+17]
    RIP: ffffffff81178611  RSP: ffff88021f007bc0  RFLAGS: 00010246
    RAX: ffff88021e64c440  RBX: ffffffff8156cc63  RCX: 0000000000000004
    RDX: ffffffff8156cc63  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffff88021f007be0   R8: 0000000000000004   R9: 0000000000000008
    R10: ffffffff816fed00  R11: 0000000000000004  R12: 0000000000000000
    R13: ffffffff8156cc63  R14: 0000000000000000  R15: ffff8802222a0000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 torvalds#8 [ffff88021f007be8] sysfs_get_dirent at ffffffff81178c07
 torvalds#9 [ffff88021f007c18] sysfs_remove_group at ffffffff8117ac27
torvalds#10 [ffff88021f007c48] netdev_queue_update_kobjects at ffffffff813178f9
torvalds#11 [ffff88021f007c88] netif_set_real_num_tx_queues at ffffffff81303e38
torvalds#12 [ffff88021f007cc8] ixgbe_set_num_queues at ffffffffa0249763 [ixgbe]
torvalds#13 [ffff88021f007cf8] ixgbe_init_interrupt_scheme at ffffffffa024ea89 [ixgbe]
torvalds#14 [ffff88021f007d48] ixgbe_fcoe_disable at ffffffffa0267113 [ixgbe]
torvalds#15 [ffff88021f007d68] vlan_dev_fcoe_disable at ffffffffa014fef5 [8021q]
torvalds#16 [ffff88021f007d78] fcoe_interface_cleanup at ffffffffa02b7dfd [fcoe]
torvalds#17 [ffff88021f007df8] fcoe_destroy_work at ffffffffa02b7f08 [fcoe]
torvalds#18 [ffff88021f007e18] process_one_work at ffffffff8105d7ca
torvalds#19 [ffff88021f007e68] worker_thread at ffffffff81060513
torvalds#20 [ffff88021f007ee8] kthread at ffffffff810648b6
torvalds#21 [ffff88021f007f48] kernel_thread_helper at ffffffff813c40f4

Signed-off-by: Yi Zou <yi.zou@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
neilbrown pushed a commit to neilbrown/linux that referenced this pull request Apr 9, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a
 torvalds#7 [d72d3d1] zone_watermark_ok at c02d26cb
 torvalds#8 [d72d3d2c] compact_zone at c030b8de
 torvalds#9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Apr 9, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Apr 11, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Apr 12, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
psanford pushed a commit to retailnext/linux that referenced this pull request Apr 16, 2012
…S block during isolation for migration

BugLink: http://bugs.launchpad.net/bugs/931719

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a
 torvalds#7 [d72d3d1] zone_watermark_ok at c02d26cb
 torvalds#8 [d72d3d2c] compact_zone at c030b8de
 torvalds#9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Apr 19, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
amery referenced this pull request in linux-sunxi/linux-sunxi Apr 20, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
#10 [d72d3db4] try_to_compact_pages at c030bc84
#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
#14 [d72d3eb8] alloc_pages_vma at c030a845
#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
#16 [d72d3f00] handle_mm_fault at c02f36c6
#17 [d72d3f30] do_page_fault at c05c70ed
#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 4, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 4, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 5, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 7, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 9, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 14, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 16, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 17, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 21, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 22, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 22, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 23, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 24, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
@ismaell
Copy link
Contributor

ismaell commented May 30, 2012

Nonsense. Markdown is for wikis, not for help files.

@ghost
Copy link
Author

ghost commented May 30, 2012

It's also for READMEs.

kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Feb 11, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  torvalds#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  torvalds#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 torvalds#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 torvalds#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 torvalds#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 torvalds#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 torvalds#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 torvalds#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 torvalds#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <jjcolemanx86@gmail.com>
Co-developed-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Signed-off-by: John J Coleman <jjcolemanx86@gmail.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Feb 11, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  torvalds#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  torvalds#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 torvalds#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 torvalds#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 torvalds#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 torvalds#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 torvalds#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 torvalds#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 torvalds#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <jjcolemanx86@gmail.com>
Co-developed-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Signed-off-by: John J Coleman <jjcolemanx86@gmail.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Feb 11, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  torvalds#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  torvalds#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 torvalds#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 torvalds#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 torvalds#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 torvalds#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 torvalds#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 torvalds#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 torvalds#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <jjcolemanx86@gmail.com>
Co-developed-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Signed-off-by: John J Coleman <jjcolemanx86@gmail.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Feb 11, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  torvalds#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  torvalds#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 torvalds#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 torvalds#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 torvalds#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 torvalds#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 torvalds#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 torvalds#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 torvalds#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <jjcolemanx86@gmail.com>
Co-developed-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Signed-off-by: John J Coleman <jjcolemanx86@gmail.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Feb 11, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  torvalds#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  torvalds#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 torvalds#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 torvalds#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 torvalds#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 torvalds#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 torvalds#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 torvalds#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 torvalds#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <jjcolemanx86@gmail.com>
Co-developed-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Signed-off-by: John J Coleman <jjcolemanx86@gmail.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Feb 11, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  torvalds#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  torvalds#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 torvalds#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 torvalds#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 torvalds#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 torvalds#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 torvalds#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 torvalds#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 torvalds#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <jjcolemanx86@gmail.com>
Co-developed-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Signed-off-by: John J Coleman <jjcolemanx86@gmail.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Feb 11, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  torvalds#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  torvalds#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 torvalds#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 torvalds#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 torvalds#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 torvalds#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 torvalds#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 torvalds#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 torvalds#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <jjcolemanx86@gmail.com>
Co-developed-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Signed-off-by: John J Coleman <jjcolemanx86@gmail.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Feb 12, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  torvalds#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  torvalds#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 torvalds#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 torvalds#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 torvalds#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 torvalds#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 torvalds#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 torvalds#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 torvalds#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <jjcolemanx86@gmail.com>
Co-developed-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Signed-off-by: John J Coleman <jjcolemanx86@gmail.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Feb 12, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  torvalds#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  torvalds#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 torvalds#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 torvalds#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 torvalds#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 torvalds#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 torvalds#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 torvalds#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 torvalds#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <jjcolemanx86@gmail.com>
Co-developed-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Signed-off-by: John J Coleman <jjcolemanx86@gmail.com>
Signed-off-by: NipaLocal <nipa@local>
github-actions bot pushed a commit to anon503/linux that referenced this pull request Feb 18, 2025
[ Upstream commit 6d00234 ]

Function xen_pin_page calls xen_pte_lock, which in turn grab page
table lock (ptlock). When locking, xen_pte_lock expect mm->page_table_lock
to be held before grabbing ptlock, but this does not happen when pinning
is caused by xen_mm_pin_all.

This commit addresses lockdep warning below, which shows up when
suspending a Xen VM.

[ 3680.658422] Freezing user space processes
[ 3680.660156] Freezing user space processes completed (elapsed 0.001 seconds)
[ 3680.660182] OOM killer disabled.
[ 3680.660192] Freezing remaining freezable tasks
[ 3680.661485] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[ 3680.685254]
[ 3680.685265] ==================================
[ 3680.685269] WARNING: Nested lock was not taken
[ 3680.685274] 6.12.0+ torvalds#16 Tainted: G        W
[ 3680.685279] ----------------------------------
[ 3680.685283] migration/0/19 is trying to lock:
[ 3680.685288] ffff88800bac33c0 (ptlock_ptr(ptdesc)#2){+.+.}-{3:3}, at: xen_pin_page+0x175/0x1d0
[ 3680.685303]
[ 3680.685303] but this task is not holding:
[ 3680.685308] init_mm.page_table_lock
[ 3680.685311]
[ 3680.685311] stack backtrace:
[ 3680.685316] CPU: 0 UID: 0 PID: 19 Comm: migration/0 Tainted: G        W          6.12.0+ torvalds#16
[ 3680.685324] Tainted: [W]=WARN
[ 3680.685328] Stopper: multi_cpu_stop+0x0/0x120 <- __stop_cpus.constprop.0+0x8c/0xd0
[ 3680.685339] Call Trace:
[ 3680.685344]  <TASK>
[ 3680.685347]  dump_stack_lvl+0x77/0xb0
[ 3680.685356]  __lock_acquire+0x917/0x2310
[ 3680.685364]  lock_acquire+0xce/0x2c0
[ 3680.685369]  ? xen_pin_page+0x175/0x1d0
[ 3680.685373]  _raw_spin_lock_nest_lock+0x2f/0x70
[ 3680.685381]  ? xen_pin_page+0x175/0x1d0
[ 3680.685386]  xen_pin_page+0x175/0x1d0
[ 3680.685390]  ? __pfx_xen_pin_page+0x10/0x10
[ 3680.685394]  __xen_pgd_walk+0x233/0x2c0
[ 3680.685401]  ? stop_one_cpu+0x91/0x100
[ 3680.685405]  __xen_pgd_pin+0x5d/0x250
[ 3680.685410]  xen_mm_pin_all+0x70/0xa0
[ 3680.685415]  xen_pv_pre_suspend+0xf/0x280
[ 3680.685420]  xen_suspend+0x57/0x1a0
[ 3680.685428]  multi_cpu_stop+0x6b/0x120
[ 3680.685432]  ? update_cpumasks_hier+0x7c/0xa60
[ 3680.685439]  ? __pfx_multi_cpu_stop+0x10/0x10
[ 3680.685443]  cpu_stopper_thread+0x8c/0x140
[ 3680.685448]  ? smpboot_thread_fn+0x20/0x1f0
[ 3680.685454]  ? __pfx_smpboot_thread_fn+0x10/0x10
[ 3680.685458]  smpboot_thread_fn+0xed/0x1f0
[ 3680.685462]  kthread+0xde/0x110
[ 3680.685467]  ? __pfx_kthread+0x10/0x10
[ 3680.685471]  ret_from_fork+0x2f/0x50
[ 3680.685478]  ? __pfx_kthread+0x10/0x10
[ 3680.685482]  ret_from_fork_asm+0x1a/0x30
[ 3680.685489]  </TASK>
[ 3680.685491]
[ 3680.685491] other info that might help us debug this:
[ 3680.685497] 1 lock held by migration/0/19:
[ 3680.685500]  #0: ffffffff8284df38 (pgd_lock){+.+.}-{3:3}, at: xen_mm_pin_all+0x14/0xa0
[ 3680.685512]
[ 3680.685512] stack backtrace:
[ 3680.685518] CPU: 0 UID: 0 PID: 19 Comm: migration/0 Tainted: G        W          6.12.0+ torvalds#16
[ 3680.685528] Tainted: [W]=WARN
[ 3680.685531] Stopper: multi_cpu_stop+0x0/0x120 <- __stop_cpus.constprop.0+0x8c/0xd0
[ 3680.685538] Call Trace:
[ 3680.685541]  <TASK>
[ 3680.685544]  dump_stack_lvl+0x77/0xb0
[ 3680.685549]  __lock_acquire+0x93c/0x2310
[ 3680.685554]  lock_acquire+0xce/0x2c0
[ 3680.685558]  ? xen_pin_page+0x175/0x1d0
[ 3680.685562]  _raw_spin_lock_nest_lock+0x2f/0x70
[ 3680.685568]  ? xen_pin_page+0x175/0x1d0
[ 3680.685572]  xen_pin_page+0x175/0x1d0
[ 3680.685578]  ? __pfx_xen_pin_page+0x10/0x10
[ 3680.685582]  __xen_pgd_walk+0x233/0x2c0
[ 3680.685588]  ? stop_one_cpu+0x91/0x100
[ 3680.685592]  __xen_pgd_pin+0x5d/0x250
[ 3680.685596]  xen_mm_pin_all+0x70/0xa0
[ 3680.685600]  xen_pv_pre_suspend+0xf/0x280
[ 3680.685607]  xen_suspend+0x57/0x1a0
[ 3680.685611]  multi_cpu_stop+0x6b/0x120
[ 3680.685615]  ? update_cpumasks_hier+0x7c/0xa60
[ 3680.685620]  ? __pfx_multi_cpu_stop+0x10/0x10
[ 3680.685625]  cpu_stopper_thread+0x8c/0x140
[ 3680.685629]  ? smpboot_thread_fn+0x20/0x1f0
[ 3680.685634]  ? __pfx_smpboot_thread_fn+0x10/0x10
[ 3680.685638]  smpboot_thread_fn+0xed/0x1f0
[ 3680.685642]  kthread+0xde/0x110
[ 3680.685645]  ? __pfx_kthread+0x10/0x10
[ 3680.685649]  ret_from_fork+0x2f/0x50
[ 3680.685654]  ? __pfx_kthread+0x10/0x10
[ 3680.685657]  ret_from_fork_asm+0x1a/0x30
[ 3680.685662]  </TASK>
[ 3680.685267] xen:grant_table: Grant tables using version 1 layout
[ 3680.685921] OOM killer enabled.
[ 3680.685934] Restarting tasks ... done.

Signed-off-by: Maksym Planeta <maksym@exostellar.io>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20241204103516.3309112-1-maksym@exostellar.io>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
github-actions bot pushed a commit to anon503/linux that referenced this pull request Feb 18, 2025
[ Upstream commit 6d00234 ]

Function xen_pin_page calls xen_pte_lock, which in turn grab page
table lock (ptlock). When locking, xen_pte_lock expect mm->page_table_lock
to be held before grabbing ptlock, but this does not happen when pinning
is caused by xen_mm_pin_all.

This commit addresses lockdep warning below, which shows up when
suspending a Xen VM.

[ 3680.658422] Freezing user space processes
[ 3680.660156] Freezing user space processes completed (elapsed 0.001 seconds)
[ 3680.660182] OOM killer disabled.
[ 3680.660192] Freezing remaining freezable tasks
[ 3680.661485] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[ 3680.685254]
[ 3680.685265] ==================================
[ 3680.685269] WARNING: Nested lock was not taken
[ 3680.685274] 6.12.0+ torvalds#16 Tainted: G        W
[ 3680.685279] ----------------------------------
[ 3680.685283] migration/0/19 is trying to lock:
[ 3680.685288] ffff88800bac33c0 (ptlock_ptr(ptdesc)#2){+.+.}-{3:3}, at: xen_pin_page+0x175/0x1d0
[ 3680.685303]
[ 3680.685303] but this task is not holding:
[ 3680.685308] init_mm.page_table_lock
[ 3680.685311]
[ 3680.685311] stack backtrace:
[ 3680.685316] CPU: 0 UID: 0 PID: 19 Comm: migration/0 Tainted: G        W          6.12.0+ torvalds#16
[ 3680.685324] Tainted: [W]=WARN
[ 3680.685328] Stopper: multi_cpu_stop+0x0/0x120 <- __stop_cpus.constprop.0+0x8c/0xd0
[ 3680.685339] Call Trace:
[ 3680.685344]  <TASK>
[ 3680.685347]  dump_stack_lvl+0x77/0xb0
[ 3680.685356]  __lock_acquire+0x917/0x2310
[ 3680.685364]  lock_acquire+0xce/0x2c0
[ 3680.685369]  ? xen_pin_page+0x175/0x1d0
[ 3680.685373]  _raw_spin_lock_nest_lock+0x2f/0x70
[ 3680.685381]  ? xen_pin_page+0x175/0x1d0
[ 3680.685386]  xen_pin_page+0x175/0x1d0
[ 3680.685390]  ? __pfx_xen_pin_page+0x10/0x10
[ 3680.685394]  __xen_pgd_walk+0x233/0x2c0
[ 3680.685401]  ? stop_one_cpu+0x91/0x100
[ 3680.685405]  __xen_pgd_pin+0x5d/0x250
[ 3680.685410]  xen_mm_pin_all+0x70/0xa0
[ 3680.685415]  xen_pv_pre_suspend+0xf/0x280
[ 3680.685420]  xen_suspend+0x57/0x1a0
[ 3680.685428]  multi_cpu_stop+0x6b/0x120
[ 3680.685432]  ? update_cpumasks_hier+0x7c/0xa60
[ 3680.685439]  ? __pfx_multi_cpu_stop+0x10/0x10
[ 3680.685443]  cpu_stopper_thread+0x8c/0x140
[ 3680.685448]  ? smpboot_thread_fn+0x20/0x1f0
[ 3680.685454]  ? __pfx_smpboot_thread_fn+0x10/0x10
[ 3680.685458]  smpboot_thread_fn+0xed/0x1f0
[ 3680.685462]  kthread+0xde/0x110
[ 3680.685467]  ? __pfx_kthread+0x10/0x10
[ 3680.685471]  ret_from_fork+0x2f/0x50
[ 3680.685478]  ? __pfx_kthread+0x10/0x10
[ 3680.685482]  ret_from_fork_asm+0x1a/0x30
[ 3680.685489]  </TASK>
[ 3680.685491]
[ 3680.685491] other info that might help us debug this:
[ 3680.685497] 1 lock held by migration/0/19:
[ 3680.685500]  #0: ffffffff8284df38 (pgd_lock){+.+.}-{3:3}, at: xen_mm_pin_all+0x14/0xa0
[ 3680.685512]
[ 3680.685512] stack backtrace:
[ 3680.685518] CPU: 0 UID: 0 PID: 19 Comm: migration/0 Tainted: G        W          6.12.0+ torvalds#16
[ 3680.685528] Tainted: [W]=WARN
[ 3680.685531] Stopper: multi_cpu_stop+0x0/0x120 <- __stop_cpus.constprop.0+0x8c/0xd0
[ 3680.685538] Call Trace:
[ 3680.685541]  <TASK>
[ 3680.685544]  dump_stack_lvl+0x77/0xb0
[ 3680.685549]  __lock_acquire+0x93c/0x2310
[ 3680.685554]  lock_acquire+0xce/0x2c0
[ 3680.685558]  ? xen_pin_page+0x175/0x1d0
[ 3680.685562]  _raw_spin_lock_nest_lock+0x2f/0x70
[ 3680.685568]  ? xen_pin_page+0x175/0x1d0
[ 3680.685572]  xen_pin_page+0x175/0x1d0
[ 3680.685578]  ? __pfx_xen_pin_page+0x10/0x10
[ 3680.685582]  __xen_pgd_walk+0x233/0x2c0
[ 3680.685588]  ? stop_one_cpu+0x91/0x100
[ 3680.685592]  __xen_pgd_pin+0x5d/0x250
[ 3680.685596]  xen_mm_pin_all+0x70/0xa0
[ 3680.685600]  xen_pv_pre_suspend+0xf/0x280
[ 3680.685607]  xen_suspend+0x57/0x1a0
[ 3680.685611]  multi_cpu_stop+0x6b/0x120
[ 3680.685615]  ? update_cpumasks_hier+0x7c/0xa60
[ 3680.685620]  ? __pfx_multi_cpu_stop+0x10/0x10
[ 3680.685625]  cpu_stopper_thread+0x8c/0x140
[ 3680.685629]  ? smpboot_thread_fn+0x20/0x1f0
[ 3680.685634]  ? __pfx_smpboot_thread_fn+0x10/0x10
[ 3680.685638]  smpboot_thread_fn+0xed/0x1f0
[ 3680.685642]  kthread+0xde/0x110
[ 3680.685645]  ? __pfx_kthread+0x10/0x10
[ 3680.685649]  ret_from_fork+0x2f/0x50
[ 3680.685654]  ? __pfx_kthread+0x10/0x10
[ 3680.685657]  ret_from_fork_asm+0x1a/0x30
[ 3680.685662]  </TASK>
[ 3680.685267] xen:grant_table: Grant tables using version 1 layout
[ 3680.685921] OOM killer enabled.
[ 3680.685934] Restarting tasks ... done.

Signed-off-by: Maksym Planeta <maksym@exostellar.io>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20241204103516.3309112-1-maksym@exostellar.io>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
github-actions bot pushed a commit to anon503/linux that referenced this pull request Feb 19, 2025
[ Upstream commit 6d00234 ]

Function xen_pin_page calls xen_pte_lock, which in turn grab page
table lock (ptlock). When locking, xen_pte_lock expect mm->page_table_lock
to be held before grabbing ptlock, but this does not happen when pinning
is caused by xen_mm_pin_all.

This commit addresses lockdep warning below, which shows up when
suspending a Xen VM.

[ 3680.658422] Freezing user space processes
[ 3680.660156] Freezing user space processes completed (elapsed 0.001 seconds)
[ 3680.660182] OOM killer disabled.
[ 3680.660192] Freezing remaining freezable tasks
[ 3680.661485] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[ 3680.685254]
[ 3680.685265] ==================================
[ 3680.685269] WARNING: Nested lock was not taken
[ 3680.685274] 6.12.0+ torvalds#16 Tainted: G        W
[ 3680.685279] ----------------------------------
[ 3680.685283] migration/0/19 is trying to lock:
[ 3680.685288] ffff88800bac33c0 (ptlock_ptr(ptdesc)#2){+.+.}-{3:3}, at: xen_pin_page+0x175/0x1d0
[ 3680.685303]
[ 3680.685303] but this task is not holding:
[ 3680.685308] init_mm.page_table_lock
[ 3680.685311]
[ 3680.685311] stack backtrace:
[ 3680.685316] CPU: 0 UID: 0 PID: 19 Comm: migration/0 Tainted: G        W          6.12.0+ torvalds#16
[ 3680.685324] Tainted: [W]=WARN
[ 3680.685328] Stopper: multi_cpu_stop+0x0/0x120 <- __stop_cpus.constprop.0+0x8c/0xd0
[ 3680.685339] Call Trace:
[ 3680.685344]  <TASK>
[ 3680.685347]  dump_stack_lvl+0x77/0xb0
[ 3680.685356]  __lock_acquire+0x917/0x2310
[ 3680.685364]  lock_acquire+0xce/0x2c0
[ 3680.685369]  ? xen_pin_page+0x175/0x1d0
[ 3680.685373]  _raw_spin_lock_nest_lock+0x2f/0x70
[ 3680.685381]  ? xen_pin_page+0x175/0x1d0
[ 3680.685386]  xen_pin_page+0x175/0x1d0
[ 3680.685390]  ? __pfx_xen_pin_page+0x10/0x10
[ 3680.685394]  __xen_pgd_walk+0x233/0x2c0
[ 3680.685401]  ? stop_one_cpu+0x91/0x100
[ 3680.685405]  __xen_pgd_pin+0x5d/0x250
[ 3680.685410]  xen_mm_pin_all+0x70/0xa0
[ 3680.685415]  xen_pv_pre_suspend+0xf/0x280
[ 3680.685420]  xen_suspend+0x57/0x1a0
[ 3680.685428]  multi_cpu_stop+0x6b/0x120
[ 3680.685432]  ? update_cpumasks_hier+0x7c/0xa60
[ 3680.685439]  ? __pfx_multi_cpu_stop+0x10/0x10
[ 3680.685443]  cpu_stopper_thread+0x8c/0x140
[ 3680.685448]  ? smpboot_thread_fn+0x20/0x1f0
[ 3680.685454]  ? __pfx_smpboot_thread_fn+0x10/0x10
[ 3680.685458]  smpboot_thread_fn+0xed/0x1f0
[ 3680.685462]  kthread+0xde/0x110
[ 3680.685467]  ? __pfx_kthread+0x10/0x10
[ 3680.685471]  ret_from_fork+0x2f/0x50
[ 3680.685478]  ? __pfx_kthread+0x10/0x10
[ 3680.685482]  ret_from_fork_asm+0x1a/0x30
[ 3680.685489]  </TASK>
[ 3680.685491]
[ 3680.685491] other info that might help us debug this:
[ 3680.685497] 1 lock held by migration/0/19:
[ 3680.685500]  #0: ffffffff8284df38 (pgd_lock){+.+.}-{3:3}, at: xen_mm_pin_all+0x14/0xa0
[ 3680.685512]
[ 3680.685512] stack backtrace:
[ 3680.685518] CPU: 0 UID: 0 PID: 19 Comm: migration/0 Tainted: G        W          6.12.0+ torvalds#16
[ 3680.685528] Tainted: [W]=WARN
[ 3680.685531] Stopper: multi_cpu_stop+0x0/0x120 <- __stop_cpus.constprop.0+0x8c/0xd0
[ 3680.685538] Call Trace:
[ 3680.685541]  <TASK>
[ 3680.685544]  dump_stack_lvl+0x77/0xb0
[ 3680.685549]  __lock_acquire+0x93c/0x2310
[ 3680.685554]  lock_acquire+0xce/0x2c0
[ 3680.685558]  ? xen_pin_page+0x175/0x1d0
[ 3680.685562]  _raw_spin_lock_nest_lock+0x2f/0x70
[ 3680.685568]  ? xen_pin_page+0x175/0x1d0
[ 3680.685572]  xen_pin_page+0x175/0x1d0
[ 3680.685578]  ? __pfx_xen_pin_page+0x10/0x10
[ 3680.685582]  __xen_pgd_walk+0x233/0x2c0
[ 3680.685588]  ? stop_one_cpu+0x91/0x100
[ 3680.685592]  __xen_pgd_pin+0x5d/0x250
[ 3680.685596]  xen_mm_pin_all+0x70/0xa0
[ 3680.685600]  xen_pv_pre_suspend+0xf/0x280
[ 3680.685607]  xen_suspend+0x57/0x1a0
[ 3680.685611]  multi_cpu_stop+0x6b/0x120
[ 3680.685615]  ? update_cpumasks_hier+0x7c/0xa60
[ 3680.685620]  ? __pfx_multi_cpu_stop+0x10/0x10
[ 3680.685625]  cpu_stopper_thread+0x8c/0x140
[ 3680.685629]  ? smpboot_thread_fn+0x20/0x1f0
[ 3680.685634]  ? __pfx_smpboot_thread_fn+0x10/0x10
[ 3680.685638]  smpboot_thread_fn+0xed/0x1f0
[ 3680.685642]  kthread+0xde/0x110
[ 3680.685645]  ? __pfx_kthread+0x10/0x10
[ 3680.685649]  ret_from_fork+0x2f/0x50
[ 3680.685654]  ? __pfx_kthread+0x10/0x10
[ 3680.685657]  ret_from_fork_asm+0x1a/0x30
[ 3680.685662]  </TASK>
[ 3680.685267] xen:grant_table: Grant tables using version 1 layout
[ 3680.685921] OOM killer enabled.
[ 3680.685934] Restarting tasks ... done.

Signed-off-by: Maksym Planeta <maksym@exostellar.io>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20241204103516.3309112-1-maksym@exostellar.io>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
github-actions bot pushed a commit to anon503/linux that referenced this pull request Feb 19, 2025
[ Upstream commit 6d00234 ]

Function xen_pin_page calls xen_pte_lock, which in turn grab page
table lock (ptlock). When locking, xen_pte_lock expect mm->page_table_lock
to be held before grabbing ptlock, but this does not happen when pinning
is caused by xen_mm_pin_all.

This commit addresses lockdep warning below, which shows up when
suspending a Xen VM.

[ 3680.658422] Freezing user space processes
[ 3680.660156] Freezing user space processes completed (elapsed 0.001 seconds)
[ 3680.660182] OOM killer disabled.
[ 3680.660192] Freezing remaining freezable tasks
[ 3680.661485] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[ 3680.685254]
[ 3680.685265] ==================================
[ 3680.685269] WARNING: Nested lock was not taken
[ 3680.685274] 6.12.0+ torvalds#16 Tainted: G        W
[ 3680.685279] ----------------------------------
[ 3680.685283] migration/0/19 is trying to lock:
[ 3680.685288] ffff88800bac33c0 (ptlock_ptr(ptdesc)#2){+.+.}-{3:3}, at: xen_pin_page+0x175/0x1d0
[ 3680.685303]
[ 3680.685303] but this task is not holding:
[ 3680.685308] init_mm.page_table_lock
[ 3680.685311]
[ 3680.685311] stack backtrace:
[ 3680.685316] CPU: 0 UID: 0 PID: 19 Comm: migration/0 Tainted: G        W          6.12.0+ torvalds#16
[ 3680.685324] Tainted: [W]=WARN
[ 3680.685328] Stopper: multi_cpu_stop+0x0/0x120 <- __stop_cpus.constprop.0+0x8c/0xd0
[ 3680.685339] Call Trace:
[ 3680.685344]  <TASK>
[ 3680.685347]  dump_stack_lvl+0x77/0xb0
[ 3680.685356]  __lock_acquire+0x917/0x2310
[ 3680.685364]  lock_acquire+0xce/0x2c0
[ 3680.685369]  ? xen_pin_page+0x175/0x1d0
[ 3680.685373]  _raw_spin_lock_nest_lock+0x2f/0x70
[ 3680.685381]  ? xen_pin_page+0x175/0x1d0
[ 3680.685386]  xen_pin_page+0x175/0x1d0
[ 3680.685390]  ? __pfx_xen_pin_page+0x10/0x10
[ 3680.685394]  __xen_pgd_walk+0x233/0x2c0
[ 3680.685401]  ? stop_one_cpu+0x91/0x100
[ 3680.685405]  __xen_pgd_pin+0x5d/0x250
[ 3680.685410]  xen_mm_pin_all+0x70/0xa0
[ 3680.685415]  xen_pv_pre_suspend+0xf/0x280
[ 3680.685420]  xen_suspend+0x57/0x1a0
[ 3680.685428]  multi_cpu_stop+0x6b/0x120
[ 3680.685432]  ? update_cpumasks_hier+0x7c/0xa60
[ 3680.685439]  ? __pfx_multi_cpu_stop+0x10/0x10
[ 3680.685443]  cpu_stopper_thread+0x8c/0x140
[ 3680.685448]  ? smpboot_thread_fn+0x20/0x1f0
[ 3680.685454]  ? __pfx_smpboot_thread_fn+0x10/0x10
[ 3680.685458]  smpboot_thread_fn+0xed/0x1f0
[ 3680.685462]  kthread+0xde/0x110
[ 3680.685467]  ? __pfx_kthread+0x10/0x10
[ 3680.685471]  ret_from_fork+0x2f/0x50
[ 3680.685478]  ? __pfx_kthread+0x10/0x10
[ 3680.685482]  ret_from_fork_asm+0x1a/0x30
[ 3680.685489]  </TASK>
[ 3680.685491]
[ 3680.685491] other info that might help us debug this:
[ 3680.685497] 1 lock held by migration/0/19:
[ 3680.685500]  #0: ffffffff8284df38 (pgd_lock){+.+.}-{3:3}, at: xen_mm_pin_all+0x14/0xa0
[ 3680.685512]
[ 3680.685512] stack backtrace:
[ 3680.685518] CPU: 0 UID: 0 PID: 19 Comm: migration/0 Tainted: G        W          6.12.0+ torvalds#16
[ 3680.685528] Tainted: [W]=WARN
[ 3680.685531] Stopper: multi_cpu_stop+0x0/0x120 <- __stop_cpus.constprop.0+0x8c/0xd0
[ 3680.685538] Call Trace:
[ 3680.685541]  <TASK>
[ 3680.685544]  dump_stack_lvl+0x77/0xb0
[ 3680.685549]  __lock_acquire+0x93c/0x2310
[ 3680.685554]  lock_acquire+0xce/0x2c0
[ 3680.685558]  ? xen_pin_page+0x175/0x1d0
[ 3680.685562]  _raw_spin_lock_nest_lock+0x2f/0x70
[ 3680.685568]  ? xen_pin_page+0x175/0x1d0
[ 3680.685572]  xen_pin_page+0x175/0x1d0
[ 3680.685578]  ? __pfx_xen_pin_page+0x10/0x10
[ 3680.685582]  __xen_pgd_walk+0x233/0x2c0
[ 3680.685588]  ? stop_one_cpu+0x91/0x100
[ 3680.685592]  __xen_pgd_pin+0x5d/0x250
[ 3680.685596]  xen_mm_pin_all+0x70/0xa0
[ 3680.685600]  xen_pv_pre_suspend+0xf/0x280
[ 3680.685607]  xen_suspend+0x57/0x1a0
[ 3680.685611]  multi_cpu_stop+0x6b/0x120
[ 3680.685615]  ? update_cpumasks_hier+0x7c/0xa60
[ 3680.685620]  ? __pfx_multi_cpu_stop+0x10/0x10
[ 3680.685625]  cpu_stopper_thread+0x8c/0x140
[ 3680.685629]  ? smpboot_thread_fn+0x20/0x1f0
[ 3680.685634]  ? __pfx_smpboot_thread_fn+0x10/0x10
[ 3680.685638]  smpboot_thread_fn+0xed/0x1f0
[ 3680.685642]  kthread+0xde/0x110
[ 3680.685645]  ? __pfx_kthread+0x10/0x10
[ 3680.685649]  ret_from_fork+0x2f/0x50
[ 3680.685654]  ? __pfx_kthread+0x10/0x10
[ 3680.685657]  ret_from_fork_asm+0x1a/0x30
[ 3680.685662]  </TASK>
[ 3680.685267] xen:grant_table: Grant tables using version 1 layout
[ 3680.685921] OOM killer enabled.
[ 3680.685934] Restarting tasks ... done.

Signed-off-by: Maksym Planeta <maksym@exostellar.io>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20241204103516.3309112-1-maksym@exostellar.io>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Feb 19, 2025
[ Upstream commit 6d00234 ]

Function xen_pin_page calls xen_pte_lock, which in turn grab page
table lock (ptlock). When locking, xen_pte_lock expect mm->page_table_lock
to be held before grabbing ptlock, but this does not happen when pinning
is caused by xen_mm_pin_all.

This commit addresses lockdep warning below, which shows up when
suspending a Xen VM.

[ 3680.658422] Freezing user space processes
[ 3680.660156] Freezing user space processes completed (elapsed 0.001 seconds)
[ 3680.660182] OOM killer disabled.
[ 3680.660192] Freezing remaining freezable tasks
[ 3680.661485] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[ 3680.685254]
[ 3680.685265] ==================================
[ 3680.685269] WARNING: Nested lock was not taken
[ 3680.685274] 6.12.0+ torvalds#16 Tainted: G        W
[ 3680.685279] ----------------------------------
[ 3680.685283] migration/0/19 is trying to lock:
[ 3680.685288] ffff88800bac33c0 (ptlock_ptr(ptdesc)#2){+.+.}-{3:3}, at: xen_pin_page+0x175/0x1d0
[ 3680.685303]
[ 3680.685303] but this task is not holding:
[ 3680.685308] init_mm.page_table_lock
[ 3680.685311]
[ 3680.685311] stack backtrace:
[ 3680.685316] CPU: 0 UID: 0 PID: 19 Comm: migration/0 Tainted: G        W          6.12.0+ torvalds#16
[ 3680.685324] Tainted: [W]=WARN
[ 3680.685328] Stopper: multi_cpu_stop+0x0/0x120 <- __stop_cpus.constprop.0+0x8c/0xd0
[ 3680.685339] Call Trace:
[ 3680.685344]  <TASK>
[ 3680.685347]  dump_stack_lvl+0x77/0xb0
[ 3680.685356]  __lock_acquire+0x917/0x2310
[ 3680.685364]  lock_acquire+0xce/0x2c0
[ 3680.685369]  ? xen_pin_page+0x175/0x1d0
[ 3680.685373]  _raw_spin_lock_nest_lock+0x2f/0x70
[ 3680.685381]  ? xen_pin_page+0x175/0x1d0
[ 3680.685386]  xen_pin_page+0x175/0x1d0
[ 3680.685390]  ? __pfx_xen_pin_page+0x10/0x10
[ 3680.685394]  __xen_pgd_walk+0x233/0x2c0
[ 3680.685401]  ? stop_one_cpu+0x91/0x100
[ 3680.685405]  __xen_pgd_pin+0x5d/0x250
[ 3680.685410]  xen_mm_pin_all+0x70/0xa0
[ 3680.685415]  xen_pv_pre_suspend+0xf/0x280
[ 3680.685420]  xen_suspend+0x57/0x1a0
[ 3680.685428]  multi_cpu_stop+0x6b/0x120
[ 3680.685432]  ? update_cpumasks_hier+0x7c/0xa60
[ 3680.685439]  ? __pfx_multi_cpu_stop+0x10/0x10
[ 3680.685443]  cpu_stopper_thread+0x8c/0x140
[ 3680.685448]  ? smpboot_thread_fn+0x20/0x1f0
[ 3680.685454]  ? __pfx_smpboot_thread_fn+0x10/0x10
[ 3680.685458]  smpboot_thread_fn+0xed/0x1f0
[ 3680.685462]  kthread+0xde/0x110
[ 3680.685467]  ? __pfx_kthread+0x10/0x10
[ 3680.685471]  ret_from_fork+0x2f/0x50
[ 3680.685478]  ? __pfx_kthread+0x10/0x10
[ 3680.685482]  ret_from_fork_asm+0x1a/0x30
[ 3680.685489]  </TASK>
[ 3680.685491]
[ 3680.685491] other info that might help us debug this:
[ 3680.685497] 1 lock held by migration/0/19:
[ 3680.685500]  #0: ffffffff8284df38 (pgd_lock){+.+.}-{3:3}, at: xen_mm_pin_all+0x14/0xa0
[ 3680.685512]
[ 3680.685512] stack backtrace:
[ 3680.685518] CPU: 0 UID: 0 PID: 19 Comm: migration/0 Tainted: G        W          6.12.0+ torvalds#16
[ 3680.685528] Tainted: [W]=WARN
[ 3680.685531] Stopper: multi_cpu_stop+0x0/0x120 <- __stop_cpus.constprop.0+0x8c/0xd0
[ 3680.685538] Call Trace:
[ 3680.685541]  <TASK>
[ 3680.685544]  dump_stack_lvl+0x77/0xb0
[ 3680.685549]  __lock_acquire+0x93c/0x2310
[ 3680.685554]  lock_acquire+0xce/0x2c0
[ 3680.685558]  ? xen_pin_page+0x175/0x1d0
[ 3680.685562]  _raw_spin_lock_nest_lock+0x2f/0x70
[ 3680.685568]  ? xen_pin_page+0x175/0x1d0
[ 3680.685572]  xen_pin_page+0x175/0x1d0
[ 3680.685578]  ? __pfx_xen_pin_page+0x10/0x10
[ 3680.685582]  __xen_pgd_walk+0x233/0x2c0
[ 3680.685588]  ? stop_one_cpu+0x91/0x100
[ 3680.685592]  __xen_pgd_pin+0x5d/0x250
[ 3680.685596]  xen_mm_pin_all+0x70/0xa0
[ 3680.685600]  xen_pv_pre_suspend+0xf/0x280
[ 3680.685607]  xen_suspend+0x57/0x1a0
[ 3680.685611]  multi_cpu_stop+0x6b/0x120
[ 3680.685615]  ? update_cpumasks_hier+0x7c/0xa60
[ 3680.685620]  ? __pfx_multi_cpu_stop+0x10/0x10
[ 3680.685625]  cpu_stopper_thread+0x8c/0x140
[ 3680.685629]  ? smpboot_thread_fn+0x20/0x1f0
[ 3680.685634]  ? __pfx_smpboot_thread_fn+0x10/0x10
[ 3680.685638]  smpboot_thread_fn+0xed/0x1f0
[ 3680.685642]  kthread+0xde/0x110
[ 3680.685645]  ? __pfx_kthread+0x10/0x10
[ 3680.685649]  ret_from_fork+0x2f/0x50
[ 3680.685654]  ? __pfx_kthread+0x10/0x10
[ 3680.685657]  ret_from_fork_asm+0x1a/0x30
[ 3680.685662]  </TASK>
[ 3680.685267] xen:grant_table: Grant tables using version 1 layout
[ 3680.685921] OOM killer enabled.
[ 3680.685934] Restarting tasks ... done.

Signed-off-by: Maksym Planeta <maksym@exostellar.io>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20241204103516.3309112-1-maksym@exostellar.io>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Feb 20, 2025
When a bio with REQ_PREFLUSH is submitted to dm, __send_empty_flush()
generates a flush_bio with REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC,
which causes the flush_bio to be throttled by wbt_wait().

An example from v5.4, similar problem also exists in upstream:

    crash> bt 2091206
    PID: 2091206  TASK: ffff2050df92a300  CPU: 109  COMMAND: "kworker/u260:0"
     #0 [ffff800084a2f7f0] __switch_to at ffff80004008aeb8
     #1 [ffff800084a2f820] __schedule at ffff800040bfa0c4
     #2 [ffff800084a2f880] schedule at ffff800040bfa4b4
     #3 [ffff800084a2f8a0] io_schedule at ffff800040bfa9c4
     #4 [ffff800084a2f8c0] rq_qos_wait at ffff8000405925bc
     #5 [ffff800084a2f940] wbt_wait at ffff8000405bb3a0
     torvalds#6 [ffff800084a2f9a0] __rq_qos_throttle at ffff800040592254
     torvalds#7 [ffff800084a2f9c0] blk_mq_make_request at ffff80004057cf38
     torvalds#8 [ffff800084a2fa60] generic_make_request at ffff800040570138
     torvalds#9 [ffff800084a2fae0] submit_bio at ffff8000405703b4
    torvalds#10 [ffff800084a2fb50] xlog_write_iclog at ffff800001280834 [xfs]
    torvalds#11 [ffff800084a2fbb0] xlog_sync at ffff800001280c3c [xfs]
    torvalds#12 [ffff800084a2fbf0] xlog_state_release_iclog at ffff800001280df4 [xfs]
    torvalds#13 [ffff800084a2fc10] xlog_write at ffff80000128203c [xfs]
    torvalds#14 [ffff800084a2fcd0] xlog_cil_push at ffff8000012846dc [xfs]
    torvalds#15 [ffff800084a2fda0] xlog_cil_push_work at ffff800001284a2c [xfs]
    torvalds#16 [ffff800084a2fdb0] process_one_work at ffff800040111d08
    torvalds#17 [ffff800084a2fe00] worker_thread at ffff8000401121cc
    torvalds#18 [ffff800084a2fe70] kthread at ffff800040118de4

After commit 2def284 ("xfs: don't allow log IO to be throttled"),
the metadata submitted by xlog_write_iclog() should not be throttled.
But due to the existence of the dm layer, throttling flush_bio indirectly
causes the metadata bio to be throttled.

Fix this by conditionally adding REQ_IDLE to flush_bio.bi_opf, which makes
wbt_should_throttle() return false to avoid wbt_wait().

Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
Reviewed-by: Tianxiang Peng <txpeng@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
github-actions bot pushed a commit to anon503/linux that referenced this pull request Feb 20, 2025
[ Upstream commit 6d00234 ]

Function xen_pin_page calls xen_pte_lock, which in turn grab page
table lock (ptlock). When locking, xen_pte_lock expect mm->page_table_lock
to be held before grabbing ptlock, but this does not happen when pinning
is caused by xen_mm_pin_all.

This commit addresses lockdep warning below, which shows up when
suspending a Xen VM.

[ 3680.658422] Freezing user space processes
[ 3680.660156] Freezing user space processes completed (elapsed 0.001 seconds)
[ 3680.660182] OOM killer disabled.
[ 3680.660192] Freezing remaining freezable tasks
[ 3680.661485] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[ 3680.685254]
[ 3680.685265] ==================================
[ 3680.685269] WARNING: Nested lock was not taken
[ 3680.685274] 6.12.0+ torvalds#16 Tainted: G        W
[ 3680.685279] ----------------------------------
[ 3680.685283] migration/0/19 is trying to lock:
[ 3680.685288] ffff88800bac33c0 (ptlock_ptr(ptdesc)#2){+.+.}-{3:3}, at: xen_pin_page+0x175/0x1d0
[ 3680.685303]
[ 3680.685303] but this task is not holding:
[ 3680.685308] init_mm.page_table_lock
[ 3680.685311]
[ 3680.685311] stack backtrace:
[ 3680.685316] CPU: 0 UID: 0 PID: 19 Comm: migration/0 Tainted: G        W          6.12.0+ torvalds#16
[ 3680.685324] Tainted: [W]=WARN
[ 3680.685328] Stopper: multi_cpu_stop+0x0/0x120 <- __stop_cpus.constprop.0+0x8c/0xd0
[ 3680.685339] Call Trace:
[ 3680.685344]  <TASK>
[ 3680.685347]  dump_stack_lvl+0x77/0xb0
[ 3680.685356]  __lock_acquire+0x917/0x2310
[ 3680.685364]  lock_acquire+0xce/0x2c0
[ 3680.685369]  ? xen_pin_page+0x175/0x1d0
[ 3680.685373]  _raw_spin_lock_nest_lock+0x2f/0x70
[ 3680.685381]  ? xen_pin_page+0x175/0x1d0
[ 3680.685386]  xen_pin_page+0x175/0x1d0
[ 3680.685390]  ? __pfx_xen_pin_page+0x10/0x10
[ 3680.685394]  __xen_pgd_walk+0x233/0x2c0
[ 3680.685401]  ? stop_one_cpu+0x91/0x100
[ 3680.685405]  __xen_pgd_pin+0x5d/0x250
[ 3680.685410]  xen_mm_pin_all+0x70/0xa0
[ 3680.685415]  xen_pv_pre_suspend+0xf/0x280
[ 3680.685420]  xen_suspend+0x57/0x1a0
[ 3680.685428]  multi_cpu_stop+0x6b/0x120
[ 3680.685432]  ? update_cpumasks_hier+0x7c/0xa60
[ 3680.685439]  ? __pfx_multi_cpu_stop+0x10/0x10
[ 3680.685443]  cpu_stopper_thread+0x8c/0x140
[ 3680.685448]  ? smpboot_thread_fn+0x20/0x1f0
[ 3680.685454]  ? __pfx_smpboot_thread_fn+0x10/0x10
[ 3680.685458]  smpboot_thread_fn+0xed/0x1f0
[ 3680.685462]  kthread+0xde/0x110
[ 3680.685467]  ? __pfx_kthread+0x10/0x10
[ 3680.685471]  ret_from_fork+0x2f/0x50
[ 3680.685478]  ? __pfx_kthread+0x10/0x10
[ 3680.685482]  ret_from_fork_asm+0x1a/0x30
[ 3680.685489]  </TASK>
[ 3680.685491]
[ 3680.685491] other info that might help us debug this:
[ 3680.685497] 1 lock held by migration/0/19:
[ 3680.685500]  #0: ffffffff8284df38 (pgd_lock){+.+.}-{3:3}, at: xen_mm_pin_all+0x14/0xa0
[ 3680.685512]
[ 3680.685512] stack backtrace:
[ 3680.685518] CPU: 0 UID: 0 PID: 19 Comm: migration/0 Tainted: G        W          6.12.0+ torvalds#16
[ 3680.685528] Tainted: [W]=WARN
[ 3680.685531] Stopper: multi_cpu_stop+0x0/0x120 <- __stop_cpus.constprop.0+0x8c/0xd0
[ 3680.685538] Call Trace:
[ 3680.685541]  <TASK>
[ 3680.685544]  dump_stack_lvl+0x77/0xb0
[ 3680.685549]  __lock_acquire+0x93c/0x2310
[ 3680.685554]  lock_acquire+0xce/0x2c0
[ 3680.685558]  ? xen_pin_page+0x175/0x1d0
[ 3680.685562]  _raw_spin_lock_nest_lock+0x2f/0x70
[ 3680.685568]  ? xen_pin_page+0x175/0x1d0
[ 3680.685572]  xen_pin_page+0x175/0x1d0
[ 3680.685578]  ? __pfx_xen_pin_page+0x10/0x10
[ 3680.685582]  __xen_pgd_walk+0x233/0x2c0
[ 3680.685588]  ? stop_one_cpu+0x91/0x100
[ 3680.685592]  __xen_pgd_pin+0x5d/0x250
[ 3680.685596]  xen_mm_pin_all+0x70/0xa0
[ 3680.685600]  xen_pv_pre_suspend+0xf/0x280
[ 3680.685607]  xen_suspend+0x57/0x1a0
[ 3680.685611]  multi_cpu_stop+0x6b/0x120
[ 3680.685615]  ? update_cpumasks_hier+0x7c/0xa60
[ 3680.685620]  ? __pfx_multi_cpu_stop+0x10/0x10
[ 3680.685625]  cpu_stopper_thread+0x8c/0x140
[ 3680.685629]  ? smpboot_thread_fn+0x20/0x1f0
[ 3680.685634]  ? __pfx_smpboot_thread_fn+0x10/0x10
[ 3680.685638]  smpboot_thread_fn+0xed/0x1f0
[ 3680.685642]  kthread+0xde/0x110
[ 3680.685645]  ? __pfx_kthread+0x10/0x10
[ 3680.685649]  ret_from_fork+0x2f/0x50
[ 3680.685654]  ? __pfx_kthread+0x10/0x10
[ 3680.685657]  ret_from_fork_asm+0x1a/0x30
[ 3680.685662]  </TASK>
[ 3680.685267] xen:grant_table: Grant tables using version 1 layout
[ 3680.685921] OOM killer enabled.
[ 3680.685934] Restarting tasks ... done.

Signed-off-by: Maksym Planeta <maksym@exostellar.io>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20241204103516.3309112-1-maksym@exostellar.io>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
github-actions bot pushed a commit to anon503/linux that referenced this pull request Feb 20, 2025
[ Upstream commit 6d00234 ]

Function xen_pin_page calls xen_pte_lock, which in turn grab page
table lock (ptlock). When locking, xen_pte_lock expect mm->page_table_lock
to be held before grabbing ptlock, but this does not happen when pinning
is caused by xen_mm_pin_all.

This commit addresses lockdep warning below, which shows up when
suspending a Xen VM.

[ 3680.658422] Freezing user space processes
[ 3680.660156] Freezing user space processes completed (elapsed 0.001 seconds)
[ 3680.660182] OOM killer disabled.
[ 3680.660192] Freezing remaining freezable tasks
[ 3680.661485] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[ 3680.685254]
[ 3680.685265] ==================================
[ 3680.685269] WARNING: Nested lock was not taken
[ 3680.685274] 6.12.0+ torvalds#16 Tainted: G        W
[ 3680.685279] ----------------------------------
[ 3680.685283] migration/0/19 is trying to lock:
[ 3680.685288] ffff88800bac33c0 (ptlock_ptr(ptdesc)#2){+.+.}-{3:3}, at: xen_pin_page+0x175/0x1d0
[ 3680.685303]
[ 3680.685303] but this task is not holding:
[ 3680.685308] init_mm.page_table_lock
[ 3680.685311]
[ 3680.685311] stack backtrace:
[ 3680.685316] CPU: 0 UID: 0 PID: 19 Comm: migration/0 Tainted: G        W          6.12.0+ torvalds#16
[ 3680.685324] Tainted: [W]=WARN
[ 3680.685328] Stopper: multi_cpu_stop+0x0/0x120 <- __stop_cpus.constprop.0+0x8c/0xd0
[ 3680.685339] Call Trace:
[ 3680.685344]  <TASK>
[ 3680.685347]  dump_stack_lvl+0x77/0xb0
[ 3680.685356]  __lock_acquire+0x917/0x2310
[ 3680.685364]  lock_acquire+0xce/0x2c0
[ 3680.685369]  ? xen_pin_page+0x175/0x1d0
[ 3680.685373]  _raw_spin_lock_nest_lock+0x2f/0x70
[ 3680.685381]  ? xen_pin_page+0x175/0x1d0
[ 3680.685386]  xen_pin_page+0x175/0x1d0
[ 3680.685390]  ? __pfx_xen_pin_page+0x10/0x10
[ 3680.685394]  __xen_pgd_walk+0x233/0x2c0
[ 3680.685401]  ? stop_one_cpu+0x91/0x100
[ 3680.685405]  __xen_pgd_pin+0x5d/0x250
[ 3680.685410]  xen_mm_pin_all+0x70/0xa0
[ 3680.685415]  xen_pv_pre_suspend+0xf/0x280
[ 3680.685420]  xen_suspend+0x57/0x1a0
[ 3680.685428]  multi_cpu_stop+0x6b/0x120
[ 3680.685432]  ? update_cpumasks_hier+0x7c/0xa60
[ 3680.685439]  ? __pfx_multi_cpu_stop+0x10/0x10
[ 3680.685443]  cpu_stopper_thread+0x8c/0x140
[ 3680.685448]  ? smpboot_thread_fn+0x20/0x1f0
[ 3680.685454]  ? __pfx_smpboot_thread_fn+0x10/0x10
[ 3680.685458]  smpboot_thread_fn+0xed/0x1f0
[ 3680.685462]  kthread+0xde/0x110
[ 3680.685467]  ? __pfx_kthread+0x10/0x10
[ 3680.685471]  ret_from_fork+0x2f/0x50
[ 3680.685478]  ? __pfx_kthread+0x10/0x10
[ 3680.685482]  ret_from_fork_asm+0x1a/0x30
[ 3680.685489]  </TASK>
[ 3680.685491]
[ 3680.685491] other info that might help us debug this:
[ 3680.685497] 1 lock held by migration/0/19:
[ 3680.685500]  #0: ffffffff8284df38 (pgd_lock){+.+.}-{3:3}, at: xen_mm_pin_all+0x14/0xa0
[ 3680.685512]
[ 3680.685512] stack backtrace:
[ 3680.685518] CPU: 0 UID: 0 PID: 19 Comm: migration/0 Tainted: G        W          6.12.0+ torvalds#16
[ 3680.685528] Tainted: [W]=WARN
[ 3680.685531] Stopper: multi_cpu_stop+0x0/0x120 <- __stop_cpus.constprop.0+0x8c/0xd0
[ 3680.685538] Call Trace:
[ 3680.685541]  <TASK>
[ 3680.685544]  dump_stack_lvl+0x77/0xb0
[ 3680.685549]  __lock_acquire+0x93c/0x2310
[ 3680.685554]  lock_acquire+0xce/0x2c0
[ 3680.685558]  ? xen_pin_page+0x175/0x1d0
[ 3680.685562]  _raw_spin_lock_nest_lock+0x2f/0x70
[ 3680.685568]  ? xen_pin_page+0x175/0x1d0
[ 3680.685572]  xen_pin_page+0x175/0x1d0
[ 3680.685578]  ? __pfx_xen_pin_page+0x10/0x10
[ 3680.685582]  __xen_pgd_walk+0x233/0x2c0
[ 3680.685588]  ? stop_one_cpu+0x91/0x100
[ 3680.685592]  __xen_pgd_pin+0x5d/0x250
[ 3680.685596]  xen_mm_pin_all+0x70/0xa0
[ 3680.685600]  xen_pv_pre_suspend+0xf/0x280
[ 3680.685607]  xen_suspend+0x57/0x1a0
[ 3680.685611]  multi_cpu_stop+0x6b/0x120
[ 3680.685615]  ? update_cpumasks_hier+0x7c/0xa60
[ 3680.685620]  ? __pfx_multi_cpu_stop+0x10/0x10
[ 3680.685625]  cpu_stopper_thread+0x8c/0x140
[ 3680.685629]  ? smpboot_thread_fn+0x20/0x1f0
[ 3680.685634]  ? __pfx_smpboot_thread_fn+0x10/0x10
[ 3680.685638]  smpboot_thread_fn+0xed/0x1f0
[ 3680.685642]  kthread+0xde/0x110
[ 3680.685645]  ? __pfx_kthread+0x10/0x10
[ 3680.685649]  ret_from_fork+0x2f/0x50
[ 3680.685654]  ? __pfx_kthread+0x10/0x10
[ 3680.685657]  ret_from_fork_asm+0x1a/0x30
[ 3680.685662]  </TASK>
[ 3680.685267] xen:grant_table: Grant tables using version 1 layout
[ 3680.685921] OOM killer enabled.
[ 3680.685934] Restarting tasks ... done.

Signed-off-by: Maksym Planeta <maksym@exostellar.io>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20241204103516.3309112-1-maksym@exostellar.io>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
KexyBiscuit pushed a commit to AOSC-Tracking/linux that referenced this pull request Feb 21, 2025
[ Upstream commit 6d00234 ]

Function xen_pin_page calls xen_pte_lock, which in turn grab page
table lock (ptlock). When locking, xen_pte_lock expect mm->page_table_lock
to be held before grabbing ptlock, but this does not happen when pinning
is caused by xen_mm_pin_all.

This commit addresses lockdep warning below, which shows up when
suspending a Xen VM.

[ 3680.658422] Freezing user space processes
[ 3680.660156] Freezing user space processes completed (elapsed 0.001 seconds)
[ 3680.660182] OOM killer disabled.
[ 3680.660192] Freezing remaining freezable tasks
[ 3680.661485] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[ 3680.685254]
[ 3680.685265] ==================================
[ 3680.685269] WARNING: Nested lock was not taken
[ 3680.685274] 6.12.0+ torvalds#16 Tainted: G        W
[ 3680.685279] ----------------------------------
[ 3680.685283] migration/0/19 is trying to lock:
[ 3680.685288] ffff88800bac33c0 (ptlock_ptr(ptdesc)#2){+.+.}-{3:3}, at: xen_pin_page+0x175/0x1d0
[ 3680.685303]
[ 3680.685303] but this task is not holding:
[ 3680.685308] init_mm.page_table_lock
[ 3680.685311]
[ 3680.685311] stack backtrace:
[ 3680.685316] CPU: 0 UID: 0 PID: 19 Comm: migration/0 Tainted: G        W          6.12.0+ torvalds#16
[ 3680.685324] Tainted: [W]=WARN
[ 3680.685328] Stopper: multi_cpu_stop+0x0/0x120 <- __stop_cpus.constprop.0+0x8c/0xd0
[ 3680.685339] Call Trace:
[ 3680.685344]  <TASK>
[ 3680.685347]  dump_stack_lvl+0x77/0xb0
[ 3680.685356]  __lock_acquire+0x917/0x2310
[ 3680.685364]  lock_acquire+0xce/0x2c0
[ 3680.685369]  ? xen_pin_page+0x175/0x1d0
[ 3680.685373]  _raw_spin_lock_nest_lock+0x2f/0x70
[ 3680.685381]  ? xen_pin_page+0x175/0x1d0
[ 3680.685386]  xen_pin_page+0x175/0x1d0
[ 3680.685390]  ? __pfx_xen_pin_page+0x10/0x10
[ 3680.685394]  __xen_pgd_walk+0x233/0x2c0
[ 3680.685401]  ? stop_one_cpu+0x91/0x100
[ 3680.685405]  __xen_pgd_pin+0x5d/0x250
[ 3680.685410]  xen_mm_pin_all+0x70/0xa0
[ 3680.685415]  xen_pv_pre_suspend+0xf/0x280
[ 3680.685420]  xen_suspend+0x57/0x1a0
[ 3680.685428]  multi_cpu_stop+0x6b/0x120
[ 3680.685432]  ? update_cpumasks_hier+0x7c/0xa60
[ 3680.685439]  ? __pfx_multi_cpu_stop+0x10/0x10
[ 3680.685443]  cpu_stopper_thread+0x8c/0x140
[ 3680.685448]  ? smpboot_thread_fn+0x20/0x1f0
[ 3680.685454]  ? __pfx_smpboot_thread_fn+0x10/0x10
[ 3680.685458]  smpboot_thread_fn+0xed/0x1f0
[ 3680.685462]  kthread+0xde/0x110
[ 3680.685467]  ? __pfx_kthread+0x10/0x10
[ 3680.685471]  ret_from_fork+0x2f/0x50
[ 3680.685478]  ? __pfx_kthread+0x10/0x10
[ 3680.685482]  ret_from_fork_asm+0x1a/0x30
[ 3680.685489]  </TASK>
[ 3680.685491]
[ 3680.685491] other info that might help us debug this:
[ 3680.685497] 1 lock held by migration/0/19:
[ 3680.685500]  #0: ffffffff8284df38 (pgd_lock){+.+.}-{3:3}, at: xen_mm_pin_all+0x14/0xa0
[ 3680.685512]
[ 3680.685512] stack backtrace:
[ 3680.685518] CPU: 0 UID: 0 PID: 19 Comm: migration/0 Tainted: G        W          6.12.0+ torvalds#16
[ 3680.685528] Tainted: [W]=WARN
[ 3680.685531] Stopper: multi_cpu_stop+0x0/0x120 <- __stop_cpus.constprop.0+0x8c/0xd0
[ 3680.685538] Call Trace:
[ 3680.685541]  <TASK>
[ 3680.685544]  dump_stack_lvl+0x77/0xb0
[ 3680.685549]  __lock_acquire+0x93c/0x2310
[ 3680.685554]  lock_acquire+0xce/0x2c0
[ 3680.685558]  ? xen_pin_page+0x175/0x1d0
[ 3680.685562]  _raw_spin_lock_nest_lock+0x2f/0x70
[ 3680.685568]  ? xen_pin_page+0x175/0x1d0
[ 3680.685572]  xen_pin_page+0x175/0x1d0
[ 3680.685578]  ? __pfx_xen_pin_page+0x10/0x10
[ 3680.685582]  __xen_pgd_walk+0x233/0x2c0
[ 3680.685588]  ? stop_one_cpu+0x91/0x100
[ 3680.685592]  __xen_pgd_pin+0x5d/0x250
[ 3680.685596]  xen_mm_pin_all+0x70/0xa0
[ 3680.685600]  xen_pv_pre_suspend+0xf/0x280
[ 3680.685607]  xen_suspend+0x57/0x1a0
[ 3680.685611]  multi_cpu_stop+0x6b/0x120
[ 3680.685615]  ? update_cpumasks_hier+0x7c/0xa60
[ 3680.685620]  ? __pfx_multi_cpu_stop+0x10/0x10
[ 3680.685625]  cpu_stopper_thread+0x8c/0x140
[ 3680.685629]  ? smpboot_thread_fn+0x20/0x1f0
[ 3680.685634]  ? __pfx_smpboot_thread_fn+0x10/0x10
[ 3680.685638]  smpboot_thread_fn+0xed/0x1f0
[ 3680.685642]  kthread+0xde/0x110
[ 3680.685645]  ? __pfx_kthread+0x10/0x10
[ 3680.685649]  ret_from_fork+0x2f/0x50
[ 3680.685654]  ? __pfx_kthread+0x10/0x10
[ 3680.685657]  ret_from_fork_asm+0x1a/0x30
[ 3680.685662]  </TASK>
[ 3680.685267] xen:grant_table: Grant tables using version 1 layout
[ 3680.685921] OOM killer enabled.
[ 3680.685934] Restarting tasks ... done.

Signed-off-by: Maksym Planeta <maksym@exostellar.io>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20241204103516.3309112-1-maksym@exostellar.io>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Feb 21, 2025
[ Upstream commit 6d00234 ]

Function xen_pin_page calls xen_pte_lock, which in turn grab page
table lock (ptlock). When locking, xen_pte_lock expect mm->page_table_lock
to be held before grabbing ptlock, but this does not happen when pinning
is caused by xen_mm_pin_all.

This commit addresses lockdep warning below, which shows up when
suspending a Xen VM.

[ 3680.658422] Freezing user space processes
[ 3680.660156] Freezing user space processes completed (elapsed 0.001 seconds)
[ 3680.660182] OOM killer disabled.
[ 3680.660192] Freezing remaining freezable tasks
[ 3680.661485] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[ 3680.685254]
[ 3680.685265] ==================================
[ 3680.685269] WARNING: Nested lock was not taken
[ 3680.685274] 6.12.0+ torvalds#16 Tainted: G        W
[ 3680.685279] ----------------------------------
[ 3680.685283] migration/0/19 is trying to lock:
[ 3680.685288] ffff88800bac33c0 (ptlock_ptr(ptdesc)#2){+.+.}-{3:3}, at: xen_pin_page+0x175/0x1d0
[ 3680.685303]
[ 3680.685303] but this task is not holding:
[ 3680.685308] init_mm.page_table_lock
[ 3680.685311]
[ 3680.685311] stack backtrace:
[ 3680.685316] CPU: 0 UID: 0 PID: 19 Comm: migration/0 Tainted: G        W          6.12.0+ torvalds#16
[ 3680.685324] Tainted: [W]=WARN
[ 3680.685328] Stopper: multi_cpu_stop+0x0/0x120 <- __stop_cpus.constprop.0+0x8c/0xd0
[ 3680.685339] Call Trace:
[ 3680.685344]  <TASK>
[ 3680.685347]  dump_stack_lvl+0x77/0xb0
[ 3680.685356]  __lock_acquire+0x917/0x2310
[ 3680.685364]  lock_acquire+0xce/0x2c0
[ 3680.685369]  ? xen_pin_page+0x175/0x1d0
[ 3680.685373]  _raw_spin_lock_nest_lock+0x2f/0x70
[ 3680.685381]  ? xen_pin_page+0x175/0x1d0
[ 3680.685386]  xen_pin_page+0x175/0x1d0
[ 3680.685390]  ? __pfx_xen_pin_page+0x10/0x10
[ 3680.685394]  __xen_pgd_walk+0x233/0x2c0
[ 3680.685401]  ? stop_one_cpu+0x91/0x100
[ 3680.685405]  __xen_pgd_pin+0x5d/0x250
[ 3680.685410]  xen_mm_pin_all+0x70/0xa0
[ 3680.685415]  xen_pv_pre_suspend+0xf/0x280
[ 3680.685420]  xen_suspend+0x57/0x1a0
[ 3680.685428]  multi_cpu_stop+0x6b/0x120
[ 3680.685432]  ? update_cpumasks_hier+0x7c/0xa60
[ 3680.685439]  ? __pfx_multi_cpu_stop+0x10/0x10
[ 3680.685443]  cpu_stopper_thread+0x8c/0x140
[ 3680.685448]  ? smpboot_thread_fn+0x20/0x1f0
[ 3680.685454]  ? __pfx_smpboot_thread_fn+0x10/0x10
[ 3680.685458]  smpboot_thread_fn+0xed/0x1f0
[ 3680.685462]  kthread+0xde/0x110
[ 3680.685467]  ? __pfx_kthread+0x10/0x10
[ 3680.685471]  ret_from_fork+0x2f/0x50
[ 3680.685478]  ? __pfx_kthread+0x10/0x10
[ 3680.685482]  ret_from_fork_asm+0x1a/0x30
[ 3680.685489]  </TASK>
[ 3680.685491]
[ 3680.685491] other info that might help us debug this:
[ 3680.685497] 1 lock held by migration/0/19:
[ 3680.685500]  #0: ffffffff8284df38 (pgd_lock){+.+.}-{3:3}, at: xen_mm_pin_all+0x14/0xa0
[ 3680.685512]
[ 3680.685512] stack backtrace:
[ 3680.685518] CPU: 0 UID: 0 PID: 19 Comm: migration/0 Tainted: G        W          6.12.0+ torvalds#16
[ 3680.685528] Tainted: [W]=WARN
[ 3680.685531] Stopper: multi_cpu_stop+0x0/0x120 <- __stop_cpus.constprop.0+0x8c/0xd0
[ 3680.685538] Call Trace:
[ 3680.685541]  <TASK>
[ 3680.685544]  dump_stack_lvl+0x77/0xb0
[ 3680.685549]  __lock_acquire+0x93c/0x2310
[ 3680.685554]  lock_acquire+0xce/0x2c0
[ 3680.685558]  ? xen_pin_page+0x175/0x1d0
[ 3680.685562]  _raw_spin_lock_nest_lock+0x2f/0x70
[ 3680.685568]  ? xen_pin_page+0x175/0x1d0
[ 3680.685572]  xen_pin_page+0x175/0x1d0
[ 3680.685578]  ? __pfx_xen_pin_page+0x10/0x10
[ 3680.685582]  __xen_pgd_walk+0x233/0x2c0
[ 3680.685588]  ? stop_one_cpu+0x91/0x100
[ 3680.685592]  __xen_pgd_pin+0x5d/0x250
[ 3680.685596]  xen_mm_pin_all+0x70/0xa0
[ 3680.685600]  xen_pv_pre_suspend+0xf/0x280
[ 3680.685607]  xen_suspend+0x57/0x1a0
[ 3680.685611]  multi_cpu_stop+0x6b/0x120
[ 3680.685615]  ? update_cpumasks_hier+0x7c/0xa60
[ 3680.685620]  ? __pfx_multi_cpu_stop+0x10/0x10
[ 3680.685625]  cpu_stopper_thread+0x8c/0x140
[ 3680.685629]  ? smpboot_thread_fn+0x20/0x1f0
[ 3680.685634]  ? __pfx_smpboot_thread_fn+0x10/0x10
[ 3680.685638]  smpboot_thread_fn+0xed/0x1f0
[ 3680.685642]  kthread+0xde/0x110
[ 3680.685645]  ? __pfx_kthread+0x10/0x10
[ 3680.685649]  ret_from_fork+0x2f/0x50
[ 3680.685654]  ? __pfx_kthread+0x10/0x10
[ 3680.685657]  ret_from_fork_asm+0x1a/0x30
[ 3680.685662]  </TASK>
[ 3680.685267] xen:grant_table: Grant tables using version 1 layout
[ 3680.685921] OOM killer enabled.
[ 3680.685934] Restarting tasks ... done.

Signed-off-by: Maksym Planeta <maksym@exostellar.io>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20241204103516.3309112-1-maksym@exostellar.io>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
ptr1337 pushed a commit to CachyOS/linux that referenced this pull request Feb 21, 2025
[ Upstream commit 6d00234 ]

Function xen_pin_page calls xen_pte_lock, which in turn grab page
table lock (ptlock). When locking, xen_pte_lock expect mm->page_table_lock
to be held before grabbing ptlock, but this does not happen when pinning
is caused by xen_mm_pin_all.

This commit addresses lockdep warning below, which shows up when
suspending a Xen VM.

[ 3680.658422] Freezing user space processes
[ 3680.660156] Freezing user space processes completed (elapsed 0.001 seconds)
[ 3680.660182] OOM killer disabled.
[ 3680.660192] Freezing remaining freezable tasks
[ 3680.661485] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[ 3680.685254]
[ 3680.685265] ==================================
[ 3680.685269] WARNING: Nested lock was not taken
[ 3680.685274] 6.12.0+ torvalds#16 Tainted: G        W
[ 3680.685279] ----------------------------------
[ 3680.685283] migration/0/19 is trying to lock:
[ 3680.685288] ffff88800bac33c0 (ptlock_ptr(ptdesc)#2){+.+.}-{3:3}, at: xen_pin_page+0x175/0x1d0
[ 3680.685303]
[ 3680.685303] but this task is not holding:
[ 3680.685308] init_mm.page_table_lock
[ 3680.685311]
[ 3680.685311] stack backtrace:
[ 3680.685316] CPU: 0 UID: 0 PID: 19 Comm: migration/0 Tainted: G        W          6.12.0+ torvalds#16
[ 3680.685324] Tainted: [W]=WARN
[ 3680.685328] Stopper: multi_cpu_stop+0x0/0x120 <- __stop_cpus.constprop.0+0x8c/0xd0
[ 3680.685339] Call Trace:
[ 3680.685344]  <TASK>
[ 3680.685347]  dump_stack_lvl+0x77/0xb0
[ 3680.685356]  __lock_acquire+0x917/0x2310
[ 3680.685364]  lock_acquire+0xce/0x2c0
[ 3680.685369]  ? xen_pin_page+0x175/0x1d0
[ 3680.685373]  _raw_spin_lock_nest_lock+0x2f/0x70
[ 3680.685381]  ? xen_pin_page+0x175/0x1d0
[ 3680.685386]  xen_pin_page+0x175/0x1d0
[ 3680.685390]  ? __pfx_xen_pin_page+0x10/0x10
[ 3680.685394]  __xen_pgd_walk+0x233/0x2c0
[ 3680.685401]  ? stop_one_cpu+0x91/0x100
[ 3680.685405]  __xen_pgd_pin+0x5d/0x250
[ 3680.685410]  xen_mm_pin_all+0x70/0xa0
[ 3680.685415]  xen_pv_pre_suspend+0xf/0x280
[ 3680.685420]  xen_suspend+0x57/0x1a0
[ 3680.685428]  multi_cpu_stop+0x6b/0x120
[ 3680.685432]  ? update_cpumasks_hier+0x7c/0xa60
[ 3680.685439]  ? __pfx_multi_cpu_stop+0x10/0x10
[ 3680.685443]  cpu_stopper_thread+0x8c/0x140
[ 3680.685448]  ? smpboot_thread_fn+0x20/0x1f0
[ 3680.685454]  ? __pfx_smpboot_thread_fn+0x10/0x10
[ 3680.685458]  smpboot_thread_fn+0xed/0x1f0
[ 3680.685462]  kthread+0xde/0x110
[ 3680.685467]  ? __pfx_kthread+0x10/0x10
[ 3680.685471]  ret_from_fork+0x2f/0x50
[ 3680.685478]  ? __pfx_kthread+0x10/0x10
[ 3680.685482]  ret_from_fork_asm+0x1a/0x30
[ 3680.685489]  </TASK>
[ 3680.685491]
[ 3680.685491] other info that might help us debug this:
[ 3680.685497] 1 lock held by migration/0/19:
[ 3680.685500]  #0: ffffffff8284df38 (pgd_lock){+.+.}-{3:3}, at: xen_mm_pin_all+0x14/0xa0
[ 3680.685512]
[ 3680.685512] stack backtrace:
[ 3680.685518] CPU: 0 UID: 0 PID: 19 Comm: migration/0 Tainted: G        W          6.12.0+ torvalds#16
[ 3680.685528] Tainted: [W]=WARN
[ 3680.685531] Stopper: multi_cpu_stop+0x0/0x120 <- __stop_cpus.constprop.0+0x8c/0xd0
[ 3680.685538] Call Trace:
[ 3680.685541]  <TASK>
[ 3680.685544]  dump_stack_lvl+0x77/0xb0
[ 3680.685549]  __lock_acquire+0x93c/0x2310
[ 3680.685554]  lock_acquire+0xce/0x2c0
[ 3680.685558]  ? xen_pin_page+0x175/0x1d0
[ 3680.685562]  _raw_spin_lock_nest_lock+0x2f/0x70
[ 3680.685568]  ? xen_pin_page+0x175/0x1d0
[ 3680.685572]  xen_pin_page+0x175/0x1d0
[ 3680.685578]  ? __pfx_xen_pin_page+0x10/0x10
[ 3680.685582]  __xen_pgd_walk+0x233/0x2c0
[ 3680.685588]  ? stop_one_cpu+0x91/0x100
[ 3680.685592]  __xen_pgd_pin+0x5d/0x250
[ 3680.685596]  xen_mm_pin_all+0x70/0xa0
[ 3680.685600]  xen_pv_pre_suspend+0xf/0x280
[ 3680.685607]  xen_suspend+0x57/0x1a0
[ 3680.685611]  multi_cpu_stop+0x6b/0x120
[ 3680.685615]  ? update_cpumasks_hier+0x7c/0xa60
[ 3680.685620]  ? __pfx_multi_cpu_stop+0x10/0x10
[ 3680.685625]  cpu_stopper_thread+0x8c/0x140
[ 3680.685629]  ? smpboot_thread_fn+0x20/0x1f0
[ 3680.685634]  ? __pfx_smpboot_thread_fn+0x10/0x10
[ 3680.685638]  smpboot_thread_fn+0xed/0x1f0
[ 3680.685642]  kthread+0xde/0x110
[ 3680.685645]  ? __pfx_kthread+0x10/0x10
[ 3680.685649]  ret_from_fork+0x2f/0x50
[ 3680.685654]  ? __pfx_kthread+0x10/0x10
[ 3680.685657]  ret_from_fork_asm+0x1a/0x30
[ 3680.685662]  </TASK>
[ 3680.685267] xen:grant_table: Grant tables using version 1 layout
[ 3680.685921] OOM killer enabled.
[ 3680.685934] Restarting tasks ... done.

Signed-off-by: Maksym Planeta <maksym@exostellar.io>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20241204103516.3309112-1-maksym@exostellar.io>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
staging-kernelci-org pushed a commit to kernelci/linux that referenced this pull request Feb 21, 2025
[ Upstream commit 6d00234 ]

Function xen_pin_page calls xen_pte_lock, which in turn grab page
table lock (ptlock). When locking, xen_pte_lock expect mm->page_table_lock
to be held before grabbing ptlock, but this does not happen when pinning
is caused by xen_mm_pin_all.

This commit addresses lockdep warning below, which shows up when
suspending a Xen VM.

[ 3680.658422] Freezing user space processes
[ 3680.660156] Freezing user space processes completed (elapsed 0.001 seconds)
[ 3680.660182] OOM killer disabled.
[ 3680.660192] Freezing remaining freezable tasks
[ 3680.661485] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[ 3680.685254]
[ 3680.685265] ==================================
[ 3680.685269] WARNING: Nested lock was not taken
[ 3680.685274] 6.12.0+ torvalds#16 Tainted: G        W
[ 3680.685279] ----------------------------------
[ 3680.685283] migration/0/19 is trying to lock:
[ 3680.685288] ffff88800bac33c0 (ptlock_ptr(ptdesc)#2){+.+.}-{3:3}, at: xen_pin_page+0x175/0x1d0
[ 3680.685303]
[ 3680.685303] but this task is not holding:
[ 3680.685308] init_mm.page_table_lock
[ 3680.685311]
[ 3680.685311] stack backtrace:
[ 3680.685316] CPU: 0 UID: 0 PID: 19 Comm: migration/0 Tainted: G        W          6.12.0+ torvalds#16
[ 3680.685324] Tainted: [W]=WARN
[ 3680.685328] Stopper: multi_cpu_stop+0x0/0x120 <- __stop_cpus.constprop.0+0x8c/0xd0
[ 3680.685339] Call Trace:
[ 3680.685344]  <TASK>
[ 3680.685347]  dump_stack_lvl+0x77/0xb0
[ 3680.685356]  __lock_acquire+0x917/0x2310
[ 3680.685364]  lock_acquire+0xce/0x2c0
[ 3680.685369]  ? xen_pin_page+0x175/0x1d0
[ 3680.685373]  _raw_spin_lock_nest_lock+0x2f/0x70
[ 3680.685381]  ? xen_pin_page+0x175/0x1d0
[ 3680.685386]  xen_pin_page+0x175/0x1d0
[ 3680.685390]  ? __pfx_xen_pin_page+0x10/0x10
[ 3680.685394]  __xen_pgd_walk+0x233/0x2c0
[ 3680.685401]  ? stop_one_cpu+0x91/0x100
[ 3680.685405]  __xen_pgd_pin+0x5d/0x250
[ 3680.685410]  xen_mm_pin_all+0x70/0xa0
[ 3680.685415]  xen_pv_pre_suspend+0xf/0x280
[ 3680.685420]  xen_suspend+0x57/0x1a0
[ 3680.685428]  multi_cpu_stop+0x6b/0x120
[ 3680.685432]  ? update_cpumasks_hier+0x7c/0xa60
[ 3680.685439]  ? __pfx_multi_cpu_stop+0x10/0x10
[ 3680.685443]  cpu_stopper_thread+0x8c/0x140
[ 3680.685448]  ? smpboot_thread_fn+0x20/0x1f0
[ 3680.685454]  ? __pfx_smpboot_thread_fn+0x10/0x10
[ 3680.685458]  smpboot_thread_fn+0xed/0x1f0
[ 3680.685462]  kthread+0xde/0x110
[ 3680.685467]  ? __pfx_kthread+0x10/0x10
[ 3680.685471]  ret_from_fork+0x2f/0x50
[ 3680.685478]  ? __pfx_kthread+0x10/0x10
[ 3680.685482]  ret_from_fork_asm+0x1a/0x30
[ 3680.685489]  </TASK>
[ 3680.685491]
[ 3680.685491] other info that might help us debug this:
[ 3680.685497] 1 lock held by migration/0/19:
[ 3680.685500]  #0: ffffffff8284df38 (pgd_lock){+.+.}-{3:3}, at: xen_mm_pin_all+0x14/0xa0
[ 3680.685512]
[ 3680.685512] stack backtrace:
[ 3680.685518] CPU: 0 UID: 0 PID: 19 Comm: migration/0 Tainted: G        W          6.12.0+ torvalds#16
[ 3680.685528] Tainted: [W]=WARN
[ 3680.685531] Stopper: multi_cpu_stop+0x0/0x120 <- __stop_cpus.constprop.0+0x8c/0xd0
[ 3680.685538] Call Trace:
[ 3680.685541]  <TASK>
[ 3680.685544]  dump_stack_lvl+0x77/0xb0
[ 3680.685549]  __lock_acquire+0x93c/0x2310
[ 3680.685554]  lock_acquire+0xce/0x2c0
[ 3680.685558]  ? xen_pin_page+0x175/0x1d0
[ 3680.685562]  _raw_spin_lock_nest_lock+0x2f/0x70
[ 3680.685568]  ? xen_pin_page+0x175/0x1d0
[ 3680.685572]  xen_pin_page+0x175/0x1d0
[ 3680.685578]  ? __pfx_xen_pin_page+0x10/0x10
[ 3680.685582]  __xen_pgd_walk+0x233/0x2c0
[ 3680.685588]  ? stop_one_cpu+0x91/0x100
[ 3680.685592]  __xen_pgd_pin+0x5d/0x250
[ 3680.685596]  xen_mm_pin_all+0x70/0xa0
[ 3680.685600]  xen_pv_pre_suspend+0xf/0x280
[ 3680.685607]  xen_suspend+0x57/0x1a0
[ 3680.685611]  multi_cpu_stop+0x6b/0x120
[ 3680.685615]  ? update_cpumasks_hier+0x7c/0xa60
[ 3680.685620]  ? __pfx_multi_cpu_stop+0x10/0x10
[ 3680.685625]  cpu_stopper_thread+0x8c/0x140
[ 3680.685629]  ? smpboot_thread_fn+0x20/0x1f0
[ 3680.685634]  ? __pfx_smpboot_thread_fn+0x10/0x10
[ 3680.685638]  smpboot_thread_fn+0xed/0x1f0
[ 3680.685642]  kthread+0xde/0x110
[ 3680.685645]  ? __pfx_kthread+0x10/0x10
[ 3680.685649]  ret_from_fork+0x2f/0x50
[ 3680.685654]  ? __pfx_kthread+0x10/0x10
[ 3680.685657]  ret_from_fork_asm+0x1a/0x30
[ 3680.685662]  </TASK>
[ 3680.685267] xen:grant_table: Grant tables using version 1 layout
[ 3680.685921] OOM killer enabled.
[ 3680.685934] Restarting tasks ... done.

Signed-off-by: Maksym Planeta <maksym@exostellar.io>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20241204103516.3309112-1-maksym@exostellar.io>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
steev pushed a commit to steev/linux that referenced this pull request Feb 23, 2025
[ Upstream commit 6d00234 ]

Function xen_pin_page calls xen_pte_lock, which in turn grab page
table lock (ptlock). When locking, xen_pte_lock expect mm->page_table_lock
to be held before grabbing ptlock, but this does not happen when pinning
is caused by xen_mm_pin_all.

This commit addresses lockdep warning below, which shows up when
suspending a Xen VM.

[ 3680.658422] Freezing user space processes
[ 3680.660156] Freezing user space processes completed (elapsed 0.001 seconds)
[ 3680.660182] OOM killer disabled.
[ 3680.660192] Freezing remaining freezable tasks
[ 3680.661485] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[ 3680.685254]
[ 3680.685265] ==================================
[ 3680.685269] WARNING: Nested lock was not taken
[ 3680.685274] 6.12.0+ torvalds#16 Tainted: G        W
[ 3680.685279] ----------------------------------
[ 3680.685283] migration/0/19 is trying to lock:
[ 3680.685288] ffff88800bac33c0 (ptlock_ptr(ptdesc)#2){+.+.}-{3:3}, at: xen_pin_page+0x175/0x1d0
[ 3680.685303]
[ 3680.685303] but this task is not holding:
[ 3680.685308] init_mm.page_table_lock
[ 3680.685311]
[ 3680.685311] stack backtrace:
[ 3680.685316] CPU: 0 UID: 0 PID: 19 Comm: migration/0 Tainted: G        W          6.12.0+ torvalds#16
[ 3680.685324] Tainted: [W]=WARN
[ 3680.685328] Stopper: multi_cpu_stop+0x0/0x120 <- __stop_cpus.constprop.0+0x8c/0xd0
[ 3680.685339] Call Trace:
[ 3680.685344]  <TASK>
[ 3680.685347]  dump_stack_lvl+0x77/0xb0
[ 3680.685356]  __lock_acquire+0x917/0x2310
[ 3680.685364]  lock_acquire+0xce/0x2c0
[ 3680.685369]  ? xen_pin_page+0x175/0x1d0
[ 3680.685373]  _raw_spin_lock_nest_lock+0x2f/0x70
[ 3680.685381]  ? xen_pin_page+0x175/0x1d0
[ 3680.685386]  xen_pin_page+0x175/0x1d0
[ 3680.685390]  ? __pfx_xen_pin_page+0x10/0x10
[ 3680.685394]  __xen_pgd_walk+0x233/0x2c0
[ 3680.685401]  ? stop_one_cpu+0x91/0x100
[ 3680.685405]  __xen_pgd_pin+0x5d/0x250
[ 3680.685410]  xen_mm_pin_all+0x70/0xa0
[ 3680.685415]  xen_pv_pre_suspend+0xf/0x280
[ 3680.685420]  xen_suspend+0x57/0x1a0
[ 3680.685428]  multi_cpu_stop+0x6b/0x120
[ 3680.685432]  ? update_cpumasks_hier+0x7c/0xa60
[ 3680.685439]  ? __pfx_multi_cpu_stop+0x10/0x10
[ 3680.685443]  cpu_stopper_thread+0x8c/0x140
[ 3680.685448]  ? smpboot_thread_fn+0x20/0x1f0
[ 3680.685454]  ? __pfx_smpboot_thread_fn+0x10/0x10
[ 3680.685458]  smpboot_thread_fn+0xed/0x1f0
[ 3680.685462]  kthread+0xde/0x110
[ 3680.685467]  ? __pfx_kthread+0x10/0x10
[ 3680.685471]  ret_from_fork+0x2f/0x50
[ 3680.685478]  ? __pfx_kthread+0x10/0x10
[ 3680.685482]  ret_from_fork_asm+0x1a/0x30
[ 3680.685489]  </TASK>
[ 3680.685491]
[ 3680.685491] other info that might help us debug this:
[ 3680.685497] 1 lock held by migration/0/19:
[ 3680.685500]  #0: ffffffff8284df38 (pgd_lock){+.+.}-{3:3}, at: xen_mm_pin_all+0x14/0xa0
[ 3680.685512]
[ 3680.685512] stack backtrace:
[ 3680.685518] CPU: 0 UID: 0 PID: 19 Comm: migration/0 Tainted: G        W          6.12.0+ torvalds#16
[ 3680.685528] Tainted: [W]=WARN
[ 3680.685531] Stopper: multi_cpu_stop+0x0/0x120 <- __stop_cpus.constprop.0+0x8c/0xd0
[ 3680.685538] Call Trace:
[ 3680.685541]  <TASK>
[ 3680.685544]  dump_stack_lvl+0x77/0xb0
[ 3680.685549]  __lock_acquire+0x93c/0x2310
[ 3680.685554]  lock_acquire+0xce/0x2c0
[ 3680.685558]  ? xen_pin_page+0x175/0x1d0
[ 3680.685562]  _raw_spin_lock_nest_lock+0x2f/0x70
[ 3680.685568]  ? xen_pin_page+0x175/0x1d0
[ 3680.685572]  xen_pin_page+0x175/0x1d0
[ 3680.685578]  ? __pfx_xen_pin_page+0x10/0x10
[ 3680.685582]  __xen_pgd_walk+0x233/0x2c0
[ 3680.685588]  ? stop_one_cpu+0x91/0x100
[ 3680.685592]  __xen_pgd_pin+0x5d/0x250
[ 3680.685596]  xen_mm_pin_all+0x70/0xa0
[ 3680.685600]  xen_pv_pre_suspend+0xf/0x280
[ 3680.685607]  xen_suspend+0x57/0x1a0
[ 3680.685611]  multi_cpu_stop+0x6b/0x120
[ 3680.685615]  ? update_cpumasks_hier+0x7c/0xa60
[ 3680.685620]  ? __pfx_multi_cpu_stop+0x10/0x10
[ 3680.685625]  cpu_stopper_thread+0x8c/0x140
[ 3680.685629]  ? smpboot_thread_fn+0x20/0x1f0
[ 3680.685634]  ? __pfx_smpboot_thread_fn+0x10/0x10
[ 3680.685638]  smpboot_thread_fn+0xed/0x1f0
[ 3680.685642]  kthread+0xde/0x110
[ 3680.685645]  ? __pfx_kthread+0x10/0x10
[ 3680.685649]  ret_from_fork+0x2f/0x50
[ 3680.685654]  ? __pfx_kthread+0x10/0x10
[ 3680.685657]  ret_from_fork_asm+0x1a/0x30
[ 3680.685662]  </TASK>
[ 3680.685267] xen:grant_table: Grant tables using version 1 layout
[ 3680.685921] OOM killer enabled.
[ 3680.685934] Restarting tasks ... done.

Signed-off-by: Maksym Planeta <maksym@exostellar.io>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20241204103516.3309112-1-maksym@exostellar.io>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
github-actions bot pushed a commit to anon503/linux that referenced this pull request Feb 24, 2025
[ Upstream commit 6d00234 ]

Function xen_pin_page calls xen_pte_lock, which in turn grab page
table lock (ptlock). When locking, xen_pte_lock expect mm->page_table_lock
to be held before grabbing ptlock, but this does not happen when pinning
is caused by xen_mm_pin_all.

This commit addresses lockdep warning below, which shows up when
suspending a Xen VM.

[ 3680.658422] Freezing user space processes
[ 3680.660156] Freezing user space processes completed (elapsed 0.001 seconds)
[ 3680.660182] OOM killer disabled.
[ 3680.660192] Freezing remaining freezable tasks
[ 3680.661485] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[ 3680.685254]
[ 3680.685265] ==================================
[ 3680.685269] WARNING: Nested lock was not taken
[ 3680.685274] 6.12.0+ torvalds#16 Tainted: G        W
[ 3680.685279] ----------------------------------
[ 3680.685283] migration/0/19 is trying to lock:
[ 3680.685288] ffff88800bac33c0 (ptlock_ptr(ptdesc)#2){+.+.}-{3:3}, at: xen_pin_page+0x175/0x1d0
[ 3680.685303]
[ 3680.685303] but this task is not holding:
[ 3680.685308] init_mm.page_table_lock
[ 3680.685311]
[ 3680.685311] stack backtrace:
[ 3680.685316] CPU: 0 UID: 0 PID: 19 Comm: migration/0 Tainted: G        W          6.12.0+ torvalds#16
[ 3680.685324] Tainted: [W]=WARN
[ 3680.685328] Stopper: multi_cpu_stop+0x0/0x120 <- __stop_cpus.constprop.0+0x8c/0xd0
[ 3680.685339] Call Trace:
[ 3680.685344]  <TASK>
[ 3680.685347]  dump_stack_lvl+0x77/0xb0
[ 3680.685356]  __lock_acquire+0x917/0x2310
[ 3680.685364]  lock_acquire+0xce/0x2c0
[ 3680.685369]  ? xen_pin_page+0x175/0x1d0
[ 3680.685373]  _raw_spin_lock_nest_lock+0x2f/0x70
[ 3680.685381]  ? xen_pin_page+0x175/0x1d0
[ 3680.685386]  xen_pin_page+0x175/0x1d0
[ 3680.685390]  ? __pfx_xen_pin_page+0x10/0x10
[ 3680.685394]  __xen_pgd_walk+0x233/0x2c0
[ 3680.685401]  ? stop_one_cpu+0x91/0x100
[ 3680.685405]  __xen_pgd_pin+0x5d/0x250
[ 3680.685410]  xen_mm_pin_all+0x70/0xa0
[ 3680.685415]  xen_pv_pre_suspend+0xf/0x280
[ 3680.685420]  xen_suspend+0x57/0x1a0
[ 3680.685428]  multi_cpu_stop+0x6b/0x120
[ 3680.685432]  ? update_cpumasks_hier+0x7c/0xa60
[ 3680.685439]  ? __pfx_multi_cpu_stop+0x10/0x10
[ 3680.685443]  cpu_stopper_thread+0x8c/0x140
[ 3680.685448]  ? smpboot_thread_fn+0x20/0x1f0
[ 3680.685454]  ? __pfx_smpboot_thread_fn+0x10/0x10
[ 3680.685458]  smpboot_thread_fn+0xed/0x1f0
[ 3680.685462]  kthread+0xde/0x110
[ 3680.685467]  ? __pfx_kthread+0x10/0x10
[ 3680.685471]  ret_from_fork+0x2f/0x50
[ 3680.685478]  ? __pfx_kthread+0x10/0x10
[ 3680.685482]  ret_from_fork_asm+0x1a/0x30
[ 3680.685489]  </TASK>
[ 3680.685491]
[ 3680.685491] other info that might help us debug this:
[ 3680.685497] 1 lock held by migration/0/19:
[ 3680.685500]  #0: ffffffff8284df38 (pgd_lock){+.+.}-{3:3}, at: xen_mm_pin_all+0x14/0xa0
[ 3680.685512]
[ 3680.685512] stack backtrace:
[ 3680.685518] CPU: 0 UID: 0 PID: 19 Comm: migration/0 Tainted: G        W          6.12.0+ torvalds#16
[ 3680.685528] Tainted: [W]=WARN
[ 3680.685531] Stopper: multi_cpu_stop+0x0/0x120 <- __stop_cpus.constprop.0+0x8c/0xd0
[ 3680.685538] Call Trace:
[ 3680.685541]  <TASK>
[ 3680.685544]  dump_stack_lvl+0x77/0xb0
[ 3680.685549]  __lock_acquire+0x93c/0x2310
[ 3680.685554]  lock_acquire+0xce/0x2c0
[ 3680.685558]  ? xen_pin_page+0x175/0x1d0
[ 3680.685562]  _raw_spin_lock_nest_lock+0x2f/0x70
[ 3680.685568]  ? xen_pin_page+0x175/0x1d0
[ 3680.685572]  xen_pin_page+0x175/0x1d0
[ 3680.685578]  ? __pfx_xen_pin_page+0x10/0x10
[ 3680.685582]  __xen_pgd_walk+0x233/0x2c0
[ 3680.685588]  ? stop_one_cpu+0x91/0x100
[ 3680.685592]  __xen_pgd_pin+0x5d/0x250
[ 3680.685596]  xen_mm_pin_all+0x70/0xa0
[ 3680.685600]  xen_pv_pre_suspend+0xf/0x280
[ 3680.685607]  xen_suspend+0x57/0x1a0
[ 3680.685611]  multi_cpu_stop+0x6b/0x120
[ 3680.685615]  ? update_cpumasks_hier+0x7c/0xa60
[ 3680.685620]  ? __pfx_multi_cpu_stop+0x10/0x10
[ 3680.685625]  cpu_stopper_thread+0x8c/0x140
[ 3680.685629]  ? smpboot_thread_fn+0x20/0x1f0
[ 3680.685634]  ? __pfx_smpboot_thread_fn+0x10/0x10
[ 3680.685638]  smpboot_thread_fn+0xed/0x1f0
[ 3680.685642]  kthread+0xde/0x110
[ 3680.685645]  ? __pfx_kthread+0x10/0x10
[ 3680.685649]  ret_from_fork+0x2f/0x50
[ 3680.685654]  ? __pfx_kthread+0x10/0x10
[ 3680.685657]  ret_from_fork_asm+0x1a/0x30
[ 3680.685662]  </TASK>
[ 3680.685267] xen:grant_table: Grant tables using version 1 layout
[ 3680.685921] OOM killer enabled.
[ 3680.685934] Restarting tasks ... done.

Signed-off-by: Maksym Planeta <maksym@exostellar.io>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20241204103516.3309112-1-maksym@exostellar.io>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
github-actions bot pushed a commit to anon503/linux that referenced this pull request Feb 25, 2025
When a bio with REQ_PREFLUSH is submitted to dm, __send_empty_flush()
generates a flush_bio with REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC,
which causes the flush_bio to be throttled by wbt_wait().

An example from v5.4, similar problem also exists in upstream:

    crash> bt 2091206
    PID: 2091206  TASK: ffff2050df92a300  CPU: 109  COMMAND: "kworker/u260:0"
     #0 [ffff800084a2f7f0] __switch_to at ffff80004008aeb8
     #1 [ffff800084a2f820] __schedule at ffff800040bfa0c4
     #2 [ffff800084a2f880] schedule at ffff800040bfa4b4
     #3 [ffff800084a2f8a0] io_schedule at ffff800040bfa9c4
     #4 [ffff800084a2f8c0] rq_qos_wait at ffff8000405925bc
     #5 [ffff800084a2f940] wbt_wait at ffff8000405bb3a0
     torvalds#6 [ffff800084a2f9a0] __rq_qos_throttle at ffff800040592254
     torvalds#7 [ffff800084a2f9c0] blk_mq_make_request at ffff80004057cf38
     torvalds#8 [ffff800084a2fa60] generic_make_request at ffff800040570138
     torvalds#9 [ffff800084a2fae0] submit_bio at ffff8000405703b4
    torvalds#10 [ffff800084a2fb50] xlog_write_iclog at ffff800001280834 [xfs]
    torvalds#11 [ffff800084a2fbb0] xlog_sync at ffff800001280c3c [xfs]
    torvalds#12 [ffff800084a2fbf0] xlog_state_release_iclog at ffff800001280df4 [xfs]
    torvalds#13 [ffff800084a2fc10] xlog_write at ffff80000128203c [xfs]
    torvalds#14 [ffff800084a2fcd0] xlog_cil_push at ffff8000012846dc [xfs]
    torvalds#15 [ffff800084a2fda0] xlog_cil_push_work at ffff800001284a2c [xfs]
    torvalds#16 [ffff800084a2fdb0] process_one_work at ffff800040111d08
    torvalds#17 [ffff800084a2fe00] worker_thread at ffff8000401121cc
    torvalds#18 [ffff800084a2fe70] kthread at ffff800040118de4

After commit 2def284 ("xfs: don't allow log IO to be throttled"),
the metadata submitted by xlog_write_iclog() should not be throttled.
But due to the existence of the dm layer, throttling flush_bio indirectly
causes the metadata bio to be throttled.

Fix this by conditionally adding REQ_IDLE to flush_bio.bi_opf, which makes
wbt_should_throttle() return false to avoid wbt_wait().

Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
Reviewed-by: Tianxiang Peng <txpeng@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
thestinger pushed a commit to GrapheneOS/kernel_common-6.12 that referenced this pull request Feb 25, 2025
I encountered an issue where Perfetto sometimes fails to display the
blocked function when a thread enters the D state.

The problem arises from the caller value in the trace_sched_blocked_reason,
and I observed the following outputs while using bpftrace:

bpftrace -e 't:sched:sched_blocked_reason /args.caller > 0 &&
args.caller < 0xFFFFFFFF/ {@[args.caller]=count();}'
Attaching 1 probe...
^C

@[0x26a8]: 1
@[0x188]: 1
@[0x3]: 1
@[0xf833]: 1
@[0x4]: 10
@[0x9]: 14
@[0x1]: 23

When try_to_wake_up is waking a thread executing __schedule, it can trigger
the trace_sched_blocked_reason. This function attempts to obtain the
thread's backtrace using __get_wchan. However, __get_wchan requires the
thread to be blocking in cpu_switch_to, which is not always the case.

                                         | set_current_state(TASK_UNINTERRUPTIBLE)
                                         | __schedule
                                         |   deactivate_task
                                         |     p->on_rq = 0
                                         |     dequeue_task
try_to_wake_up                           |   pick_next_task
  ttwu_state_match(TASK_UNINTERRUPTIBLE) |   context_switch
  p->on_rq == 0                          |     prepare_task_switch
  p->__state & TASK_UNINTERRUPTIBLE      |     switch_mm
    trace_sched_blocked_reason           |     __switch_to
      __get_wchan                        |       ......
        unwind_init_from_task            |       cpu_switch_to
          state->fp = thread_saved_fp(p) |         mov x10, #THREAD_CPU_CONTEXT
            // Load an invalid fp        |         add x8, x0, x10
                                         |         ......
                                         |         stp fp, x9, [x8], torvalds#16
                                         |         str lr, [x8]
                                         |         ......
                                         |         ret

To resolve this, we can move the trace_sched_blocked_reason call inside
__schedule to ensure accurate caller information.

Fixes: 44447de ("ANDROID: sched: move blocked reason trace point to cover all class")
Change-Id: Ib63839107dbe835a52f191cceef9452d13e18785
Signed-off-by: pengdonglin <pengdonglin@xiaomi.com>
[jstultz: backported to android16-6.12]
Signed-off-by: John Stultz <jstultz@google.com>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Feb 26, 2025
Fixed warning due to unintialized struct nand_operation op->cs
WARN_ON(op->cs >= nanddev_ntargets(&chip->base)

[   14.588522] ------------[ cut here ]------------
[   14.588529] WARNING: CPU: 0 PID: 1392 at drivers/mtd/nand/raw/internals.h:139 nand_reset_op+0x1e0/0x1f8
[   14.588553] Modules linked in: bdc udc_core
[   14.588579] CPU: 0 UID: 0 PID: 1392 Comm: rtcwake Tainted: G        W          6.14.0-rc4-g5394eea10651 torvalds#16
[   14.588590] Tainted: [W]=WARN
[   14.588593] Hardware name: Broadcom STB (Flattened Device Tree)
[   14.588598] Call trace:
[   14.588604]  dump_backtrace from show_stack+0x18/0x1c
[   14.588622]  r7:00000009 r6:0000008b r5:60000153 r4:c0fa558c
[   14.588625]  show_stack from dump_stack_lvl+0x70/0x7c
[   14.588639]  dump_stack_lvl from dump_stack+0x18/0x1c
[   14.588653]  r5:c08d40b0 r4:c1003cb0
[   14.588656]  dump_stack from __warn+0x84/0xe4
[   14.588668]  __warn from warn_slowpath_fmt+0x18c/0x194
[   14.588678]  r7:c08d40b0 r6:c1003cb0 r5:00000000 r4:00000000
[   14.588681]  warn_slowpath_fmt from nand_reset_op+0x1e0/0x1f8
[   14.588695]  r8:70c40dff r7:89705f41 r6:36b4a597 r5:c26c9444 r4:c26b0048
[   14.588697]  nand_reset_op from brcmnand_resume+0x13c/0x150
[   14.588714]  r9:00000000 r8:00000000 r7:c24f8010 r6:c228a3f8 r5:c26c94bc r4:c26b0040
[   14.588717]  brcmnand_resume from platform_pm_resume+0x34/0x54
[   14.588735]  r5:00000010 r4:c0840a50
[   14.588738]  platform_pm_resume from dpm_run_callback+0x5c/0x14c
[   14.588757]  dpm_run_callback from device_resume+0xc0/0x324
[   14.588776]  r9:c24f8054 r8:c24f80a0 r7:00000000 r6:00000000 r5:00000010 r4:c24f8010
[   14.588779]  device_resume from dpm_resume+0x130/0x160
[   14.588799]  r9:c22539e4 r8:00000010 r7:c22bebb0 r6:c24f8010 r5:c22539dc r4:c22539b0
[   14.588802]  dpm_resume from dpm_resume_end+0x14/0x20
[   14.588822]  r10:c2204e40 r9:00000000 r8:c228a3fc r7:00000000 r6:00000003 r5:c228a414
[   14.588826]  r4:00000010
[   14.588828]  dpm_resume_end from suspend_devices_and_enter+0x274/0x6f8
[   14.588848]  r5:c228a414 r4:00000000
[   14.588851]  suspend_devices_and_enter from pm_suspend+0x228/0x2bc
[   14.588868]  r10:c3502910 r9:c3501f40 r8:00000004 r7:c228a438 r6:c0f95e18 r5:00000000
[   14.588871]  r4:00000003
[   14.588874]  pm_suspend from state_store+0x74/0xd0
[   14.588889]  r7:c228a438 r6:c0f934c8 r5:00000003 r4:00000003
[   14.588892]  state_store from kobj_attr_store+0x1c/0x28
[   14.588913]  r9:00000000 r8:00000000 r7:f09f9f08 r6:00000004 r5:c3502900 r4:c0283250
[   14.588916]  kobj_attr_store from sysfs_kf_write+0x40/0x4c
[   14.588936]  r5:c3502900 r4:c0d92a48
[   14.588939]  sysfs_kf_write from kernfs_fop_write_iter+0x104/0x1f0
[   14.588956]  r5:c3502900 r4:c3501f40
[   14.588960]  kernfs_fop_write_iter from vfs_write+0x250/0x420
[   14.588980]  r10:c0e14b48 r9:00000000 r8:c25f5780 r7:00443398 r6:f09f9f68 r5:c34f7f00
[   14.588983]  r4:c042a88c
[   14.588987]  vfs_write from ksys_write+0x74/0xe4
[   14.589005]  r10:00000004 r9:c25f5780 r8:c02002f0 r7:00000000 r6:00000000 r5:c34f7f00
[   14.589008]  r4:c34f7f00
[   14.589011]  ksys_write from sys_write+0x10/0x14
[   14.589029]  r7:00000004 r6:004421c0 r5:00443398 r4:00000004
[   14.589032]  sys_write from ret_fast_syscall+0x0/0x5c
[   14.589044] Exception stack(0xf09f9fa8 to 0xf09f9ff0)
[   14.589050] 9fa0:                   00000004 00443398 00000004 00443398 00000004 00000001
[   14.589056] 9fc0: 00000004 00443398 004421c0 00000004 b6ecbd58 00000008 bebfbc38 0043eb78
[   14.589062] 9fe0: 00440eb0 bebfbaf8 b6de18a0 b6e579e8
[   14.589065] ---[ end trace 0000000000000000 ]---

The fix uses the higher level nand_reset(chip, chipnr); where chipnr = 0, when
doing PM resume operation in compliance with the controller support for single die
nand chip. This ensures that the op->cs is initialized before calling the
nand_reset_op().

Fixes : 97d90da ("mtd: nand: provide several helpers to do common NAND operations")
Signed-off-by: Kamal Dasu <kdasu@broadcom.com>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Feb 27, 2025
Fixed warning on PM resume as shown below caused due to uninitialized
struct nand_operation that checks chip select field :
WARN_ON(op->cs >= nanddev_ntargets(&chip->base)

[   14.588522] ------------[ cut here ]------------
[   14.588529] WARNING: CPU: 0 PID: 1392 at drivers/mtd/nand/raw/internals.h:139 nand_reset_op+0x1e0/0x1f8
[   14.588553] Modules linked in: bdc udc_core
[   14.588579] CPU: 0 UID: 0 PID: 1392 Comm: rtcwake Tainted: G        W          6.14.0-rc4-g5394eea10651 torvalds#16
[   14.588590] Tainted: [W]=WARN
[   14.588593] Hardware name: Broadcom STB (Flattened Device Tree)
[   14.588598] Call trace:
[   14.588604]  dump_backtrace from show_stack+0x18/0x1c
[   14.588622]  r7:00000009 r6:0000008b r5:60000153 r4:c0fa558c
[   14.588625]  show_stack from dump_stack_lvl+0x70/0x7c
[   14.588639]  dump_stack_lvl from dump_stack+0x18/0x1c
[   14.588653]  r5:c08d40b0 r4:c1003cb0
[   14.588656]  dump_stack from __warn+0x84/0xe4
[   14.588668]  __warn from warn_slowpath_fmt+0x18c/0x194
[   14.588678]  r7:c08d40b0 r6:c1003cb0 r5:00000000 r4:00000000
[   14.588681]  warn_slowpath_fmt from nand_reset_op+0x1e0/0x1f8
[   14.588695]  r8:70c40dff r7:89705f41 r6:36b4a597 r5:c26c9444 r4:c26b0048
[   14.588697]  nand_reset_op from brcmnand_resume+0x13c/0x150
[   14.588714]  r9:00000000 r8:00000000 r7:c24f8010 r6:c228a3f8 r5:c26c94bc r4:c26b0040
[   14.588717]  brcmnand_resume from platform_pm_resume+0x34/0x54
[   14.588735]  r5:00000010 r4:c0840a50
[   14.588738]  platform_pm_resume from dpm_run_callback+0x5c/0x14c
[   14.588757]  dpm_run_callback from device_resume+0xc0/0x324
[   14.588776]  r9:c24f8054 r8:c24f80a0 r7:00000000 r6:00000000 r5:00000010 r4:c24f8010
[   14.588779]  device_resume from dpm_resume+0x130/0x160
[   14.588799]  r9:c22539e4 r8:00000010 r7:c22bebb0 r6:c24f8010 r5:c22539dc r4:c22539b0
[   14.588802]  dpm_resume from dpm_resume_end+0x14/0x20
[   14.588822]  r10:c2204e40 r9:00000000 r8:c228a3fc r7:00000000 r6:00000003 r5:c228a414
[   14.588826]  r4:00000010
[   14.588828]  dpm_resume_end from suspend_devices_and_enter+0x274/0x6f8
[   14.588848]  r5:c228a414 r4:00000000
[   14.588851]  suspend_devices_and_enter from pm_suspend+0x228/0x2bc
[   14.588868]  r10:c3502910 r9:c3501f40 r8:00000004 r7:c228a438 r6:c0f95e18 r5:00000000
[   14.588871]  r4:00000003
[   14.588874]  pm_suspend from state_store+0x74/0xd0
[   14.588889]  r7:c228a438 r6:c0f934c8 r5:00000003 r4:00000003
[   14.588892]  state_store from kobj_attr_store+0x1c/0x28
[   14.588913]  r9:00000000 r8:00000000 r7:f09f9f08 r6:00000004 r5:c3502900 r4:c0283250
[   14.588916]  kobj_attr_store from sysfs_kf_write+0x40/0x4c
[   14.588936]  r5:c3502900 r4:c0d92a48
[   14.588939]  sysfs_kf_write from kernfs_fop_write_iter+0x104/0x1f0
[   14.588956]  r5:c3502900 r4:c3501f40
[   14.588960]  kernfs_fop_write_iter from vfs_write+0x250/0x420
[   14.588980]  r10:c0e14b48 r9:00000000 r8:c25f5780 r7:00443398 r6:f09f9f68 r5:c34f7f00
[   14.588983]  r4:c042a88c
[   14.588987]  vfs_write from ksys_write+0x74/0xe4
[   14.589005]  r10:00000004 r9:c25f5780 r8:c02002fA0 r7:00000000 r6:00000000 r5:c34f7f00
[   14.589008]  r4:c34f7f00
[   14.589011]  ksys_write from sys_write+0x10/0x14
[   14.589029]  r7:00000004 r6:004421c0 r5:00443398 r4:00000004
[   14.589032]  sys_write from ret_fast_syscall+0x0/0x5c
[   14.589044] Exception stack(0xf09f9fa8 to 0xf09f9ff0)
[   14.589050] 9fa0:                   00000004 00443398 00000004 00443398 00000004 00000001
[   14.589056] 9fc0: 00000004 00443398 004421c0 00000004 b6ecbd58 00000008 bebfbc38 0043eb78
[   14.589062] 9fe0: 00440eb0 bebfbaf8 b6de18a0 b6e579e8
[   14.589065] ---[ end trace 0000000000000000 ]---

The fix uses the higher level nand_reset(chip, chipnr); where chipnr = 0, when
doing PM resume operation in compliance with the controller support for single
die nand chip. Switching from nand_reset_op() to nand_reset() implies more
than just setting the cs field op->cs, it also reconfigures the data interface
(ie. the timings). Tested and confirmed the NAND chip is in sync timing wise
with host after the fix.

Fixes: 97d90da ("mtd: nand: provide several helpers to do common NAND operations")
Cc: stable@vger.kernel.org
Signed-off-by: Kamal Dasu <kamal.dasu@broadcom.com>
github-actions bot pushed a commit to anon503/linux that referenced this pull request Mar 5, 2025
Fixed warning on PM resume as shown below caused due to uninitialized
struct nand_operation that checks chip select field :
WARN_ON(op->cs >= nanddev_ntargets(&chip->base)

[   14.588522] ------------[ cut here ]------------
[   14.588529] WARNING: CPU: 0 PID: 1392 at drivers/mtd/nand/raw/internals.h:139 nand_reset_op+0x1e0/0x1f8
[   14.588553] Modules linked in: bdc udc_core
[   14.588579] CPU: 0 UID: 0 PID: 1392 Comm: rtcwake Tainted: G        W          6.14.0-rc4-g5394eea10651 torvalds#16
[   14.588590] Tainted: [W]=WARN
[   14.588593] Hardware name: Broadcom STB (Flattened Device Tree)
[   14.588598] Call trace:
[   14.588604]  dump_backtrace from show_stack+0x18/0x1c
[   14.588622]  r7:00000009 r6:0000008b r5:60000153 r4:c0fa558c
[   14.588625]  show_stack from dump_stack_lvl+0x70/0x7c
[   14.588639]  dump_stack_lvl from dump_stack+0x18/0x1c
[   14.588653]  r5:c08d40b0 r4:c1003cb0
[   14.588656]  dump_stack from __warn+0x84/0xe4
[   14.588668]  __warn from warn_slowpath_fmt+0x18c/0x194
[   14.588678]  r7:c08d40b0 r6:c1003cb0 r5:00000000 r4:00000000
[   14.588681]  warn_slowpath_fmt from nand_reset_op+0x1e0/0x1f8
[   14.588695]  r8:70c40dff r7:89705f41 r6:36b4a597 r5:c26c9444 r4:c26b0048
[   14.588697]  nand_reset_op from brcmnand_resume+0x13c/0x150
[   14.588714]  r9:00000000 r8:00000000 r7:c24f8010 r6:c228a3f8 r5:c26c94bc r4:c26b0040
[   14.588717]  brcmnand_resume from platform_pm_resume+0x34/0x54
[   14.588735]  r5:00000010 r4:c0840a50
[   14.588738]  platform_pm_resume from dpm_run_callback+0x5c/0x14c
[   14.588757]  dpm_run_callback from device_resume+0xc0/0x324
[   14.588776]  r9:c24f8054 r8:c24f80a0 r7:00000000 r6:00000000 r5:00000010 r4:c24f8010
[   14.588779]  device_resume from dpm_resume+0x130/0x160
[   14.588799]  r9:c22539e4 r8:00000010 r7:c22bebb0 r6:c24f8010 r5:c22539dc r4:c22539b0
[   14.588802]  dpm_resume from dpm_resume_end+0x14/0x20
[   14.588822]  r10:c2204e40 r9:00000000 r8:c228a3fc r7:00000000 r6:00000003 r5:c228a414
[   14.588826]  r4:00000010
[   14.588828]  dpm_resume_end from suspend_devices_and_enter+0x274/0x6f8
[   14.588848]  r5:c228a414 r4:00000000
[   14.588851]  suspend_devices_and_enter from pm_suspend+0x228/0x2bc
[   14.588868]  r10:c3502910 r9:c3501f40 r8:00000004 r7:c228a438 r6:c0f95e18 r5:00000000
[   14.588871]  r4:00000003
[   14.588874]  pm_suspend from state_store+0x74/0xd0
[   14.588889]  r7:c228a438 r6:c0f934c8 r5:00000003 r4:00000003
[   14.588892]  state_store from kobj_attr_store+0x1c/0x28
[   14.588913]  r9:00000000 r8:00000000 r7:f09f9f08 r6:00000004 r5:c3502900 r4:c0283250
[   14.588916]  kobj_attr_store from sysfs_kf_write+0x40/0x4c
[   14.588936]  r5:c3502900 r4:c0d92a48
[   14.588939]  sysfs_kf_write from kernfs_fop_write_iter+0x104/0x1f0
[   14.588956]  r5:c3502900 r4:c3501f40
[   14.588960]  kernfs_fop_write_iter from vfs_write+0x250/0x420
[   14.588980]  r10:c0e14b48 r9:00000000 r8:c25f5780 r7:00443398 r6:f09f9f68 r5:c34f7f00
[   14.588983]  r4:c042a88c
[   14.588987]  vfs_write from ksys_write+0x74/0xe4
[   14.589005]  r10:00000004 r9:c25f5780 r8:c02002fA0 r7:00000000 r6:00000000 r5:c34f7f00
[   14.589008]  r4:c34f7f00
[   14.589011]  ksys_write from sys_write+0x10/0x14
[   14.589029]  r7:00000004 r6:004421c0 r5:00443398 r4:00000004
[   14.589032]  sys_write from ret_fast_syscall+0x0/0x5c
[   14.589044] Exception stack(0xf09f9fa8 to 0xf09f9ff0)
[   14.589050] 9fa0:                   00000004 00443398 00000004 00443398 00000004 00000001
[   14.589056] 9fc0: 00000004 00443398 004421c0 00000004 b6ecbd58 00000008 bebfbc38 0043eb78
[   14.589062] 9fe0: 00440eb0 bebfbaf8 b6de18a0 b6e579e8
[   14.589065] ---[ end trace 0000000000000000 ]---

The fix uses the higher level nand_reset(chip, chipnr); where chipnr = 0, when
doing PM resume operation in compliance with the controller support for single
die nand chip. Switching from nand_reset_op() to nand_reset() implies more
than just setting the cs field op->cs, it also reconfigures the data interface
(ie. the timings). Tested and confirmed the NAND chip is in sync timing wise
with host after the fix.

Fixes: 97d90da ("mtd: nand: provide several helpers to do common NAND operations")
Cc: stable@vger.kernel.org
Signed-off-by: Kamal Dasu <kamal.dasu@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
github-actions bot pushed a commit to anon503/linux that referenced this pull request Mar 19, 2025
With recent ftrace changes, argument tracing has been added to the
function tracer. As a result, ftrace opportunistically reads the first
FTRACE_REGS_MAX_ARGS (i.e., 6) registers. On s390, only five arguments are
passed in registers, and the 6-th is read from the stack. If a function
has fewer than 6 arguments, the following KASAN report may be observed:

 BUG: KASAN: stack-out-of-bounds in regs_get_kernel_stack_nth+0xa8/0xb0
 Read of size 8 at addr 00007f7fe066fdb8 by task swapper/31/0

 CPU: 31 UID: 0 PID: 0 Comm: swapper/31 Not tainted 6.14.0-rc4-00006-g76fe0337c219 torvalds#16
 Hardware name: IBM 3931 A01 704 (KVM/Linux)
 Call Trace:
  [<00007fffe0147224>] dump_stack_lvl+0x104/0x168
  [<00007fffe011381c>] print_address_description.constprop.0+0x34/0x338
  [<00007fffe0113b64>] print_report+0x44/0x138
  [<00007fffe0ad9422>] kasan_report+0xc2/0x180
  [<00007fffe0159ff8>] regs_get_kernel_stack_nth+0xa8/0xb0
  [<00007fffe05ebeda>] trace_function+0x23a/0x4d0
  [<00007fffe0615d32>] irqsoff_tracer_call+0xd2/0x110
  [<00007fffe2b4e34c>] ftrace_common+0x1c/0x40
  [<00007fffe0150826>] arch_cpu_idle_enter+0x6/0x10
  [<00007fffe035a1c8>] do_idle+0x168/0x2e0
  [<00007fffe035a9d0>] cpu_startup_entry+0x90/0xb0
  [<00007fffe017d25a>] smp_start_secondary+0x3da/0x4e0
  [<00007fffe2b4e20a>] restart_int_handler+0x72/0x88
 no locks held by swapper/31/0.

 The buggy address belongs to stack of task swapper/31/0
  and is located at offset 0 in frame:
  do_idle+0x0/0x2e0

 This frame has 1 object:
  [32, 40) '__mask'

 The buggy address belongs to the virtual mapping at
  [00007f7fe0660000, 00007f7fe0671000) created by:
  dup_task_struct+0x66/0x4e0

 The buggy address belongs to the physical page:
 page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x80f23
 flags: 0x3ffff00000000000(node=0|zone=1|lastcpupid=0x1ffff)
 raw: 3ffff00000000000 0000000000000000 0000000000000122 0000000000000000
 raw: 0000000000000000 0000000000000000 ffffffff00000001 0000000000000000
 page dumped because: kasan: bad access detected

 Memory state around the buggy address:
  00007f7fe066fc80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  00007f7fe066fd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 >00007f7fe066fd80: 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f3 f3 f3 00
                                         ^
  00007f7fe066fe00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  00007f7fe066fe80: 00 f1 f1 f1 f1 00 f2 f2 f2 00 00 f3 f3 00 00 00

The function regs_get_kernel_stack_nth() verifies that the requested
argument is located on the stack, making it safe to read even if it is
not actually present. Make use of READ_ONCE_NOCHECK() helper to silence
KASAN reports in this case.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
klarasm pushed a commit to klarasm/linux that referenced this pull request Mar 20, 2025
there is a global spinlock between reset and clk, if locked in reset,
then print some debug information, maybe dead-lock when uart driver
try to disable clk.

Backtrace stopped: frame did not save the PC
(gdb) thread 4
[Switching to thread 4 (Thread 4)]
#0  cpu_relax () at ./arch/riscv/include/asm/vdso/processor.h:22
22      ./arch/riscv/include/asm/vdso/processor.h: No such file or directory.
(gdb) bt
#0  cpu_relax () at ./arch/riscv/include/asm/vdso/processor.h:22
#1  arch_spin_lock (lock=lock@entry=0xffffffff81a57cd0 <enable_lock>) at ./include/asm-generic/spinlock.h:49
#2  do_raw_spin_lock (lock=lock@entry=0xffffffff81a57cd0 <enable_lock>) at ./include/linux/spinlock.h:186
#3  0xffffffff80aa21ce in __raw_spin_lock_irqsave (lock=0xffffffff81a57cd0 <enable_lock>) at ./include/linux/spinlock_api_smp.h:111
#4  _raw_spin_lock_irqsave (lock=lock@entry=0xffffffff81a57cd0 <enable_lock>) at kernel/locking/spinlock.c:162
#5  0xffffffff80563416 in clk_enable_lock () at ./include/linux/spinlock.h:325
torvalds#6  0xffffffff805648de in clk_core_disable_lock (core=0xffffffd900512500) at drivers/clk/clk.c:1062
torvalds#7  0xffffffff8056527e in clk_disable (clk=<optimized out>) at drivers/clk/clk.c:1084
torvalds#8  clk_disable (clk=0xffffffd9048b5100) at drivers/clk/clk.c:1079
torvalds#9  0xffffffff8059e5d4 in serial_pxa_console_write (co=<optimized out>, s=0xffffffff81a68250 <text> "[   14.708612] [RESET][spacemit_reset_set][373]:assert = 1, id = 59 \n", count=<optimized out>)
    at drivers/tty/serial/pxa_k1x.c:1724
torvalds#10 0xffffffff8004a34c in call_console_driver (dropped_text=0xffffffff81a68650 <dropped_text> "", len=69,
    text=0xffffffff81a68250 <text> "[   14.708612] [RESET][spacemit_reset_set][373]:assert = 1, id = 59 \n", con=0xffffffff81964c10 <serial_pxa_console>) at kernel/printk/printk.c:1942
torvalds#11 console_emit_next_record (con=con@entry=0xffffffff81964c10 <serial_pxa_console>, ext_text=<optimized out>, dropped_text=0xffffffff81a68650 <dropped_text> "", handover=0xffffffc80578baa7,
    text=0xffffffff81a68250 <text> "[   14.708612] [RESET][spacemit_reset_set][373]:assert = 1, id = 59 \n") at kernel/printk/printk.c:2731
torvalds#12 0xffffffff8004a49a in console_flush_all (handover=0xffffffc80578baa7, next_seq=<synthetic pointer>, do_cond_resched=false) at kernel/printk/printk.c:2793
torvalds#13 console_unlock () at kernel/printk/printk.c:2860
torvalds#14 0xffffffff8004b388 in vprintk_emit (facility=facility@entry=0, level=<optimized out>, level@entry=-1, dev_info=dev_info@entry=0x0, fmt=<optimized out>, args=<optimized out>)
    at kernel/printk/printk.c:2268
torvalds#15 0xffffffff8004b3ae in vprintk_default (fmt=<optimized out>, args=<optimized out>) at kernel/printk/printk.c:2279
torvalds#16 0xffffffff8004b646 in vprintk (fmt=fmt@entry=0xffffffff813be470 "\001\066[RESET][%s][%d]:assert = %d, id = %d \n", args=args@entry=0xffffffc80578bbd8) at kernel/printk/printk_safe.c:50
torvalds#17 0xffffffff80a880d6 in _printk (fmt=fmt@entry=0xffffffff813be470 "\001\066[RESET][%s][%d]:assert = %d, id = %d \n") at kernel/printk/printk.c:2289
torvalds#18 0xffffffff80a90bb6 in spacemit_reset_set (rcdev=rcdev@entry=0xffffffff81f563a8 <k1x_reset_controller+8>, id=id@entry=59, assert=assert@entry=true) at drivers/reset/reset-spacemit-k1x.c:373
torvalds#19 0xffffffff805823b6 in spacemit_reset_update (assert=true, id=59, rcdev=0xffffffff81f563a8 <k1x_reset_controller+8>) at drivers/reset/reset-spacemit-k1x.c:401
torvalds#20 spacemit_reset_update (assert=true, id=59, rcdev=0xffffffff81f563a8 <k1x_reset_controller+8>) at drivers/reset/reset-spacemit-k1x.c:387
torvalds#21 spacemit_reset_assert (rcdev=0xffffffff81f563a8 <k1x_reset_controller+8>, id=59) at drivers/reset/reset-spacemit-k1x.c:413
torvalds#22 0xffffffff8058158e in reset_control_assert (rstc=0xffffffd902b2f280) at drivers/reset/core.c:485
torvalds#23 0xffffffff807ccf96 in cpp_disable_clocks (cpp_dev=cpp_dev@entry=0xffffffd904cc9040) at drivers/media/platform/spacemit/camera/cam_cpp/k1x_cpp.c:960
torvalds#24 0xffffffff807cd0b2 in cpp_release_hardware (cpp_dev=cpp_dev@entry=0xffffffd904cc9040) at drivers/media/platform/spacemit/camera/cam_cpp/k1x_cpp.c:1038
torvalds#25 0xffffffff807cd990 in cpp_close_node (sd=<optimized out>, fh=<optimized out>) at drivers/media/platform/spacemit/camera/cam_cpp/k1x_cpp.c:1135
torvalds#26 0xffffffff8079525e in subdev_close (file=0xffffffd906645d00) at drivers/media/v4l2-core/v4l2-subdev.c:105
torvalds#27 0xffffffff8078e49e in v4l2_release (inode=<optimized out>, filp=0xffffffd906645d00) at drivers/media/v4l2-core/v4l2-dev.c:459
torvalds#28 0xffffffff80154974 in __fput (file=0xffffffd906645d00) at fs/file_table.c:320
torvalds#29 0xffffffff80154aa2 in ____fput (work=<optimized out>) at fs/file_table.c:348
torvalds#30 0xffffffff8002677e in task_work_run () at kernel/task_work.c:179
torvalds#31 0xffffffff800053b4 in resume_user_mode_work (regs=0xffffffc80578bee0) at ./include/linux/resume_user_mode.h:49
torvalds#32 do_work_pending (regs=0xffffffc80578bee0, thread_info_flags=<optimized out>) at arch/riscv/kernel/signal.c:478
torvalds#33 0xffffffff800039c6 in handle_exception () at arch/riscv/kernel/entry.S:374
Backtrace stopped: frame did not save the PC
(gdb) thread 1
[Switching to thread 1 (Thread 1)]
#0  0xffffffff80047e9c in arch_spin_lock (lock=lock@entry=0xffffffff81a57cd8 <g_cru_lock>) at ./include/asm-generic/spinlock.h:49
49      ./include/asm-generic/spinlock.h: No such file or directory.
(gdb) bt
#0  0xffffffff80047e9c in arch_spin_lock (lock=lock@entry=0xffffffff81a57cd8 <g_cru_lock>) at ./include/asm-generic/spinlock.h:49
#1  do_raw_spin_lock (lock=lock@entry=0xffffffff81a57cd8 <g_cru_lock>) at ./include/linux/spinlock.h:186
#2  0xffffffff80aa21ce in __raw_spin_lock_irqsave (lock=0xffffffff81a57cd8 <g_cru_lock>) at ./include/linux/spinlock_api_smp.h:111
#3  _raw_spin_lock_irqsave (lock=0xffffffff81a57cd8 <g_cru_lock>) at kernel/locking/spinlock.c:162
#4  0xffffffff8056c4cc in ccu_mix_disable (hw=0xffffffff81956858 <sdh2_clk+120>) at ./include/linux/spinlock.h:325
#5  0xffffffff80564832 in clk_core_disable (core=0xffffffd900529900) at drivers/clk/clk.c:1051
torvalds#6  clk_core_disable (core=0xffffffd900529900) at drivers/clk/clk.c:1031
torvalds#7  0xffffffff805648e6 in clk_core_disable_lock (core=0xffffffd900529900) at drivers/clk/clk.c:1063
torvalds#8  0xffffffff8056527e in clk_disable (clk=<optimized out>) at drivers/clk/clk.c:1084
torvalds#9  clk_disable (clk=clk@entry=0xffffffd904fafa80) at drivers/clk/clk.c:1079
torvalds#10 0xffffffff808bb898 in clk_disable_unprepare (clk=0xffffffd904fafa80) at ./include/linux/clk.h:1085
torvalds#11 0xffffffff808bb916 in spacemit_sdhci_runtime_suspend (dev=<optimized out>) at drivers/mmc/host/sdhci-of-k1x.c:1469
torvalds#12 0xffffffff8066e8e2 in pm_generic_runtime_suspend (dev=<optimized out>) at drivers/base/power/generic_ops.c:25
torvalds#13 0xffffffff80670398 in __rpm_callback (cb=cb@entry=0xffffffff8066e8ca <pm_generic_runtime_suspend>, dev=dev@entry=0xffffffd9018a2810) at drivers/base/power/runtime.c:395
torvalds#14 0xffffffff806704b8 in rpm_callback (cb=cb@entry=0xffffffff8066e8ca <pm_generic_runtime_suspend>, dev=dev@entry=0xffffffd9018a2810) at drivers/base/power/runtime.c:529
torvalds#15 0xffffffff80670bdc in rpm_suspend (dev=0xffffffd9018a2810, rpmflags=<optimized out>) at drivers/base/power/runtime.c:672
torvalds#16 0xffffffff806716de in pm_runtime_work (work=0xffffffd9018a2948) at drivers/base/power/runtime.c:974
torvalds#17 0xffffffff800236f4 in process_one_work (worker=worker@entry=0xffffffd9013ee9c0, work=0xffffffd9018a2948) at kernel/workqueue.c:2289
torvalds#18 0xffffffff80023ba6 in worker_thread (__worker=0xffffffd9013ee9c0) at kernel/workqueue.c:2436
torvalds#19 0xffffffff80028bb2 in kthread (_create=0xffffffd9017de840) at kernel/kthread.c:376
torvalds#20 0xffffffff80003934 in handle_exception () at arch/riscv/kernel/entry.S:249
Backtrace stopped: frame did not save the PC
(gdb)

Change-Id: Ia95b41ffd6c1893c9c5e9c1c9fc0c155ea902d2c
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant