IRQ problem with 2.6.32/2.6.39 kernel on Debian Squeeze x86_64

Posted by MasterM on Super User See other posts from Super User or by MasterM
Published on 2011-06-24T11:05:14Z Indexed on 2011/06/24 16:25 UTC
Read the original article Hit count: 399

Filed under:
|
|
|

I recently assembled a new computer so that all hardware is pretty new. Since then I've been experiencing some problem with IRQs when running Debian 6.0. On random occasions, usually after an hour or so of running I hear a beep and this shows up in dmesg:

[ 3537.762795] irq 16: nobody cared (try booting with the "irqpoll" option)
[ 3537.762797] Pid: 0, comm: swapper Tainted: P        W  O 2.6.39-2-amd64 #1
[ 3537.762798] Call Trace:
[ 3537.762799]  <IRQ>  [<ffffffff810924d4>] ? __report_bad_irq+0x3a/0xa2
[ 3537.762803]  [<ffffffff810926a4>] ? note_interrupt+0x168/0x1da
[ 3537.762805]  [<ffffffff81090dd4>] ? handle_irq_event_percpu+0x171/0x18f
[ 3537.762807]  [<ffffffff8100e0e2>] ? read_tsc+0x5/0x16
[ 3537.762809]  [<ffffffff8106b8a2>] ? update_ts_time_stats+0x32/0x6b
[ 3537.762810]  [<ffffffff81090e26>] ? handle_irq_event+0x34/0x52
[ 3537.762812]  [<ffffffff81063fb7>] ? sched_clock_idle_wakeup_event+0x12/0x1c
[ 3537.762813]  [<ffffffff81092df2>] ? handle_fasteoi_irq+0x82/0xa4
[ 3537.762815]  [<ffffffff8100aadb>] ? handle_irq+0x1a/0x23
[ 3537.762816]  [<ffffffff8100a384>] ? do_IRQ+0x45/0xaa
[ 3537.762818]  [<ffffffff81332c93>] ? common_interrupt+0x13/0x13
[ 3537.762818]  <EOI>  [<ffffffff81332c8e>] ? common_interrupt+0xe/0x13
[ 3537.762821]  [<ffffffff81026800>] ? native_safe_halt+0x2/0x3
[ 3537.762829]  [<ffffffffa016ed58>] ? acpi_idle_do_entry+0x39/0x62 [processor]
[ 3537.762831]  [<ffffffffa016edde>] ? acpi_idle_enter_c1+0x5d/0xad [processor]
[ 3537.762834]  [<ffffffff81261033>] ? cpuidle_idle_call+0x11f/0x1cc
[ 3537.762835]  [<ffffffff81008dd2>] ? cpu_idle+0xab/0xe1
[ 3537.762837]  [<ffffffff8169fc60>] ? start_kernel+0x3e0/0x3eb
[ 3537.762838]  [<ffffffff8169f3c8>] ? x86_64_start_kernel+0x102/0x10f
[ 3537.762839] handlers:
[ 3537.762840] [<ffffffffa0358d5a>] (rtl8169_interrupt+0x0/0x2d7 [r8169])
[ 3537.762842] [<ffffffffa08ff2ca>] (nv_kern_isr+0x0/0x54 [nvidia])
[ 3537.762902] Disabling IRQ #16

After that Xorg either hogs on CPU or is unstable (up to hanging the system completely). When I restart Xorg everything is fine again and the problem doesn't occur until next reboot.

I tried to upgrade the kernel from stock 2.6.32 to 2.6.39 from unstable repository but that didn't help. Booting with irqpoll option only seems to prolong the initial time period after which the problem occurs.

I'm using latest NVIDIA drivers and Realtek firmware from firmware-realtek package. I have two GTX 560Ti that run in SLI. Disabling SLI or taking out one card completely doesn't solve the problem either.

Output of uname -a is:

Linux whitestar 2.6.39-2-amd64 #1 SMP Wed Jun 8 11:01:04 UTC 2011 x86_64 GNU/Linux

Output of lspci is:

00:00.0 Host bridge: Intel Corporation Sandy Bridge DRAM Controller (rev 09)
00:01.0 PCI bridge: Intel Corporation Sandy Bridge PCI Express Root Port (rev 09)
00:01.1 PCI bridge: Intel Corporation Sandy Bridge PCI Express Root Port (rev 09)
00:16.0 Communication controller: Intel Corporation Cougar Point HECI Controller #1 (rev 04)
00:19.0 Ethernet controller: Intel Corporation 82579V Gigabit Network Connection (rev 05)
00:1a.0 USB Controller: Intel Corporation Cougar Point USB Enhanced Host Controller #2 (rev 05)
00:1b.0 Audio device: Intel Corporation Cougar Point High Definition Audio Controller (rev 05)
00:1c.0 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 1 (rev b5)
00:1c.1 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 2 (rev b5)
00:1c.2 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 3 (rev b5)
00:1c.4 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 5 (rev b5)
00:1c.6 PCI bridge: Intel Corporation 82801 PCI Bridge (rev b5)
00:1d.0 USB Controller: Intel Corporation Cougar Point USB Enhanced Host Controller #1 (rev 05)
00:1f.0 ISA bridge: Intel Corporation Cougar Point LPC Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation Cougar Point 6 port SATA AHCI Controller (rev 05)
00:1f.3 SMBus: Intel Corporation Cougar Point SMBus Controller (rev 05)
01:00.0 VGA compatible controller: nVidia Corporation Device 1200 (rev a1)
01:00.1 Audio device: nVidia Corporation Device 0e0c (rev a1)
02:00.0 VGA compatible controller: nVidia Corporation Device 1200 (rev a1)
02:00.1 Audio device: nVidia Corporation Device 0e0c (rev a1)
04:00.0 USB Controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 04)
06:00.0 USB Controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 04)
07:00.0 PCI bridge: Device 1b21:1080 (rev 01)
08:02.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8110SC/8169SC Gigabit Ethernet (rev 10)
08:03.0 FireWire (IEEE 1394): VIA Technologies, Inc. VT6306/7/8 [Fire II(M)] IEEE 1394 OHCI Controller (rev c0)

Contents of /proc/interrupts:

CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7
0:         77          0          0          0          0          0          0          0   IO-APIC-edge      timer
1:          2          0          0          0          0          0          0          0   IO-APIC-edge      i8042
8:          1          0          0          0          0          0          0          0   IO-APIC-edge      rtc0
9:          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   acpi
12:          4          0          0          0          0          0          0          0   IO-APIC-edge      i8042
16:     699083          0          0          0          0          0          0          0   IO-APIC-fasteoi   nvidia, eth0
17:      87810          0          0          0          0          0          0          0   IO-APIC-fasteoi   firewire_ohci, hda_intel, nvidia
18:        242          0          0          0          0          0          0          0   IO-APIC-fasteoi   hda_intel
23:      85925          0          0          0          0          0          0          0   IO-APIC-fasteoi   ehci_hcd:usb5, ehci_hcd:usb6
40:          0          0          0          0          0          0          0          0   PCI-MSI-edge      PCIe PME
41:          0          0          0          0          0          0          0          0   PCI-MSI-edge      PCIe PME
42:          0          0          0          0          0          0          0          0   PCI-MSI-edge      PCIe PME
43:          0          0          0          0          0          0          0          0   PCI-MSI-edge      PCIe PME
44:          0          0          0          0          0          0          0          0   PCI-MSI-edge      PCIe PME
45:          0          0          0          0          0          0          0          0   PCI-MSI-edge      PCIe PME
46:      79853          0          0          0          0          0          0          0   PCI-MSI-edge      ahci
48:          1          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
49:          0          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
50:          0          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
51:          0          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
52:          0          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
53:          0          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
54:          0          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
55:          0          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
56:          1          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
57:          0          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
58:          0          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
59:          0          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
60:          0          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
61:          0          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
62:          0          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
63:          0          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
64:     173506          0          0          0          0          0          0          0   PCI-MSI-edge      hda_intel
NMI:        482         89         25         13        277         24         11         10   Non-maskable interrupts
LOC:     783857     194752     114133      70577     372438     179065     117179     162016   Local timer interrupts
SPU:          0          0          0          0          0          0          0          0   Spurious interrupts
PMI:        482         89         25         13        277         24         11         10   Performance monitoring interrupts
IWI:          0          0          0          0          0          0          0          0   IRQ work interrupts
RES:     131917      46750       7432       3291     150003       9576       3435       3067   Rescheduling interrupts
CAL:       2759       6563       7150       6997       5387       7140       7269       6678   Function call interrupts
TLB:       4396       2038       1336        492       5434       1896       1121        606   TLB shootdowns
TRM:          0          0          0          0          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0          0          0          0          0   Threshold APIC interrupts
MCE:          0          0          0          0          0          0          0          0   Machine check exceptions
MCP:         37         37         37         37         37         37         37         37   Machine check polls
ERR:          0
MIS:          0

Last but not least, right after boot-up those lines are usually present in dmesg:

[   18.367094] hda-intel: IRQ timing workaround is activated for card #1. Suggest a bigger bdl_pos_adj.
[   18.458859] hda-intel: IRQ timing workaround is activated for card #2. Suggest a bigger bdl_pos_adj.

I'm not sure if it's related or a symptom of a bigger problem so I'm posting it just in case.

I don't really know what other information might be of relevance here. Don't hesitate to ask for more in the comments.

© Super User or respective owner

Related posts about linux

Related posts about drivers