Search Results

Search found 5564 results on 223 pages for 'multi gpu'.

Page 130/223 | < Previous Page | 126 127 128 129 130 131 132 133 134 135 136 137  | Next Page >

  • ubuntu 10.04 logs itself out overnight

    - by Corey
    Every night when I leave work, I lock the screen via ubuntu's "power" button in the top right hand panel. When I come to work in the morning, I'm greeted with the log-in screen. This doesn't happen every night, but most. I'm running ubuntu 10.04 on a Dell inspiron. I've included some HW specs, and also dmesg output. Please let me know what other logs may be useful. thanks! Corey ~$ dmesg [20559.696062] type=1503 audit(1285957687.048:16): operation="open" pid=6212 parent=1 profile="/usr/bin/evince" requested_mask="::r" denied_mask="::r" fsuid=1000 ouid=0 name="/usr/local/lib/libltdl.so.7.2.2" [21127.951621] type=1503 audit(1285958255.300:17): operation="open" pid=6390 parent=1 profile="/usr/bin/evince" requested_mask="::r" denied_mask="::r" fsuid=1000 ouid=0 name="/usr/local/lib/libltdl.so.7.2.2" [291038.528014] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [291038.528025] render error detected, EIR: 0x00000000 [291038.528042] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 22973891 at 22973890) [291038.828014] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [291038.828023] render error detected, EIR: 0x00000000 [291038.828042] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 22973894 at 22973890) ~$ lspci -vv 00:00.0 Host bridge: Intel Corporation 4 Series Chipset DRAM Controller (rev 03) Subsystem: Dell Device 02e1 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx- Latency: 0 Capabilities: <access denied> Kernel driver in use: agpgart-intel Kernel modules: intel-agp 00:02.0 VGA compatible controller: Intel Corporation 4 Series Chipset Integrated Graphics Controller (rev 03) Subsystem: Dell Device 02e1 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 27 Region 0: Memory at fe400000 (64-bit, non-prefetchable) [size=4M] Region 2: Memory at d0000000 (64-bit, prefetchable) [size=256M] Region 4: I/O ports at dc00 [size=8] Capabilities: <access denied> Kernel driver in use: i915 Kernel modules: i915 00:1b.0 Audio device: Intel Corporation N10/ICH 7 Family High Definition Audio Controller (rev 01) Subsystem: Dell Device 02e1 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 16 Region 0: Memory at feaf8000 (64-bit, non-prefetchable) [size=16K] Capabilities: <access denied> Kernel driver in use: HDA Intel Kernel modules: snd-hda-intel 00:1c.0 PCI bridge: Intel Corporation N10/ICH 7 Family PCI Express Port 1 (rev 01) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 32 bytes Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 I/O behind bridge: 00001000-00001fff Memory behind bridge: 80000000-801fffff Prefetchable memory behind bridge: 0000000080200000-00000000803fffff Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: <access denied> Kernel driver in use: pcieport Kernel modules: shpchp 00:1c.1 PCI bridge: Intel Corporation N10/ICH 7 Family PCI Express Port 2 (rev 01) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 32 bytes Bus: primary=00, secondary=02, subordinate=02, sec-latency=0 I/O behind bridge: 0000e000-0000efff Memory behind bridge: feb00000-febfffff Prefetchable memory behind bridge: 00000000fdf00000-00000000fdffffff Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: <access denied> Kernel driver in use: pcieport Kernel modules: shpchp 00:1d.0 USB Controller: Intel Corporation N10/ICH7 Family USB UHCI Controller #1 (rev 01) Subsystem: Dell Device 02e1 Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 23 Region 4: I/O ports at d880 [size=32] Kernel driver in use: uhci_hcd 00:1d.1 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #2 (rev 01) Subsystem: Dell Device 02e1 Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin B routed to IRQ 19 Region 4: I/O ports at d800 [size=32] Kernel driver in use: uhci_hcd 00:1d.2 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #3 (rev 01) Subsystem: Dell Device 02e1 Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin C routed to IRQ 18 Region 4: I/O ports at d480 [size=32] Kernel driver in use: uhci_hcd 00:1d.3 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #4 (rev 01) Subsystem: Dell Device 02e1 Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin D routed to IRQ 16 Region 4: I/O ports at d400 [size=32] Kernel driver in use: uhci_hcd 00:1d.7 USB Controller: Intel Corporation N10/ICH 7 Family USB2 EHCI Controller (rev 01) (prog-if 20) Subsystem: Dell Device 02e1 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 23 Region 0: Memory at feaf7c00 (32-bit, non-prefetchable) [size=1K] Capabilities: <access denied> Kernel driver in use: ehci_hcd 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1) (prog-if 01) Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Bus: primary=00, secondary=03, subordinate=03, sec-latency=32 Secondary status: 66MHz- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: <access denied> 00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01) Subsystem: Dell Device 02e1 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Capabilities: <access denied> Kernel modules: iTCO_wdt, intel-rng 00:1f.2 IDE interface: Intel Corporation N10/ICH7 Family SATA IDE Controller (rev 01) (prog-if 8f [Master SecP SecO PriP PriO]) Subsystem: Dell Device 02e1 Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin B routed to IRQ 19 Region 0: I/O ports at d080 [size=8] Region 1: I/O ports at d000 [size=4] Region 2: I/O ports at cc00 [size=8] Region 3: I/O ports at c880 [size=4] Region 4: I/O ports at c800 [size=16] Capabilities: <access denied> Kernel driver in use: ata_piix 00:1f.3 SMBus: Intel Corporation N10/ICH 7 Family SMBus Controller (rev 01) Subsystem: Dell Device 02e1 Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin B routed to IRQ 5 Region 4: I/O ports at 0400 [size=32] Kernel modules: i2c-i801 02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet controller (rev 02) Subsystem: Dell Device 02e1 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 26 Region 0: I/O ports at e800 [size=256] Region 2: Memory at fdfff000 (64-bit, prefetchable) [size=4K] Region 4: Memory at fdfe0000 (64-bit, prefetchable) [size=64K] Expansion ROM at febe0000 [disabled] [size=128K] Capabilities: <access denied> Kernel driver in use: r8169 Kernel modules: r8169 log$ tail -n 15 Xorg.0.log.old for help. Please also check the log file at "/var/log/Xorg.0.log" for additional information. (II) Power Button: Close (II) UnloadModule: "evdev" (II) Power Button: Close (II) UnloadModule: "evdev" (II) USB Optical Mouse: Close (II) UnloadModule: "evdev" (II) Dell Dell USB Entry Keyboard: Close (II) UnloadModule: "evdev" (II) Macintosh mouse button emulation: Close (II) UnloadModule: "evdev" (II) AIGLX: Suspending AIGLX clients for VT switch ddxSigGiveUp: Closing log

    Read the article

  • Centos 7. Freeradius fails to start on boot

    - by Alex
    I was messing around with FreeRADIUS and MySQL (MariaDB) and it seems FreeRADIUS service can't start properly on startup. But it starts fine using root user or in debug mode (radiusd -X) and works just fine! Debug mode shows no errors. systemctl command shows that radiusd.service has failed to start. /var/log/messages output: Aug 21 15:52:29 nexus-test systemd: Starting The Apache HTTP Server... Aug 21 15:52:29 nexus-test systemd: Starting MariaDB database server... Aug 21 15:52:29 nexus-test systemd: Starting FreeRADIUS high performance RADIUS server.... Aug 21 15:52:29 nexus-test systemd: Started OpenSSH server daemon. Aug 21 15:52:29 nexus-test mysqld_safe: 140821 15:52:29 mysqld_safe Logging to '/var/log/mariadb/mariadb.log'. Aug 21 15:52:29 nexus-test mysqld_safe: 140821 15:52:29 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql Aug 21 15:52:30 nexus-test systemd: Started Postfix Mail Transport Agent. Aug 21 15:52:30 nexus-test avahi-daemon[604]: Registering new address record for fe80::250:56ff:fe85:e4af on eth0.*. Aug 21 15:52:30 nexus-test systemd: radiusd.service: control process exited, code=exited status=1 Aug 21 15:52:30 nexus-test systemd: Failed to start FreeRADIUS high performance RADIUS server.. Aug 21 15:52:30 nexus-test systemd: Unit radiusd.service entered failed state. Aug 21 15:52:31 nexus-test kdumpctl: kexec: loaded kdump kernel Aug 21 15:52:31 nexus-test kdumpctl: Starting kdump: [OK] Aug 21 15:52:31 nexus-test systemd: Started Crash recovery kernel arming. Aug 21 15:52:31 nexus-test systemd: Started The Apache HTTP Server. Aug 21 15:52:31 nexus-test systemd: Started MariaDB database server. /var/log/radius/radius.log output: Thu Aug 21 15:24:16 2014 : Info: rlm_sql (sql): Driver rlm_sql_mysql (module rlm_sql_mysql) loaded and linked Thu Aug 21 15:24:16 2014 : Info: rlm_sql (sql): Attempting to connect to database "radius" Thu Aug 21 15:24:16 2014 : Info: rlm_sql (sql): Opening additional connection (0) Thu Aug 21 15:24:16 2014 : Error: rlm_sql_mysql: Couldn't connect socket to MySQL server radius@localhost:radius Thu Aug 21 15:24:16 2014 : Error: rlm_sql_mysql: Mysql error 'Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)' Thu Aug 21 15:24:16 2014 : Error: rlm_sql (sql): Opening connection failed (0) Thu Aug 21 15:24:16 2014 : Error: /etc/raddb/mods-enabled/sql[47]: Instantiation failed for module "sql" After seeing this I tried to replicate the problem, killed mariadb.service and started to run debug mode again. And it spits out the same problem as in the radius.log. I tried disabling iptables and firewalld and rebooting, but no luck: systemctl disable iptables systemctl disable firewalld So maybe the problem is in the process startup order or delay of some kind. Maybe FreeRADIUS's SQL module can't connect to not yet started MariaDB? If it, how can I fix this? In earlier versions of RHEL/CENTOS I know you easily see service start order in like rc.d or stuff, now IDK. I am new to this fancy "systemd", "systemctl", "firewalld" stuff Centos 7 introduced so sorry I'm a little bit confused. Also this new FreeRADIUS 3 structure... PS. MariaDB is enabled on startup, credentials in FR DB configuration are correct A little update: cat /etc/systemd/system/multi-user.target.wants/radiusd.service output: [Unit] Description=FreeRADIUS high performance RADIUS server. After=syslog.target network.target [Service] Type=forking PIDFile=/var/run/radiusd/radiusd.pid ExecStartPre=-/bin/chown -R radiusd.radiusd /var/run/radiusd ExecStartPre=/usr/sbin/radiusd -C ExecStart=/usr/sbin/radiusd -d /etc/raddb ExecReload=/usr/sbin/radiusd -C ExecReload=/bin/kill -HUP $MAINPID [Install] WantedBy=multi-user.target

    Read the article

  • Why does calling setScaleX during pinch zoom gesture cause flicker?

    - by numan
    I am trying to create a zoomable container and I am targeting API 14+ In my onScale (i am using the ScaleGestureDetector to detect pinch-zoom) I am doing something like this: public boolean onScale (ScaleGestureDetector detector) { float scaleFactor = detector.getScaleFactor(); setScaleX(getScaleX() * scaleFactor); setScaleY(getScaleY() * scaleFactor); return true; }; It works but the zoom is not smooth. In fact it noticeably flickers. I also tried it with hardware layer thinking that the scaling would happen on the GPU once the texture was uploaded and thus would be super fast. But it made no difference - the zoom is not smooth and flickers weirdly sometimes. What am I doing wrong?

    Read the article

  • Best approach for GPGPU/CUDA/OpenCL in Java?

    - by Frederik
    General-purpose computing on graphics processing units (GPGPU) is a very attractive concept to harness the power of the GPU for any kind of computing. I'd love to use GPGPU for image processing, particles, and fast geometric operations. Right now, it seems the two contenders in this space are CUDA and OpenCL. I'd like to know: Is OpenCL usable yet from Java on Windows/Mac? What are the libraries ways to interface to OpenCL/CUDA? Is using JNA directly an option? Am I forgetting something? Any real-world experience/examples/war stories are appreciated.

    Read the article

  • Recommendation for high performance WPF Chart

    - by Ajaxx
    We're working on a WPF-based desktop application that charts financial markets information (candlestick charts, overlayed indicator curves, volume, etc). The charts are displayed in real-time with responses to market ticks being shown in real-time (updating one to two times per second is probably a reasonable display refresh policy). We've been looking for a software package (commercial is fine by us) that has the capability of displaying these charts. Additionally, we'd like to have an approach that can render the initial amount of data in a reasonable timeframe (give or take 100-200ms from the time we hand the data over to a complete render on screen). Also we view multiple charts (5-10) simultaneously so a solution that chews up 50% of my CPU to display one chart really isn't going to work well. Has anyone had any good experiences with charting controls. We've had to hand roll the last few charts we've done and I'd prefer not to do it again. Solutions that can make use of the GPU to minimize CPU utilization would be nice as well.

    Read the article

  • glFramebufferTexture2D performance

    - by nornagon
    I'm doing heavy computation using the GPU, which involves a lot of render-to-texture operations. It's an iterative computation, so there's a lot of rendering to a texture, then rendering that texture to another texture, then rendering the second texture back to the first texture and so on, passing the texture through a shader each time. My question is: is it better to have a separate FBO for each texture I want to render into, or should I rather have one FBO and bind the target texture using glFramebufferTexture2D each time I want to change render target? My platform is OpenGL ES 2.0 on the iPhone.

    Read the article

  • OpenCL or OpenGL – which one to use?

    - by Malte Schledjewski
    My Problem involves a black and white image with a black area in the middle. I never worked with OpenGL or OpenCL before so I do not know which one to chose. I want to put some white circles over the area and check at the end whether the whole image is white. I will try many combinations so I want to use the GPU because of its parallelism. Should I use OpenGL and create the circle as a texture and put it on top of the image or should I write some OpenCL kernels which work on the pixel/entries in the matrix?

    Read the article

  • Do GLSL geometry shaders work on the GMA X3100 under OSX

    - by GameFreak
    I am trying to use a trivial geometry shader but when run in Shader Builder on a laptop with a GMA X3100 it falls back and uses the software render. According this document the GMA X3100 does support EXT_geometry_shader4. The input is POINTS and the output is LINE_STRIP. What would be required to get it to run on the GPU (if possible) uniform vec2 offset; void main() { gl_Position = gl_PositionIn[0]; EmitVertex(); gl_Position = gl_PositionIn[0] + vec4(offset.x,offset.y,0,0); EmitVertex(); EndPrimitive(); }

    Read the article

  • Updating a Cuda 4.0 project to Cuda 4.2

    - by aljndrrr
    I have a VS2010 project that was tested with CUDA 4.0, today I installed CUDA 4.2 and I want to update this project, the problem is that when I try to run the project it asks me for cudart32_40_17.dll, but since this is CUDA 4.2 I only have on my folders (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\bin) cudart32_42_9.dll. I already set the Build Customizations to Cuda 4.2 and it compiles without any problem, the only problem is when I try to run it, the app asks me for the previous version of the dll. Is there a way to especify that the project must use cudart32_42_9.dll?

    Read the article

  • Java2D OpenGL Hardware Acceleration Doesn't Work

    - by Aaron
    It doesn't work with OpenGL with even the simplest of programs. Here is what I am doing.. java -Dsun.java2d.opengl=True -jar Java2Demo.jar (Java2Demo.jar is usually included with the JDK..) The text output is: OpenGL pipeline enabled for default config on screen 0 When I don't pass in the above VM argument things work fine (but slowly). When I do pass in the above argument nothing shows up... If I move the window around it captures whatever image it was on top of and jumbles it into nonsense. I'm running Windows XP Pro SP3 (Microsoft Windows XP [Version 5.1.2600]) (under Parallels on OS X 10.5.8) I used "Geeks3D GPU Caps Viewer" to tell me I have Open GL version: 2.0 NVIDIA-1.5.48 I have tried this with two version of the JVM. First: java version "1.6.0_13" Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) Client VM (build 11.3-b02, mixed mode) and second: java version "1.6.0_20" Java(TM) SE Runtime Environment (build 1.6.0_20-b02) Java HotSpot(TM) Client VM (build 16.3-b01, mixed mode, sharing)

    Read the article

  • TMS320C64x Quick start reference for porgrammers

    - by osgx
    Hello Is thare any quickstart guide for programmers for writing DSP-accelerated appliations for TMS320C64x? I have a program with custom algorythm (not the fft, or usial filtering) and I want to accelerate it using multi-DSP coprocessor. So, how should I modify source to move computation from main CPU to DSPs? What limitations are there for DSP-running code? I have some experience with CUDA. In CUDA I should mark every function as being host, device, or entry point for device (kernel). There are also functions to start kernels and to upload/download data to/from GPU. There are also some limitations, for device code, described in CUDA Reference manual. I hope, there is an similar interface and a documentation for DSP.

    Read the article

  • Matrix inversion in OpenCL

    - by buchtak
    Hi, I am trying to accelerate some computations using OpenCL and part of the algorithm consists of inverting a matrix. Is there any open-source library or freely available code to compute lu factorization (lapack dgetrf and dgetri) of matrix or general inversion written in OpenCL or CUDA? The matrix is real and square but doesn't have any other special properties besides that. So far, I've managed to find only basic blas matrix-vector operations implementations on gpu. The matrix is rather small, only about 60-100 rows and cols, so it could be computed faster on cpu, but it's used kinda in the middle of the algorithm, so I would have to transfer it to host, calculate the inverse, and then transfer the result back on the device where it's then used in much larger computations.

    Read the article

  • Fastest sort of fixed length 6 int array

    - by kriss
    Answering to another StackOverflow question (this one) I stumbled upon an interresting sub-problem. What is the fastest way to sort an array of 6 ints ? As the question is very low level (will be executed by a GPU): we can't assume libraries are available (and the call itself has it's cost), only plain C to avoid emptying instruction pipeline (that has a very high cost) we should probably minimize branches, jumps, and every other kind of control flow breaking (like those hidden behind sequence points in && or ||). room is constrained and minimizing registers and memory use is an issue, ideally in place sort is probably best. Really this question is a kind of Golf where the goal is not to minimize source length but execution speed. I call it 'Zening` code as used in the title of the book Zen of Code optimization by Michael Abrash and it's sequels.

    Read the article

  • OpenGL Shading Language backwards compatibility

    - by Luca
    I've noticed that my GLSL shaders are not compilable when the GLSL version is lower than 130. What are the most critical elements for having a backward compatible shader source? I don't want to have a full backward compatibility, but I'd like to understand the main guidelines for having simple (forward compatible) shaders running on GPU with GLSL lower than 130. Of course the problem could be solved with the preprocessor #if __VERSION__ < 130 #define VERTEX_IN attribute #else #define VERTER_IN in #endif But there probably many issues that I ignore. Thank you

    Read the article

  • HLSL: Enforce Constant Register Limit at Compile Time

    - by Andrew Russell
    In HLSL, is there any way to limit the number of constant registers that the compiler uses? Specifically, if I have something like: float4 foobar[300]; In a vs_2_0 vertex shader, the compiler will merrily generate the effect with more than 256 constant registers. But a 2.0 vertex shader is only guaranteed to have access to 256 constant registers, so when I try to use the effect, it fails in an obscure and GPU-dependent way at runtime. I would much rather have it fail at compile time. This problem is especially annoying as the compiler itself allocates constant registers behind the scenes, on top of the ones I am asking for. I have to check the assembly to see if I'm over the limit. Ideally I'd like to do this in HLSL (I'm using the XNA content pipeline), but if there's a flag that can be passed to the compiler that would also be interesting.

    Read the article

  • HPC (mainly on Java)

    - by Insectatorious
    I'm looking for some way of using the number-crunching ability of a GPU (with Java perhaps?) in addition to using the multiple cores that the target machine has. I will be working on implementing (at present) the A* Algorithm but in the future I hope to replace it with a Genetic Algorithm of sorts. I've looked at Project Fortress but as I'm building my GUI in JavaFX, I'd prefer not to stray too far from a JVM. Of course, should no feasible solution be available, I will migrate to the easiest solution to implement.

    Read the article

  • Measuring device drivers CPU/IO utilization caused by my program

    - by Lior Kogan
    Sometimes code can utilize device drivers up to the point where the system is unresponsive. Lately I've optimized a WIN32/VC++ code which made the system almost unresponsive. The CPU usage, however, was very low. The reason was 1000's of creations and destruction of GDI objects (pens, brushes, etc.). Once I refactored the code to create all objects only once - the system became responsive again. This leads me to the question: Is there a way to measure CPU/IO usage of device drivers (GPU/disk/etc) for a given program / function / line of code?

    Read the article

  • directx 9 hlsl vs. directx 10 hlsl : syntex the same.

    - by numerical25
    For the past month or so I been busting my behind trying to learn directx. So I been mixing back back and fourth from directx 9 to directx 10. One of the major changes I've seen in the two is how to process vectors in the graphics card. one of the drastic changes I notice is how you get the gpu to recognize your structs. In directx 9 you define the Flexible Vertex Formats your typical set up would be like this... #define CUSTOMFVF (D3DFVF_XYZRHW | D3DFVF_DIFFUSE) in directx 10 I believe the equivalent is the input vertex description D3D10_INPUT_ELEMENT_DESC layout[] = { {"POSITION",0,DXGI_FORMAT_R32G32B32_FLOAT, 0 , 0, D3D10_INPUT_PER_VERTEX_DATA, 0}, {"COLOR",0,DXGI_FORMAT_R32G32B32A32_FLOAT, 0 , 12, D3D10_INPUT_PER_VERTEX_DATA, 0} }; I notice in directx 10. it is more descriptive. besides this, what are some of the drastic changes made. and is the hlsl syntax the same for both.

    Read the article

  • DirectX 9 HLSL vs. DirectX 10 HLSL: syntax the same?

    - by numerical25
    For the past month or so, I have been busting my behind trying to learn DirectX. So I've been mixing back back and forth between DirectX 9 and 10. One of the major changes I've seen in the two is how to process vectors in the graphics card. One of the drastic changes I notice is how you get the GPU to recognize your structs. In DirectX 9, you define the Flexible Vertex Formats. Your typical set up would be like this: #define CUSTOMFVF (D3DFVF_XYZRHW | D3DFVF_DIFFUSE) In DirectX 10, I believe the equivalent is the input vertex description: D3D10_INPUT_ELEMENT_DESC layout[] = { {"POSITION",0,DXGI_FORMAT_R32G32B32_FLOAT, 0 , 0, D3D10_INPUT_PER_VERTEX_DATA, 0}, {"COLOR",0,DXGI_FORMAT_R32G32B32A32_FLOAT, 0 , 12, D3D10_INPUT_PER_VERTEX_DATA, 0} }; I notice in DirectX 10 that it is more descriptive. Besides this, what are some of the drastic changes made, and is the HLSL syntax the same for both?

    Read the article

  • How do I use compiler intrinsic __fmul_?

    - by Eric Thoma
    I am writing a massively parallel GPU application. I have been optimizing it by hand. I received a 20% performance increase with _fdividef(x, y), and according to The Cuda C Programming Guide (section C.2.1), using similar functions for multiplication and adding is also beneficial. The function is stated as this: "_fmulrn,rz,ru,rd". __fdividef(x,y) was not stated with the arguments in brackets. I was wondering, what are those brackets? If I run the simple code: int t = __fmul_(5,4); I a compiler error about how _fmul is undefined. I have the CUDA runtime included, so I don't think it is a setup thing; rather it is something to do with those square brackets. How do I correctly use this function? Thank you.

    Read the article

  • Image/"most resembling pixel" search optimization?

    - by SigTerm
    The situation: Let's say I have an image A, say, 512x512 pixels, and image B, 5x5 or 7x7 pixels. Both images are 24bit rgb, and B have 1bit alpha mask (so each pixel is either completely transparent or completely solid). I need to find within image A a pixel which (with its' neighbors) most closely resembles image B, OR the pixel that probably most closely resembles image B. Resemblance is calculated as "distance" which is sum of "distances" between non-transparent B's pixels and A's pixels divided by number of non-transparent B's pixels. Here is a sample SDL code for explanation: struct Pixel{ unsigned char b, g, r, a; }; void fillPixel(int x, int y, SDL_Surface* dst, SDL_Surface* src, int dstMaskX, int dstMaskY){ Pixel& dstPix = *((Pixel*)((char*)(dst->pixels) + sizeof(Pixel)*x + dst->pitch*y)); int xMin = x + texWidth - searchWidth; int xMax = xMin + searchWidth*2; int yMin = y + texHeight - searchHeight; int yMax = yMin + searchHeight*2; int numFilled = 0; for (int curY = yMin; curY < yMax; curY++) for (int curX = xMin; curX < xMax; curX++){ Pixel& cur = *((Pixel*)((char*)(dst->pixels) + sizeof(Pixel)*(curX & texMaskX) + dst->pitch*(curY & texMaskY))); if (cur.a != 0) numFilled++; } if (numFilled == 0){ int srcX = rand() % src->w; int srcY = rand() % src->h; dstPix = *((Pixel*)((char*)(src->pixels) + sizeof(Pixel)*srcX + src->pitch*srcY)); dstPix.a = 0xFF; return; } int storedSrcX = rand() % src->w; int storedSrcY = rand() % src->h; float lastDifference = 3.40282347e+37F; //unsigned char mask = for (int srcY = searchHeight; srcY < (src->h - searchHeight); srcY++) for (int srcX = searchWidth; srcX < (src->w - searchWidth); srcX++){ float curDifference = 0; int numPixels = 0; for (int tmpY = -searchHeight; tmpY < searchHeight; tmpY++) for(int tmpX = -searchWidth; tmpX < searchWidth; tmpX++){ Pixel& tmpSrc = *((Pixel*)((char*)(src->pixels) + sizeof(Pixel)*(srcX+tmpX) + src->pitch*(srcY+tmpY))); Pixel& tmpDst = *((Pixel*)((char*)(dst->pixels) + sizeof(Pixel)*((x + dst->w + tmpX) & dstMaskX) + dst->pitch*((y + dst->h + tmpY) & dstMaskY))); if (tmpDst.a){ numPixels++; int dr = tmpSrc.r - tmpDst.r; int dg = tmpSrc.g - tmpDst.g; int db = tmpSrc.g - tmpDst.g; curDifference += dr*dr + dg*dg + db*db; } } if (numPixels) curDifference /= (float)numPixels; if (curDifference < lastDifference){ lastDifference = curDifference; storedSrcX = srcX; storedSrcY = srcY; } } dstPix = *((Pixel*)((char*)(src->pixels) + sizeof(Pixel)*storedSrcX + src->pitch*storedSrcY)); dstPix.a = 0xFF; } This thing is supposed to be used for texture generation. Now, the question: The easiest way to do this is brute force search (which is used in example routine). But it is slow - even using GPU acceleration and dual core cpu won't make it much faster. It looks like I can't use modified binary search because of B's mask. So, how can I find desired pixel faster? Additional Info: It is allowed to use 2 cores, GPU acceleration, CUDA, and 1.5..2 gigabytes of RAM for the task. I would prefer to avoid some kind of lengthy preprocessing phase that will take 30 minutes to finish. Ideas?

    Read the article

  • webgl adding projection doesnt display object

    - by dazed3confused
    I am having a look at web gl, and trying to render a cube, but I am having a problem when I try to add projection into the vertex shader. I have added an attribute, but when I use it to multiple the modelview and position, it stops displaying the cube. Im not sure why and was wondering if anyone could help? Ive tried looking at a few examples but just cant get this to work vertex shader attribute vec3 aVertexPosition; uniform mat4 uMVMatrix; uniform mat4 uPMatrix; void main(void) { gl_Position = uPMatrix * uMVMatrix * vec4(aVertexPosition, 1.0); //gl_Position = uMVMatrix * vec4(aVertexPosition, 1.0); } fragment shader #ifdef GL_ES precision highp float; // Not sure why this is required, need to google it #endif uniform vec4 uColor; void main() { gl_FragColor = uColor; } function init() { // Get a reference to our drawing surface canvas = document.getElementById("webglSurface"); gl = canvas.getContext("experimental-webgl"); /** Create our simple program **/ // Get our shaders var v = document.getElementById("vertexShader").firstChild.nodeValue; var f = document.getElementById("fragmentShader").firstChild.nodeValue; // Compile vertex shader var vs = gl.createShader(gl.VERTEX_SHADER); gl.shaderSource(vs, v); gl.compileShader(vs); // Compile fragment shader var fs = gl.createShader(gl.FRAGMENT_SHADER); gl.shaderSource(fs, f); gl.compileShader(fs); // Create program and attach shaders program = gl.createProgram(); gl.attachShader(program, vs); gl.attachShader(program, fs); gl.linkProgram(program); // Some debug code to check for shader compile errors and log them to console if (!gl.getShaderParameter(vs, gl.COMPILE_STATUS)) console.log(gl.getShaderInfoLog(vs)); if (!gl.getShaderParameter(fs, gl.COMPILE_STATUS)) console.log(gl.getShaderInfoLog(fs)); if (!gl.getProgramParameter(program, gl.LINK_STATUS)) console.log(gl.getProgramInfoLog(program)); /* Create some simple VBOs*/ // Vertices for a cube var vertices = new Float32Array([ -0.5, 0.5, 0.5, // 0 -0.5, -0.5, 0.5, // 1 0.5, 0.5, 0.5, // 2 0.5, -0.5, 0.5, // 3 -0.5, 0.5, -0.5, // 4 -0.5, -0.5, -0.5, // 5 -0.5, 0.5, -0.5, // 6 -0.5,-0.5, -0.5 // 7 ]); // Indices of the cube var indicies = new Int16Array([ 0, 1, 2, 1, 2, 3, // front 5, 4, 6, 5, 6, 7, // back 0, 1, 5, 0, 5, 4, // left 2, 3, 6, 6, 3, 7, // right 0, 4, 2, 4, 2, 6, // top 5, 3, 1, 5, 3, 7 // bottom ]); // create vertices object on the GPU vbo = gl.createBuffer(); gl.bindBuffer(gl.ARRAY_BUFFER, vbo); gl.bufferData(gl.ARRAY_BUFFER, vertices, gl.STATIC_DRAW); // Create indicies object on th GPU ibo = gl.createBuffer(); gl.bindBuffer(gl.ELEMENT_ARRAY_BUFFER, ibo); gl.bufferData(gl.ELEMENT_ARRAY_BUFFER, indicies, gl.STATIC_DRAW); gl.clearColor(0.0, 0.0, 0.0, 1.0); gl.enable(gl.DEPTH_TEST); // Render scene every 33 milliseconds setInterval(render, 33); } var mvMatrix = mat4.create(); var pMatrix = mat4.create(); function render() { // Set our viewport and clear it before we render gl.viewport(0, 0, canvas.width, canvas.height); gl.clear(gl.COLOR_BUFFER_BIT | gl.DEPTH_BUFFER_BIT); gl.useProgram(program); // Bind appropriate VBOs gl.bindBuffer(gl.ARRAY_BUFFER, vbo); gl.bindBuffer(gl.ELEMENT_ARRAY_BUFFER, ibo); // Set the color for the fragment shader program.uColor = gl.getUniformLocation(program, "uColor"); gl.uniform4fv(program.uColor, [0.3, 0.3, 0.3, 1.0]); // // code.google.com/p/glmatrix/wiki/Usage program.uPMatrix = gl.getUniformLocation(program, "uPMatrix"); program.uMVMatrix = gl.getUniformLocation(program, "uMVMatrix"); mat4.perspective(45, gl.viewportWidth / gl.viewportHeight, 1.0, 10.0, pMatrix); mat4.identity(mvMatrix); mat4.translate(mvMatrix, [0.0, -0.25, -1.0]); gl.uniformMatrix4fv(program.uPMatrix, false, pMatrix); gl.uniformMatrix4fv(program.uMVMatrix, false, mvMatrix); // Set the position for the vertex shader program.aVertexPosition = gl.getAttribLocation(program, "aVertexPosition"); gl.enableVertexAttribArray(program.aVertexPosition); gl.vertexAttribPointer(program.aVertexPosition, 3, gl.FLOAT, false, 3*4, 0); // position // Render the Object gl.drawElements(gl.TRIANGLES, 36, gl.UNSIGNED_SHORT, 0); } Thanks in advance for any help

    Read the article

  • Can I program Nvidia's CUDA using only Python or do I have to learn C?

    - by Aquateenfan
    I guess the question speaks for itself. I'm interested in doing some serious computations but am not a programmer by trade. I can string enough python together to get done what I want. But can I write a program in python and have the GPU execute it using CUDA? Or do I have to use some mix of python and C? The examples on Klockner's (sp) "pyCUDA" webpage had a mix of both python and C, so I'm not sure what the answer is. If anyone wants to chime in about Opencl, feel free. I heard about this CUDA business only a couple of weeks ago and didn't know you could use your video cards like this. thx

    Read the article

  • Modifying an image with OpenGL ?

    - by chmike
    I have a device to acquire XRay images. Due to some technical constrains, the detector is made of heterogeneous pixel size and multiple tilted and partially overlapping tiles. The image is thus distorted. The detector geometry is known precisely. I need a function converting these distorted images into a flat image with homogeneous pixel size. I have already done this by CPU, but I would like to give a try with OpenGL to use the GPU in a portable way. I have no experience with OpenGL programming, and most of the information I could find on the web was useless for this use. How should I proceed ? How do I do this ? Image size are 560x860 pixels and we have batches of 720 images to process. I'm on Ubuntu.

    Read the article

  • How does Photoshop (Or drawing programs) blit?

    - by user146780
    I'm getting ready to make a drawing application in Windows. I'm just wondering, do drawing programs have a memory bitmap which they lock, then set each pixel, then blit? I don't understand how Photoshop can move entire layers without lag or flicker without using hardware acceleration. Also in a program like Expression Design, I could have 200 shapes and move them around all at once with no lag. I'm really wondering how this can be done without GPU help. I don't think super efficient algorithms could justify that? Thanks

    Read the article

< Previous Page | 126 127 128 129 130 131 132 133 134 135 136 137  | Next Page >