Search Results

Search found 209 results on 9 pages for 'absence'.

Page 9/9 | < Previous Page | 5 6 7 8 9 

  • Mac 10.6 Universal Binary scipy: cephes/specfun "_aswfa_" symbol not found

    - by Markus
    Hi folks, I can't get scipy to function in 32 bit mode when compiled as a i386/x86_64 universal binary, and executed on my 64 bit 10.6.2 MacPro1,1. My python setup With the help of this answer, I built a 32/64 bit intel universal binary of python 2.6.4 with the intention of using the arch command to select between the architectures. (I managed to make some universal binaries of a few libraries I wanted using lipo.) That all works. I then installed scipy according to the instructions on hyperjeff's article, only with more up-to-date numpy (1.4.0) and skipping the bit about moving numpy aside briefly during the installation of scipy. Now, everything except scipy seems to be working as far as I can tell, and I can indeed select between 32 and 64 bit mode using arch -i386 python and arch -x86_64 python. The error Scipy complains in 32 bit mode: $ arch -x86_64 python -c "import scipy.interpolate; print 'success'" success $ arch -i386 python -c "import scipy.interpolate; print 'success'" Traceback (most recent call last): File "<string>", line 1, in <module> File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/interpolate/__init__.py", line 7, in <module> from interpolate import * File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/interpolate/interpolate.py", line 13, in <module> import scipy.special as spec File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/special/__init__.py", line 8, in <module> from basic import * File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/special/basic.py", line 8, in <module> from _cephes import * ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/special/_cephes.so, 2): Symbol not found: _aswfa_ Referenced from: /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/special/_cephes.so Expected in: flat namespace in /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/special/_cephes.so Attempt at tracking down the problem It looks like scipy.interpolate imports something called _cephes, which looks for a symbol called _aswfa_ but can't find it in 32 bit mode. Browsing through scipy's source, I find an ASWFA subroutine in specfun.f. The only scipy product file with a similar name is specfun.so, but both that and _cephes.so appear to be universal binaries: $ cd /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/special/ $ file _cephes.so specfun.so _cephes.so: Mach-O universal binary with 2 architectures _cephes.so (for architecture i386): Mach-O bundle i386 _cephes.so (for architecture x86_64): Mach-O 64-bit bundle x86_64 specfun.so: Mach-O universal binary with 2 architectures specfun.so (for architecture i386): Mach-O bundle i386 specfun.so (for architecture x86_64): Mach-O 64-bit bundle x86_64 Ho hum. I'm stuck. Things I may try but haven't figured out how yet include compiling specfun.so myself manually, somehow. I would imagine that scipy isn't broken for all 32 bit machines, so I guess something is wrong with the way I've installed it, but I can't figure out what. I don't really expect a full answer given my fairly unique (?) setup, but if anyone has any clues that might point me in the right direction, they'd be greatly appreciated. (edit) More details to address questions: I'm using gfortran (GNU Fortran from GCC 4.2.1 Apple Inc. build 5646). Python 2.6.4 was installed more-or-less like so: cd /tmp curl -O http://www.python.org/ftp/python/2.6.4/Python-2.6.4.tar.bz2 tar xf Python-2.6.4.tar.bz2 cd Python-2.6.4 # Now replace buggy pythonw.c file with one that supports the "arch" command: curl http://bugs.python.org/file14949/pythonw.c | sed s/2.7/2.6/ > Mac/Tools/pythonw.c ./configure --enable-framework=/Library/Frameworks --enable-universalsdk=/ --with-universal-archs=intel make -j4 sudo make frameworkinstall Scipy 0.7.1 was installed pretty much as described as here, but it boils down to a simple sudo python setup.py install. It would indeed appear that the symbol is undefined in the i386 architecture if you look at the _cephes library with nm, as suggested by David Cournapeau: $ nm -arch x86_64 /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/special/_cephes.so | grep _aswfa_ 00000000000d4950 T _aswfa_ 000000000011e4b0 d _oblate_aswfa_data 000000000011e510 d _oblate_aswfa_nocv_data (snip) $ nm -arch i386 /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/special/_cephes.so | grep _aswfa_ U _aswfa_ 0002e96c d _oblate_aswfa_data 0002e99c d _oblate_aswfa_nocv_data (snip) however, I can't yet explain its absence.

    Read the article

  • How do I use UIImagePickerController just to display the camera and not take a picture?

    - by Thomas
    Hello All: I'd like to know how to open the camera inside of a pre-defined frame (not the entire screen). When the view loads, I have a box, and inside it, I want to display what the camera sees. I don't want to snap a picture, just basically use the camera as a viewfinder. I have searched this site and have not yet found what I'm looking for. Please help. Thanks! Thomas Update 1: Here is what I have tried so far. 1.) I added UIImageView to my xib. 2.) Connect the following outlet to the UIImageView in IB IBOutlet UIImageView *cameraWindow; 3.) I put the following code in viewWillAppear -(void)viewWillAppear:(BOOL)animated { [super viewWillAppear:animated]; UIImagePickerController *picker = [[UIImagePickerController alloc] init]; picker.delegate = self; picker.sourceType = UIImagePickerControllerSourceTypeCamera; [self presentModalViewController:picker animated:YES]; NSLog(@"viewWillAppear ran"); } But this method does not run, as evident by the absence of NSLog statement from my console. Please help! Thanks, Thomas Update 2: OK I got it to run by putting the code in viewDidLoad but my camera still doesn't show up...any suggestions? Anyone....? I've been reading the UIImagePickerController class reference, but am kinda unsure how to make sense of it. I'm still learning iPhone, so it's a bit of a struggle. Please help! - (void)viewDidLoad { [super viewDidLoad]; // Create a bool variable "camera" and call isSourceTypeAvailable to see if camera exists on device BOOL camera = [UIImagePickerController isSourceTypeAvailable:UIImagePickerControllerSourceTypeCamera]; // If there is a camera, then display the world throught the viewfinder if(camera) { UIImagePickerController *picker = [[UIImagePickerController alloc] init]; // Since I'm not actually taking a picture, is a delegate function necessary? picker.delegate = self; picker.sourceType = UIImagePickerControllerSourceTypeCamera; [self presentModalViewController:picker animated:YES]; NSLog(@"Camera is available"); } // Otherwise, do nothing. else NSLog(@"No camera available"); } Thanks! Thomas Update 3: A-HA! Found this on the Apple Class Reference. Discussion The delegate receives notifications when the user picks an image or movie, or exits the picker interface. The delegate also decides when to dismiss the picker interface, so you must provide a delegate to use a picker. If this property is nil, the picker is dismissed immediately if you try to show it. Gonna play around with the delegate now. Then I'm going to read on wtf a delegate is. Backwards? Whatever :-p Update 4: The two delegate functions for the class are – imagePickerController:didFinishPickingMediaWithInfo: – imagePickerControllerDidCancel: and since I don't actually want to pick an image or give the user the option to cancel, I am just defining the methods. They should never run though....I think.

    Read the article

  • Why Wifi no longer works 12.04.1

    - by Roger
    starting this morning over wifi (Realtek RTL8188CE) on CLEVO W253HU. May be due to the update before yesterday, more pilot managed, but somehow it worked yesterday. If someone has an idea of the problem. Back command lines: cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=12.04 DISTRIB_CODENAME=precise DISTRIB_DESCRIPTION="Ubuntu 12.04.1 LTS" lsusb Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub Bus 002 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub Bus 002 Device 006: ID 192f:0416 Avago Technologies, Pte. Bus 002 Device 004: ID 5986:0315 Acer, Inc lspci 00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09) 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) 00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04) 00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05) 00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 05) 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) 00:1c.2 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 3 (rev b5) 00:1c.3 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 4 (rev b5) 00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05) 00:1f.0 ISA bridge: Intel Corporation HM65 Express Chipset Family LPC Controller (rev 05) 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller (rev 05) 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05) 02:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8188CE 802.11b/g/n WiFi Adapter (rev 01) 03:00.0 Ethernet controller: JMicron Technology Corp. JMC250 PCI Express Gigabit Ethernet Controller (rev 05) 03:00.1 System peripheral: JMicron Technology Corp. SD/MMC Host Controller (rev 90) 03:00.2 SD Host controller: JMicron Technology Corp. Standard SD Host Controller (rev 90) 03:00.3 System peripheral: JMicron Technology Corp. MS Host Controller (rev 90) lspci -nn | grep -i net 02:00.0 Network controller [0280]: Realtek Semiconductor Co., Ltd. RTL8188CE 802.11b/g/n WiFi Adapter [10ec:8176] (rev 01) 03:00.0 Ethernet controller [0200]: JMicron Technology Corp. JMC250 PCI Express Gigabit Ethernet Controller [197b:0250] (rev 05) lspci -k 00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09) Subsystem: CLEVO/KAPOK Computer Device 4140 Kernel driver in use: agpgart-intel 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) Subsystem: CLEVO/KAPOK Computer Device 4140 Kernel driver in use: i915 Kernel modules: i915 00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04) Subsystem: CLEVO/KAPOK Computer Device 4140 Kernel driver in use: mei Kernel modules: mei 00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05) Subsystem: CLEVO/KAPOK Computer Device 4140 Kernel driver in use: ehci_hcd 00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 05) Subsystem: CLEVO/KAPOK Computer Device 4140 Kernel driver in use: snd_hda_intel Kernel modules: snd-hda-intel 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) Kernel driver in use: pcieport Kernel modules: shpchp 00:1c.2 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 3 (rev b5) Kernel driver in use: pcieport Kernel modules: shpchp 00:1c.3 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 4 (rev b5) Kernel driver in use: pcieport Kernel modules: shpchp 00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05) Subsystem: CLEVO/KAPOK Computer Device 4140 Kernel driver in use: ehci_hcd 00:1f.0 ISA bridge: Intel Corporation HM65 Express Chipset Family LPC Controller (rev 05) Subsystem: CLEVO/KAPOK Computer Device 4140 Kernel modules: iTCO_wdt 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller (rev 05) Subsystem: CLEVO/KAPOK Computer Device 4140 Kernel driver in use: ahci 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05) Subsystem: CLEVO/KAPOK Computer Device 4140 Kernel modules: i2c-i801 02:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8188CE 802.11b/g/n WiFi Adapter (rev 01) Subsystem: Realtek Semiconductor Co., Ltd. Device 9196 Kernel modules: rtl8192ce 03:00.0 Ethernet controller: JMicron Technology Corp. JMC250 PCI Express Gigabit Ethernet Controller (rev 05) Subsystem: CLEVO/KAPOK Computer Device 4140 Kernel driver in use: jme Kernel modules: jme 03:00.1 System peripheral: JMicron Technology Corp. SD/MMC Host Controller (rev 90) Subsystem: CLEVO/KAPOK Computer Device 4140 Kernel driver in use: sdhci-pci Kernel modules: sdhci-pci 03:00.2 SD Host controller: JMicron Technology Corp. Standard SD Host Controller (rev 90) Subsystem: CLEVO/KAPOK Computer Device 4140 Kernel modules: sdhci-pci 03:00.3 System peripheral: JMicron Technology Corp. MS Host Controller (rev 90) Subsystem: CLEVO/KAPOK Computer Device 4140 Kernel driver in use: jmb38x_ms Kernel modules: jmb38x_ms sudo lshw -C network *-network NON-RÉCLAMÉ description: Network controller produit: RTL8188CE 802.11b/g/n WiFi Adapter fabriquant: Realtek Semiconductor Co., Ltd. identifiant matériel: 0 information bus: pci@0000:02:00.0 version: 01 bits: 64 bits horloge: 33MHz fonctionnalités: pm msi pciexpress cap_list configuration: latency=0 ressources: portE/S:e000(taille=256) mémoire:f7d00000-f7d03fff *-network description: Ethernet interface produit: JMC250 PCI Express Gigabit Ethernet Controller fabriquant: JMicron Technology Corp. identifiant matériel: 0 information bus: pci@0000:03:00.0 nom logique: eth0 version: 05 numéro de série: 00:90:f5:c1:c6:45 taille: 100Mbit/s capacité: 1Gbit/s bits: 32 bits horloge: 33MHz fonctionnalités: pm pciexpress msix msi bus_master cap_list rom ethernet physical tp mii 10bt 10bt-fd 100bt 100bt-fd 1000bt 1000bt-fd autonegotiation configuration: autonegotiation=on broadcast=yes driver=jme driverversion=1.0.8 duplex=full ip=192.168.1.54 latency=0 link=yes multicast=yes port=MII speed=100Mbit/s ressources: irq:44 mémoire:f7c20000-f7c23fff portE/S:d100(taille=128) portE/S:d000(taille=256) mémoire:f7c10000-f7c1ffff mémoire:f7c00000-f7c0ffff lsmod Module Size Used by btusb 18288 0 rfcomm 47604 0 bnep 18281 2 bluetooth 180104 11 btusb,rfcomm,bnep parport_pc 32866 0 ppdev 17113 0 binfmt_misc 17540 1 snd_hda_codec_realtek 224173 0 dm_crypt 23125 0 snd_hda_codec_hdmi 32474 0 uvcvideo 72627 0 videodev 98259 1 uvcvideo v4l2_compat_ioctl32 17128 1 videodev snd_hda_intel 33773 2 snd_hda_codec 127706 3 snd_hda_codec_realtek,snd_hda_codec_hdmi,snd_hda_intel snd_hwdep 13668 1 snd_hda_codec snd_pcm 97188 3 snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec snd_seq_midi 13324 0 snd_rawmidi 30748 1 snd_seq_midi jmb38x_ms 17646 0 psmouse 87692 0 serio_raw 13211 0 memstick 16569 1 jmb38x_ms snd_seq_midi_event 14899 1 snd_seq_midi rtl8192ce 84826 0 rtl8192c_common 75767 1 rtl8192ce rtlwifi 111202 1 rtl8192ce snd_seq 61896 2 snd_seq_midi,snd_seq_midi_event snd_timer 29990 2 snd_pcm,snd_seq snd_seq_device 14540 3 snd_seq_midi,snd_rawmidi,snd_seq mac80211 506816 3 rtl8192ce,rtl8192c_common,rtlwifi mac_hid 13253 0 snd 78855 14 snd_hda_codec_realtek,snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec,snd_hwdep,snd_pcm,snd_rawmidi,snd_seq,snd_timer,snd_seq_device cfg80211 205544 2 rtlwifi,mac80211 soundcore 15091 1 snd snd_page_alloc 18529 2 snd_hda_intel,snd_pcm mei 41616 0 lp 17799 0 parport 46562 3 parport_pc,ppdev,lp usbhid 47199 0 hid 99559 1 usbhid i915 473035 3 drm_kms_helper 46978 1 i915 drm 242038 4 i915,drm_kms_helper jme 41259 0 i2c_algo_bit 13423 1 i915 sdhci_pci 18826 0 sdhci 33205 1 sdhci_pci wmi 19256 0 video 19596 1 i915 iwconfig lo no wireless extensions. eth0 no wireless extensions. ifconfig eth0 Link encap:Ethernet HWaddr 00:90:f5:c1:c6:45 inet adr:192.168.1.54 Bcast:192.168.1.255 Masque:255.255.255.0 adr inet6: fe80::290:f5ff:fec1:c645/64 Scope:Lien UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Packets reçus:4513 erreurs:0 :0 overruns:0 frame:0 TX packets:4359 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 lg file transmission:1000 Octets reçus:3471675 (3.4 MB) Octets transmis:712722 (712.7 KB) Interruption:44 lo Link encap:Boucle locale inet adr:127.0.0.1 Masque:255.0.0.0 adr inet6: ::1/128 Scope:Hôte UP LOOPBACK RUNNING MTU:16436 Metric:1 Packets reçus:686 erreurs:0 :0 overruns:0 frame:0 TX packets:686 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 lg file transmission:0 Octets reçus:64556 (64.5 KB) Octets transmis:64556 (64.5 KB) sudo iwlist scan lo Interface doesn't support scanning. eth0 Interface doesn't support scanning. uname -r -m 3.2.0-30-generic x86_64 cat /etc/network/interfaces auto lo iface lo inet loopback nm-tool NetworkManager Tool State: connected (global) - Device: eth0 [Connexion filaire 1] ------------------------------------------ Type: Wired Driver: jme State: connected Default: yes HW Address: 00:90:F5:C1:C6:45 Capabilities: Carrier Detect: yes Speed: 100 Mb/s Wired Properties Carrier: on IPv4 Settings: Address: 192.168.1.54 Prefix: 24 (255.255.255.0) Gateway: 192.168.1.1 DNS: 192.168.1.1 sudo rfkill listrfkill list 1: hci0: Bluetooth Soft blocked: no Hard blocked: no The absence of line "Kernel driver in use:" the return of lspci-k made ??me think that it is not loaded yet but he seems to be. lsmod | grep rtl8192ce rtl8192ce 137478 0 rtlwifi 118749 1 rtl8192ce mac80211 506816 2 rtl8192ce,rtlwifi I found something disturbing in / var / log / syslog Sep 14 11:40:11 pcroger kernel: [ 64.048783] rtl8192ce-0:rtl92c_init_sw_vars():<0-0> Failed to request firmware! Sep 14 11:40:11 pcroger kernel: [ 64.048795] rtlwifi-0:rtl_pci_probe():<0-0> Can't init_sw_vars. Sep 14 11:40:11 pcroger kernel: [ 64.048835] rtl8192ce 0000:02:00.0: PCI INT A disabled Sep 14 11:40:11 pcroger kernel: [ 64.943345] ata1.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x6 frozen Sep 14 11:40:11 pcroger kernel: [ 64.943358] ata1.00: failed command: READ FPDMA QUEUED Sep 14 11:40:11 pcroger kernel: [ 64.943371] ata1.00: cmd 60/00:00:00:68:6a/04:00:0b:00:00/40 tag 0 ncq 524288 in Sep 14 11:40:11 pcroger kernel: [ 64.943374] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 14 11:40:11 pcroger kernel: [ 64.943381] ata1.00: status: { DRDY } Ubuntu and takes forever to start (2 min).

    Read the article

  • Protecting a WebCenter app with OAM 11g - the Webcenter side

    - by Martin Deh
    Recently, there was a customer requirment to enable a WebCenter custom portal application to have multiple login-type pages and have the authentication be handle through Oracle Access Manager (OAM) As my security colleagues would tell me, this is fully supported through OAM.  Basically, all that would have to be done is to define in OAM individual resources (directories, URLS , .etc) that needed to be secured. Once that was done, OAM would handle the rest and the user would typically then be prompted by a login page, which was provided by OAM.  I am not going to discuss talking about OAM security in this blog.  In addition, my colleague Chris Johnson (ATEAM security) has already blogged his side of the story here:  http://fusionsecurity.blogspot.com/2012/06/protecting-webcenter-app-with-oam-11g.html .  What I am going to cover is what was done on the WebCenter/ADF side of things. In the test application, basically the structure of pages defined in the pages.xml are as follows:  In this screenshot, notice that "Delegated Security" has been selected, and of the absence for the anonymous-role for the "secured" page (A - B is the same)  This essentially in the WebCenter world means that each of these pages are protected, and only accessible by those define by the applications "role".  For more information on how WebCenter handles security, which by the way extends from ADF security, please refer to the documentation.  The (default) navigation model was configured.  You can see that with this set up, a user will be able to view the "links", where the links define navigation to the respective page:   Note from this dialog, you could also set some security on each link via the "visible" property.  However, the recommended best practice is to set the permissions through the page hierarchy (pages.xml).  Now based on this set up, the expected behavior is that I could only see the link for secured A page only if I was already authenticated (logged in).  But, this is not the use case of the requirement, since any user (anonymous) should be able to view (and click on the link).  So how is this accomplished?  There is now a patch that enables this.  In addition, the portal application's web.xml will need an additional context parameter: <context-param>     <param-name>oracle.webcenter.navigationframework.SECURITY_LEVEL</param-name>     <param-value>public</param-value>  </context-param>  As Chris mentions in his part of the blog, the code that is responsible for displaying the "links" is based upon the retrieval of the navigation model "node" prettyURL.  The prettyURL is a generated URL that also includes the adf.ctrl-state token, which is very important to the ADF framework runtime.  URLs that are void of this token, get new tokens from the ADF runtime.  This can lead to potential memory issues.  <af:forEach var="node" varStatus="vs"    items="#{navigationContext.defaultNavigationModel.listModel['startNode=/,includeStartNode=false']}">                 <af:spacer width="10" height="10" id="s1"/>                 <af:panelGroupLayout id="pgl2" layout="vertical"                                      inlineStyle="border:blue solid 1px">                   <af:goLink id="pt_gl1" text="#{node.title}"                              destination="#{node.goLinkPrettyUrl}"                              targetFrame="#{node.attributes['Target']}"                              inlineStyle="font-size:large;#{node.selected ? 'font-weight:bold;' : ''}"/>                   <af:spacer width="10" height="10" id="s2"/>                   <af:outputText value="#{node.goLinkPrettyUrl}" id="ot2"                                  inlineStyle="font-size:medium; font-weight:bold;"/>                 </af:panelGroupLayout>               </af:forEach>  So now that the links are visible to all, clicking on a secure link will be intercepted by OAM.  Since the OAM can also configure in the Authentication Scheme, the challenging URL (the login page(s)) can also come from anywhere.  In this case the each login page have been defined in the custom portal application.  This was another requirement as well, since this login page also needed to have ADF based content.  This would not be possible if the login page came from OAM.  The following is the example login page: <?xml version='1.0' encoding='UTF-8'?> <jsp:root xmlns:jsp="http://java.sun.com/JSP/Page" version="2.1"           xmlns:f="http://java.sun.com/jsf/core"           xmlns:h="http://java.sun.com/jsf/html"           xmlns:af="http://xmlns.oracle.com/adf/faces/rich">   <jsp:directive.page contentType="text/html;charset=UTF-8"/>   <f:view>     <af:document title="Settings" id="d1">       <af:panelGroupLayout id="pgl1" layout="vertical"/>       <af:outputText value="LOGIN FORM FOR A" id="ot1"/>       <form id="loginform" name="loginform" method="POST"             action="XXXXXXXX:14100/oam/server/auth_cred_submit">         <table>           <tr>             <td align="right">username:</td>             <td align="left">               <input name="username" type="text"/>             </td>           </tr>                      <tr>             <td align="right">password:</td>             <td align="left">               <input name="password" type="password"/>             </td>           </tr>                      <tr>             <td colspan="2" align="center">               <input value=" login " type="submit"/>             </td>           </tr>         </table>         <input name="request_id" type="hidden" value="${param['request_id']}"                id="itsss"/>       </form>     </af:document>   </f:view> </jsp:root> As you can see the code is pretty straight forward.  The most important section is in the form tag, where the submit is a POST to the OAM server.  This example page is mostly HTML, however, it is valid to have adf tags mixed in as well.  As a side note, this solution is really to tailored for a specific requirement.  Normally, there would be only one login page (or dialog/popup), and the OAM challenge resource would be /adfAuthentication.  This maps to the adfAuthentication servlet.  Please see the documentation for more about ADF security here. 

    Read the article

  • CodePlex Daily Summary for Sunday, November 21, 2010

    CodePlex Daily Summary for Sunday, November 21, 2010Popular ReleasesMDownloader: MDownloader-0.15.24.6966: Fixed Updater; Fixed minor bugs;Smith Html Editor: Smith Html Editor V0.75: The first public release.MiniTwitter: 1.59: MiniTwitter 1.59 ???? ?? User Streams ????????????????? ?? ?????????????? ???????? ?????????????.NET Extensions - Extension Methods Library for C# and VB.NET: Release 2011.01: Added new extensions for - object.CountLoopsToNull Added new extensions for DateTime: - DateTime.IsWeekend - DateTime.AddWeeks Added new extensions for string: - string.Repeat - string.IsNumeric - string.ExtractDigits - string.ConcatWith - string.ToGuid - string.ToGuidSave Added new extensions for Exception: - Exception.GetOriginalException Added new extensions for Stream: - Stream.Write (overload) And other new methods ... Release as of dotnetpro 01/2011Code Sample from Microsoft: Visual Studio 2010 Code Samples 2010-11-19: Code samples for Visual Studio 2010Prism Training Kit: Prism Training Kit 4.0: Release NotesThis is an updated version of the Prism training Kit that targets Prism 4.0 and added labs for some of the new features of Prism 4.0. This release consists of a Training Kit with Labs on the following topics Modularity Dependency Injection Bootstrapper UI Composition Communication MEF Navigation Note: Take into account that this is a Beta version. If you find any bugs please report them in the Issue Tracker PrerequisitesVisual Studio 2010 Microsoft Word 2...Free language translator and file converter: Free Language Translator 2.2: Starting with version 2.0, the translator encountered a major redesign that uses MEF based plugins and .net 4.0. I've also fixed some bugs and added support for translating subtitles that can show up in video media players. Version 2.1 shows the context menu 'Translate' in Windows Explorer on right click. Version 2.2 has links to start the media file with its associated subtitle. Download the zip file and expand it in a temporary location on your local disk. At a minimum , you should uninstal...Free Silverlight & WPF Chart Control - Visifire: Visifire SL and WPF Charts v3.6.4 Released: Hi, Today we are releasing Visifire 3.6.4 with few bug fixes: * Multi-line Labels were getting clipped while exploding last DataPoint in Funnel and Pyramid chart. * ClosestPlotDistance property in Axis was not behaving as expected. * In DateTime Axis, Chart threw exception on mouse click over PlotArea if there were no DataPoints present in Chart. * ToolTip was not disappearing while changing the DataSource property of the DataSeries at real-time. * Chart threw exception ...Microsoft SQL Server Product Samples: Database: AdventureWorks 2008R2 SR1: Sample Databases for Microsoft SQL Server 2008R2 (SR1)This release is dedicated to the sample databases that ship for Microsoft SQL Server 2008R2. See Database Prerequisites for SQL Server 2008R2 for feature configurations required for installing the sample databases. See Installing SQL Server 2008R2 Databases for step by step installation instructions. The SR1 release contains minor bug fixes to the installer used to create the sample databases. There are no changes to the databases them...VidCoder: 0.7.2: Fixed duplicated subtitles when running multiple encodes off of the same title.Craig's Utility Library: Craig's Utility Library Code 2.0: This update contains a number of changes, added functionality, and bug fixes: Added transaction support to SQLHelper. Added linked/embedded resource ability to EmailSender. Updated List to take into account new functions. Added better support for MAC address in WMI classes. Fixed Parsing in Reflection class when dealing with sub classes. Fixed bug in SQLHelper when replacing the Command that is a select after doing a select. Fixed issue in SQL Server helper with regard to generati...MFCMAPI: November 2010 Release: Build: 6.0.0.1023 Full release notes at SGriffin's blog. If you just want to run the tool, get the executable. If you want to debug it, get the symbol file and the source. The 64 bit build will only work on a machine with Outlook 2010 64 bit installed. All other machines should use the 32 bit build, regardless of the operating system. Facebook BadgeDotNetNuke® Community Edition: 05.06.00: Major HighlightsAdded automatic portal alias creation for single portal installs Updated the file manager upload page to allow user to upload multiple files without returning to the file manager page. Fixed issue with Event Log Email Notifications. Fixed issue where Telerik HTML Editor was unable to upload files to secure or database folder. Fixed issue where registration page is not set correctly during an upgrade. Fixed issue where Sendmail stripped HTML and Links from emails...mVu Mobile Viewer: mVu Mobile Viewer 0.7.10.0: Tube8 fix.EPPlus-Create advanced Excel 2007 spreadsheets on the server: EPPlus 2.8.0.1: EPPlus-Create advanced Excel 2007 spreadsheets on the serverNew Features Improved chart support Different chart-types series on the same chart Support for secondary axis and a lot of new properties Better styling Encryption and Workbook protection Table support Import csv files Array formulas ...and a lot of bugfixesAutoLoL: AutoLoL v1.4.2: Added support for more clients (French and Russian) Settings are now stored sepperatly for each user on a computer Auto Login is much faster now Auto Login detects and handles caps lock state properly nowTailspinSpyworks - WebForms Sample Application: TailspinSpyworks-v0.9: Contains a number of bug fixes and additional tutorial steps as well as complete database implementation details.ASP.NET MVC Project Awesome (jQuery Ajax helpers): 1.3 and demos: It contains a rich set of helpers (controls) that you can use to build highly responsive and interactive Ajax-enabled Web applications. These helpers include Autocomplete, AjaxDropdown, Lookup, Confirm Dialog, Popup Form and Pager tested on mozilla, safari, chrome, opera, ie 9b/8/7/6 new stuff in 1.3 Autocomplete helper Autocomplete and AjaxDropdown can have parentId and be filled with data depending on the value of the parent PopupForm besides Content("ok") on success can also return J...Nearforums - ASP.NET MVC forum engine: Nearforums v4.1: Version 4.1 of the ASP.NET MVC forum engine, with great improvements: TinyMCE added as visual editor for messages (removed CKEditor). Integrated AntiSamy for cleaner html user post and add more prevention to potential injections. Admin status page: a page for the site admin to check the current status of the configuration / db / etc. View Roadmap for more details.UltimateJB: UltimateJB 2.01 PL3 KakaRoto + PSNYes by EvilSperm: Voici une version attendu avec impatience pour beaucoup : - La Version PSNYes pour pouvoir jouer sur le PSN avec une PS3 Jailbreaker. - Pour l'instant le PSNYes n'est disponible qu'avec les PS3 en firmwares 3.41 !!! - La version PL3 KAKAROTO intégre ses dernières modification et prépare a l'intégration du Firmware 3.30 !!! Conclusion : - UltimateJB PSNYes => Valide l'utilisation du PSN : Uniquement compatible avec les 3.41 - ultimateJB DEFAULT => Pas de PSN mais disponible pour les PS3 sui...New Projects1600hours: 1600hours project made in C++.aoleDownload: Aole Series DownloadBills and Cash Flow: Bills and Cash Flow is a simple multi-tenant application to track bills and view cash flowCUDAagrep: CUDAagrep, a fast CUDA implementation of agrep algorithm for approximate DNA/RNA sequence matching.DNN5 Simple Ticketing Module: This is a simple DNN module that accepts trouble tickets and creates a knowledge base for a company.EntityOH: Dynamic Entities ORMFxcop ASP.NET Security Rules: Fxcop ASP.NET security rules This is a set of code analysis rules aiming at analyzing ASP.NET and ASP.NET MVC security against best practices. The rules can be used by Visual Studio 10 Ultimate or FxCop v10 standalone.Head First Design Patterns - Code Examples in C#: This project consists of ported code examples from the book Head First Design Patterns by Eric and Elizabeth Freeman into C#.HTML5 Media Player (Video / Audio): A .NET implementation of the VideoJS and AudioJS open source projects with video and audio support for HTML5. Excellent for use with iPod, iPad, iPhone, etc.Keyword Auction Simulator: This is the project for simulating the keyword auction like Adwords.mAdcOW Office Add-Ins: A collection of handy Office 2010 add-ins.Manga to Epub: Manga to Epub allow you to convert a bunch of images to a single "epub" file, readable on your reader. It handles most of the image types as well as several archives. You have multiple customization options, such as trimming the images in order to remove white borders.Mapua Career Ramp Up: A joint endeavor with the Philippine IT industry leaders and with Mapua School of Information Technology to build an online collaborative database system to Ramp-Up graduating students on their career as future IT Professionals. minami: Minami is a Project what focuse the work on Stability and Features. Is Development in C++minami-dev: Comes later the Description.Mobile RPG: Mobile RPG is five ATtiny85 microcontrollers playing their own RPG characters with a primary MCU acting as GM. Its a fun exercise in autonomous role playing.NetSnoop: Netsnoop allows everyone to get a quick overview over alle the current connections on their workstation.nGso: GSO algorithm implementation based on http://www.springerlink.com/content/y065470472612847/fulltext.pdf Glowworm swarm optimization for simultaneous capture of multiple local optima of multimodal functions K.N. Krishnanand · D. GhoseOpenID Starter Kit for ASP.NET MVC: OpenID Starter Kit for ASP.NET MVC is used to jump start building your web application with ASP.NET MVC with OpenID login system. It is also a good education resource if you want to learn how to implement OpenID into a ASP.NET MVC.Orchard Contact Us Module: Add a contact us page to your Orchard site using this module.Persian Scheduler and Calendar Control: This is a Jalali (Persian or shamsi) calendar and scheduler control in silverlight. Choosing the name 'Jalali' is in honor of 'Hakim omar khayyam' the founder of Jalali calendar. This is under the lisence of 'Barid New Systems' company.Popfly Metadata Generator: Creates Metadata for New project.PurpleStoat: A modular, extensible Silverlight application shell using Prism, Unity and the Enterprise Library, and written in C#. It includes a WCF service which provides AuthZ and logging services to the shell, which are also available to the modules.QL Config Compare Tool: The QL Config Compare Tool enables you to compare two QuakeLive configs. It creates a detailed overview of the differences and is able to save statistics.SQL PHI Identifier: SQL PHI Identifier is an auditing tool for DBA's in a healthcare environment to be able to help identify which databases/tables might hold protected health information (PHI). Using this information a DBA can then take the necessary steps to secure that data adequately.Sqlite ORM: Sqlite ORM is at present a simple Class to Table mapper for Sqlite databases. Tables are created on demand, and designed to future proof for Sharding. Code has 100% unit test coverage.Test shop: Test shopVarMerger - ??????? ????????? ??? ???????? ????????????.: VarMerger - ?????????? (Add-In) ??? MS Word 2007, ??????? ????????? ??????????? ???????? ???????? ??????? ?? ??????, ?????????? ????????? ?????? ? ??????. Visual Studio Add-In For creating Vista Gadget: The absence of tools in Visual Studio that can help developers to create Vista gadgets is strange and disappointing, in my opinion., I want to show you some tools that can help you to develop Vista gadgets using only Visual Studio 2008 or 2010 IDE.Vocal Remover - VST Plugin: VST Plugin Removes vocal form songs using M/S system trick with EQ on mid signal. source in C++ IDE: Visual Studio 2010 Express Edition LIB: Steinberg VST SDK 2.4Windows Phone 7 To Go: A project with demos for Windows Phone 7 FeaturesWinware: Winware is not only an Entity Framework, but beyond.XTengine: Xtengine makes it easier for XNA developers to develop in a compositional manner. You'll no longer have to write specific game classes with deep hierarchies or hardcode to load levels. It's developed in C# with XNA 4.0, with WP7 in mind.

    Read the article

  • Parsing Concerns

    - by Jesse
    If you’ve ever written an application that accepts date and/or time inputs from an external source (a person, an uploaded file, posted XML, etc.) then you’ve no doubt had to deal with parsing some text representing a date into a data structure that a computer can understand. Similarly, you’ve probably also had to take values from those same data structure and turn them back into their original formats. Most (all?) suitably modern development platforms expose some kind of parsing and formatting functionality for turning text into dates and vice versa. In .NET, the DateTime data structure exposes ‘Parse’ and ‘ToString’ methods for this purpose. This post will focus mostly on parsing, though most of the examples and suggestions below can also be applied to the ToString method. The DateTime.Parse method is pretty permissive in the values that it will accept (though apparently not as permissive as some other languages) which makes it pretty easy to take some text provided by a user and turn it into a proper DateTime instance. Here are some examples (note that the resulting DateTime values are shown using the RFC1123 format): DateTime.Parse("3/12/2010"); //Fri, 12 Mar 2010 00:00:00 GMT DateTime.Parse("2:00 AM"); //Sat, 01 Jan 2011 02:00:00 GMT (took today's date as date portion) DateTime.Parse("5-15/2010"); //Sat, 15 May 2010 00:00:00 GMT DateTime.Parse("7/8"); //Fri, 08 Jul 2011 00:00:00 GMT DateTime.Parse("Thursday, July 1, 2010"); //Thu, 01 Jul 2010 00:00:00 GMT Dealing With Inaccuracy While the DateTime struct has the ability to store a date and time value accurate down to the millisecond, most date strings provided by a user are not going to specify values with that much precision. In each of the above examples, the Parse method was provided a partial value from which to construct a proper DateTime. This means it had to go ahead and assume what you meant and fill in the missing parts of the date and time for you. This is a good thing, especially when we’re talking about taking input from a user. We can’t expect that every person using our software to provide a year, day, month, hour, minute, second, and millisecond every time they need to express a date. That said, it’s important for developers to understand what assumptions the software might be making and plan accordingly. I think the assumptions that were made in each of the above examples were pretty reasonable, though if we dig into this method a little bit deeper we’ll find that there are a lot more assumptions being made under the covers than you might have previously known. One of the biggest assumptions that the DateTime.Parse method has to make relates to the format of the date represented by the provided string. Let’s consider this example input string: ‘10-02-15’. To some people. that might look like ‘15-Feb-2010’. To others, it might be ‘02-Oct-2015’. Like many things, it depends on where you’re from. This Is America! Most cultures around the world have adopted a “little-endian” or “big-endian” formats. (Source: Date And Time Notation By Country) In this context,  a “little-endian” date format would list the date parts with the least significant first while the “big-endian” date format would list them with the most significant first. For example, a “little-endian” date would be “day-month-year” and “big-endian” would be “year-month-day”. It’s worth nothing here that ISO 8601 defines a “big-endian” format as the international standard. While I personally prefer “big-endian” style date formats, I think both styles make sense in that they follow some logical standard with respect to ordering the date parts by their significance. Here in the United States, however, we buck that trend by using what is, in comparison, a completely nonsensical format of “month/day/year”. Almost no other country in the world uses this format. I’ve been fortunate in my life to have done some international travel, so I’ve been aware of this difference for many years, but never really thought much about it. Until recently, I had been developing software for exclusively US-based audiences and remained blissfully ignorant of the different date formats employed by other countries around the world. The web application I work on is being rolled out to users in different countries, so I was recently tasked with updating it to support different date formats. As it turns out, .NET has a great mechanism for dealing with different date formats right out of the box. Supporting date formats for different cultures is actually pretty easy once you understand this mechanism. Pulling the Curtain Back On the Parse Method Have you ever taken a look at the different flavors (read: overloads) that the DateTime.Parse method comes in? In it’s simplest form, it takes a single string parameter and returns the corresponding DateTime value (if it can divine what the date value should be). You can optionally provide two additional parameters to this method: an ‘System.IFormatProvider’ and a ‘System.Globalization.DateTimeStyles’. Both of these optional parameters have some bearing on the assumptions that get made while parsing a date, but for the purposes of this article I’m going to focus on the ‘System.IFormatProvider’ parameter. The IFormatProvider exposes a single method called ‘GetFormat’ that returns an object to be used for determining the proper format for displaying and parsing things like numbers and dates. This interface plays a big role in the globalization capabilities that are built into the .NET Framework. The cornerstone of these globalization capabilities can be found in the ‘System.Globalization.CultureInfo’ class. To put it simply, the CultureInfo class is used to encapsulate information related to things like language, writing system, and date formats for a certain culture. Support for many cultures are “baked in” to the .NET Framework and there is capacity for defining custom cultures if needed (thought I’ve never delved into that). While the details of the CultureInfo class are beyond the scope of this post, so for now let me just point out that the CultureInfo class implements the IFormatInfo interface. This means that a CultureInfo instance created for a given culture can be provided to the DateTime.Parse method in order to tell it what date formats it should expect. So what happens when you don’t provide this value? Let’s crack this method open in Reflector: When no IFormatInfo parameter is provided (i.e. we use the simple DateTime.Parse(string) overload), the ‘DateTimeFormatInfo.CurrentInfo’ is used instead. Drilling down a bit further we can see the implementation of the DateTimeFormatInfo.CurrentInfo property: From this property we can determine that, in the absence of an IFormatProvider being specified, the DateTime.Parse method will assume that the provided date should be treated as if it were in the format defined by the CultureInfo object that is attached to the current thread. The culture specified by the CultureInfo instance on the current thread can vary depending on several factors, but if you’re writing an application where a single instance might be used by people from different cultures (i.e. a web application with an international user base), it’s important to know what this value is. Having a solid strategy for setting the current thread’s culture for each incoming request in an internationally used ASP .NET application is obviously important, and might make a good topic for a future post. For now, let’s think about what the implications of not having the correct culture set on the current thread. Let’s say you’re running an ASP .NET application on a server in the United States. The server was setup by English speakers in the United States, so it’s configured for US English. It exposes a web page where users can enter order data, one piece of which is an anticipated order delivery date. Most users are in the US, and therefore enter dates in a ‘month/day/year’ format. The application is using the DateTime.Parse(string) method to turn the values provided by the user into actual DateTime instances that can be stored in the database. This all works fine, because your users and your server both think of dates in the same way. Now you need to support some users in South America, where a ‘day/month/year’ format is used. The best case scenario at this point is a user will enter March 13, 2011 as ‘25/03/2011’. This would cause the call to DateTime.Parse to blow up since that value doesn’t look like a valid date in the US English culture (Note: In all likelihood you might be using the DateTime.TryParse(string) method here instead, but that method behaves the same way with regard to date formats). “But wait a minute”, you might be saying to yourself, “I thought you said that this was the best case scenario?” This scenario would prevent users from entering orders in the system, which is bad, but it could be worse! What if the order needs to be delivered a day earlier than that, on March 12, 2011? Now the user enters ‘12/03/2011’. Now the call to DateTime.Parse sees what it thinks is a valid date, but there’s just one problem: it’s not the right date. Now this order won’t get delivered until December 3, 2011. In my opinion, that kind of data corruption is a much bigger problem than having the Parse call fail. What To Do? My order entry example is a bit contrived, but I think it serves to illustrate the potential issues with accepting date input from users. There are some approaches you can take to make this easier on you and your users: Eliminate ambiguity by using a graphical date input control. I’m personally a fan of a jQuery UI Datepicker widget. It’s pretty easy to setup, can be themed to match the look and feel of your site, and has support for multiple languages and cultures. Be sure you have a way to track the culture preference of each user in your system. For a web application this could be done using something like a cookie or session state variable. Ensure that the current user’s culture is being applied correctly to DateTime formatting and parsing code. This can be accomplished by ensuring that each request has the handling thread’s CultureInfo set properly, or by using the Format and Parse method overloads that accept an IFormatProvider instance where the provided value is a CultureInfo object constructed using the current user’s culture preference. When in doubt, favor formats that are internationally recognizable. Using the string ‘2010-03-05’ is likely to be recognized as March, 5 2011 by users from most (if not all) cultures. Favor standard date format strings over custom ones. So far we’ve only talked about turning a string into a DateTime, but most of the same “gotchas” apply when doing the opposite. Consider this code: someDateValue.ToString("MM/dd/yyyy"); This will output the same string regardless of what the current thread’s culture is set to (with the exception of some cultures that don’t use the Gregorian calendar system, but that’s another issue all together). For displaying dates to users, it would be better to do this: someDateValue.ToString("d"); This standard format string of “d” will use the “short date format” as defined by the culture attached to the current thread (or provided in the IFormatProvider instance in the proper method overload). This means that it will honor the proper month/day/year, year/month/day, or day/month/year format for the culture. Knowing Your Audience The examples and suggestions shown above can go a long way toward getting an application in shape for dealing with date inputs from users in multiple cultures. There are some instances, however, where taking approaches like these would not be appropriate. In some cases, the provider or consumer of date values that pass through your application are not people, but other applications (or other portions of your own application). For example, if your site has a page that accepts a date as a query string parameter, you’ll probably want to format that date using invariant date format. Otherwise, the same URL could end up evaluating to a different page depending on the user that is viewing it. In addition, if your application exports data for consumption by other systems, it’s best to have an agreed upon format that all systems can use and that will not vary depending upon whether or not the users of the systems on either side prefer a month/day/year or day/month/year format. I’ll look more at some approaches for dealing with these situations in a future post. If you take away one thing from this post, make it an understanding of the importance of knowing where the dates that pass through your system come from and are going to. You will likely want to vary your parsing and formatting approach depending on your audience.

    Read the article

  • std::basic_stringstream<unsigned char> won't compile with MSVC 10

    - by Michael J
    I'm trying to get UTF-8 chars to co-exist with ANSI 8-bit chars. My strategy has been to represent utf-8 chars as unsigned char so that appropriate overloads of functions can be used for the two character types. e.g. namespace MyStuff { typedef uchar utf8_t; typedef std::basic_string<utf8_t> U8string; } void SomeFunc(std::string &s); void SomeFunc(std::wstring &s); void SomeFunc(MyStuff::U8string &s); This all works pretty well until I try to use a stringstream. std::basic_ostringstream<MyStuff::utf8_t> ostr; ostr << 1; MSVC Visual C++ Express V10 won't compile this: c:\program files\microsoft visual studio 10.0\vc\include\xlocmon(213): warning C4273: 'id' : inconsistent dll linkage c:\program files\microsoft visual studio 10.0\vc\include\xlocnum(65) : see previous definition of 'public: static std::locale::id std::numpunct<unsigned char>::id' c:\program files\microsoft visual studio 10.0\vc\include\xlocnum(65) : while compiling class template static data member 'std::locale::id std::numpunct<_Elem>::id' with [ _Elem=Tk::utf8_t ] c:\program files\microsoft visual studio 10.0\vc\include\xlocnum(1149) : see reference to function template instantiation 'const _Facet &std::use_facet<std::numpunct<_Elem>>(const std::locale &)' being compiled with [ _Facet=std::numpunct<Tk::utf8_t>, _Elem=Tk::utf8_t ] c:\program files\microsoft visual studio 10.0\vc\include\xlocnum(1143) : while compiling class template member function 'std::ostreambuf_iterator<_Elem,_Traits> std::num_put<_Elem,_OutIt>:: do_put(_OutIt,std::ios_base &,_Elem,std::_Bool) const' with [ _Elem=Tk::utf8_t, _Traits=std::char_traits<Tk::utf8_t>, _OutIt=std::ostreambuf_iterator<Tk::utf8_t,std::char_traits<Tk::utf8_t>> ] c:\program files\microsoft visual studio 10.0\vc\include\ostream(295) : see reference to class template instantiation 'std::num_put<_Elem,_OutIt>' being compiled with [ _Elem=Tk::utf8_t, _OutIt=std::ostreambuf_iterator<Tk::utf8_t,std::char_traits<Tk::utf8_t>> ] c:\program files\microsoft visual studio 10.0\vc\include\ostream(281) : while compiling class template member function 'std::basic_ostream<_Elem,_Traits> & std::basic_ostream<_Elem,_Traits>::operator <<(int)' with [ _Elem=Tk::utf8_t, _Traits=std::char_traits<Tk::utf8_t> ] c:\program files\microsoft visual studio 10.0\vc\include\sstream(526) : see reference to class template instantiation 'std::basic_ostream<_Elem,_Traits>' being compiled with [ _Elem=Tk::utf8_t, _Traits=std::char_traits<Tk::utf8_t> ] c:\users\michael\dvl\tmp\console\console.cpp(23) : see reference to class template instantiation 'std::basic_ostringstream<_Elem,_Traits,_Alloc>' being compiled with [ _Elem=Tk::utf8_t, _Traits=std::char_traits<Tk::utf8_t>, _Alloc=std::allocator<uchar> ] . c:\program files\microsoft visual studio 10.0\vc\include\xlocmon(213): error C2491: 'std::numpunct<_Elem>::id' : definition of dllimport static data member not allowed with [ _Elem=Tk::utf8_t ] Any ideas? ** Edited 19 June 2012 ** OK, I've gotten closer to understanding this, but not how to solve it. As we all know, static class variables get defined twice: once in the class definition and once outside the class definition which establishes storage space. e.g. // in .h file class CFoo { // ... static int x; }; // in .cpp file int CFoo::x = 42; Now in the VC10 headers we get something like this: template<class _Elem> class numpunct : public locale::facet { // ... _CRTIMP2_PURE static locale::id id; // ... } When the header is included in an application, _CRTIMP2_PURE is defined as __declspec(dllimport), which means that the variable is imported from a dll. Now the header also contains the following template<class _Elem> locale::id numpunct<_Elem>::id; Note the absence of the __declspec(dllimport) qualifier. i.e. The class declaration says that the static linkage of the id variable is in the dll, but for the general case, it gets declared outside the dll. For the known cases, there are specialisations. template locale::id numpunct<char>::id; template locale::id numpunct<wchar_t>::id; These are protected by #ifs so that they are only included when building the DLL. They are excluded otherwise. i.e. the char and wchar_t versions of numpunct ARE inside the dll So we have the class definition saying that id's storage is in the DLL, but that is only true for the char and wchar_t specialisations, meaning that my unsigned char version is doomed. :-( The only way forward that I can think of is to create my own specialisation: basically copying it from the header file and fixing it. This raises many issues. Anybody have a better idea?

    Read the article

  • Toorcon 15 (2013)

    - by danx
    The Toorcon gang (senior staff): h1kari (founder), nfiltr8, and Geo Introduction to Toorcon 15 (2013) A Tale of One Software Bypass of MS Windows 8 Secure Boot Breaching SSL, One Byte at a Time Running at 99%: Surviving an Application DoS Security Response in the Age of Mass Customized Attacks x86 Rewriting: Defeating RoP and other Shinanighans Clowntown Express: interesting bugs and running a bug bounty program Active Fingerprinting of Encrypted VPNs Making Attacks Go Backwards Mask Your Checksums—The Gorry Details Adventures with weird machines thirty years after "Reflections on Trusting Trust" Introduction to Toorcon 15 (2013) Toorcon 15 is the 15th annual security conference held in San Diego. I've attended about a third of them and blogged about previous conferences I attended here starting in 2003. As always, I've only summarized the talks I attended and interested me enough to write about them. Be aware that I may have misrepresented the speaker's remarks and that they are not my remarks or opinion, or those of my employer, so don't quote me or them. Those seeking further details may contact the speakers directly or use The Google. For some talks, I have a URL for further information. A Tale of One Software Bypass of MS Windows 8 Secure Boot Andrew Furtak and Oleksandr Bazhaniuk Yuri Bulygin, Oleksandr ("Alex") Bazhaniuk, and (not present) Andrew Furtak Yuri and Alex talked about UEFI and Bootkits and bypassing MS Windows 8 Secure Boot, with vendor recommendations. They previously gave this talk at the BlackHat 2013 conference. MS Windows 8 Secure Boot Overview UEFI (Unified Extensible Firmware Interface) is interface between hardware and OS. UEFI is processor and architecture independent. Malware can replace bootloader (bootx64.efi, bootmgfw.efi). Once replaced can modify kernel. Trivial to replace bootloader. Today many legacy bootkits—UEFI replaces them most of them. MS Windows 8 Secure Boot verifies everything you load, either through signatures or hashes. UEFI firmware relies on secure update (with signed update). You would think Secure Boot would rely on ROM (such as used for phones0, but you can't do that for PCs—PCs use writable memory with signatures DXE core verifies the UEFI boat loader(s) OS Loader (winload.efi, winresume.efi) verifies the OS kernel A chain of trust is established with a root key (Platform Key, PK), which is a cert belonging to the platform vendor. Key Exchange Keys (KEKs) verify an "authorized" database (db), and "forbidden" database (dbx). X.509 certs with SHA-1/SHA-256 hashes. Keys are stored in non-volatile (NV) flash-based NVRAM. Boot Services (BS) allow adding/deleting keys (can't be accessed once OS starts—which uses Run-Time (RT)). Root cert uses RSA-2048 public keys and PKCS#7 format signatures. SecureBoot — enable disable image signature checks SetupMode — update keys, self-signed keys, and secure boot variables CustomMode — allows updating keys Secure Boot policy settings are: always execute, never execute, allow execute on security violation, defer execute on security violation, deny execute on security violation, query user on security violation Attacking MS Windows 8 Secure Boot Secure Boot does NOT protect from physical access. Can disable from console. Each BIOS vendor implements Secure Boot differently. There are several platform and BIOS vendors. It becomes a "zoo" of implementations—which can be taken advantage of. Secure Boot is secure only when all vendors implement it correctly. Allow only UEFI firmware signed updates protect UEFI firmware from direct modification in flash memory protect FW update components program SPI controller securely protect secure boot policy settings in nvram protect runtime api disable compatibility support module which allows unsigned legacy Can corrupt the Platform Key (PK) EFI root certificate variable in SPI flash. If PK is not found, FW enters setup mode wich secure boot turned off. Can also exploit TPM in a similar manner. One is not supposed to be able to directly modify the PK in SPI flash from the OS though. But they found a bug that they can exploit from User Mode (undisclosed) and demoed the exploit. It loaded and ran their own bootkit. The exploit requires a reboot. Multiple vendors are vulnerable. They will disclose this exploit to vendors in the future. Recommendations: allow only signed updates protect UEFI fw in ROM protect EFI variable store in ROM Breaching SSL, One Byte at a Time Yoel Gluck and Angelo Prado Angelo Prado and Yoel Gluck, Salesforce.com CRIME is software that performs a "compression oracle attack." This is possible because the SSL protocol doesn't hide length, and because SSL compresses the header. CRIME requests with every possible character and measures the ciphertext length. Look for the plaintext which compresses the most and looks for the cookie one byte-at-a-time. SSL Compression uses LZ77 to reduce redundancy. Huffman coding replaces common byte sequences with shorter codes. US CERT thinks the SSL compression problem is fixed, but it isn't. They convinced CERT that it wasn't fixed and they issued a CVE. BREACH, breachattrack.com BREACH exploits the SSL response body (Accept-Encoding response, Content-Encoding). It takes advantage of the fact that the response is not compressed. BREACH uses gzip and needs fairly "stable" pages that are static for ~30 seconds. It needs attacker-supplied content (say from a web form or added to a URL parameter). BREACH listens to a session's requests and responses, then inserts extra requests and responses. Eventually, BREACH guesses a session's secret key. Can use compression to guess contents one byte at-a-time. For example, "Supersecret SupersecreX" (a wrong guess) compresses 10 bytes, and "Supersecret Supersecret" (a correct guess) compresses 11 bytes, so it can find each character by guessing every character. To start the guess, BREACH needs at least three known initial characters in the response sequence. Compression length then "leaks" information. Some roadblocks include no winners (all guesses wrong) or too many winners (multiple possibilities that compress the same). The solutions include: lookahead (guess 2 or 3 characters at-a-time instead of 1 character). Expensive rollback to last known conflict check compression ratio can brute-force first 3 "bootstrap" characters, if needed (expensive) block ciphers hide exact plain text length. Solution is to align response in advance to block size Mitigations length: use variable padding secrets: dynamic CSRF tokens per request secret: change over time separate secret to input-less servlets Future work eiter understand DEFLATE/GZIP HTTPS extensions Running at 99%: Surviving an Application DoS Ryan Huber Ryan Huber, Risk I/O Ryan first discussed various ways to do a denial of service (DoS) attack against web services. One usual method is to find a slow web page and do several wgets. Or download large files. Apache is not well suited at handling a large number of connections, but one can put something in front of it Can use Apache alternatives, such as nginx How to identify malicious hosts short, sudden web requests user-agent is obvious (curl, python) same url requested repeatedly no web page referer (not normal) hidden links. hide a link and see if a bot gets it restricted access if not your geo IP (unless the website is global) missing common headers in request regular timing first seen IP at beginning of attack count requests per hosts (usually a very large number) Use of captcha can mitigate attacks, but you'll lose a lot of genuine users. Bouncer, goo.gl/c2vyEc and www.github.com/rawdigits/Bouncer Bouncer is software written by Ryan in netflow. Bouncer has a small, unobtrusive footprint and detects DoS attempts. It closes blacklisted sockets immediately (not nice about it, no proper close connection). Aggregator collects requests and controls your web proxies. Need NTP on the front end web servers for clean data for use by bouncer. Bouncer is also useful for a popularity storm ("Slashdotting") and scraper storms. Future features: gzip collection data, documentation, consumer library, multitask, logging destroyed connections. Takeaways: DoS mitigation is easier with a complete picture Bouncer designed to make it easier to detect and defend DoS—not a complete cure Security Response in the Age of Mass Customized Attacks Peleus Uhley and Karthik Raman Peleus Uhley and Karthik Raman, Adobe ASSET, blogs.adobe.com/asset/ Peleus and Karthik talked about response to mass-customized exploits. Attackers behave much like a business. "Mass customization" refers to concept discussed in the book Future Perfect by Stan Davis of Harvard Business School. Mass customization is differentiating a product for an individual customer, but at a mass production price. For example, the same individual with a debit card receives basically the same customized ATM experience around the world. Or designing your own PC from commodity parts. Exploit kits are another example of mass customization. The kits support multiple browsers and plugins, allows new modules. Exploit kits are cheap and customizable. Organized gangs use exploit kits. A group at Berkeley looked at 77,000 malicious websites (Grier et al., "Manufacturing Compromise: The Emergence of Exploit-as-a-Service", 2012). They found 10,000 distinct binaries among them, but derived from only a dozen or so exploit kits. Characteristics of Mass Malware: potent, resilient, relatively low cost Technical characteristics: multiple OS, multipe payloads, multiple scenarios, multiple languages, obfuscation Response time for 0-day exploits has gone down from ~40 days 5 years ago to about ~10 days now. So the drive with malware is towards mass customized exploits, to avoid detection There's plenty of evicence that exploit development has Project Manager bureaucracy. They infer from the malware edicts to: support all versions of reader support all versions of windows support all versions of flash support all browsers write large complex, difficult to main code (8750 lines of JavaScript for example Exploits have "loose coupling" of multipe versions of software (adobe), OS, and browser. This allows specific attacks against specific versions of multiple pieces of software. Also allows exploits of more obscure software/OS/browsers and obscure versions. Gave examples of exploits that exploited 2, 3, 6, or 14 separate bugs. However, these complete exploits are more likely to be buggy or fragile in themselves and easier to defeat. Future research includes normalizing malware and Javascript. Conclusion: The coming trend is that mass-malware with mass zero-day attacks will result in mass customization of attacks. x86 Rewriting: Defeating RoP and other Shinanighans Richard Wartell Richard Wartell The attack vector we are addressing here is: First some malware causes a buffer overflow. The malware has no program access, but input access and buffer overflow code onto stack Later the stack became non-executable. The workaround malware used was to write a bogus return address to the stack jumping to malware Later came ASLR (Address Space Layout Randomization) to randomize memory layout and make addresses non-deterministic. The workaround malware used was to jump t existing code segments in the program that can be used in bad ways "RoP" is Return-oriented Programming attacks. RoP attacks use your own code and write return address on stack to (existing) expoitable code found in program ("gadgets"). Pinkie Pie was paid $60K last year for a RoP attack. One solution is using anti-RoP compilers that compile source code with NO return instructions. ASLR does not randomize address space, just "gadgets". IPR/ILR ("Instruction Location Randomization") randomizes each instruction with a virtual machine. Richard's goal was to randomize a binary with no source code access. He created "STIR" (Self-Transofrming Instruction Relocation). STIR disassembles binary and operates on "basic blocks" of code. The STIR disassembler is conservative in what to disassemble. Each basic block is moved to a random location in memory. Next, STIR writes new code sections with copies of "basic blocks" of code in randomized locations. The old code is copied and rewritten with jumps to new code. the original code sections in the file is marked non-executible. STIR has better entropy than ASLR in location of code. Makes brute force attacks much harder. STIR runs on MS Windows (PEM) and Linux (ELF). It eliminated 99.96% or more "gadgets" (i.e., moved the address). Overhead usually 5-10% on MS Windows, about 1.5-4% on Linux (but some code actually runs faster!). The unique thing about STIR is it requires no source access and the modified binary fully works! Current work is to rewrite code to enforce security policies. For example, don't create a *.{exe,msi,bat} file. Or don't connect to the network after reading from the disk. Clowntown Express: interesting bugs and running a bug bounty program Collin Greene Collin Greene, Facebook Collin talked about Facebook's bug bounty program. Background at FB: FB has good security frameworks, such as security teams, external audits, and cc'ing on diffs. But there's lots of "deep, dark, forgotten" parts of legacy FB code. Collin gave several examples of bountied bugs. Some bounty submissions were on software purchased from a third-party (but bounty claimers don't know and don't care). We use security questions, as does everyone else, but they are basically insecure (often easily discoverable). Collin didn't expect many bugs from the bounty program, but they ended getting 20+ good bugs in first 24 hours and good submissions continue to come in. Bug bounties bring people in with different perspectives, and are paid only for success. Bug bounty is a better use of a fixed amount of time and money versus just code review or static code analysis. The Bounty program started July 2011 and paid out $1.5 million to date. 14% of the submissions have been high priority problems that needed to be fixed immediately. The best bugs come from a small % of submitters (as with everything else)—the top paid submitters are paid 6 figures a year. Spammers like to backstab competitors. The youngest sumitter was 13. Some submitters have been hired. Bug bounties also allows to see bugs that were missed by tools or reviews, allowing improvement in the process. Bug bounties might not work for traditional software companies where the product has release cycle or is not on Internet. Active Fingerprinting of Encrypted VPNs Anna Shubina Anna Shubina, Dartmouth Institute for Security, Technology, and Society (I missed the start of her talk because another track went overtime. But I have the DVD of the talk, so I'll expand later) IPsec leaves fingerprints. Using netcat, one can easily visually distinguish various crypto chaining modes just from packet timing on a chart (example, DES-CBC versus AES-CBC) One can tell a lot about VPNs just from ping roundtrips (such as what router is used) Delayed packets are not informative about a network, especially if far away from the network More needed to explore about how TCP works in real life with respect to timing Making Attacks Go Backwards Fuzzynop FuzzyNop, Mandiant This talk is not about threat attribution (finding who), product solutions, politics, or sales pitches. But who are making these malware threats? It's not a single person or group—they have diverse skill levels. There's a lot of fat-fingered fumblers out there. Always look for low-hanging fruit first: "hiding" malware in the temp, recycle, or root directories creation of unnamed scheduled tasks obvious names of files and syscalls ("ClearEventLog") uncleared event logs. Clearing event log in itself, and time of clearing, is a red flag and good first clue to look for on a suspect system Reverse engineering is hard. Disassembler use takes practice and skill. A popular tool is IDA Pro, but it takes multiple interactive iterations to get a clean disassembly. Key loggers are used a lot in targeted attacks. They are typically custom code or built in a backdoor. A big tip-off is that non-printable characters need to be printed out (such as "[Ctrl]" "[RightShift]") or time stamp printf strings. Look for these in files. Presence is not proof they are used. Absence is not proof they are not used. Java exploits. Can parse jar file with idxparser.py and decomile Java file. Java typially used to target tech companies. Backdoors are the main persistence mechanism (provided externally) for malware. Also malware typically needs command and control. Application of Artificial Intelligence in Ad-Hoc Static Code Analysis John Ashaman John Ashaman, Security Innovation Initially John tried to analyze open source files with open source static analysis tools, but these showed thousands of false positives. Also tried using grep, but tis fails to find anything even mildly complex. So next John decided to write his own tool. His approach was to first generate a call graph then analyze the graph. However, the problem is that making a call graph is really hard. For example, one problem is "evil" coding techniques, such as passing function pointer. First the tool generated an Abstract Syntax Tree (AST) with the nodes created from method declarations and edges created from method use. Then the tool generated a control flow graph with the goal to find a path through the AST (a maze) from source to sink. The algorithm is to look at adjacent nodes to see if any are "scary" (a vulnerability), using heuristics for search order. The tool, called "Scat" (Static Code Analysis Tool), currently looks for C# vulnerabilities and some simple PHP. Later, he plans to add more PHP, then JSP and Java. For more information see his posts in Security Innovation blog and NRefactory on GitHub. Mask Your Checksums—The Gorry Details Eric (XlogicX) Davisson Eric (XlogicX) Davisson Sometimes in emailing or posting TCP/IP packets to analyze problems, you may want to mask the IP address. But to do this correctly, you need to mask the checksum too, or you'll leak information about the IP. Problem reports found in stackoverflow.com, sans.org, and pastebin.org are usually not masked, but a few companies do care. If only the IP is masked, the IP may be guessed from checksum (that is, it leaks data). Other parts of packet may leak more data about the IP. TCP and IP checksums both refer to the same data, so can get more bits of information out of using both checksums than just using one checksum. Also, one can usually determine the OS from the TTL field and ports in a packet header. If we get hundreds of possible results (16x each masked nibble that is unknown), one can do other things to narrow the results, such as look at packet contents for domain or geo information. With hundreds of results, can import as CSV format into a spreadsheet. Can corelate with geo data and see where each possibility is located. Eric then demoed a real email report with a masked IP packet attached. Was able to find the exact IP address, given the geo and university of the sender. Point is if you're going to mask a packet, do it right. Eric wouldn't usually bother, but do it correctly if at all, to not create a false impression of security. Adventures with weird machines thirty years after "Reflections on Trusting Trust" Sergey Bratus Sergey Bratus, Dartmouth College (and Julian Bangert and Rebecca Shapiro, not present) "Reflections on Trusting Trust" refers to Ken Thompson's classic 1984 paper. "You can't trust code that you did not totally create yourself." There's invisible links in the chain-of-trust, such as "well-installed microcode bugs" or in the compiler, and other planted bugs. Thompson showed how a compiler can introduce and propagate bugs in unmodified source. But suppose if there's no bugs and you trust the author, can you trust the code? Hell No! There's too many factors—it's Babylonian in nature. Why not? Well, Input is not well-defined/recognized (code's assumptions about "checked" input will be violated (bug/vunerabiliy). For example, HTML is recursive, but Regex checking is not recursive. Input well-formed but so complex there's no telling what it does For example, ELF file parsing is complex and has multiple ways of parsing. Input is seen differently by different pieces of program or toolchain Any Input is a program input executes on input handlers (drives state changes & transitions) only a well-defined execution model can be trusted (regex/DFA, PDA, CFG) Input handler either is a "recognizer" for the inputs as a well-defined language (see langsec.org) or it's a "virtual machine" for inputs to drive into pwn-age ELF ABI (UNIX/Linux executible file format) case study. Problems can arise from these steps (without planting bugs): compiler linker loader ld.so/rtld relocator DWARF (debugger info) exceptions The problem is you can't really automatically analyze code (it's the "halting problem" and undecidable). Only solution is to freeze code and sign it. But you can't freeze everything! Can't freeze ASLR or loading—must have tables and metadata. Any sufficiently complex input data is the same as VM byte code Example, ELF relocation entries + dynamic symbols == a Turing Complete Machine (TM). @bxsays created a Turing machine in Linux from relocation data (not code) in an ELF file. For more information, see Rebecca "bx" Shapiro's presentation from last year's Toorcon, "Programming Weird Machines with ELF Metadata" @bxsays did same thing with Mach-O bytecode Or a DWARF exception handling data .eh_frame + glibc == Turning Machine X86 MMU (IDT, GDT, TSS): used address translation to create a Turning Machine. Page handler reads and writes (on page fault) memory. Uses a page table, which can be used as Turning Machine byte code. Example on Github using this TM that will fly a glider across the screen Next Sergey talked about "Parser Differentials". That having one input format, but two parsers, will create confusion and opportunity for exploitation. For example, CSRs are parsed during creation by cert requestor and again by another parser at the CA. Another example is ELF—several parsers in OS tool chain, which are all different. Can have two different Program Headers (PHDRs) because ld.so parses multiple PHDRs. The second PHDR can completely transform the executable. This is described in paper in the first issue of International Journal of PoC. Conclusions trusting computers not only about bugs! Bugs are part of a problem, but no by far all of it complex data formats means bugs no "chain of trust" in Babylon! (that is, with parser differentials) we need to squeeze complexity out of data until data stops being "code equivalent" Further information See and langsec.org. USENIX WOOT 2013 (Workshop on Offensive Technologies) for "weird machines" papers and videos.

    Read the article

  • How John Got 15x Improvement Without Really Trying

    - by rchrd
    The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here.  How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

    Read the article

< Previous Page | 5 6 7 8 9