Search Results

Search found 36521 results on 1461 pages for 'aq advanced queue oracle support streams propagation schedule dblink troubleshoo'.

Page 632/1461 | < Previous Page | 628 629 630 631 632 633 634 635 636 637 638 639  | Next Page >

  • ThreadQueue problems in "Accelerated C# 2008"

    - by Singlet
    Example for threading queue book "Accelerated C# 2008" (CrudeThreadPool class) not work correctly. If I insert long job in WorkFunction() on 2-processor machine executing for next task don't run before first is over. How to solve this problem? I want to load the processor to 100 percent public class CrudeThreadPool { static readonly int MAX_WORK_THREADS = 4; static readonly int WAIT_TIMEOUT = 2000; public delegate void WorkDelegate(); public CrudeThreadPool() { stop = 0; workLock = new Object(); workQueue = new Queue(); threads = new Thread[ MAX_WORK_THREADS ]; for( int i = 0; i < MAX_WORK_THREADS; ++i ) { threads[i] = new Thread( new ThreadStart(this.ThreadFunc) ); threads[i].Start(); } } private void ThreadFunc() { lock( workLock ) { int shouldStop = 0; do { shouldStop = Interlocked.Exchange( ref stop, stop ); if( shouldStop == 0 ) { WorkDelegate workItem = null; if( Monitor.Wait(workLock, WAIT_TIMEOUT) ) { // Process the item on the front of the queue lock( workQueue ) { workItem =(WorkDelegate) workQueue.Dequeue(); } workItem(); } } } while( shouldStop == 0 ); } } public void SubmitWorkItem( WorkDelegate item ) { lock( workLock ) { lock( workQueue ) { workQueue.Enqueue( item ); } Monitor.Pulse( workLock ); } } public void Shutdown() { Interlocked.Exchange( ref stop, 1 ); } private Queue workQueue; private Object workLock; private Thread[] threads; private int stop; } public class EntryPoint { static void WorkFunction() { Console.WriteLine( "WorkFunction() called on Thread {0}",Thread.CurrentThread.GetHashCode() ); //some long job double s = 0; for (int i = 0; i < 100000000; i++) s += Math.Sin(i); } static void Main() { CrudeThreadPool pool = new CrudeThreadPool(); for( int i = 0; i < 10; ++i ) { pool.SubmitWorkItem( new CrudeThreadPool.WorkDelegate( EntryPoint.WorkFunction) ); } pool.Shutdown(); } }

    Read the article

  • Question regarding xsd

    - by Hima
    I have an application which reads the data from the database, creates an object out of the data, marshalls it into an xml and enqueue the xml to a queue which is producer. The xml is dequeued from the queue by a consumer. I need to use xsds at two different places. For database access while reading the data from the database and For interaction between producer and consumer Can the same xsd be used in both the cases? Or do I need to use different xsds?

    Read the article

  • C# start a static thread

    - by user595605
    I have a Queue of items I want to process in a thread, and any instance of a class can add items to the Queue to be processed. My idea for doing this is to have a static Thread in the class that processes the items, the only problem is that I don't know where to start this thread, since I can't start it in its initialization. Is there a way I can start a static thread? Or should I be changing the architecture completely?

    Read the article

  • it is very important for me this problem [closed]

    - by davit-datuashvili
    please help this is very important problem for me i am going to get job and need such kind of practise implement heaps priortiy queue and so on what is wrong in my java code please tell i want insert number with heap property and return minimum element what is wrong explain please look http://stackoverflow.com/questions/2902781/priority-queue-implementation/2903288#2903288

    Read the article

  • What is "Disable class based route addition" good for?

    - by id.roppert.dejroppert
    In the advanced TCP/IP settings of a VPN connection, i found a checkbox labeled with "Disable class based route addition". The checkbox is only enabled as long as "Use default gateway on remote network" is switched off. What is "Disable class based route addition" good for? Detailed instructions to find the settings: Open Properties of VPN connection Go to Networking tab Open Properties of "Internet Protocol Version 4 (TCP/IPv4)" (and/or TCP/IPv6) Click "Advanced..." Button Change to "IP Settings" tab Here you can find the checkboxes mentioned above

    Read the article

  • Fields and Properties in Microsoft Word 2007

    - by O_O
    I have added some advanced properties into my Microsoft Word 2007 document. These were created by doing the following: Click the Office button - Prepare - Properties. Under the Document Properties drop-down menu, select Advanced Properties. In the Custom tab, add properties as needed. My question is how do you insert these custom properties into the Word document so that they are in text form and gets updated when you update the properties in that one spot? Thank you!

    Read the article

  • Active Directory: Viewing "Attribute Editor" after finding an account via ADUC's "Find" option

    - by Beaming Mel-Bin
    When I activate the Advanced features (View - Advanced Features) and open a user's properties by navigating to their OU and right clicking the user object, I see the Attribute Editor tab. However, if I search for a user (right click the domain - Find - search for the user), and double click on the user, I do not see the tab. I cannot normally navigate to users because some OUs have too many users. Can someone suggest an alternative that allows me to view the Attribute Editor tab?

    Read the article

  • vmware esx licensing limit on vCPUs per VM

    - by maruti
    when a server has more than 8 cores per CPU (total 16 logical procs) and ESX standard license is applied, what does it mean for VM performance? Since each VM on host is allowed only 4 vCPUs max VMWare ESX/ESXi limits the no of vCPUs per guest VM depending on the license: standard Lic = 4 vCPU Advanced Lic = 4 since i dont know exact number is there need to upgrade to Advanced version for any perf benefits if none of VMs have workloads that need more than 4 vCPUs?

    Read the article

  • Basic question on Internet Connection Sharing

    - by Apps
    I've basic question on Internet Connection Sharing in Windows XP. I've a Internet Connection through LAN and I've created an Ad-hoc network that is supposed to share the internet connection. My question is on which connection should I enable "Internet Connection Sharing"? Is it on LAN Connection'Settings > Advanced of Ad-hoc network Connections' Settings > Advanced ? Can you please help me?

    Read the article

  • Poor performance of single processor 32bit Windows XP xompared SMP in VBA+Excel

    - by Adam Ryczkowski
    Welcome! On many computers I experienced poor performance of 32 bit guests running on 64 bit Linux host (I used only the Debian family). At last I managed to collect benchmark data. I made the benchmark by running custom VBA macro, (which we use in our company) that generates 284 pages long Word document full of Excel Pie charts, tables and comments. The macro is run as a single task (excluding the standard services) on a set of identically configured Windows XP 32-bit systems. I measured the time (in sec.) needed to perform the test. The computer (i.e. my notebook Asus P53E) supports both VT-d extensions and native Windows XP. It has 2-core processor, each core is hyperthreaded, so in total we have 4 mostly independent execution units. I use the latest VirtualBox 4.2 and VMWare Workstation 9.0 for Linux, installed together on the same host (running Mint 13 Maya) but never run simultaneously. The results (in column Time) are no less accurate than ± 10% Here are the results (sorry for the format, but I couldn't find out a better solution for tables in SO): +---------------+-------------+------------------------------------------------------+---------+------------+----------------+------+ | Host software | # processor | Windows kernel | IO APIC | VT-x/AMD-V | 2D Video Accel | Time | +---------------+-------------+------------------------------------------------------+---------+------------+----------------+------+ | VirtualBox | 1 | Advanced Configuration and Power Interface (ACPI) PC | 0 | 1 | 0 | 1139 | | VirtualBox | 1 | Advanced Configuration and Power Interface (ACPI) PC | 0 | 1 | 1 | 1050 | | VirtualBox | 1 | Advanced Configuration and Power Interface (ACPI) PC | 0 | 0 | 1 | 1644 | | VirtualBox | 4 | ACPI Multiprocessor PC | 1 | 1 | 1 | 6809 | | VMWare | 1 | ACPI Uniprocessor PC | | 1 | 1 | 1175 | | VMWare | 4 | ACPI Multiprocessor PC | | 1 | 1 | 3412 | | Native | 4 | ACPI Multiprocessor PC | | | | 1693 | | Native | 1 | Advanced Configuration and Power Interface (ACPI) PC | | | | 1170 | +---------------+-------------+------------------------------------------------------+---------+------------+----------------+------+ Here are the striking conclusions: Although I've read in the VirtualBox fora about abysmal performance with 32-bit guest on 64-bit host, VMWare also has problems compared to native run, still being twice faster(!) than VBox. Although VBA is inherently single-threaded, the Excel calculations, which take much more than a half of total computation time, supposedly aren't. So one would expect some speed gain when running on 2+ cores ("+" for hyperthreading). What we see is a speed loss. And quite big one too. For the VirtualBox the VT-d extension isn't a big deal. Can anyone shed some light on why the singlethreaded Windows kernel is so much faster than the SMP one?

    Read the article

  • How does Bittorrent work?

    - by mumtaz
    I want to learn more about the bittorrent way of file sharing. I am a technically advanced user (programmer), so technically advanced material is no problem, but it should be concise and to the point. I need a good resource book/web which explains the overall bittorrent architecture. I am not interested in details, just the overall architecture and the terminology like seeds, peers, etc. Any suggestions?

    Read the article

  • SQL server in VMware

    - by UndertheFold
    Please provide your tips and best practices for virtualizing SQL Server in VMWare ESX I am interested in advanced configurations and settings. Please provide reasoning behind your recommendations Edit: Just to clarify, I already have over 70 Virtual SQL servers in separate clusters using an ISCSI equallogic San - What I am really looking for are those advanced configurations like: How you configured your disks / RDM's Do you make use of settings like Mem.ShareScanGHz - http://communities.vmware.com/thread/143828 - that are not well documented

    Read the article

  • What is "Disable class based route addition" good for?

    - by JRoppert
    In the advanced TCP/IP settings of a VPN connection, i found a checkbox labeled with "Disable class based route addition". The checkbox is only enabled as long as "Use default gateway on remote network" is switched off. What is "Disable class based route addition" good for? Detailed instructions to find the settings: Open Properties of VPN connection Go to Networking tab Open Properties of "Internet Protocol Version 4 (TCP/IPv4)" (and/or TCP/IPv6) Click "Advanced..." Button Change to "IP Settings" tab Here you can find the checkboxes mentioned above

    Read the article

  • UPK 3.6.1 (is Coming)

    - by marc.santosusso
    In anticipation of the release of UPK 3.6.1, I'd like to briefly describe some of the features that will be available in this new version. Topic Editor in Tabs Topic Editors now open in tabs instead of separate Developer windows. This offers several improvements: First, the bubble editor can be docked and resized in the same way as other editor panes. That's right, you can resize the bubble editor! The second enhancement that this changes brings is an improved undo and redo which allows each action to be undone and redone in the Topic Editor. New Sound Editor The topic and web page editors include a new sound editor with all the bells and whistles necessary to record, edit, import, and export, sound. Sound can be captured during topic recording--which is great for a Subject Matter Expert (SME) to narrate what they're recording--or after the topic has been recorded. Sound can also be added to web pages and played on the concept panes of modules, sections and topics. Turn off bubbles in Topics Authors may opt to hide bubbles either per frame or for an entire topic. When you want to draw a user's attention to the content on the screen instead of the bubble. This feature works extremely well in conjunction with the new sound capabilities. For instance, consider recording conceptual information with narration and no bubbles. Presentation Output UPK content can be published as a Presentation in Microsoft PowerPoint format. Publishing for Presentation will create a presentation for each topic published. The presentation template can be customized Using the same methods offered for the UPK document outputs, allowing your UPK-generated presentations to match your corporate branding. Autosave and Recovery The Developer will automatically save your work as often as you would like. This affords authors the ability to recover these automatically saved documents if their system or UPK were to close unexpectedly. The Developer defaults to save open documents every ten minutes. Package Editor Enhancement Files in packages will now open in the associated application when double-clicked. Authors can also choose to "Open with..." from the context menu (AKA right click menu.) See It! Window See It! mode may now be launched in a non-fullscreen window. This is available from the kp.html file in any Player package. This version of See It! mode offers on-screen navigation controls including previous frame, next frame, pause etc. Firefox Enhancments The UPK Player will now offer both Do It! mode and sound playback when viewed using Firefox web browser. Player Support for Safari The UPK Player is now fully supported on the Safari web browser for both Mac OS and Windows platforms. Keep document checked out Authors may choose to keep a document checked out when performing a check in. This allows an author to have a new version created on the server and continue editing. Close button on individual tabs A close button has been added to the tabs making it easier to close a specific tab. Outline Editor Enhancements Authors will have the option to prevent concepts from immediately displaying in the Developer when an outline item is selected. This makes it faster to move around in the outline editor. Tell us which feature you're most excited to use in the comments.

    Read the article

  • Pie Charts Just Don't Work When Comparing Data - Number 10 of Top 10 Reasons to Never Ever Use a Pie

    - by Tony Wolfram
    When comparing data, which is what a pie chart is for, people have a hard time judging the angles and areas of the multiple pie slices in order to calculate how much bigger one slice is than the others. Pie Charts Don't Work A slice of pie is good for serving up a portion of desert. It's not good for making a judgement about how big the slice is, what percentage of 100 it is, or how it compares to other slices. People have trouble comparing angles and areas to each other. Controlled studies show that people will overestimate the percentage that a pie slice area represents. This is because we have trouble calculating the area based on the space between the two angles that define the slice. This picture shows how a pie chart is useless in determing the largest value when you have to compare pie slices.   You can't compare angles and slice areas to each other. Human perception and cognition is poor when viewing angles and areas and trying to make a mental comparison. Pie charts overload the working memory, forcing the person to make complicated calculations, and at the same time make a decision based on those comparisons. What's the point of showing a pie chart when you want to compare data, except to say, "well, the slices are almost the same, but I'm not really sure which one is bigger, or by how much, or what order they are from largest to smallest. But the colors sure are pretty. Plus, I like round things. Oh,was I suppose to make some important business decision? Sorry." Bad Choices and Bad Decisions Interaction Designers, Graphic Artists, Report Builders, Software Developers, and Executives have all made the decision to use pie charts in their reports, software applications, and dashboards. It was a bad decision. It was a poor choice. There are always better options and choices, yet the designer still made the decision to use a pie chart. I'll expore why people make such poor choices in my upcoming blog entires. (Hint: It has more to do with emotions than with analytical thinking.) I've outlined my opinions and arguments about the evils of using pie charts in "Countdown of Top 10 Reasons to Never Ever Use a Pie Chart." Each of my next 10 blog entries will support these arguments with illustrations, examples, and references to studies. But my goal is not to continuously and endlessly rage against the evils of using pie charts. This blog is not about pie charts. This blog is about understanding why designers choose to use a pie chart. Why, when give better alternatives, and acknowledging the shortcomings of pie charts, do designers over and over again still freely choose to place a pie chart in a report? As an extra treat and parting shot, check out the nice pie chart that Wikipedia uses to illustrate the United States population by state.   Remember, somebody chose to use this pie chart, with all its glorious colors, and post it on Wikipedia for all the world to see. My next blog will give you a better alternative for displaying comparable data - the sorted bar chart.

    Read the article

  • BYOD is not a fashion statement; it’s an architectural shift - by Indus Khaitan

    - by Greg Jensen
    Ten years ago, if you asked a CIO, “how mobile is your enterprise?”. The answer would be, “100%, we give Blackberry to all our employees.”Few things have changed since then: 1.    Smartphone form-factors have matured, especially after the launch of iPhone. 2.    Rapid growth of productivity applications and services that enable creation and consumption of digital content 3.    Pervasive mobile data connectivityThere are two threads emerging from the change. Users are rapidly mingling their personas of an individual as well as an employee. In the first second, posting a picture of a fancy dinner on Facebook, to creating an expense report for the same meal on the mobile device. Irrespective of the dual persona, a user’s personal and corporate lives intermingle freely on a single hardware and more often than not, it’s an employees personal smartphone being used for everything. A BYOD program enables IT to “control” an employee owned device, while enabling productivity. More often than not the objective of BYOD programs are financial; instead of the organization, an employee pays for it.  More than a fancy device, BYOD initiatives have become sort of fashion statement, of corporate productivity, of letting employees be in-charge and a show of corporate empathy to not force an archaic form-factor in a world of new device launches every month. BYOD is no longer a means of effectively moving expense dollars and support costs. It does not matter who owns the device, it has to be protected.  BYOD brings an architectural shift.  BYOD is an architecture, which assumes that every device is vulnerable, not just what your employees have brought but what organizations have purchased for their employees. It's an architecture, which forces us to rethink how to provide productivity without comprising security.Why assume that every device is vulnerable? Mobile operating systems are rapidly evolving with leading upgrade announcement every other month. It is impossible for IT to catch-up. More than that, user’s are savvier than earlier.  While IT could install locks at the doors to prevent intruders, it may degrade productivity—which incentivizes user’s to bypass restrictions. A rapidly evolving mobile ecosystem have moving parts which are vulnerable. Hence, creating a mobile security platform, which uses the fundamental blocks of BYOD architecture such as identity defragmentation, IT control and data isolation, ensures that the sprawl of corporate data is contained. In the next post, we’ll dig deeper into the BYOD architecture. Normal 0 false false false EN-US JA X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Cambria","serif"; mso-ascii-font-family:Cambria; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Cambria; mso-hansi-theme-font:minor-latin;}

    Read the article

  • Integrating NetBeans for Raspberry Pi Java Development

    - by speakjava
    Raspberry Pi IDE Java Development The Raspberry Pi is an incredible device for building embedded Java applications but, despite being able to run an IDE on the Pi it really pushes things to the limit.  It's much better to use a PC or laptop to develop the code and then deploy and test on the Pi.  What I thought I'd do in this blog entry was to run through the steps necessary to set up NetBeans on a PC for Java code development, with automatic deployment to the Raspberry Pi as part of the build process. I will assume that your starting point is a Raspberry Pi with an SD card that has one of the latest Raspbian images on it.  This is good because this now includes the JDK 7 as part of the distro, so no need to download and install a separate JDK.  I will also assume that you have installed the JDK and NetBeans on your PC.  These can be downloaded here. There are numerous approaches you can take to this including mounting the file system from the Raspberry Pi remotely on your development machine.  I tried this and I found that NetBeans got rather upset if the file system disappeared either through network interruption or the Raspberry Pi being turned off.  The following method uses copying over SSH, which will fail more gracefully if the Pi is not responding. Step 1: Enable SSH on the Raspberry Pi To run the Java applications you create you will need to start Java on the Raspberry Pi with the appropriate class name, classpath and parameters.  For non-JavaFX applications you can either do this from the Raspberry Pi desktop or, if you do not have a monitor connected through a remote command line.  To execute the remote command line you need to enable SSH (a secure shell login over the network) and connect using an application like PuTTY. You can enable SSH when you first boot the Raspberry Pi, as the raspi-config program runs automatically.  You can also run it at any time afterwards by running the command: sudo raspi-config This will bring up a menu of options.  Select '8 Advanced Options' and on the next screen select 'A$ SSH'.  Select 'Enable' and the task is complete. Step 2: Configure Raspberry Pi Networking By default, the Raspbian distribution configures the ethernet connection to use DHCP rather than a static IP address.  You can continue to use DHCP if you want, but to avoid having to potentially change settings whenever you reboot the Pi using a static IP address is simpler. To configure this on the Pi you need to edit the /etc/network/interfaces file.  You will need to do this as root using the sudo command, so something like sudo vi /etc/network/interfaces.  In this file you will see this line: iface eth0 inet dhcp This needs to be changed to the following: iface eth0 inet static     address 10.0.0.2     gateway 10.0.0.254     netmask 255.255.255.0 You will need to change the values in red to an appropriate IP address and to match the address of your gateway. Step 3: Create a Public-Private Key Pair On Your Development Machine How you do this will depend on which Operating system you are using: Mac OSX or Linux Run the command: ssh-keygen -t rsa Press ENTER/RETURN to accept the default destination for saving the key.  We do not need a passphrase so simply press ENTER/RETURN for an empty one and once more to confirm. The key will be created in the file .ssh/id_rsa.pub in your home directory.  Display the contents of this file using the cat command: cat ~/.ssh/id_rsa.pub Open a window, SSH to the Raspberry Pi and login.  Change directory to .ssh and edit the authorized_keys file (don't worry if the file does not exist).  Copy and paste the contents of the id_rsa.pub file to the authorized_keys file and save it. Windows Since Windows is not a UNIX derivative operating system it does not include the necessary key generating software by default.  To generate the key I used puttygen.exe which is available from the same site that provides the PuTTY application, here. Download this and run it on your Windows machine.  Follow the instructions to generate a key.  I remove the key comment, but you can leave that if you want. Click "Save private key", confirm that you don't want to use a passphrase and select a filename and location for the key. Copy the public key from the part of the window marked, "Public key for pasting into OpenSSH authorized_keys file".  Use PuTTY to connect to the Raspberry Pi and login.  Change directory to .ssh and edit the authorized_keys file (don't worry if this does not exist).  Paste the key information at the end of this file and save it. Logout and then start PuTTY again.  This time we need to create a saved session using the private key.  Type in the IP address of the Raspberry Pi in the "Hostname (or IP address)" field and expand "SSH" under the "Connection" category.  Select "Auth" (see the screen shot below). Click the "Browse" button under "Private key file for authentication" and select the file you saved from puttygen. Go back to the "Session" category and enter a short name in the saved sessions field, as shown below.  Click "Save" to save the session. Step 4: Test The Configuration You should now have the ability to use scp (Mac/Linux) or pscp.exe (Windows) to copy files from your development machine to the Raspberry Pi without needing to authenticate by typing in a password (so we can automate the process in NetBeans).  It's a good idea to test this using something like: scp /tmp/foo [email protected]:/tmp on Linux or Mac or pscp.exe foo pi@raspi:/tmp on Windows (Note that we use the saved configuration name instead of the IP address or hostname so the public key is picked up). pscp.exe is another tool available from the creators of PuTTY. Step 5: Configure the NetBeans Build Script Start NetBeans and create a new project (or open an existing one that you want to deploy automatically to the Raspberry Pi). Select the Files tab in the explorer window and expand your project.  You will see a build.xml file.  Double click this to edit it. This file will mostly be comments.  At the end (but within the </project> tag) add the XML for <target name="-post-jar">, shown below Here's the code again in case you want to use cut-and-paste: <target name="-post-jar">   <echo level="info" message="Copying dist directory to remote Pi"/>   <exec executable="scp" dir="${basedir}">     <arg line="-r"/>     <arg value="dist"/>     <arg value="[email protected]:NetBeans/CopyTest"/>   </exec>  </target> For Windows it will be slightly different: <target name="-post-jar">   <echo level="info" message="Copying dist directory to remote Pi"/>   <exec executable="C:\pi\putty\pscp.exe" dir="${basedir}">     <arg line="-r"/>     <arg value="dist"/>     <arg value="pi@raspi:NetBeans/CopyTest"/>   </exec> </target> You will also need to ensure that pscp.exe is in your PATH (or specify a fully qualified pathname). From now on when you clean and build the project the dist directory will automatically be copied to the Raspberry Pi ready for testing.

    Read the article

  • Instant Rename and Rename Refactoring

    - by Petr
    During the last weeks I have got  a few questions about rename refactoring and some users also complain to me that the refactoring in NetBeans 6.x was much faster. So I would like to explain the situation. For some people, who don't know, Instant Rename action and Rename Refactoring  can look like one action. But it's not true, even if  both actions use the same shortcut (CTRL + R). NetBeans 6.x contained only Instant Rename action (speaking about PHP support), which we can mark as very simple rename refactoring through one file. From NetBeans 7.0 the Instant Rename action works only in "non public" context. It means that this action is used for fast renaming variables that has local context like inside a method, or for renaming private methods and fields that can not be used outside of the scope, where they are declared. From user point of view these two action can be simply recognized. When is after CTRL+R called Instant Rename action, then the identifier is surrounded with rectangle and you can rename it directly in the file. It's fast and simple, also the usages of this identifier are renamed in the same time as you write. The picture below shows Instant Rename action for $message identifier, that is visible only in the print_test method and due this after CTRL+R is called Instant Rename. In NetBeans 7.0, there was added Rename Refactoring that is called for public identifiers. It means for identifiers that could be used in other files. If you press CTRL+R shortcut when the caret is inside $hello identifier from the picture above, NetBeans recognizes that $hello is declared / used in a global context and calls the Rename Refactoring that brings a dialog to change the name of the identifier. From this dialog you have to preview suggested changes, through pressing Preview button and then execute the refactoring through Do Refactoring button. Yes, it's more complicated from user point of view than Instant Rename, but in Rename Refactoring NetBeans can change more files at once. It should be  the developer responsibility to decide whether the suggested changes are right and the refactoring can be executed or in some files original name should be kept. Someone can argue that he doesn't use $hello variable in any other file so Instant Rename could be used in such case. Yes it's true, but in such case NetBeans has to know all usages of all identifiers and keep this informations up to date during editing a file. I'm sure that this is not possible due to the performance problems, mainly for big projects. So the usages are computed after pressing the Preview button. And why is the Refactor button always disabled in the Rename dialog and user has to always go through the preview phase? NetBeans has API and SPI for implementing refactoring actions and this dialog is a part of this infrastructure. If you rename an identifier for example in Java, the Refactor buttons is enabled, but Java is strongly type language and you can be almost in 99% sure that the IDE will suggest the right results. In PHP as a dynamic language, we can not be sure, what NetBeans finds is only a "guess". This is why NetBeans pushes developers to preview the changes for PHP rename. I hope that I have explain it clearly. I'm open to any discussion. What I have described above is situation in NetBeans 7.0, 7.0.1 and probably it will be also in NetBeans 7.1, because there is no plan to change it. Please write your opinion here.

    Read the article

  • How John Got 15x Improvement Without Really Trying

    - by rchrd
    The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here.  How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

    Read the article

  • CSV Anyone?

    - by Tim Dexter
    CSV, a dead format? With today's abilities to generate Excel output (Im working on some posts for the shiny new Excel templates) and the sophistication of the eText outputs, do you really need CSV? I guess so seeing as it was added after the base product was released as an enhancement. But I'd be interested in hearing use cases? I used to think the same of text output. But now Im working in the public sector where older technologies abound I can see a use for text output. Its also used as a machine readable format, etc. BIP still does not support a text format from its RTF templates but its very doable using XSL templates. Back to CSV, I noticed an enhancement in the big list of new stuff in the 10.1.3.4.1 rollup. You can now generate CSV from structured data ie a data template or a SQL XML query. This piqued my interest so I ran off a data template to CSV. Its interesting, in a geeky kinda way, I guess Im that way inclined.Here's my data structure:<EMPLOYEES> <LIST_G_DEPT> <G_DEPT>  <DEPARTMENT_ID>10</DEPARTMENT_ID>  <DEPARTMENT_NAME>Administration</DEPARTMENT_NAME>  <LIST_G_EMP>   <G_EMP>     <EMPLOYEE_ID>200</EMPLOYEE_ID>     <EMP_NAME>Jennifer Whalen</EMP_NAME>     <EMAIL>JWHALEN</EMAIL>     <PHONE_NUMBER>515.123.4444</PHONE_NUMBER>     <HIRE_DATE>1987-09-17T00:00:00.000-06:00</HIRE_DATE>     <SALARY>4400</SALARY>    </G_EMP>   </LIST_G_EMP>  <TOTAL_EMPS>1</TOTAL_EMPS>  <TOTAL_SALARY>4400</TOTAL_SALARY>  <AVG_SALARY>4400</AVG_SALARY>  <MAX_SALARY>4400</MAX_SALARY>  <MIN_SALARY>4400</MIN_SALARY> </G_DEPT>Poor ol Jennifer, she needs some help in the Admin department :0)Pushing this to CSV and BP completely flattens the data out so you get a completely denormalized data output in the CSV.TOTAL_SALARY,DEPARTMENT_ID,DEPARTMENT_NAME,MIN_SALARY,AVG_SALARY,MAX_SALARY,EMPLOYEE_ID,SALARY,PHONE_NUMBER,HIRE_DATE,EMP_NAME,EMAIL,TOTAL_EMPS4400,10,Administration,4400,4400,4400,200,4400,515.123.4444,1987-09-17T00:00:00.000-06:00,"Jennifer Whalen",JWHALEN,119000,20,Marketing,6000,9500,13000,201,13000,515.123.5555,1996-02-17T00:00:00.000-07:00,"Michael Hartstein",MHARTSTE,219000,20,Marketing,6000,9500,13000,202,6000,603.123.6666,1997-08-17T00:00:00.000-06:00,"Pat Fay",PFAY,2Useful? Its a little confusing to start with but it is completely de-normalized so there is lots of repetition. But if it floats your boat, you got it!

    Read the article

  • What's up with LDoms: Part 9 - Direct IO

    - by Stefan Hinker
    In the last article of this series, we discussed the most general of all physical IO options available for LDoms, root domains.  Now, let's have a short look at the next level of granularity: Virtualizing individual PCIe slots.  In the LDoms terminology, this feature is called "Direct IO" or DIO.  It is very similar to root domains, but instead of reassigning ownership of a complete root complex, it only moves a single PCIe slot or endpoint device to a different domain.  Let's look again at hardware available to mars in the original configuration: root@sun:~# ldm ls-io NAME TYPE BUS DOMAIN STATUS ---- ---- --- ------ ------ pci_0 BUS pci_0 primary pci_1 BUS pci_1 primary pci_2 BUS pci_2 primary pci_3 BUS pci_3 primary /SYS/MB/PCIE1 PCIE pci_0 primary EMP /SYS/MB/SASHBA0 PCIE pci_0 primary OCC /SYS/MB/NET0 PCIE pci_0 primary OCC /SYS/MB/PCIE5 PCIE pci_1 primary EMP /SYS/MB/PCIE6 PCIE pci_1 primary EMP /SYS/MB/PCIE7 PCIE pci_1 primary EMP /SYS/MB/PCIE2 PCIE pci_2 primary EMP /SYS/MB/PCIE3 PCIE pci_2 primary OCC /SYS/MB/PCIE4 PCIE pci_2 primary EMP /SYS/MB/PCIE8 PCIE pci_3 primary EMP /SYS/MB/SASHBA1 PCIE pci_3 primary OCC /SYS/MB/NET2 PCIE pci_3 primary OCC /SYS/MB/NET0/IOVNET.PF0 PF pci_0 primary /SYS/MB/NET0/IOVNET.PF1 PF pci_0 primary /SYS/MB/NET2/IOVNET.PF0 PF pci_3 primary /SYS/MB/NET2/IOVNET.PF1 PF pci_3 primary All of the "PCIE" type devices are available for SDIO, with a few limitations.  If the device is a slot, the card in that slot must support the DIO feature.  The documentation lists all such cards.  Moving a slot to a different domain works just like moving a PCI root complex.  Again, this is not a dynamic process and includes reboots of the affected domains.  The resulting configuration is nicely shown in a diagram in the Admin Guide: There are several important things to note and consider here: The domain receiving the slot/endpoint device turns into an IO domain in LDoms terminology, because it now owns some physical IO hardware. Solaris will create nodes for this hardware under /devices.  This includes entries for the virtual PCI root complex (pci_0 in the diagram) and anything between it and the actual endpoint device.  It is very important to understand that all of this PCIe infrastructure is virtual only!  Only the actual endpoint devices are true physical hardware. There is an implicit dependency between the guest owning the endpoint device and the root domain owning the real PCIe infrastructure: Only if the root domain is up and running, will the guest domain have access to the endpoint device. The root domain is still responsible for resetting and configuring the PCIe infrastructure (root complex, PCIe level configurations, error handling etc.) because it owns this part of the physical infrastructure. This also means that if the root domain needs to reset the PCIe root complex for any reason (typically a reboot of the root domain) it will reset and thus disrupt the operation of the endpoint device owned by the guest domain.  The result in the guest is not predictable.  I recommend to configure the resulting behaviour of the guest using domain dependencies as described in the Admin Guide in Chapter "Configuring Domain Dependencies". Please consult the Admin Guide in Section "Creating an I/O Domain by Assigning PCIe Endpoint Devices" for all the details! As you can see, there are several restrictions for this feature.  It was introduced in LDoms 2.0, mainly to allow the configuration of guest domains that need access to tape devices.  Today, with the higher number of PCIe root complexes and the availability of SR-IOV, the need to use this feature is declining.  I personally do not recommend to use it, mainly because of the drawbacks of the depencies on the root domain and because it can be replaced with SR-IOV (although then with similar limitations). This was a rather short entry, more for completeness.  I believe that DIO can usually be replaced by SR-IOV, which is much more flexible.  I will cover SR-IOV in the next section of this blog series.

    Read the article

  • Isis Finally Rolls Out

    - by David Dorf
    Google has rolled their wallet out for several chains; I see the NFC readers in Walgreen's when I'm sent their for milk.  But Isis has been relatively quiet until now.  As of last week they have finally launched in their two test cities: Austin, and Salt Lake City.  Below are the supported carriers and phones as of now, but more phones will be added later. Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} AT&T supports: HTC One™ X, LG Escape™, Samsung Galaxy Exhilarate™, Samsung Galaxy S® III, Samsung Galaxy Rugby Pro™ T-Mobile supports: Samsung Galaxy S® II, Samsung Galaxy S® III, Samsung Galaxy S® Relay 4G Verizon supports: Droid Incredible 4G LTE. Of course iPhone owners have no wallet since Apple didn't included an NFC chip. To start using Isis, you have to take your NFC-capable phone to your carrier's store to get the SIM replaced with a more sophisticated one that has a secure element configured for Isis.  The "secure element" is the cryptographic logic that secures mobile payments.  Carriers like the secure element in the SIM while non-carriers (like Google) prefer the secure element in the phone's electronics. (I'm not entirely sure if you could support both Isis and Google Wallet on the same phone.  Anybody know?) Then you can download the Isis app from Google Play and load your cards.  Most credit cards are supported, and there's a process to verify the credit cards are valid.  Then you can select from the list of participating retailers to "follow."  Selecting a retailer allows that retailer to give you offers via the app. The app is well done and easy to use.  You can select a default payment type and also switch between them easily.  When the phone is tapped on the reader, there are two exchanges of information.  The payment information is transferred, and then the Isis "SmartTap" information which includes optional loyalty number and digital coupons.  Of course the value of mobile wallets comes from the ease of handling all three data types (i.e. payment, loyalty, offers). There are several advertisements for Isis running now, and my favorite is below.

    Read the article

  • Welcome To The Nashorn Blog

    - by jlaskey
    Welcome to all.  Time to break the ice and instantiate The Nashorn Blog.  I hope to contribute routinely, but we are very busy, at this point, preparing for the next development milestone and, of course, getting ready for open source. So, if there are long gaps between postings please forgive. We're just coming back from JavaOne and are stoked by the positive response to all the Nashorn sessions. It was great for the team to have the front and centre slide from Georges Saab early in the keynote. It seems we have support coming from all directions. Most of the session videos are posted. Check out the links. Nashorn: Optimizing JavaScript and Dynamic Language Execution on the JVM. Unfortunately, Marcus - the code generation juggernaut,  got saddled with the first session of the first day. Still, he had a decent turnout. The talk focused on issues relating to optimizations we did to get good performance from the JVM. Much yet to be done but looking good. Nashorn: JavaScript on the JVM. This was the main talk about Nashorn. I delivered the little bit of this and a little bit of that session with an overview, a follow up on the open source announcement, a run through a few of the Nashorn features and some demos. The room was SRO, about 250±. High points: Sam Pullara, from Twitter, came forward to describe how painless it was to get Mustache.js up and running (20x over Rhino), and,  John Ceccarelli, from NetBeans came forward to describe how Nashorn has become an integral part of Netbeans. A healthy Q & A at the end was very encouraging. Meet the Nashorn JavaScript Team. Michel, Attila, Marcus and myself hosted a Q & A. There was only a handful of people in the room (we assume it was because of a conflicting session ;-) .) Most of the questions centred around Node.jar, which leads me to believe, Nashorn + Node.jar is what has the most interest. Akhil, Mr. Node.jar, sitting in the audience, fielded the Node.jar questions. Nashorn, Node, and Java Persistence. Doug Clarke, Akhil and myself, discussed the title topics, followed by a lengthy Q & A (security had to hustle us out.) 80 or so in the room. Lots of questions about Node.jar. It was great to see Doug's use of Nashorn + JPA. Nashorn in action, with such elegance and grace. Putting the Metaobject Protocol to Work: Nashorn’s Java Bindings. Attila discussed how he applied Dynalink to Nashorn. Good turn out for this session as well. I have a feeling that once people discover and embrace this hidden gem, great things will happen for all languages running on the JVM. Finally, there were quite a few JavaOne sessions that focused on non-Java languages and their impact on the JVM. I've always believed that one's tool belt should carry a variety of programming languages, not just for domain/task applicability, but also to enhance your thinking and approaches to problem solving. For the most part, future blog entries will focus on 'how to' in Nashorn, but if you have any suggestions for topics you want discussed, please drop a line.  Cheers. 

    Read the article

  • Juggling with JDKs on Apple OS X

    - by Blueberry Coder
    I recently got a shiny new MacBook Pro to help me support our ADF Mobile customers. It is really a wonderful piece of hardware, although I am still adjusting to Apple's peculiar keyboard layout. Did you know, for example, that the « delete » key actually performs a « backspace »? But I disgress... As you may know, ADF Mobile development still requires JDeveloper 11gR2, which in turn runs on Java 6. On the other hand, JDeveloper 12c needs JDK 7. I wanted to install both versions, and wasn't sure how to do it.   If you remember, I explained in a previous blog entry how to install JDeveloper 11gR2 on Apple's OS X. The trick was to use the /usr/libexec/java_home command in order to invoke the proper JDK. In this case, I could have done the same thing; the two JDKs can coexist without any problems, since they install in completely different locations. But I wanted more than just installing JDeveloper. I wanted to be able to select my JDK when using the command line as well. On Windows, this is easy, since I keep all my JDKs in a central location. I simply have to move to the appropriate folder or type the folder name in the command I want to execute. Problem is, on OS X, the paths to the JDKs are... let's say convoluted.  Here is the one for Java 6. /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home The Java 7 path is not better, just different. /Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home Intuitive, isn't it? Clearly, I needed something better... On OS X, the default command shell is bash. It is possible to configure the shell environment by creating a file named « .profile » in a user's home folder. Thus, I created such a file and put the following inside: export JAVA_7_HOME=$(/usr/libexec/java_home -v1.7) export JAVA_6_HOME=$(/usr/libexec/java_home -v1.6) export JAVA_HOME=$JAVA_7_HOME alias java6='export JAVA_HOME=$JAVA_6_HOME' alias java7='export JAVA_HOME=$JAVA_7_HOME'  The first two lines retrieve the current paths for Java 7 and Java 6 and store them in two environment variables. The third line marks Java 7 as the default. The last two lines create command aliases. Thus, when I type java6, the value for JAVA_HOME is set to JAVA_6_HOME, for example.  I now have an environment which works even better than the one I have on Windows, since I can change my active JDK on a whim. Here a sample, fresh from my terminal window. fdesbien-mac:~ fdesbien$ java6 fdesbien-mac:~ fdesbien$ java -version java version "1.6.0_65" Java(TM) SE Runtime Environment (build 1.6.0_65-b14-462-11M4609) Java HotSpot(TM) 64-Bit Server VM (build 20.65-b04-462, mixed mode) fdesbien-mac:~ fdesbien$ fdesbien-mac:~ fdesbien$ java7 fdesbien-mac:~ fdesbien$ java -version java version "1.7.0_45" Java(TM) SE Runtime Environment (build 1.7.0_45-b18) Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode) fdesbien-mac:~ fdesbien$ Et voilà! Maximum flexibility without downsides, just I like it. 

    Read the article

< Previous Page | 628 629 630 631 632 633 634 635 636 637 638 639  | Next Page >