Search Results

Search found 25518 results on 1021 pages for 'iterative development'.

Page 635/1021 | < Previous Page | 631 632 633 634 635 636 637 638 639 640 641 642  | Next Page >

  • Rails Passenger Nginx cannot load such file -- bundler

    - by Stuart
    I have set up Rails, Passenger, nginx, and PostgreSQL on Ubuntu Server 12.04LTS. Upon trying to access the application/website, however, I am greeted with an error page saying that the application could not be started because a source file is missing. Error message: cannot load such file -- bundler. My nginx config (/opt/nginx/conf/nginx.conf): user railsapp; worker_processes 1; events { worker_connections 1024; } http { include mime.types; default_type application/octet-stream; sendfile on; keepalive_timeout 65; passenger_root /home/railsapp/.rvm/gems/ruby-1.9.3-p194/gems/passenger-3.0.14; passenger_ruby /home/railsapp/.rvm/rubies/ruby-1.9.3-p194/bin/ruby; server { listen 80; server_name fitness_schedules.local; root /home/railsapp/fitness_schedules/public; passenger_enabled on; rack_env development; } } Here is the error message: A source file that the application requires, is missing. It is possible that you didn't upload your application files correctly. Please check whether all your application files are uploaded. A required library may not installed. Please install all libraries that this application requires. Further information about the error may have been written to the application's log file. Please check it in order to analyse the problem. Error message: cannot load such file -- bundler Exception class: LoadError Application root: /home/railsapp/fitness_schedules Here is the backtrace from the webpage that is presented by nginx: Backtrace: # File Line Location 0 /home/railsapp/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb 36 in `require' 1 /home/railsapp/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb 36 in `require' 2 /home/railsapp/.rvm/gems/ruby-1.9.3-p194/gems/passenger-3.0.14/lib/phusion_passenger/utils.rb 325 in `prepare_app_process' 3 /home/railsapp/.rvm/gems/ruby-1.9.3-p194/gems/passenger-3.0.14/lib/phusion_passenger/rack/application_spawner.rb 156 in `block in initialize_server' 4 /home/railsapp/.rvm/gems/ruby-1.9.3-p194/gems/passenger-3.0.14/lib/phusion_passenger/utils.rb 563 in `report_app_init_status' 5 /home/railsapp/.rvm/gems/ruby-1.9.3-p194/gems/passenger-3.0.14/lib/phusion_passenger/rack/application_spawner.rb 154 in `initialize_server' 6 /home/railsapp/.rvm/gems/ruby-1.9.3-p194/gems/passenger-3.0.14/lib/phusion_passenger/abstract_server.rb 204 in `start_synchronously' 7 /home/railsapp/.rvm/gems/ruby-1.9.3-p194/gems/passenger-3.0.14/lib/phusion_passenger/abstract_server.rb 180 in `start' 8 /home/railsapp/.rvm/gems/ruby-1.9.3-p194/gems/passenger-3.0.14/lib/phusion_passenger/rack/application_spawner.rb 129 in `start' 9 /home/railsapp/.rvm/gems/ruby-1.9.3-p194/gems/passenger-3.0.14/lib/phusion_passenger/spawn_manager.rb 253 in `block (2 levels) in spawn_rack_application' 10 /home/railsapp/.rvm/gems/ruby-1.9.3-p194/gems/passenger-3.0.14/lib/phusion_passenger/abstract_server_collection.rb 132 in `lookup_or_add' 11 /home/railsapp/.rvm/gems/ruby-1.9.3-p194/gems/passenger-3.0.14/lib/phusion_passenger/spawn_manager.rb 246 in `block in spawn_rack_application' 12 /home/railsapp/.rvm/gems/ruby-1.9.3-p194/gems/passenger-3.0.14/lib/phusion_passenger/abstract_server_collection.rb 82 in `block in synchronize' 13 prelude> 10:in `synchronize' 14 /home/railsapp/.rvm/gems/ruby-1.9.3-p194/gems/passenger-3.0.14/lib/phusion_passenger/abstract_server_collection.rb 79 in `synchronize' 15 /home/railsapp/.rvm/gems/ruby-1.9.3-p194/gems/passenger-3.0.14/lib/phusion_passenger/spawn_manager.rb 244 in `spawn_rack_application' 16 /home/railsapp/.rvm/gems/ruby-1.9.3-p194/gems/passenger-3.0.14/lib/phusion_passenger/spawn_manager.rb 137 in `spawn_application' 17 /home/railsapp/.rvm/gems/ruby-1.9.3-p194/gems/passenger-3.0.14/lib/phusion_passenger/spawn_manager.rb 275 in `handle_spawn_application' 18 /home/railsapp/.rvm/gems/ruby-1.9.3-p194/gems/passenger-3.0.14/lib/phusion_passenger/abstract_server.rb 357 in `server_main_loop' 19 /home/railsapp/.rvm/gems/ruby-1.9.3-p194/gems/passenger-3.0.14/lib/phusion_passenger/abstract_server.rb 206 in `start_synchronously' 20 /home/railsapp/.rvm/gems/ruby-1.9.3-p194/gems/passenger-3.0.14/helper-scripts/passenger-spawn-server 99 in `' In ~/fitness_schedules/log there are only development and test logs, no production/development logs.

    Read the article

  • register_globals error in php

    - by user145862
    I was stuck up with the error directive 'register_globals' is no longer available in PHP in unknown on line 0 when tried to check the php version using "php -v" after enabling register_globals in php.ini file. I am not getting any php version info by doing so. Instead it throws the above mentioned error.After turning off this option, php info works quite well. It is very essential for me to have register_globals to be turned on.How can I have this corrected. my php.ini is as follows: ; Default Value: None ; Development Value: "GP" ; Production Value: "GP" ; http://php.net/request-order request_order = "GP" ; Whether or not to register the EGPCS variables as global variables. You may ; want to turn this off if you don't want to clutter your scripts' global scope ; with user data. ; You should do your best to write your scripts so that they do not require ; register_globals to be on; Using form variables as globals can easily lead ; to possible security problems, if the code is not very well thought of. ; register_globals = On ; Determines whether the deprecated long $HTTP_*_VARS type predefined variables ; are registered by PHP or not. As they are deprecated, we obviously don't ; recommend you use them. They are on by default for compatibility reasons but ; they are not recommended on production servers. ; Default Value: On ; Development Value: Off ; Production Value: Off ; register_long_arrays = Off ; This directive determines whether PHP registers $argv & $argc each time it ; runs. $argv contains an array of all the arguments passed to PHP when a script ; is invoked. $argc contains an integer representing the number of arguments ; that were passed when the script was invoked. These arrays are extremely ; useful when running scripts from the command line. When this directive is ; enabled, registering these variables consumes CPU cycles and memory each time ; a script is executed. For performance reasons, this feature should be disabled ; on production servers. ; Note: This directive is hardcoded to On for the CLI SAPI ; Default Value: On ; Development Value: Off ; Production Value: Off ; register_argc_argv = Off ; When enabled, the SERVER and ENV variables are created when they're first ; used (Just In Time) instead of when the script starts. If these variables ; are not used within a script, having this directive on will result in a ; performance gain. The PHP directives register_globals, register_long_arrays, ; and register_argc_argv must be disabled for this directive to have any affect. ; auto_globals_jit = On

    Read the article

  • How to restore Linode to Vagrant VM?

    - by Iain Elder
    I'm trying to set up a Linux development environment so I can safely make changes to my website without breaking the live site. Linode hosts my live site. A simple solution would be to host my development server on Linode as well, but I want to avoid doubling my hosting costs. The cheapest way I see is to use Vagrant on my Windows workstation to host my development environment. After I attempt to restore the backup to Vagrant and reboot the VM, I can no longer ssh into the Vagrant host. It's probably because by restoring the backup I overwrite some special Vagrant configuration, but I'm not sure how to avoid that. How do I make this approach work? If my approach is fundamentally wrong, can you suggest an alternative? Creating the backup On the Linode I used these commands to create a compressed copy of the entire filesystem, while ignoring things that shouldn't be included in the backup: $ sudo rsync -ahvz --exclude={/dev/*,/proc/*,/sys/*,/tmp/*,/run/*,/mnt/*,/backup/*} /* /backup/2 $ sudo tar -czf /backup/2.gz /backup/2 The backup file is called 2.gz because this is thesecond backup. The first backup is called 1.gz. I use WinSCP to copy the backup file to my Windows workstation. Setting up the Vagrant host I need a Vagrant box that matches my Linode operating system (Ubuntu 12.04.3 LTS, kernel 3.9.3). I selected the closet match from vagrantbox.es: Ubuntu Server Precise 12.04.3 amd64 Kernel is ready for Docker (Docker not included) On my workstation I ran these commands to add the box and initialize and boot an instance: $ vagrant box add ubuntu-precise http://nitron-vagrant.s3-website-us-east-1.amazonaws.com/vagrant_ubuntu_12.04.3_amd64_virtualbox.box $ mkdir linode-test $ cd linode-test $ vagrant init ubuntu-precise $ vagrant up Now Vagrant is running a machine with SSH on port 2222. The operating system version is the same. The kernel version is 3.8.0. Sounds close enough. Restoring the backup With WinSCP I copied the backup file 2.gz to /home/vagrant/2.gz on the Vagrant box. With PuTTY I connected via ssh to my new Vagrant box: On the box move the backup to the filesystem root. $ sudo mv 2.gz / Extract the archive to the filesystem root: $ sudo tar -xvpz -f 2.gz -C / --strip-components=2 (I discovered I need to use strip components because all files in the archive have the prefix backup/2/. I'll fix this for the next backup.) After the tar command completes, I log out of the box. Testing the backup When I try to log in again, it doesn't let me log in as vagrant with a password any more. It does let me log in as iain, my user on the live Linode, with a password. That surprised me because I disabled password authentication on my live Linode. I figured that I have to restart the ssh service for the change to take effect. Instead of restarting just ssh, I chose to restart the whole system. Now I can't even get to the login screen. PuTTY says "connection refused" when I try to connect. What went wrong?

    Read the article

  • How John Got 15x Improvement Without Really Trying

    - by rchrd
    The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here.  How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

    Read the article

  • Developing an Implementation Plan with Iterations by Russ Pitts

    - by user535886
    Developing an Implementation Plan with Iterations by Russ Pitts  Ok, so you have come to grips with understanding that applying the iterative concept, as defined by OUM is simply breaking up the project effort you have estimated for each phase into one or more six week calendar duration blocks of work. Idea being the business user(s) or key recipient(s) of work product(s) being developed never go longer than six weeks without having some sort of review or prototyping of the work results for an iteration…”think-a-little”, “do-a-little”, and “show-a-little” in a six week or less timeframe…ideally the business user(s) or key recipients(s) are involved throughout. You also understand the OUM concept that you only plan for that which you have knowledge of. The concept further defined, a project plan initially is developed at a high-level, and becomes more detailed as project knowledge grows. Agreeing to this concept means you also have to admit to the fallacy that one can plan with precision beyond six weeks into a project…Anything beyond six weeks is a best guess in most cases when dealing with software implementation projects. Project planning, as defined by OUM begins with the Implementation Plan view, which is a very high-level perspective of the effort estimated for each of the five OUM phases, as well as the number of iterations within each phase. You might wonder how can you predict the number of iterations for each phase at this early point in the project. Remember project planning is not an exact science, and initially is high-level and abstract in nature, and then becomes more detailed and precise as the project proceeds. So where do you start in defining iterations for each phase for a project? The following are three easy steps to initially define the number of iterations for each phase: Step 1 => Start with identifying the known factors… …Prior to starting a project you should know: · The agreed upon time-period for an iteration (e.g 6 weeks, or 4 weeks, or…) within a phase (recommend keeping iteration time-period consistent within a phase, if not for the entire project) · The number of resources available for the project · The number of total number of man-day (effort) you have estimated for each of the five OUM phases of the project · The number of work days for a week Step 2 => Calculate the man-days of effort required for an iteration within a phase… Lets assume for the sake of this example there are 10 project resources, and you have estimated 2,536 man-days of work effort which will need to occur for the elaboration phase of the project. Let’s also assume a week for this project is defined as 5 business days, and that each iteration in the elaboration phase will last a calendar duration of 6 weeks. A simple calculation is performed to calculate the daily burn rate for a single iteration, which produces a result of… ((Number of resources * days per week) * duration of iteration) = Number of days required per iteration ((10 resources * 5 days/week) * 6 weeks) = 300 man days of effort required per iteration Step 3 => Calculate the number of iterations that can occur within a phase Next calculate the number of iterations that can occur for the amount of man-days of effort estimated for the phase being considered… (number of man-days of effort estimated / number of man-days required per iteration) = # of iterations for phase (2,536 man-days of estimated effort for phase / 300 man days of effort required per iteration) = 8.45 iterations, which should be rounded to a whole number such as 9 iterations* *Note - It is important to note this is an approximate calculation, not an exact science. This particular example is a simple one, which assumes all resources are utilized throughout the phase, including tech resources, etc. (rounding down or up to a whole number based on project factor considerations). It is also best in many cases to round up to higher number, as this provides some calendar scheduling contingency.

    Read the article

  • How to Build Services from Legacy Applications

    - by Chris Falter
    The SOA consultants invaded the executive suite at your company or agency, preached the true religion, and converted the unbelievers. Now by divine imperative you must convert your legacy applications into a suite of reusable services.  But as usual, you lack the time and resources that you need in order to develop the services properly.  So you googled or bing’ed, found this blog post, and began crying in gratitude.  Yes, as the title implies, I am going to reveal my easy, 3-step, works-every-time process for converting silos of legacy applications into the inventory of services your CIO has been dreaming about.  So just close your eyes and count to 3 … now open them … and here it is…. Not. While wishful thinking is too often the coin of the IT realm, even the most naive practitioner knows that converting legacy applications into reusable services requires more than a magic wand.  The reason is simple: if your starting point is your legacy applications, then you will simply be bolting a web service technology layer on top of your legacy API.  And that legacy API is built in the image of the silo applications.  Enter the wide gate of the legacy API, follow the broad path of generating service interfaces from existing code, and you will arrive at the siloed enterprise destruction that you thought you were escaping. The Straight and Narrow Path This past week I had the opportunity to learn how the FBI Criminal Justice Information Systems department has been transitioning from silo applications to a service inventory.  Lafe Hutcheson, IT Specialist in the architecture group and fellow attendee at an SOA Architect Certification Workshop, was my guide.  Lafe has survived the chaos of an SOA initiative, so it is not surprising that he was able to return from a US Army deployment to Kabul, Afghanistan with nary a scratch.  According to Lafe, building their service inventory is a three-phase process: Model a business process.  This requires intense collaboration between the IT and business wings of the organization, of course.  The FBI uses IBM Websphere tools to model the process with BPMN. Identify candidate services to facilitate the business process. Convert the BPMN to an executable BPEL orchestration, model and develop the services, and use a BPEL engine to run the process.  The FBI uses ActiveVOS for orchestration services. The 12 Step Program to End Your Legacy API Addiction Thomas Erl has documented a process for building a web service inventory that is quite similar to the FBI process. Erl’s process adds a technology architecture definition phase, which allows for the technology environment to influence the inventory blueprint.  For example, if you are using an enterprise service bus, you will probably not need to build your own utility services for logging or intermediate routing.  Erl also lists a service-oriented analysis phase that highlights the 12-step process of applying the principles of service orientation to modeling your services.  Erl depicts the modeling of a service inventory as an iterative process: model a business process, define the relevant technology architecture, define the service inventory blueprint, analyze the services, then model another business process, rinse and repeat.  (Astute readers will note that Erl’s diagram, restricted to analysis and modeling process, does not include the implementation phase that concludes the FBI service development methodology.) The service-oriented analysis phase is where you find the 12 steps that will free you from your legacy API addiction. In a nutshell, you identify the steps in the process that need services; identify the different types of services (agnostic entity services, service compositions, and utility services) that are required; apply service-orientation principles; and normalize the inventory into cohesive service models. Rather than discuss each of the 12 steps individually, I will close by simply referring my readers to Erl’s explanation.

    Read the article

  • Valuing "Working Software over Comprehensive Documentation"

    - by tom.spitz
    Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;} I subscribe to the tenets put forth in the Manifesto for Agile Software Development - http://agilemanifesto.org. As Oracle's chief methodologist, that might seem a self-deprecating attitude. After all, the agile manifesto tells us that we should value "individuals and interactions" over "processes and tools." My job includes process development. I also subscribe to ideas put forth in a number of subsequent works including Balancing Agility and Discipline: A Guide for the Perplexed (Boehm/Turner, Addison-Wesley) and Agile Project Management: Creating Innovative Products (Highsmith, Addison-Wesley). Both of these books talk about finding the right balance between "agility and discipline" or between a "predictive and adaptive" project approach. So there still seems to be a place for us in creating the Oracle Unified Method (OUM) to become the "single method framework that supports the successful implementation of every Oracle product." After all, the real idea is to apply just enough ceremony and produce just enough documentation to suit the needs of the particular project that supports an enterprise in moving toward its desired future state. The thing I've been struggling with - and the thing I'd like to hear from you about right now - is the prevalence of an ongoing obsession with "documents." OUM provides a comprehensive set of guidance for an iterative and incremental approach to engineering and implementing software systems. Our intent is first to support the information technology system implementation and, as necessary, support the creation of documentation. OUM, therefore, includes a supporting set of document templates. Our guidance is to employ those templates, sparingly, as needed; not create piles of documentation that you're not gonna (sic) need. In other words, don't serve the method, make the method serve you. Yet, there seems to be a "gimme" mentality in some circles that if you give me a sample document - or better yet - a repository of samples - then I will be able to do anything cheaply and quickly. The notion is certainly appealing AND reuse can save time. Plus, documents are a lowest common denominator way of packaging reusable stuff. However, without sustained investment and management I've seen "reuse repositories" turn quickly into garbage heaps. So, I remain a skeptic. I agree that providing document examples that promote consistency is helpful. However, there may be too much emphasis on the documents themselves and not enough on creating a system that meets the evolving needs of the business. How can we shift the emphasis toward working software and away from our dependency on documents - especially on large, complex implementation projects - while still supporting the need for documentation? I'd like to hear your thoughts.

    Read the article

  • Open source adventures with... wait for it... Microsoft

    - by Jeff
    Last week, Microsoft announced that it was going to open source the rest of the ASP.NET MVC Web stack. The core MVC framework has been open source for a long time now, but the other pieces around it are also now out in the wild. Not only that, but it's not what I call "big bang" open source, where you release the source with each version. No, they're actually committing in real time to a public repository. They're also taking contributions where it makes sense. If that weren't exciting enough, CodePlex, which used to be a part of the team I was on, has been re-org'd to a different part of the company where it is getting the love and attention (and apparently money) that it deserves. For a period of several months, I lobbied to get a PM gig with that product, but got nowhere. A year and a half later, I'm happy to see it finally treated right. In any case, I found a bug in Razor, the rendering engine, before the beta came out. I informally sent the bug info to some people, but it wasn't fixed for the beta. Now, with the project being developed in the open, I was able to submit the issue, and went back and forth with the developer who wrote the code (I met him once at a meet up in Bellevue, I think), and he committed a fix. I tried it a day later, and the bug was gone. There's a lot to learn from all of this. That open source software is surprisingly efficient and often of high quality is one part of it. For me the win is that it demonstrates how open and collaborative processes, as light as possible, lead to better software. In other words, even if this were a project being developed internally, at a bank or something, getting stakeholders involved early and giving people the ability to respond leads to awesomeness. While there is always a place for big thinking, experience has shown time and time again that trying to figure everything out up front takes too long, and rarely meets expectations. This is a lesson that probably half of Microsoft has yet to learn, including the team I was on before I split. It's the reason that team still hasn't shipped anything to general availability. But I've seen what an open and iterative development style can do for teams, at Microsoft and other places that I've worked. When you can have a conversation with people, and take ideas and turn them into code quickly, you're winning. So why don't people like winning? I think there are a lot of reasons, and they can generally be categorized into fear, skepticism and bad experiences. I can't give the Web stack teams enough credit. Not only did they dream big, but they changed a culture that often seems immovable and hopelessly stuck. This is a very public example of this culture change, but it's starting to happen at every scale in Microsoft. It's really interesting to see in a company that has been written off as dead the last decade.

    Read the article

  • Solving Diophantine Equations Using Python

    - by HARSHITH
    In mathematics, a Diophantine equation (named for Diophantus of Alexandria, a third century Greek mathematician) is a polynomial equation where the variables can only take on integer values. Although you may not realize it, you have seen Diophantine equations before: one of the most famous Diophantine equations is: We are not certain that McDonald's knows about Diophantine equations (actually we doubt that they do), but they use them! McDonald's sells Chicken McNuggets in packages of 6, 9 or 20 McNuggets. Thus, it is possible, for example, to buy exactly 15 McNuggets (with one package of 6 and a second package of 9), but it is not possible to buy exactly 16 nuggets, since no non- negative integer combination of 6's, 9's and 20's adds up to 16. To determine if it is possible to buy exactly n McNuggets, one has to solve a Diophantine equation: find non-negative integer values of a, b, and c, such that 6a + 9b + 20c = n. Write an iterative program that finds the largest number of McNuggets that cannot be bought in exact quantity. Your program should print the answer in the following format (where the correct number is provided in place of n): "Largest number of McNuggets that cannot be bought in exact quantity: n"

    Read the article

  • Improving long-polling Ajax performance

    - by Bears will eat you
    I'm writing a webapp (Firefox-compatible only) which uses long polling (via jQuery's ajax abilities) to send more-or-less constant updates from the server to the client. I'm concerned about the effects of leaving this running for long periods of time, say, all day or overnight. The basic code skeleton is this: function processResults(xml) { // do stuff with the xml from the server } function fetch() { setTimeout(function () { $.ajax({ type: 'GET', url: 'foo/bar/baz', dataType: 'xml', success: function (xml) { processResults(xml); fetch(); }, error: function (xhr, type, exception) { if (xhr.status === 0) { console.log('XMLHttpRequest cancelled'); } else { console.debug(xhr); fetch(); } } }); }, 500); } (The half-second "sleep" is so that the client doesn't hammer the server if the updates are coming back to the client quickly - which they usually are.) After leaving this running overnight, it tends to make Firefox crawl. I'd been thinking that this could be partially caused by a large stack depth since I've basically written an infinitely recursive function. However, if I use Firebug and throw a breakpoint into fetch, it looks like this is not the case. The stack that Firebug shows me is only about 4 or 5 frames deep, even after an hour. One of the solutions I'm considering is changing my recursive function to an iterative one, but I can't figure out how I would insert the delay in between Ajax requests without spinning. I've looked at the JS 1.7 "yield" keyword but I can't quite wrap my head around it, to figure out if it's what I need here. Is the best solution just to do a hard refresh on the page periodically, say, once every hour? Is there a better/leaner long-polling design pattern that won't put a hurt on the browser even after running for 8 or 12 hours? Or should I just skip the long polling altogether and use a different "constant update" pattern since I usually know how frequently the server will have a response for me?

    Read the article

  • Python progression path - From apprentice to guru

    - by Morlock
    Hi all, I've been learning, working, and playing with Python for a year and a half now. As a biologist slowly making the turn to bio-informatics, this language has been a the very core of all the major contributions I have made in the lab. (bash and R scripts have helped some too. My C++ capabilities are very not functional yet). I more or less fell in love with the way Python permits me to express beautiful solutions and also with the semantics of the language that allows such a natural flow from thoughts to workable code. What I would like to know from you is your answer to a kind of question I have seldom seen in this or other forums. Let me sum up what I do NOT want to ask first ;) I don't want to know how to QUICKLY learn Python Nor do I want to find out the best way to get acquainted with the language Finally, I don't want to know a 'one trick that does it all' approach. What I do want to know your opinion about, is: What are the steps YOU would recommend to a Python journeyman, from apprenticeship to guru status (feel free to stop wherever your expertise dictates it), in order that one IMPROVES CONSTANTLY, becoming a better and better Python coder, one step at a time. The kind of answers I would enjoy (but feel free to surprise the readership :P ), is formatted more or less like this: Read this (eg: python tutorial), pay attention to that kind of details Code for so manytime/problems/lines of code Then, read this (eg: this or that book), but this time, pay attention to this Tackle a few real-life problems Then, proceed to reading Y. Be sure to grasp these concepts Code for X time Come back to such and such basics or move further to... (you get the point :) This process depicts an iterative Learn/Code cycle, and I really care about knowing your opinion on what exactly one should pay attention to, at various stages, in order to progress CONSTANTLY (with due efforts, of course). If you come from a specific field of expertise, discuss the path you see as appropriate in this field. Thanks a lot for sharing your opinions and good Python coding!

    Read the article

  • Disco/MapReduce: Using results of previous iteration as input to new iteration

    - by muckabout
    Currently am implementing PageRank on Disco. As an iterative algorithm, the results of one iteration are used as input to the next iteration. I have a large file which represents all the links, with each row representing a page and the values in the row representing the pages to which it links. For Disco, I break this file into N chunks, then run MapReduce for one round. As a result, I get a set of (page, rank) tuples. I'd like to feed this rank to the next iteration. However, now my mapper needs two inputs: the graph file, and the pageranks. I would like to "zip" together the graph file and the page ranks, such that each line represents a page, it's rank, and it's out links. Since this graph file is separated into N chunks, I need to split the pagerank vector into N parallel chunks, and zip the regions of the pagerank vectors to the graph chunks This all seems more complicated than necessary, and as a pretty straightforward operation (with the quintessential mapreduce algorithm), it seems I'm missing something about Disco that could really simplify the approach. Any thoughts?

    Read the article

  • Project Euler Question 14 (Collatz Problem)

    - by paradox
    The following iterative sequence is defined for the set of positive integers: n -n/2 (n is even) n -3n + 1 (n is odd) Using the rule above and starting with 13, we generate the following sequence: 13 40 20 10 5 16 8 4 2 1 It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1. Which starting number, under one million, produces the longest chain? NOTE: Once the chain starts the terms are allowed to go above one million. I tried coding a solution to this in C using the bruteforce method. However, it seems that my program stalls when trying to calculate 113383. Please advise :) #include <stdio.h> #define LIMIT 1000000 int iteration(int value) { if(value%2==0) return (value/2); else return (3*value+1); } int count_iterations(int value) { int count=1; //printf("%d\n", value); while(value!=1) { value=iteration(value); //printf("%d\n", value); count++; } return count; } int main() { int iteration_count=0, max=0; int i,count; for (i=1; i<LIMIT; i++) { printf("Current iteration : %d\n", i); iteration_count=count_iterations(i); if (iteration_count>max) { max=iteration_count; count=i; } } //iteration_count=count_iterations(113383); printf("Count = %d\ni = %d\n",max,count); }

    Read the article

  • java BufferedReader specific length returns NUL characters

    - by Bastien
    I have a TCP socket client receiving messages (data) from a server. messages are of the type length (2 bytes) + data (length bytes), delimited by STX & ETX characters. I'm using a bufferedReader to retrieve the two first bytes, decode the length, then read again from the same bufferedReader the appropriate length and put the result in a char array. most of the time, I have no problem, but SOMETIMES (1 out of thousands of messages received), when attempting to read (length) bytes from the reader, I get only part of it, the rest of my array being filled with "NUL" characters. I imagine it's because the buffer has not yet been filled. char[] bufLen = new char[2]; _bufferedReader.read(bufLen); int len = decodeLength(bufLen); char[] _rawMsg = new char[len]; _bufferedReader.read(_rawMsg); return _rawMsg; I solved the problem in several iterative ways: first I tested the last char of my array: if it wasn't ETX I would read chars from the bufferedReader one by one until I would reach ETX, then start over my regular routine. the consequence is that I would basically DROP one message. then, in order to still retrieve that message, I would find the first occurence of the NUL char in my "truncated" message, read & store additional characters one at a time until I reached ETX, and append them to my "truncated" messages, confirming length is ok. it works also, but I'm really thinking there's something I could do better, like checking if the total number of characters I need are available in the buffer before reading it, but can't find the right way to do it... any idea / pointer ? thanks !

    Read the article

  • Windows Batch file to echo a specific line number

    - by Lee
    So for the second part of my current dilemma, I have a list of folders in c:\file_list.txt. I need to be able to extract them (well, echo them with some mods) based on the line number because this batch script is being called by an iterative macro process. I'm passing the line number as a parameter. @echo off setlocal enabledelayedexpansion set /a counter=0 set /a %%a = "" for /f "usebackq delims=" %%a in (c:\file_list.txt) do ( if "!counter!"=="%1" goto :printme & set /a counter+=1 ) :printme echo %%a which gives me an output of %a. Doh! So, I've tried echoing !a! (result: ECHO is off.); I've tried echoing %a (result: a) I figured the easy thing to do would be to modify the head.bat code found here: http://stackoverflow.com/questions/130116/dos-batch-commands-to-read-first-line-from-text-file except rather than echoing every line - I'd just echo the last line found. Not as simple as one might think. I've noticed that my counter is staying at zero for some reason; I'm wondering if the set /a counter+=1 is doing what I think it's doing.

    Read the article

  • How to perform spatial partitioning in n-dimensions?

    - by kevin42
    I'm trying to design an implementation of Vector Quantization as a c++ template class that can handle different types and dimensions of vectors (e.g. 16 dimension vectors of bytes, or 4d vectors of doubles, etc). I've been reading up on the algorithms, and I understand most of it: here and here I want to implement the Linde-Buzo-Gray (LBG) Algorithm, but I'm having difficulty figuring out the general algorithm for partitioning the clusters. I think I need to define a plane (hyperplane?) that splits the vectors in a cluster so there is an equal number on each side of the plane. [edit to add more info] This is an iterative process, but I think I start by finding the centroid of all the vectors, then use that centroid to define the splitting plane, get the centroid of each of the sides of the plane, continuing until I have the number of clusters needed for the VQ algorithm (iterating to optimize for less distortion along the way). The animation in the first link above shows it nicely. My questions are: What is an algorithm to find the plane once I have the centroid? How can I test a vector to see if it is on either side of that plane?

    Read the article

  • Performance question: Inverting an array of pointers in-place vs array of values

    - by Anders
    The background for asking this question is that I am solving a linearized equation system (Ax=b), where A is a matrix (typically of dimension less than 100x100) and x and b are vectors. I am using a direct method, meaning that I first invert A, then find the solution by x=A^(-1)b. This step is repated in an iterative process until convergence. The way I'm doing it now, using a matrix library (MTL4): For every iteration I copy all coeffiecients of A (values) in to the matrix object, then invert. This the easiest and safest option. Using an array of pointers instead: For my particular case, the coefficients of A happen to be updated between each iteration. These coefficients are stored in different variables (some are arrays, some are not). Would there be a potential for performance gain if I set up A as an array containing pointers to these coefficient variables, then inverting A in-place? The nice thing about the last option is that once I have set up the pointers in A before the first iteration, I would not need to copy any values between successive iterations. The values which are pointed to in A would automatically be updated between iterations. So the performance question boils down to this, as I see it: - The matrix inversion process takes roughly the same amount of time, assuming de-referencing of pointers is non-expensive. - The array of pointers does not need the extra memory for matrix A containing values. - The array of pointers option does not have to copy all NxN values of A between each iteration. - The values that are pointed to the array of pointers option are generally NOT ordered in memory. Hopefully, all values lie relatively close in memory, but *A[0][1] is generally not next to *A[0][0] etc. Any comments to this? Will the last remark affect performance negatively, thus weighing up for the positive performance effects?

    Read the article

  • dotNet Templated, Repeating, Databound ServerControl: Modifying underlying ServerControl data per te

    - by Campbeln
    I have a server control that wraps an underlying class which manages a number of indexes to track where it is in a dataset (ie: RenderedRecordCount, ErroredRecordCount, NewRecordCount, etc.). I've got the server control rendering great, but OnDataBinding I'm having an issue as to seems to happen after CreateChildControls and before Render (both of which properly manage the iteration of the underlying indexes). While I'm somewhat familiar with the ASP.NET page lifecycle, this one seems to be beyond me at the moment. So... How do I hook into the iterative process OnDataBinding uses so I can manage the underlying indexes? Will I have to iterate over the ITemplates myself, managing the indexes as I go or is there an easier solution? [edit: Agh... writing the problem out is very cathartic... I'm thinking this is exactly what I will need to do...] Also... I implemented the iteration of the underlying indexes during CreateChildControls originally in the belief that was the proper place to hook in for events like OnDataBinding (thinking it was done as the controls were being .Add'd). Now it seems that this may actually be unnecessary. So I guess the secondary question is: What happens during CreateChildControls? Are the unadulterated (read: with various <%-tags in place) controls added to the .Controls collection without any other processing?

    Read the article

  • converting a Tree to newick format. java

    - by Esmond
    I'm having problems converting a binary rooted tree to newick format. The full explanation for such a format can be found: http://code.google.com/p/mrsrf/wiki/NewickTree An example of a newick format would be as follows: for a tree T such as http://www.cs.mcgill.ca/~cs251/OldCourses/1997/topic8/images/completetreetwo.gif the newick representation would be: (((8,9),(10,11)),((12,13),(14,15))) the internal node will become the commas while the leaves will be retained. such trees have internal nodes which will always have 2 children. I have a problem using recursion to come out with this newick format. The output contains far too many nodes and braces. Any comments to resolve this problem is appreciated or even an iterative algorithm would be welcomed import java.util.Stack; public class Tree { .... public String inOrderNewick(Node root, String output) throws ItemNotFoundException { if (root.hasChild()) { output += "("; output += inOrderNewick(root.child1, output); output += ","; output += inOrderNewick(root.child2, output); output += ")"; return output; } else { output += root.getSeq(); return output; } } }

    Read the article

  • Highlighting a piechart slice from an HTML element (mouseover)

    - by nickhar
    I have a series of HTML table cells with data - an example of which is: <tr id="rrow1"> <td> <a href="/electricity" class="category">Electricity</a> </td> <td> 901.471 </td> </tr> <tr id="rrow2">... <tr id="rrow3">... etc In this case, each <tr> (or hypathetically for the wider community a div/span/tr/td) is assigned a sequential id based on $rrow++; in a while loop (in PHP). I also have a Piechart using the highcharts library, where i'd like to highlight the slice (sliced: true) based upon onmouseover of particular div/span/tr/td element - in this case #rrow1 as above, but multiple/iterative elements as required and (sliced: false) onmouseout... As a simple example, I've tried accessing various derivatives of the following, but failed: $('#rrow1').mouseover(function() { chart.series[0].graph.attr('sliced', true); }); $('#rrow1').mouseout(function() { chart.series[0].graph.attr('sliced', false); }); The nearest I've found is this but bastardised at most and without success: plotOptions: { series: { mouseOver: function() { if( $('#rrow1').mouseover ) series.x = sliced: true; }, mouseOut: function() { if( $('#rrow1').mouseout ) series.x = sliced: false; } } } These are far from approaching correct and despite searching I can't find a valid/helpful example to work from or draw direction. You can view the pie chart in question on jsfiddle here.

    Read the article

  • DOS Batch file to echo a specific line number

    - by Lee
    So for the second part of my current dilemma, I have a list of folders in "c:\file_list.txt". I need to be able to extract them (well, echo them with some mods) based on the line number because this batch script is being called by an iterative macro process. I'm passing the line number as a parameter. @echo off setlocal enabledelayedexpansion set /a counter=0 set /a %%a = "" for /f "usebackq delims=" %%a in (c:\file_list.txt) do (if "!counter!"=="%1" goto :printme & set /a counter+=1) :printme echo %%a which gives me an output of "%a". Doh! So, I've tried echoing !a! (result: "ECHO is off."); I've tried echoing %a (result: a) I figured the easy thing to do would be to modify the "head.bat" code found here: http://stackoverflow.com/questions/130116/dos-batch-commands-to-read-first-line-from-text-file except rather than echoing every line - I'd just echo the last line found. Not as simple as one might think. I've noticed that my counter is staying at zero for some reason; I'm wondering if the "set /a counter+=1" is doing what I think it's doing.

    Read the article

  • SQL Server database change workflow best practices

    - by kubi
    The Background My group has 4 SQL Server Databases: Production UAT Test Dev I work in the Dev environment. When the time comes to promote the objects I've been working on (tables, views, functions, stored procs) I make a request of my manager, who promotes to Test. After testing, she submits a request to an Admin who promotes to UAT. After successful user testing, the same Admin promotes to Production. The Problem The entire process is awkward for a few reasons. Each person must manually track their changes. If I update, add, remove any objects I need to track them so that my promotion request contains everything I've done. In theory, if I miss something testing or UAT should catch it, but this isn't certain and it's a waste of the tester's time, anyway. Lots of changes I make are iterative and done in a GUI, which means there's no record of what changes I made, only the end result (at least as far as I know). We're in the fairly early stages of building out a data mart, so the majority of the changes made, at least count-wise, are minor things: changing the data type for a column, altering the names of tables as we crystallize what they'll be used for, tweaking functions and stored procs, etc. The Question People have been doing this kind of work for decades, so I imagine there have got to be a much better way to manage the process. What I would love is if I could run a diff between two databases to see how the structure was different, use that diff to generate a change script, use that change script as my promotion request. Is this possible? If not, are there any other ways to organize this process? For the record, we're a 100% Microsoft shop, just now updating everything to SQL Server 2008, so any tools available in that package would be fair game.

    Read the article

  • Tail recursion and memoization with C#

    - by Jay
    I'm writing a function that finds the full path of a directory based on a database table of entries. Each record contains a key, the directory's name, and the key of the parent directory (it's the Directory table in an MSI if you're familiar). I had an iterative solution, but it started looking a little nasty. I thought I could write an elegant tail recursive solution, but I'm not sure anymore. I'll show you my code and then explain the issues I'm facing. Dictionary<string, string> m_directoryKeyToFullPathDictionary = new Dictionary<string, string>(); ... private string ExpandDirectoryKey(Database database, string directoryKey) { // check for terminating condition string fullPath; if (m_directoryKeyToFullPathDictionary.TryGetValue(directoryKey, out fullPath)) { return fullPath; } // inductive step Record record = ExecuteQuery(database, "SELECT DefaultDir, Directory_Parent FROM Directory where Directory.Directory='{0}'", directoryKey); // null check string directoryName = record.GetString("DefaultDir"); string parentDirectoryKey = record.GetString("Directory_Parent"); return Path.Combine(ExpandDirectoryKey(database, parentDirectoryKey), directoryName); } This is how the code looked when I realized I had a problem (with some minor validation/massaging removed). I want to use memoization to short circuit whenever possible, but that requires me to make a function call to the dictionary to store the output of the recursive ExpandDirectoryKey call. I realize that I also have a Path.Combine call there, but I think that can be circumvented with a ... + Path.DirectorySeparatorChar + .... I thought about using a helper method that would memoize the directory and return the value so that I could call it like this at the end of the function above: return MemoizeHelper( m_directoryKeyToFullPathDictionary, Path.Combine(ExpandDirectoryKey(database, parentDirectoryKey)), directoryName); But I feel like that's cheating and not going to be optimized as tail recursion. Any ideas? Should I be using a completely different strategy? This doesn't need to be a super efficient algorithm at all, I'm just really curious. I'm using .NET 4.0, btw. Thanks!

    Read the article

  • No idea how to solve SICP exercise 1.11

    - by Javier Badia
    This is not homework. Exercise 1.11: A function f is defined by the rule that f(n) = n if n<3 and f(n) = f(n - 1) + 2f(n - 2) + 3f(n - 3) if n 3. Write a procedure that computes f by means of a recursive process. Write a procedure that computes f by means of an iterative process. Implementing it recursively is simple enough. But I couldn't figure out how to do it iteratively. I tried comparing with the Fibonacci example given, but I didn't know how to use it as an analogy. So I gave up (shame on me) and Googled for an explanation, and I found this: (define (f n) (if (< n 3) n (f-iter 2 1 0 n))) (define (f-iter a b c count) (if (< count 3) a (f-iter (+ a (* 2 b) (* 3 c)) a b (- count 1)))) After reading it, I understand the code and how it works. But what I don't understand is the process needed to get from the recursive defintion of the function to this. I don't get how the code formed in someone's head. Could you explain the thought process needed to arrive at the solution?

    Read the article

  • How to get the size of a binary tree ?

    - by Andrei Ciobanu
    I have a very simple binary tree structure, something like: struct nmbintree_s { unsigned int size; int (*cmp)(const void *e1, const void *e2); void (*destructor)(void *data); nmbintree_node *root; }; struct nmbintree_node_s { void *data; struct nmbintree_node_s *right; struct nmbintree_node_s *left; }; Sometimes i need to extract a 'tree' from another and i need to get the size to the 'extracted tree' in order to update the size of the initial 'tree' . I was thinking on two approaches: 1) Using a recursive function, something like: unsigned int nmbintree_size(struct nmbintree_node* node) { if (node==NULL) { return(0); } return( nmbintree_size(node->left) + nmbintree_size(node->right) + 1 ); } 2) A preorder / inorder / postorder traversal done in an iterative way (using stack / queue) + counting the nodes. What approach do you think is more 'memory failure proof' / performant ? Any other suggestions / tips ? NOTE: I am probably going to use this implementation in the future for small projects of mine. So I don't want to unexpectedly fail :).

    Read the article

< Previous Page | 631 632 633 634 635 636 637 638 639 640 641 642  | Next Page >