Search Results

Search found 9861 results on 395 pages for 'embedded systems'.

Page 232/395 | < Previous Page | 228 229 230 231 232 233 234 235 236 237 238 239 | Next Page >

Collaborative work (small team) - Best practices

- by LEM01

I'm currently working in a very small team of programmers (2-3) and I'm looking for advices/best practices on how to organise our work. We're all working on the same application using PHP. Today we're kind of all working on our way. Today situation: List item that have to be worked on by each dev 1/week. What has to be done is defined at a high functional level (ex: Build the search engine for this product..) Commit / merge our individual branches (git) every week before the next meeting No real dev rules, no code review No test written (aouutch) Problems faced: Code quality issue: discovering someone else code is sometime tough (inline, variable+function+class names, spaces, comments..) Changes in already existing classes (impact on someone else work) Responsibility of each dev unclear: after getting someone else code and discover something messy, should I make the change? Should he make the change? How to plan those things,... What I'm looking for: Basically I'm looking into structuring the way we develop things in order to avoid frustration and improve overall quality. How to define coding standards (naming convention, code rules...)? Do you you any validation script to make sure code is valid before committing? Do you think that defining an architect role in the team is needed? Someone that would actually define what has to be developed during the next phase. By defining interfaces or class descriptions that have to be written. (Does it make sense in such a small team?) Today we're losing time into understanding what others did or tried to do, we're also losing time in discussion like "you should have done it that way! Why is this class doing that and not that..? Shouldn't we have a embedded class rather that this set of data...". I'm looking into a work process, maybe with more defined responsibilities and process in order to improve our performance. If you have experience, advices, best practices or anything to share that we could benefit from it will be much appreciated! Thanks a lot for your time!

Read the article
Pixels - A cry for some insight

- by CarrotFile

I'm pretty new to web developing and I'd love some clarification. Although reading more than one book on the topic, I cannot seem to wrap my head around the pixel concept. I encounter problems with this issue when trying to use CSS and pixel units for design that fits different screen sizes. To my understanding a pixel is the most basic unit used by a monitor in order to compose an image on the screen. So if me resolution is 800 by 600, everything on my screen is rendered using those 800*600 basic building blocks. If I were to enlarge my screen resolution, 3 things would accrue: A. The basic image building block(the pixel) would shrink in size B. The pixels would move close together C. Well, more pixels would now be available All these combined lead to a sharper(depending on the viewing distance) and more detail enabling image. Well so far so good. Here is were I start getting lost: To my knowledge a pixel is not a physical, real object. Monitors are not embedded with a few thousand pixels. I am drawn to this conclusion because anyone can change his screen's resolution, making a pixel on his screen bigger or smaller, and adding or subtracting the amount of total pixels on screen. Adding to that, I have herd that different monitors have different pixel densities. For example Apple's retina monitors. Taking all of the above as my knowledge base, These are my questions: If a pixel has no real world constant size, what does comparing different pixel densities matter? Each screen company can define it's own pixel concept and declare the higher density. What does a bigger pixel density mean? Say we take two screens with the same physical dimensions, but with a different pixel density, am I to assert that the main difference would be the larger density screen being able to display a higher max resolution? Or am I to assert that given the same resolution on both monitors, the higher density one would display a sharper, smaller image? If a pixel is not a fixed size within one monitor, is it a fixed size between the same resolution on two different monitors? For example, would two different monitors, set to the same resolution, be comprised of same size, same quantity pixels? I'd love some help (:

Read the article
So Much Happening at Devoxx

- by Tori Wieldt

Devoxx, the premier Java conference in Europe, has been sold out for a while. The organizers (thanks Stephan and crew!) cap the attendance to make sure all attendees have a great experience, and that speaks volumes about their priorities. The speakers, hackathons, labs, and networking are all first class. The Oracle Technology Network will be there, and if you were smart/lucky enough to get a ticket, come find us and join the fun: IoT Hack Fest Build fun and creative Internet of Things (IoT) applications with Java Embedded, Raspberry Pi and Leap Motion on the University Days (Monday and Tuesday). Learn from top experts Yara & Vinicius Senger and Geert Bevin at two Raspberry Pi & Leap Motion hands-on labs and hacking sessions. Bring your computer. Training and equipment will be provided. Devoxx will also host an Internet of Things shop in the exhibition floor where attendees can purchase Arduino, Raspberry PI and Robot starter kits. Bring your IoT wish list! Video Interviews Yolande Poirier and I will be interviewing members of the Java Community in the back of the Expo hall on Wednesday and Thursday. Videos are posted on Parleys and YouTube/Java. We have a few slots left, so contact me (you can DM @Java) if you want to share your insights or cool new tip or trick with the rest of the developer community. (No commercials, no fluff. Keep it techie and keep it real.) Oracle Keynote Wednesday morning Mark Reinhold, Chief Java Platform Architect, and Brian Goetz, Java Language Architect will provide an update on Java 8 and beyond. Oracle Booth Drop by the Oracle booth to see old and new friends. We'll have Java in Action demos and the experts to explain them and answer your questions. We are raffling off Raspberry Pi's each day, so be sure to get your badged scanned. We'll have beer in the booth each evening. Look for @Java in her lab coat. See you at Devoxx!

Read the article
Why can't Ubuntu find an ext3 filesystem on my hard-drive?

- by urig

This question is related to this question: Not enough components to start the RAID array? I'm trying to retrieve data from a "Western Digital MyBook World Edition (white light)" NAS device. This is basically an embedded Linux box with a 1TB HDD in it formatted in ext3. It stopped booting one day for no apparent reason. I have extracted the HDD from the NAS device and installed it in a desktop machine running Ubuntu 10.10 in the hope of accessing the files on the drive. I have followed instructions in this forum post, intended to mount the drive through Terminal: http://mybookworld.wikidot.com/forum/t-90514/how-to-recover-data-from-wd-my-book-world-edition-nas-device#post-976452 I have identified the partition that I want to mount and recover files from as /dev/sd4 by running "fdisk -l" and getting this: Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0001cf00 Device Boot Start End Blocks Id System /dev/sdb1 5 248 1959930 fd Linux raid autodetect /dev/sdb2 249 280 257040 fd Linux raid autodetect /dev/sdb3 281 403 987997+ fd Linux raid autodetect /dev/sdb4 404 121601 973522935 fd Linux raid autodetect// When I try to mount using: "mount -t ext3 /dev/sdb4 /media/xyz" I get the following error: mount: wrong fs type, bad option, bad superblock on /dev/sdb4, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so And "dmesg | tail" shows me: [ 15.184757] [drm] Initialized nouveau 0.0.16 20090420 for 0000:01:00.0 on minor 0 [ 15.986859] [drm] nouveau 0000:01:00.0: Allocating FIFO number 1 [ 15.988379] [drm] nouveau 0000:01:00.0: nouveau_channel_alloc: initialised FIFO 1 [ 16.353379] EXT4-fs (sda5): re-mounted. Opts: errors=remount-ro,commit=0 [ 16.705944] tg3 0000:02:00.0: eth0: Link is up at 100 Mbps, full duplex [ 16.705951] tg3 0000:02:00.0: eth0: Flow control is off for TX and off for RX [ 16.706102] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 19.125673] EXT4-fs (sda5): re-mounted. Opts: errors=remount-ro,commit=0 [ 27.600012] eth0: no IPv6 routers present [ 373.478031] EXT3-fs (sdb4): error: can't find ext3 filesystem on dev sdb4. I guess that last line is the punch line :) Why can't it find the ext3 filesystem on my drive? What do I need to do to mount this partition and copy its contents? Does it have anything to do with the drive being part of a RAID Array (see question mentioned above)? Many thanks to any who can help.

Read the article
Why do some user agents have spam urls in them (and why are they always Opera/Presto User-Agents)?

- by Erx_VB.NExT.Coder

If you go to (say) the last 100 entries (visits) to the botsvsbrowsers.com website (exact link, feel free to take a look: http://www.botsvsbrowsers.com/recent/listings/index.html ), you'd notice that almost every User Agent that has the keywords "Opera" and "Presto" inside them, will almost certainly have a web link (URL/Web Address) inside it, and it won't just be a normal web address, but a HTML anchor tag/link to that address. Why is this so, I could not even find a single discussion about it on the internet, nowhere, I tried varying my search terms many times. If the user agent contains the words "Opera" and "Presto" it doesnt mean it will have this weblink, but it means there is about an 80% change that it will. A typical anchor tag/link inside a user agent will look like this: Mozilla/4.0 <a href="http://osis-uk.co.uk/disabled-equipment">disability equipment</a> (Windows NT 5.1; U; en) Presto/2.10.229 Version/11.60 If you check it out at the website, http://www.botsvsbrowsers.com/recent/listings/index.html you will notice that the back and forward arrows are in there unescaped format. This isn't just true for botsvsbrowsers, but several other user agent listing sites. I'm really confused and feel line I'm in a room full of 10,000 people and am the only one seeing this ghost :). If I'm doing statistical analysis, should I include or exclude this type of user agent from my listing (ie: are these just normal users who've set their user agents to attempt to drive some traffic to their sites as they browser the web), or is there something else going on? The fact that it is so consistent in terms of its format leads me to believe that it is an automated process (the setting or alteration of the user agent) so I cannot decide or understand the process by which this change is made (I know how to change a user agent), but unsure which program or facility is doing this, especially since it is exclusive to Opera (Presto) user agents that are beyond I think an 8 or 9 point something browser version. I've run some statistical tests, parsing entries from all over the place, writing custom programs, to get a better understanding of this. Keep in mind that I see normal URL's in user agents infrequently, they are just text such as +http://www.someSite.com appended to a user agent normally, especially if its a crawler or bot it provided its service URL, this is normal and isnt done with an embedded link (A HREF=) etc, so I'm not talking about "those".

Read the article
Play Framework Plugin for NetBeans IDE

- by Geertjan

The start of minimal support for the Play Framework in NetBeans IDE 7.3 Beta would constitute (1) recognizing Play projects, (2) an action to run a Play project, and (3) classpath support. Well, most of that I've created already, as can be seen, e.g., below you can see logical views in the Projects window for Play projects (i.e., I can open all the samples that come with the Play distribution). Right-clicking a Play project lets you run it and, if the embedded browser is selected in the Options window, you can see the result in the IDE. Make a change to your code and refresh the browser, which immediately shows you your changes: What needs to be done, among other things: A wizard for creating new Play projects, i.e., it would use the Play command line to create the application and then open it in the IDE. Integration of everything available on the Play command line. Maybe the logical view, i.e., what is shown in the Projects window, should be changed. Right now, only the folders "app" and "test" are shown there, with everything else accessible in the Files window, as can be seen in the screenshot above. More work on the classpath, i.e., I've hardcoded a few things just to get things to work correctly. Options window extension to register the Play executable, instead of the current hardcoded solution. Scala integrations, i.e., investigate if/how the NetBeans Scala plugin is helpful and, if not, create different/additional solutions. E.g., the HTML templates are partly in Scala, i.e., need to embed Scala support into HTML. Hyperlinking in the "routes" file, as well as special support for the "application.conf" file. Anyone interested, especially if you're a Play fan (a "playboy"?), in joining me in working on this NetBeans plugin? I'll be uploading the sources to a java.net repository soon. It will be here, once it has been made publicly accessible: http://java.net/projects/nbplay/sources/nbplay Kind of cool detail is that the NetBeans plugin is based on Maven, which means that you could use any Maven-supporting IDE to work on this plugin.

Read the article
ADF Mobile Released!!

- by Denis T

ADFmfAnnounce We are pleased to announce the general availability of the newest version of Oracle’s ADF Mobile framework. This new framework provides the much anticipated on-device capabilities that the latest mobile applications require. Feature Highlights Java - Oracle brings a Java VM embedded with each application so you can develop all your business logic in the platform neutral language you know and love! (Yes, even iOS!) JDBC - Since we give you Java, we also provide JDBC along with a SQLite driver and engine that also supports encryption out of the box. Multi-Platform - Truly develop your application only once and deploy to multiple platforms. iOS and Android platforms are supported for both phone and tablet. Flexible - You can decide how to implement the UI: (a) Use existing server-based UI framework like JSF. (b) Use your own favorite HTML5 framework like JQuery. (c) Use our declarative HTML5 component set provided with the framework. ADF Mobile XML or AMX for short, provides all the normal input and layout controls you expect and we also add charts/maps/gauges along with it to provide a very comprehensive UI controls. You can also mix and match any of the three for ultimate flexibility! Device Feature Access - You can get access to device features from either Java or JavaScript to invoke features like camera, GPS, email, SMS, contacts, etc. Secure - ADF Mobile provides integrated security that works with your server back-end as well. Whether you’re using remote URLs, local HTML or AMX, you can secure any/all of your features with a single consistent login page. Since we also give you SQLite encryption, we are assured that your data is safe. Rapid - Using the same development techniques that ADF developers are already used to, you can quickly create mobile applications without ever learning another language! Architecture ADF Mobile is a “hybrid” architecture that employs a natively built “container” on each platform that hosts a number of browser windows that are used to display the application content. We add the Java VM as a natively built library to the container for business logic. How To Get Started ADF Mobile is an extension to the recently released JDeveloper version 11.1.2.3.0. Simple get the latest JDeveloper from Oracle Technology Network and use the Check for Updates feature to get the ADF Mobile extension. Note: ADF Mobile does not require developers to learn any other languages or frameworks but to build/deploy to iOS, you must be on an Apple MacintoshTM and have Xcode installed. To build/deploy to Android™ you must have the Android SDK installed.

Read the article
Core i7-620M vs Core i5-540M

- by Shalan

Hi, I'm recommending a laptop to a colleague, and the specific laptop he has chosen has the above CPU chips as options. Both chips have 2-cores/4-threads. The i7-620M is a 2.66 GHz (4MB Cache) while the i5-540M is a 2.53 GHz (3MB Cache)....both Arrandale architecture. He is a .NET programmer working with SQL Server and Oracle, and occasionally uses Adobe Fireworks for web-related design elements. He also loves playing around in Adobe Premiere Pro, and does a lot of media/video work. Would you notice any significant performance difference between the 2? The laptop manufacturer claims that the battery life on both is the same irrespective of the chip used (although I find that hard to believe), but there is a major cost difference between them, with the Core i7-620M being the more expensive. According to http://ark.intel.com, the one thing that seems different (besides the obvious speeds/frequencies/etc) is a feature called "Embedded" - what is this exactly? You can see the quick comparison here - http://ark.intel.com/Compare.aspx?ids=43544,43560 I would sincerely appreciate any advice me on this. THANKS!

Read the article
Apache: Setting DocumentRoot to cgi directory results in downloading file instead of executing it.

- by fastmonkeywheels

I have a c-compiled CGI application that I need to execute from the DocumentRoot of my Apache server. The CGI file is called index.cgi and is located at /usr/lib/cgi-bin/index.cgi. I have the following Directory definition <Directory "/usr/lib/cgi-bin/"> Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch AllowOverride None Order allow,deny Allow from all DirectoryIndex index.cgi </Directory> I have the following VirtualHost setting: <VirtualHost *:80> ServerAdmin webmaster@localhost DocumentRoot /usr/lib/cgi-bin # ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/ ErrorLog /var/log/apache2/error.log LogLevel warn CustomLog /var/log/apache2/access.log combined </VirtualHost> If I go to 127.0.0.1 or 127.0.0.1/index.cgi I get prompted to download the index.cgi file, however if I enable the ScriptAlias in the vhost configuration block and go to 127.0.0.1/cgi-bin/index.cgi I see the output of my CGI application. I had originally solved this problem with mod_rewrite, however that worked on my test system the target (embedded) doesn't have that module available so I'm looking at another route (again).

Read the article
Windows 7: Touch gestures in IE not working without explorer.exe being run once

- by Michael

Details: Internet Explorer 9 and Windows 7 Professional, running on a HP TouchSmart (touch screen PC). It is going to be a kiosk PC (running a custom GUI for displaying websites). Scenario 1: When running Internet Explorer as a normal program in Windows 7, touch functions work perfectly. I can scroll the website by dragging it with my finger, I can pinch zoom and I can touch-and-hold right click. I now change the default shell in Windows to Internet Explorer (ie. IE starts instead of explorer.exe). Internet Explorer of course starts up when logging in. However, touch functions are reduced to basic clicking (no dragging, no pinch zooming, no touch-and-hold right click). Then I manually start explorer.exe, and the touch functions work again! And here is the weird part: When I kill explorer.exe, the touch functions keeps working - even if I close IE and start a new instance. Scenario 2: The exact same, but instead of changing the default shell to Internet Explorer, I change it to my own program, which uses an embedded Internet Explorer ("WebBrowser"). Same thing happens. What I've tried: Autorun programs: When explorer.exe launches, it launches all the autorun programs. There are no relevant programs being run by explorer, but just in case, I have manually started all the autorun programs, so that it is identical (but without explorer.exe) to a normal login. It still does not work (until I launch explorer.exe). Specifically TabTip.exe, TabTip32.exe and wisptis.exe are all running. All services are also started. To sum it up Running explorer.exe once changes something in the touch capabilities of Internet Explorer. It doesn't matter if explorer.exe is running - as long as it has been run once. Does anyone know what causes this behavior? Or how I can circumvent it neatly? Thanks!

Read the article
Mediawiki extension error

- by vinylguitar

I'm running the latest version of mediawiki using MoWeS Portable II from my desktop. I just installed this extension on the wiki http://www.mediawiki.org/wiki/Extension:MsUpload It adds an option to upload files (to be embedded in an article) to the edit screen of an article. After installing it when I try and edit an article I get the following error: Fatal error: Call to undefined method OutputPage::addModules() in C:\Users\User\Desktop\knowledge mapedia 10 25 13 copy\mowes_portable\www\mediawiki\extensions\MsUpload\msupload.php on line 65 Also here is what I posted in the localsettings.php file (I put it in at the end of localsettings.php if it makes a difference): Start --------------------------------------- MsUpload $wgMSU_ShowAutoKat = false; #autocategorisation $wgMSU_CheckedAutoKat = false; #checkbox for autocategorisation checked $wgMSU_debug = false; #debug mode $wgMSU_ImgParams = '400px'; #default max-size for inserted image $wgMSU_UseDragDrop = true; #show drag&drop area require_once "$IP/extensions/MsUpload/msupload.php"; End --------------------------------------- MsUpload require_once "$IP/extensions/msupload/msupload.php"; At line 65 in the localsettings.php file there is the following: line 64 ## Database settings line 65 $wgDBtype = "mysql"; line 66 $wgDBserver = "localhost"; line 67 $wgDBname = "mediawiki"; line 68 $wgDBuser = "root"; line 69 $wgDBpassword = ""; Any idea what I'm doing wrong?

Read the article
Embed album art in OGG through command line in linux

- by teratomata

I want to convert my music from flac to ogg, and currently oggenc does that perfectly except for album art. Metaflac can output album art, however there seems to be no command line tool to embed album art into ogg. MP3Tag and EasyTag are able to do it, and there is a specification for it here which calls for the image to be base64 encoded. However so far I have been unsuccessful in being able to take an image file, converting it to base64 and embedding it into an ogg file. If I take a base64 encoded image from an ogg file that already has the image embedded, I can easily embed it into another image using vorbiscomment: vorbiscomment -l withimage.ogg > textfile vorbiscomment -c textfile noimage.ogg My problem is taking something like a jpeg and converting it to base64. Currently I have: base64 --wrap=0 ./image.jpg Which gives me the image file converted to base64, using vorbiscomment and following the tagging rules, I can embed that into an ogg file like so: echo "METADATA_BLOCK_PICTURE=$(base64 --wrap=0 ./image.jpg)" > ./folder.txt vorbiscomment -c textfile noimage.ogg However this gives me an ogg whose image does not work properly. I noticed when comparing the base64 strings that all properly embedding pictures have a header line but all the base64 strings I generate are lacking this header. Further analysis of the header: od -c header.txt 0000000 \0 \0 \0 003 \0 \0 \0 \n i m a g e / j p 0000020 e g \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0000040 \0 \0 \0 \0 \0 \0 \0 \0 035 332 0000052 Which follows the spec given above. Notice 003 corresponds to front cover and image/jpeg is the mime type. So finally, my question is, how can I base64 encode a file and generate this header along with it for embedding into an ogg file?

Read the article
Is it possible to shrink the size of an HP Smart Array logical drive?

- by ewwhite

I know extension is quite possible using the hpacucli utility, but is there an easy way to reduce the size of an existing logical drive (not array)? The controller is a P410i in a ProLiant DL360 G6 server. I'd like to reduce logicaldrive 1 from 72GB to 40GB. => ctrl all show config detail Smart Array P410i in Slot 0 (Embedded) Bus Interface: PCI Slot: 0 Serial Number: 5001438006FD9A50 Cache Serial Number: PAAVP9VYFB8Y RAID 6 (ADG) Status: Disabled Controller Status: OK Chassis Slot: Hardware Revision: Rev C Firmware Version: 3.66 Rebuild Priority: Medium Expand Priority: Medium Surface Scan Delay: 3 secs Surface Scan Mode: Idle Queue Depth: Automatic Monitor and Performance Delay: 60 min Elevator Sort: Enabled Degraded Performance Optimization: Disabled Inconsistency Repair Policy: Disabled Wait for Cache Room: Disabled Surface Analysis Inconsistency Notification: Disabled Post Prompt Timeout: 15 secs Cache Board Present: True Cache Status: OK Accelerator Ratio: 25% Read / 75% Write Drive Write Cache: Enabled Total Cache Size: 512 MB No-Battery Write Cache: Disabled Cache Backup Power Source: Batteries Battery/Capacitor Count: 1 Battery/Capacitor Status: OK SATA NCQ Supported: True Array: A Interface Type: SAS Unused Space: 412476 MB Status: OK Logical Drive: 1 Size: 72.0 GB Fault Tolerance: RAID 1+0 Heads: 255 Sectors Per Track: 32 Cylinders: 18504 Strip Size: 256 KB Status: OK Array Accelerator: Enabled Unique Identifier: 600508B1001C132E4BBDFAA6DAD13DA3 Disk Name: /dev/cciss/c0d0 Mount Points: /boot 196 MB, / 12.0 GB, /usr 8.0 GB, /var 4.0 GB, /tmp 2.0 GB OS Status: LOCKED Logical Drive Label: AE438D6A5001438006FD9A50BE0A Mirror Group 0: physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 146 GB, OK) physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 146 GB, OK) Mirror Group 1: physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 146 GB, OK) physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 146 GB, OK) SEP (Vendor ID PMCSIERA, Model SRC 8x6G) 250 Device Number: 250 Firmware Version: RevC WWID: 5001438006FD9A5F Vendor ID: PMCSIERA Model: SRC 8x6G

Read the article
On a dual-GPU laptop, is using the discrete GPU ever more power efficient?

- by Mahmoud Al-Qudsi

Given a laptop with a dual integrated/discrete GPU configuration, is it ever more power efficient to use the discrete GPU instead of the integrated? Obviously when writing an email or working on a spreadsheet, the integrated GPU will always use less power. But let's say you're doing something graphics-medium but not graphics-intensive/heavy - is there a point where it actually makes sense to fire up the discrete GPU, not for performance but for power-saving reasons? Off the top of my head, I can think of a scenario where the external GPU supports hardware decoding of a particular video codec - I'd imagine there is a "price point" where using the GPU saves more energy than decoding that fully in software would. But I think most GPUs, integrated or discrete, pretty much decode just the plain-Jane h264. But maybe there is something more complicated, perhaps if you're doing something like desktop/windowing animations or a flash animation on a website (not an embedded flash video) - maybe the discrete GPU will use enough less power to make up for switching to it? I guess this question can be summed up as to whether or not you can say beyond doubt that if you don't care for performance on a laptop with two GPUs, always use the integrated GPU for maximum battery life.

Read the article
Improving Performance of RDP Over LAN

- by Jared Brown

Architecture: A deployment of 6 new HP thin clients (Windows XP Embedded) with TCP/IP access to several new HP servers (Windows 2003 Server). Each thin client is connected over fiber optic to a Gigabit Cisco switch, which the servers are connected to. There are 10/100 Ethernet to fiber converter boxes on each end of the fiber cables. Problem: Noticeable lag over RDP while using the Unigraphics CAD package. 3D models take .5 to 1 second to respond to mouse actions. Other Details: Network throughput on each thin client's RDP session is 7288 kbps. RDP connection settings - color setting: 15k, all themes, etc. turned off. Local and remote system performance stats are well within norms (CPU, Memory, and Network). Question: Are there newer versions of terminal services or RDP I can use on my existing OSes? Are there compression algorithms, etc. that are well suited for a high-bandwidth LAN? Are there valid alternatives that will yield higher performance (i.e. UltraVNC with drivers installed)? Are there TCP/IP tuning options I can exploit?

Read the article
Windows 2003 DNS or IIS6 Problem?

- by Mario

Weird DNS problem... We have an intranet located internally on a windows 2003 / iis6 server - DNS handled internally on another windows 2003 server. The intranet, amongst other functions, hosts a ecommerce store I wrote that sells nike apparel embroidered with our company logo. Up until recently, it would send an email to payroll and the cost would be deducted from the employees paycheck. lets say this store is located at http://mydomain.com (only available internally) Now, we've been told by the accountants that we can no longer auto deduct from payroll and the employee needs to pay with a credit card or cash. So i went to thawte.com and ordered an SSL cert to be on the safe side (even though the CC gateway is secure) and they told me i need to drop the .com from the domain name Not wanting to mess with a system thats perfectly functional, i created another DNS entry that just points to mydomain (no .com) and left the old one in there. so they would go to http://mydomain On my Mac (OS X 10.6) i can hit either one just fine On Windows XP / Windows XP Embedded or Windows 7 (the vast majority of the pc's on our network) http://mydomain - returns nothing http://mydomain.com still works https://mydomain.com works but says the cert is invalid (as it should, it was issued to mydomain - not mydomain.com) my question is: why does it work on my Mac and not on a Windows PC (i get dhcp and dns just like any other pc on the network) and will removing the .com one from the DNS server resolve this? I've done all the usual attempts - ipconfig /flushdns, ipconfig /renew and release even going so far as to stop and restart DNS client on my Windows 7 box; rebooting and shutting down - adding a regedit entry something along the lines of SecureResponses and rebooting nothing works... I think its the .com and the not conflicting in DNS but i'm not sure - and why not on OS X We're closed on sunday and i'm going to remote in and see what happens if i remove the .com from DNS but any other ideas? -Mario

Read the article
administrator user unable to login, suspicious user accounts "sky$", "admin$"

- by mks

I have a Windows 2008 R2 Standard (64 bit) running in a virtual machine. Suddenly from yesterday onwards I am not able to login as administrator. Nobody changed the password. Both in the console as well as using remote desktop I am unable to login. Whenever I login as Administrator I am getting this error: "The user name or password is incorrect" Nothing has changed in the machine and I have logged in the past successfully both through console and via remote desktop several time on the same machine. One strange behaviour I noticed is, I am seeing some additional user accounts if I try to login as other user. The suspicious user account are: sky$ admin$ SUPPORT_388945a0 Is it created by some malware/virus? Or is it some windows hidden account? Microsoft site says that SUPPORT_388945a0 is: The Support_388945a0 account enables Help and Support Service interoperability with signed scripts. This account is primarily used to control access to signed scripts that are accessible from within Help and Support Services. Administrators can use this account to delegate the ability for an ordinary user, who does not have administrative access over a computer, to run signed scripts from links embedded within Help and Support Services. These scripts can be programmed to use the Support_388945a0 account credentials instead of the user’s credentials to perform specific administrative operations on the local computer that otherwise would not be supported by the ordinary user’s account. When the delegated user clicks on a link in Help and Support Services, the script executes under the security context of the Support_388945a0 account. This account has limited access to the computer and is disabled by default. However I am not sure from where this "admin$" and "sky$" came. Anyone has similar experience?

Read the article
Network corruption - corrupt downloads, corrupt streams, etc.

- by rfrankel

I've been having some problems with my home LAN. Downloaded executables won't run, my remote desktop sessions keep getting interrupted due to encryption errors, flash video streams show visible corruption (both Hulu and YouTube), and I've had a couple downloads for which the md5 hashes don't match. The problem has even occurred with a couple images embedded in webpages, though that's rare enough (presumably because images are relatively smaller files). I've had this problem across two Windows machines and a Mac, so it's neither machine-specific nor at the app or OS level. Comcast claims it's nothing to do with them, and my Linksys/Cisco RV016 router is out of warranty, so I have no access to official support. When I log into my router, it shows no error packets or dropped packets received. I plugged a laptop directly into the router and was able to download a 5.5 MB file and verify its MD5 hash, which is not proof that the problem is downstream of the router, but makes it seem quite likely, since I failed to download the same file several times from two desktops (one Mac, one Windows). Could this be a wiring problem? If so, is there any way clever/elegant to determine which wiring is faulty with just software? If I can avoid tracing all the wires throughout my entire house it would make my life quite a bit easier.

Read the article
How John Got 15x Improvement Without Really Trying

- by rchrd

The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here. How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

Read the article
What causes "A disk read error occurred, Press Ctrl + Alt + Del to restart"?

- by Mehrdad

I have a virtual machine containing Windows XP SP3. When I resized the VHD file (and the embedded partition), and tried booting, I got: A disk read error occurred Press Ctrl + Alt + Del to restart Some notes: FixBoot and FixMBR don't help. ChkDsk doesn't help. The partition is indeed active. The partition starts at sector 63 (it also did so before the problem) of cylinder 1, head 1, and is marked as type 0x07 (NTFS) My host OS reads the VHD and the partition completely fine I'm interested in knowing the cause rather than the fix. So "re-format the disk", "reinstall Windows", etc. aren't valid solutions. It's a virtual machine after all... I have nothing to lose, so I don't care about fixing it. I just want to know what's causing this problem, in case I run into it again on a physical machine (which I have done before). More info: The layout of the original, dynamic VHD (which works correctly): +-----------------------------------------------------------------------------+ ¦ Disk: 3 MBR/GPT: MBR ¦ ¦ Size: 127.00GB CHS: 16578 255 63 ¦ ¦ Sectors: 266338304 Disk Signature: 0xEE3EEE3E ¦ ¦ Partitions: 1 Partition Order: 1 ¦ ¦ Media Type: Fixed Interface: SCSI ¦ ¦ Description: Msft Virtual Disk ¦ +-----------------------------------------------------------------------------¦ ¦Pos Idx Type/Name Size Boot Hide Start Sector Total Sectors DL Vol Label ¦ +--- --- --------- ---- ---- ---- -------------- -------------- -- -----------¦ ¦ 1 1 07-NTFS 1.5G Yes No 63 3,148,677 F: <None> ¦ +-----------------------------------------------------------------------------+ The layout of the resized, fixed-size VHD (which doesn't work): +-----------------------------------------------------------------------------+ ¦ Disk: 3 MBR/GPT: MBR ¦ ¦ Size: 1.50GB CHS: 196 255 63 ¦ ¦ Sectors: 3149824 Disk Signature: 0xEE3EEE3E ¦ ¦ Partitions: 1 Partition Order: 1 ¦ ¦ Media Type: Fixed Interface: SCSI ¦ ¦ Description: Msft Virtual Disk ¦ +-----------------------------------------------------------------------------¦ ¦Pos Idx Type/Name Size Boot Hide Start Sector Total Sectors DL Vol Label ¦ +--- --- --------- ---- ---- ---- -------------- -------------- -- -----------¦ ¦ 1 1 07-NTFS 1.5G Yes No 63 3,148,677 F: <None> ¦ +-----------------------------------------------------------------------------+

Read the article
What do I need in order to extract and combine text files from multiple ZIP files, via command line?

- by Iszi

I've got an interesting scripting challenge in front of me. I'm fairly certain there's a way to do it, but I feel like I'm probably lacking some particular tools and/or functional knowledge. There's some fifty-plus ZIP files that each contain, among other things, text files that need to be merged with one another. The structure is something like this: C:\Reports\FirstJob-1.zip |-MyName |-FirstJob |-1 |-[Some other folders] |-TXTReports |-English |-[Some other files] |-Report.txt C:\Reports\FirstJob-2.zip |-MyName |-FirstJob |-1 |-[Some other folders] |-TXTReports |-English |-[Some other files] |-Report.txt C:\Reports\SecondJob-1.zip |-MyName |-SecondJob |-1 |-[Some other folders] |-TXTReports |-English |-[Some other files] |-Report.txt If I had all the Report.txt files in one regular folder, and uniquely named, I could probably just write a FOR statement that targets *.txt and runs something like type filename.txt >> Consolidated.txt on each. However, these all have the same file name and are embedded deep within separate ZIP files. The potentially useful tools I currently have at my disposal are Windows XP Professional SP3, PowerShell, and WinZip. I'd rather not download or install anything else, but I do understand that third-party tools (or additional tools from Microsoft or WinZip) may be necessary. Whatever tools I use should run natively in Windows. I really don't want to have to mess with Cygwin or other emulators on this system. At the very least, I need a tool that will allow me to analyze and manipulate ZIP files from the command line. Also, are there any other particular complications to this that I've not yet thought of?

Read the article
Red Hat 5.3 on HP Proliant DL380 G5 and failed drive on RAID controller

- by thinkdreams

I have a development ERP server here in my office that I assist with support on, and originally the DBA requested a single drive setup for some of the drives on the server. Thus the hardware RAID controller (an HP embedded controller) looks like: c0d0 (2 drive) RAID-1 c0d1 (2 drive) RAID-1 c0d2 (1 drive) No RAID <-- Failed c0d3 (1 drive) No RAID c0d4 (1 drive) No RAID c0d5 (1 drive) No RAID c0d2 has failed. I replaced the drive immediately with a spare using the hot-swap, but the c0d2 continues to mark itself as failed, even when I umount the partition. I'm loathe to reboot the server since I'm concerned about the server coming back up in rescue mode but I'm afraid that's the only way to get the system to re-read the drive. I assumed there was some sort of auto-detection routine for this, but I haven't been able to figure out the proper procedure. I have installed the HP ACU CLI utilties, so I can see the hardware RAID setup. I'd really like to find out what the proper procedure should have been, where I went wrong, and how to correct it now. Obviously this goes without saying I should NOT have listened to the DBA and set the drives up as RAID-1 throughout as was my first instinct. He wasn't worried about data loss, but it sure would have been easier to replace the failed drive. :)

Read the article
Firefox and Chrome keeps forcing HTTPS on Rails app using nginx/Passenger

- by Steve

I've got a really weird problem here where every time I try to browse my Rails app in non-SSL mode Chrome (v16) and Firefox (v7) keeps forcing my website to be served in HTTPS. My Rails application is deployed on a Ubuntu VPS using Capistrano, nginx, Passenger and a wildcard SSL certificate. I have set these parameters for port 80 in the nginx.conf: passenger_set_cgi_param HTTP_X_FORWARDED_PROTO http; passenger_set_cgi_param HTTPS off; The long version of my nginx.conf can be found here: https://gist.github.com/2eab42666c609b015bff The ssl-redirect.include file contains: rewrite ^/sign_up https://$host$request_uri? permanent ; rewrite ^/login https://$host$request_uri? permanent ; rewrite ^/settings/password https://$host$request_uri? permanent ; It is to make sure those three pages use HTTPS when coming from non-SSL request. My production.rb file contains this line: # Enable HTTP and HTTPS in parallel config.middleware.insert_before Rack::Lock, Rack::SSL, :exclude => proc { |env| env['HTTPS'] != 'on' } I have tried redirecting to HTTP via nginx rewrites, Ruby on Rails redirects and also used Rails view url using HTTP protocol. My application.rb file contains this methods used in a before_filter hook: def force_http if Rails.env.production? if request.ssl? redirect_to :protocol => 'http', :status => :moved_permanently end end end Every time I try to redirect to HTTP non-SSL the browser attempts to redirect it back to HTTPS causing an infinite redirect loop. Safari, however, works just fine. Even when I've disabled serving SSL in nginx the browsers still try to connect to the site using HTTPS. I should also mention that when I pushed my app on to Heroku, the Rails redirect work just fine for all browsers. The reason why I want to use non-SSL is that my homepage contains non-secure dynamic embedded objects and a non-secure CDN and I want to prevent security warnings. I don't know what is causing the browser to keep forcing HTTPS requests.

Read the article
Linux: Force fsck of a read-only mounted filesystem?

- by Timothy Miller

I'm developing for a headless embedded appliance, running CentOS 6.2. The user can connect a keyboard, but not a monitor, and a serial console would require opening the case, something we don't want the user to have to do. This all pretty much obviates the possibility of using a recovery USB drive to boot from, unless all it does is blindly reimage the harddrive. I would like to provide some recovery facilities, and I have written a tool that comes up on /dev/tty1 in place of getty to provide these functions. One such function is fsck. I have found out how to remount the root and other file systems read-only. Now that they are read-only, it should be safe to fsck them and then reboot. Unfortunately, fsck complains to me that the filesystems are mounted and refuses to do anything. How can I force fsck to run on a read-only mounted partition? Based on my research, this is going to have to be something obscure. "-f" just means to force repair of a clean (but unmounted) partition. I need to repair a clean or unclean mounted partition. From what I read, this is something "only experts" should do, but no one has bothered to explain how the experts do it. I'm hoping someone can reveal this to me. BTW, I've noticed that e2fsck 1.42.4 on Gentoo will let you fsck a mounted partition, even mounted read-write, but it seems only to do so if fsck is run from a terminal, so it can ask the user if they're sure they want to do something so dangerous. I'm not sure if the CentOS version does the same thing, but it appears that fsck CAN repair a mounted partition, but it flatly refuses to when not run from a terminal. One last-resort option is for me to compile my own hacked fsck. But I'm afraid I'll mess it up in some unexpected way. Thanks! Note: Originally posted here.

Read the article
redirect from mysite.com to www.mysite.com

- by jml

hi there, i know that this has been answered many many times, so if someone wants to point me to another thread that answers my question specifically, that is fine... for right now, my searches aren't yielding many results. so i have a website like mysite.com that has a flash swf embedded in it and i go to www.mysite.com ... all of the sudden, things don't work properly. i would like to get to the bottom of this, because it's not like the page just "doesn't load" at all; it loads and i can only do certain things; as if certain functionality is disabled (might be url requests for specific urls etc). do i need to manage this in my control panel? i wouldn't assume so, because the site loads; just has a crippled functionality from within the swf. i was thinking it might have more to do with my crossdomain.xml file; could this be the case? thanks for any tips or suggestions.

Read the article

< Previous Page | 228 229 230 231 232 233 234 235 236 237 238 239 | Next Page >