darryl lawrence - Developer IT

Programmez un jeu pour téléphone portable avec MIDlet Pascal, par Darryl Kpizingui

Darryl Kpizingui signe, avec ce tutoriel, le troisième volet de la série d'articles consacrés au compilateur MIDlet Pascal, qui permet de créer des programmes pour téléphones portables. Il nous guide pas à pas dans la réalisation d'un logiciel complet, un jeu de Mario Sokoban. Vous avez à présent toutes les cartes en main pour vous lancer dans la réalisation de vos propres logiciels pour téléphones portables !

Read the article

Infrastructure and Platform As A Service in Private Cloud at Lawrence Livermore National Laboratory

- by Anand Akela

Scientists at the National Ignition Facility (NIF)— the world’s largest laser, at the Lawrence Livermore National Laboratory (LLNL)— need research environment that requires re-creating the physical environment and conditions that exist inside the sun. They have built private cloud infrastructure using Oracle VM and Oracle Enterprise Manager 12c to provision such an environment for research. Tim Frazier of LLNL joined the "Managing Your Private Cloud With Oracle Enterprise Manager' session at Oracle Open World 2012 and discussed how the latest features in Oracle VM and Oracle Enterprise Manager 12c enables them to accelerate application provisioning in their private cloud. He also talked about how to increase service delivery agility, improve standardized roll outs, and do proactive management to gain total control of the private cloud environment. He also presented at the "Scene and Be Heard Theater" at Oracle OpenWorld 2012 and shared a lot of good information about his project and what they are doing in their private cloud environment. Learn more by looking at Tim's presentation .

Read the article

Utiliser le format d'échange de données JSON sous Free Pascal avec la librairie fcl-json, par Darryl

JSON est un format d'échange de données tout comme le XML. Il présente l'avantage d'être très léger comparé à ce dernier, ce qui fait qu'il est surtout présent dans les applications web. Cet article rédigé par Darryl Kpizingui vous présente les bases nécessaires pour comprendre JSON et aussi pour apprendre à utiliser la librairie fcl-json de Free Pascal.

Read the article

dell vostro 1000 broadcom wireless connection

- by lorrenuy

I have a problem with the hardware broadcom wifi. I press the hotkey fn+f2 to activate the hardware and this will not work. I'll look at the drivers but it appears to be installed. How can I solve this problem? Ubuntu is all new to me so if possible, give me a clear explanation. Now do I connect the lan cable. I use the Ubuntu 11.10 lawrence@lawrence-Vostro-1000:~$ sudo lshw -class network [sudo] password for lawrence: PCI (sysfs) *-network description: Network controller product: BCM4311 802.11b/g WLAN vendor: Broadcom Corporation physical id: 0 bus info: pci@0000:05:00.0 version: 01 width: 32 bits clock: 33MHz capabilities: pm msi pciexpress bus_master cap_list configuration: driver=b43-pci-bridge latency=0 resources: irq:18 memory:c0200000-c0203fff *-network description: Ethernet interface product: BCM4401-B0 100Base-TX vendor: Broadcom Corporation physical id: 0 bus info: pci@0000:08:00.0 logical name: eth1 version: 02 serial: 00:1c:23:a2:b9:a9 size: 100Mbit/s capacity: 100Mbit/s width: 32 bits clock: 33MHz capabilities: pm bus_master cap_list ethernet physical mii 10bt 10bt-fd 100bt 100bt-fd autonegotiation configuration: autonegotiation=on broadcast=yes driver=b44 driverversion=2.0 duplex=full ip=192.168.1.18 latency=64 link=yes multicast=yes port=twisted pair speed=100Mbit/s resources: irq:21 memory:c0300000-c0301fff lawrence@lawrence-Vostro-1000:~$ lawrence@lawrence-Vostro-1000:~$ rfkill list all 0: dell-wifi: Wireless LAN Soft blocked: yes Hard blocked: yes

Read the article

java.net.SocketException: Connection reset

- by Darryl

I am getting the following error trying to read from a socket. I'm doing a readInt() on that InputStream, and I am getting this error. Perusing the documentation this suggests that the client part of the connection closed the connection. In this scenario, I am the server. I have access to the client log files and it is not closing the connection, and in fact its log files suggest I am closing the connection. So does anybody have an idea why this is happening? What else to check for? Does this arise when there are local resources that are perhaps reaching thresholds? Many thanks, Darryl

Read the article

On which crowdsourced design site have you the best experience? (ie, crowdspring, mycroburst, etc)

- by Darryl Hein

I wasn't sure which site to ask this on (as Graphic Design hasn't reached beta yet), so I thought I would try here. I'm looking to have a couple logos and website designs done. I've had some great local designers, but each one has moved or gone else where so I keep having to look for new designers. My thought and realization in the last couple days is to go to a crowdsourced design site like crowdspring.com or mycroburst.com. Both of these sites look good, but I'm wondering what else is out there? Are there better ones and how have your experiences been them?

Read the article

UBUNTU 12.04 Networking

- by Fr. Lawrence winslow

I have installed UBUNTU 12.04 on my Athlon 64-bit Desktop (2GB Memory) in a dual-boot with Win XP. The machine is hardwired to my D-Link N router which uses a DSL Modem to access the Internet. My Home Network was established in the router via the XP machine. When I boot UBUNTU I am able to access the Internet via this set-up but I can neither access the home network nor see the three other laptop computers (wireless access) on the network. These others are running Win Vista, Win 7, and UBUNTU 10.04 (this latter accesses the home network normally). All are visible and accessible when XP is running. I would appreciate thoughts as I am seriously thinking about changing all the computers to UBUNTU but do need to be able to operate the network.

Read the article

Atmospheric Scattering

- by Lawrence Kok

I'm trying to implement atmospheric scattering based on Sean O`Neil algorithm that was published in GPU Gems 2. But I have some trouble getting the shader to work. My latest attempts resulted in: http://img253.imageshack.us/g/scattering01.png/ I've downloaded sample code of O`Neil from: http://http.download.nvidia.com/developer/GPU_Gems_2/CD/Index.html. Made minor adjustments to the shader 'SkyFromAtmosphere' that would allow it to run in AMD RenderMonkey. In the images it is see-able a form of banding occurs, getting an blueish tone. However it is only applied to one half of the sphere, the other half is completely black. Also the banding appears to occur at Zenith instead of Horizon, and for a reason I managed to get pac-man shape. I would appreciate it if somebody could show me what I'm doing wrong. Vertex Shader: uniform mat4 matView; uniform vec4 view_position; uniform vec3 v3LightPos; const int nSamples = 3; const float fSamples = 3.0; const vec3 Wavelength = vec3(0.650,0.570,0.475); const vec3 v3InvWavelength = 1.0f / vec3( Wavelength.x * Wavelength.x * Wavelength.x * Wavelength.x, Wavelength.y * Wavelength.y * Wavelength.y * Wavelength.y, Wavelength.z * Wavelength.z * Wavelength.z * Wavelength.z); const float fInnerRadius = 10; const float fOuterRadius = fInnerRadius * 1.025; const float fInnerRadius2 = fInnerRadius * fInnerRadius; const float fOuterRadius2 = fOuterRadius * fOuterRadius; const float fScale = 1.0 / (fOuterRadius - fInnerRadius); const float fScaleDepth = 0.25; const float fScaleOverScaleDepth = fScale / fScaleDepth; const vec3 v3CameraPos = vec3(0.0, fInnerRadius * 1.015, 0.0); const float fCameraHeight = length(v3CameraPos); const float fCameraHeight2 = fCameraHeight * fCameraHeight; const float fm_ESun = 150.0; const float fm_Kr = 0.0025; const float fm_Km = 0.0010; const float fKrESun = fm_Kr * fm_ESun; const float fKmESun = fm_Km * fm_ESun; const float fKr4PI = fm_Kr * 4 * 3.141592653; const float fKm4PI = fm_Km * 4 * 3.141592653; varying vec3 v3Direction; varying vec4 c0, c1; float scale(float fCos) { float x = 1.0 - fCos; return fScaleDepth * exp(-0.00287 + x*(0.459 + x*(3.83 + x*(-6.80 + x*5.25)))); } void main( void ) { // Get the ray from the camera to the vertex, and its length (which is the far point of the ray passing through the atmosphere) vec3 v3FrontColor = vec3(0.0, 0.0, 0.0); vec3 v3Pos = normalize(gl_Vertex.xyz) * fOuterRadius; vec3 v3Ray = v3CameraPos - v3Pos; float fFar = length(v3Ray); v3Ray = normalize(v3Ray); // Calculate the ray's starting position, then calculate its scattering offset vec3 v3Start = v3CameraPos; float fHeight = length(v3Start); float fDepth = exp(fScaleOverScaleDepth * (fInnerRadius - fCameraHeight)); float fStartAngle = dot(v3Ray, v3Start) / fHeight; float fStartOffset = fDepth*scale(fStartAngle); // Initialize the scattering loop variables float fSampleLength = fFar / fSamples; float fScaledLength = fSampleLength * fScale; vec3 v3SampleRay = v3Ray * fSampleLength; vec3 v3SamplePoint = v3Start + v3SampleRay * 0.5; // Now loop through the sample rays for(int i=0; i<nSamples; i++) { float fHeight = length(v3SamplePoint); float fDepth = exp(fScaleOverScaleDepth * (fInnerRadius - fHeight)); float fLightAngle = dot(normalize(v3LightPos), v3SamplePoint) / fHeight; float fCameraAngle = dot(normalize(v3Ray), v3SamplePoint) / fHeight; float fScatter = (-fStartOffset + fDepth*( scale(fLightAngle) - scale(fCameraAngle)))/* 0.25f*/; vec3 v3Attenuate = exp(-fScatter * (v3InvWavelength * fKr4PI + fKm4PI)); v3FrontColor += v3Attenuate * (fDepth * fScaledLength); v3SamplePoint += v3SampleRay; } // Finally, scale the Mie and Rayleigh colors and set up the varying variables for the pixel shader vec4 newPos = vec4( (gl_Vertex.xyz + view_position.xyz), 1.0); gl_Position = gl_ModelViewProjectionMatrix * vec4(newPos.xyz, 1.0); gl_Position.z = gl_Position.w * 0.99999; c1 = vec4(v3FrontColor * fKmESun, 1.0); c0 = vec4(v3FrontColor * (v3InvWavelength * fKrESun), 1.0); v3Direction = v3CameraPos - v3Pos; } Fragment Shader: uniform vec3 v3LightPos; varying vec3 v3Direction; varying vec4 c0; varying vec4 c1; const float g =-0.90f; const float g2 = g * g; const float Exposure =2; void main(void){ float fCos = dot(normalize(v3LightPos), v3Direction) / length(v3Direction); float fMiePhase = 1.5 * ((1.0 - g2) / (2.0 + g2)) * (1.0 + fCos*fCos) / pow(1.0 + g2 - 2.0*g*fCos, 1.5); gl_FragColor = c0 + fMiePhase * c1; gl_FragColor.a = 1.0; }

Read the article

Disk failing on dell mini, are there diagnostic tools?

- by John Lawrence Aspden

Hi, my dell mini 10v running Ubuntu 10.10 runs ok for hours, but then will suddenly slow down drastically. Switching to a console shows lots of error messages, which also get into /var/log/syslog. These errors happen every couple of seconds. I'm figuring that the disk is failing, but is there anyway to be sure, and can laptop disks be replaced easily? Mar 20 11:08:12 dell-mini kernel: [ 2476.378774] ata1.00: status: { DRDY ERR } Mar 20 11:08:12 dell-mini kernel: [ 2476.378785] ata1.00: error: { UNC } Mar 20 11:08:12 dell-mini kernel: [ 2476.449841] ata1.00: configured for UDMA/133 Mar 20 11:08:12 dell-mini kernel: [ 2476.449887] ata1: EH complete Mar 20 11:08:14 dell-mini kernel: [ 2478.777754] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Mar 20 11:08:14 dell-mini kernel: [ 2478.777976] ata1.00: BMDMA stat 0x24 Mar 20 11:08:14 dell-mini kernel: [ 2478.778059] ata1.00: failed command: READ DMA Mar 20 11:08:14 dell-mini kernel: [ 2478.778162] ata1.00: cmd c8/00:08:43:3b:97/00:00:00:00:00/e0 tag 0 dma 4096 in Mar 20 11:08:14 dell-mini kernel: [ 2478.778166] res 51/40:00:47:3b:97/00:00:00:00:00/00 Emask 0x9 (media error)

Read the article

How to fix the Dell Mini 10v (1010) touchpad with Ubuntu 12.04.1

- by John Lawrence Aspden

I've just installed 12.04.1 from scratch on my Mini 10v, and the touchpad is behaving a bit oddly. It's got buttons integrated into the touchpad, and it looks as though a touch on a button is also being registered as a touch on the touchpad, causing all sorts of problems with clicking and dragging. Is there a fix for this? I seem to remember on very old versions of ubuntu having to disable the lower part of the touchpad, but I don't think it's been an issue for some time.

Read the article

Versioning millions of files with distributed SCM

- by C. Lawrence Wenham

I'm looking into the feasibility of using off-the-shelf distributed SCMs such as Git or Mercurial to manage millions of XML files. Each file would be a commercial transaction, such as a purchase order, that would be updated perhaps 10 times during the lifecycle of the transaction until it is "done" and changes no more. And by "manage", I mean that the SCM would be used to not just version the files, but also to replicate them to other machines for redundancy and transfer of IP. Lets suppose, for the sake of example, that a goal is to provide good performance if it was handling the volume of orders that Amazon.com claimed to have at its peak in December 2010: about 150,000 orders per minute. We're expecting the system to be distributed over many servers in order to get reasonable performance. We're also planning to use solid-state drives exclusively. There is a reason why we don't want to use an RDBMS for primary storage, but it's a bit beyond the scope of this question. Does anyone have first-hand experience with the performance of distributed SCMs under such a load, and what strategies were used? Open-source preferred, since the final product is to be FOSS, too.

Read the article

Efficient inline templates and C++

- by Darryl Gove

I've talked before about calling inline templates from C++, I've also talked about calling inline templates efficiently. This time I want to talk about efficiently calling inline templates from C++. The obvious starting point is that I need to declare the inline templates as being extern "C": extern "C" { int mytemplate(int); } This enables us to call it, but the call may not be very efficient because the compiler will treat it as a function call, and may produce suboptimal code based on that premise. So we need to add the no_side_effect pragma: extern "C" { int mytemplate(int); #pragma no_side_effect(mytemplate) } However, this may still not produce optimal code. We've discussed how the no_side_effect pragma cannot be combined with exceptions, well we know that the code cannot produce exceptions, but the compiler doesn't know that. If we tell the compiler that information it may be able to produce even better code. We can do this by adding the "throw()" keyword to the template declaration: extern "C" { int mytemplate(int) throw(); #pragma no_side_effect(mytemplate) } The following is an example of how these changes might improve performance. We can take our previous example code and migrate it to C++, adding the use of a try...catch construct: #include <iostream extern "C" { int lzd(int); #pragma no_side_effect(lzd) } int a; int c=0; class myclass { int routine(); }; int myclass::routine() { try { for(a=0; a<1000; a++) { c=lzd(c); } } catch(...) { std::cout << "Something happened" << std::endl; } return 0; } Compiling this produces a slightly suboptimal code sequence in the hot loop: $ CC -O -xtarget=T4 -S t.cpp t.il ... /* 0x0014 23 */ lzd %o0,%o0 /* 0x0018 21 */ add %l6,1,%l6 /* 0x001c */ cmp %l6,1000 /* 0x0020 */ bl,pt %icc,.L77000033 /* 0x0024 23 */ st %o0,[%l7] There's a store in the delay slot of the branch, so we're repeatedly storing data back to memory. If we change the function declaration to include "throw()", we get better code: $ CC -O -xtarget=T4 -S t.cpp t.il ... /* 0x0014 21 */ add %i1,1,%i1 /* 0x0018 23 */ lzd %o0,%o0 /* 0x001c 21 */ cmp %i1,999 /* 0x0020 */ ble,pt %icc,.L77000019 /* 0x0024 */ nop The store has gone, but the code is still suboptimal - there's a nop in the delay slot rather than useful work. However, it's good enough for this example. The point I'm making is that the compiler produces the better code with both the "throw()" and the no side effect pragma.

Read the article

CON6714 - Mixed-Language Development: Leveraging Native Code from Java

- by Darryl Gove

Here's the abstract from my JavaOne talk: There are some situations in which it is necessary to call native code (C/C++ compiled code) from Java applications. This session describes how to do this efficiently and how to performance-tune the resulting applications. The objectives for the session are: Explain reasons for using native code in Java applications Describe pitfalls of calling native code from Java Discuss performance-tuning of Java apps that use native code I'll cover how to call native code from Java, debugging native code, and then I'll dig into performance tuning the code. The talk is not going too deep on performance tuning - focusing on the JNI specific topics; I'll do a bit more about performance tuning in my OpenWorld talk later in the day.

Read the article

Architectural Composition Languages

- by C. Lawrence Wenham

Recently stumbled upon this paper (PDF) talking about ACLs, or Architectural Composition Languages. They're a fusion of two earlier lines of research: Architectural Definition Languages (such as UML) and Object Composition Languages (such as XAML, WWF, or scripting languages). The goal of an ACL is to have a high-level description of a program's architecture which can also be compiled into a runnable program. The high-level description assists automated analysis, while the 'executability' means changes can be tested immediately. You would still author the components of the program in a conventional programming language (C, Java, Python, etc), but they would be composed into a complete program by the ACL. One of the expected benefits is that a program can be ported to a different platform by swapping in "similar but different" components. I've been hankering for something like this for a long time (see this answer I gave on a StackOverflow question a few years ago). The paper mentions that the researchers were working on a language called ACL/1 that initially targeted Java, but would be ported to support .Net as well. However, I can't find any more mention of ACL/1 anywhere. Has there been any more work done on this? Are there any other implementations of the ACL concept that are available for use or experimentation?

Read the article

Current SPARC Architectures

- by Darryl Gove

Different generations of SPARC processors implement different architectures. The architecture that the compiler targets is controlled implicitly by the -xtarget flag and explicitly by the -arch flag. If an application targets a recent architecture, then the compiler gets to play with all the instructions that the new architecture provides. The downside is that the application won't work on older processors that don't have the new instructions. So for developer's there is a trade-off between performance and portability. The way we have solved this in the compiler is to assume a "generic" architecture, and we've made this the default behaviour of the compiler. The only flag that doesn't make this assumption is -fast which tells the compiler to assume that the build machine is also the deployment machine - so the compiler can use all the instructions that the build machine provides. The -xtarget=generic flag tells the compiler explicitly to use this generic model. We work hard on making generic code work well across all processors. So in most cases this is a very good choice. It is also of interest to know what processors support the various architectures. The following Venn diagram attempts to show this: A textual description is as follows: The T1 and T2 processors, in addition to most other SPARC processors that were shipped in the last 10+ years supported V9b, or sparcvis2. The SPARC64 processors from Fujitsu, used in the M-series machines, added support for the floating point multiply accumulate instruction in the sparcfmaf architecture. Support for this instruction also appeared in the T3 - this is called sparcvis3 Later SPARC64 processors added the integer multiply accumulate instruction, this architecture is sparcima. Finally the T4 includes support for both the integer and floating point multiply accumulate instructions in the sparc4 architecture. So the conclusion should be: Floating point multiply accumulate is supported in both the T-series and M-series machines, so it should be a relatively safe bet to start using it. The T4 is a very good machine to deploy to because it supports all the current instruction sets.

Read the article

Post 12.04 Update, stuck on splash screen

- by Lawrence

I updated to 12.04 a couple of weeks ago and I haven't started up Ubuntu until now. On start up the computer gets stuck on the splash screen. I am a beginner in all of this linux mechanics. I've seen many people post about relatively the same problem but I have a hard time following. I am using Wubi and running it along side Windows Starter on a Toshiba netbook. Thanks for bearing with my unfamiliarity haha,

Read the article

Compiling for T4

- by Darryl Gove

I've recently had quite a few queries about compiling for T4 based systems. So it's probably a good time to review what I consider to be the best practices. Always use the latest compiler. Being in the compiler team, this is bound to be something I'd recommend But the serious points are that (a) Every release the tools get better and better, so you are going to be much more effective using the latest release (b) Every release we improve the generated code, so you will see things get better (c) Old releases cannot know about new hardware. Always use optimisation. You should use at least -O to get some amount of optimisation. -xO4 is typically even better as this will add within-file inlining. Always generate debug information, using -g. This allows the tools to attribute information to lines of source. This is particularly important when profiling an application. The default target of -xtarget=generic is often sufficient. This setting is designed to produce a binary that runs well across all supported platforms. If the binary is going to be deployed on only a subset of architectures, then it is possible to produce a binary that only uses the instructions supported on these architectures, which may lead to some performance gains. I've previously discussed which chips support which architectures, and I'd recommend that you take a look at the chart that goes with the discussion. Crossfile optimisation (-xipo) can be very useful - particularly when the hot source code is distributed across multiple source files. If you're allowed to have something as geeky as favourite compiler optimisations, then this is mine! Profile feedback (-xprofile=[collect: | use:]) will help the compiler make the best code layout decisions, and is particularly effective with crossfile optimisations. But what makes this optimisation really useful is that codes that are dominated by branch instructions don't typically improve much with "traditional" compiler optimisation, but often do respond well to being built with profile feedback. The macro flag -fast aims to provide a one-stop "give me a fast application" flag. This usually gives a best performing binary, but with a few caveats. It assumes the build platform is also the deployment platform, it enables floating point optimisations, and it makes some relatively weak assumptions about pointer aliasing. It's worth investigating. SPARC64 processor, T3, and T4 implement floating point multiply accumulate instructions. These can substantially improve floating point performance. To generate them the compiler needs the flag -fma=fused and also needs an architecture that supports the instruction (at least -xarch=sparcfmaf). The most critical advise is that anyone doing performance work should profile their application. I cannot overstate how important it is to look at where the time is going in order to determine what can be done to improve it. I also presented at Oracle OpenWorld on this topic, so it might be helpful to review those slides.

Read the article

Pragmas and exceptions

- by Darryl Gove

The compiler pragmas: #pragma no_side_effect(routinename) #pragma does_not_write_global_data(routinename) #pragma does_not_read_global_data(routinename) are used to tell the compiler more about the routine being called, and enable it to do a better job of optimising around the routine. If a routine does not read global data, then global data does not need to be stored to memory before the call to the routine. If the routine does not write global data, then global data does not need to be reloaded after the call. The no side effect directive indicates that the routine does no I/O, does not read or write global data, and the result only depends on the input. However, these pragmas should not be used on routines that throw exceptions. The following example indicates the problem: #include <iostream extern "C" { int exceptional(int); #pragma no_side_effect(exceptional) } int exceptional(int a) { if (a==7) { throw 7; } else { return a+1; } } int a; int c=0; class myclass { public: int routine(); }; int myclass::routine() { for(a=0; a<1000; a++) { c=exceptional(c); } return 0; } int main() { myclass f; try { f.routine(); } catch(...) { std::cout << "Something happened" << a << c << std::endl; } } The routine "exceptional" is declared as having no side effects, however it can throw an exception. The no side effects directive enables the compiler to avoid storing global data back to memory, and retrieving it after the function call, so the loop containing the call to exceptional is quite tight: $ CC -O -S test.cpp ... .L77000061: /* 0x0014 38 */ call exceptional ! params = %o0 ! Result = %o0 /* 0x0018 36 */ add %i1,1,%i1 /* 0x001c */ cmp %i1,999 /* 0x0020 */ ble,pt %icc,.L77000061 /* 0x0024 */ nop However, when the program is run the result is incorrect: $ CC -O t.cpp $ ./a.out Something happend00 If the code had worked correctly, the output would have been "Something happened77" - the exception occurs on the seventh iteration. Yet, the current code produces a message that uses the original values for the variables 'a' and 'c'. The problem is that the exception handler reads global data, and due to the no side effects directive the compiler has not updated the global data before the function call. So these pragmas should not be used on routines that have the potential to throw exceptions.

Read the article

Inline template efficiency

- by Darryl Gove

I like inline templates, and use them quite extensively. Whenever I write code with them I'm always careful to check the disassembly to see that the resulting output is efficient. Here's a potential cause of inefficiency. Suppose we want to use the mis-named Leading Zero Detect (LZD) instruction on T4 (this instruction does a count of the number of leading zero bits in an integer register - so it should really be called leading zero count). So we put together an inline template called lzd.il looking like: .inline lzd lzd %o0,%o0 .end And we throw together some code that uses it: int lzd(int); int a; int c=0; int main() { for(a=0; a<1000; a++) { c=lzd(c); } return 0; } We compile the code with some amount of optimisation, and look at the resulting code: $ cc -O -xtarget=T4 -S lzd.c lzd.il $ more lzd.s .L77000018: /* 0x001c 11 */ lzd %o0,%o0 /* 0x0020 9 */ ld [%i1],%i3 /* 0x0024 11 */ st %o0,[%i2] /* 0x0028 9 */ add %i3,1,%i0 /* 0x002c */ cmp %i0,999 /* 0x0030 */ ble,pt %icc,.L77000018 /* 0x0034 */ st %i0,[%i1] What is surprising is that we're seeing a number of loads and stores in the code. Everything could be held in registers, so why is this happening? The problem is that the code is only inlined at the code generation stage - when the actual instructions are generated. Earlier compiler phases see a function call. The called functions can do all kinds of nastiness to global variables (like 'a' in this code) so we need to load them from memory after the function call, and store them to memory before the function call. Fortunately we can use a #pragma directive to tell the compiler that the routine lzd() has no side effects - meaning that it does not read or write to memory. The directive to do that is #pragma no_side_effect(<routine name), and it needs to be placed after the declaration of the function. The new code looks like: int lzd(int); #pragma no_side_effect(lzd) int a; int c=0; int main() { for(a=0; a<1000; a++) { c=lzd(c); } return 0; } Now the loop looks much neater: /* 0x0014 10 */ add %i1,1,%i1 ! 11 ! { ! 12 ! c=lzd(c); /* 0x0018 12 */ lzd %o0,%o0 /* 0x001c 10 */ cmp %i1,999 /* 0x0020 */ ble,pt %icc,.L77000018 /* 0x0024 */ nop

Read the article

Library order is important

- by Darryl Gove

I've written quite extensively about link ordering issues, but I've not discussed the interaction between archive libraries and shared libraries. So let's take a simple program that calls a maths library function: #include <math.h int main() { for (int i=0; i<10000000; i++) { sin(i); } } We compile and run it to get the following performance: bash-3.2$ cc -g -O fp.c -lm bash-3.2$ timex ./a.out real 6.06 user 6.04 sys 0.01 Now most people will have heard of the optimised maths library which is added by the flag -xlibmopt. This contains optimised versions of key mathematical functions, in this instance, using the library doubles performance: bash-3.2$ cc -g -O -xlibmopt fp.c -lm bash-3.2$ timex ./a.out real 2.70 user 2.69 sys 0.00 The optimised maths library is provided as an archive library (libmopt.a), and the driver adds it to the link line just before the maths library - this causes the linker to pick the definitions provided by the static library in preference to those provided by libm. We can see the processing by asking the compiler to print out the link line: bash-3.2$ cc -### -g -O -xlibmopt fp.c -lm /usr/ccs/bin/ld ... fp.o -lmopt -lm -o a.out... The flag to the linker is -lmopt, and this is placed before the -lm flag. So what happens when the -lm flag is in the wrong place on the command line: bash-3.2$ cc -g -O -xlibmopt -lm fp.c bash-3.2$ timex ./a.out real 6.02 user 6.01 sys 0.01 If the -lm flag is before the source file (or object file for that matter), we get the slower performance from the system maths library. Why's that? If we look at the link line we can see the following ordering: /usr/ccs/bin/ld ... -lmopt -lm fp.o -o a.out So the optimised maths library is still placed before the system maths library, but the object file is placed afterwards. This would be ok if the optimised maths library were a shared library, but it is not - instead it's an archive library, and archive library processing is different - as described in the linker and library guide: "The link-editor searches an archive only to resolve undefined or tentative external references that have previously been encountered." An archive library can only be used resolve symbols that are outstanding at that point in the link processing. When fp.o is placed before the libmopt.a archive library, then the linker has an unresolved symbol defined in fp.o, and it will search the archive library to resolve that symbol. If the archive library is placed before fp.o then there are no unresolved symbols at that point, and so the linker doesn't need to use the archive library. This is why libmopt needs to be placed after the object files on the link line. On the other hand if the linker has observed any shared libraries, then at any point these are checked for any unresolved symbols. The consequence of this is that once the linker "sees" libm it will resolve any symbols it can to that library, and it will not check the archive library to resolve them. This is why libmopt needs to be placed before libm on the link line. This leads to the following order for placing files on the link line: Object files Archive libraries Shared libraries If you use this order, then things will consistently get resolved to the archive libraries rather than to the shared libaries.

Read the article

It could be worse....

- by Darryl Gove

As "guest" pointed out, in my file I/O test I didn't open the file with O_SYNC, so in fact the time was spent in OS code rather than in disk I/O. It's a straightforward change to add O_SYNC to the open() call, but it's also useful to reduce the iteration count - since the cost per write is much higher: ... #define SIZE 1024 void test_write() { starttime(); int file = open("./test.dat",O_WRONLY|O_CREAT|O_SYNC,S_IWGRP|S_IWOTH|S_IWUSR); ... Running this gave the following results: Time per iteration 0.000065606310 MB/s Time per iteration 2.709711563906 MB/s Time per iteration 0.178590114758 MB/s Yup, disk I/O is way slower than the original I/O calls. However, it's not a very fair comparison since disks get written in large blocks of data and we're deliberately sending a single byte. A fairer result would be to look at the I/O operations per second; which is about 65 - pretty much what I'd expect for this system. It's also interesting to examine at the profiles for the two cases. When the write() was trapping into the OS the profile indicated that all the time was being spent in system. When the data was being written to disk, the time got attributed to sleep. This gives us an indication how to interpret profiles from apps doing I/O. It's the sleep time that indicates disk activity.

Read the article

Monday, 1st October: Presenting at JavaOne and Oracle Open World

- by Darryl Gove

On Monday 1 October I will be presenting at both JavaOne and Oracle Open World. The full conference schedule is available from here. The logistics for my sessions are as follows: JavaOne: 8:30am Monday 1 October. CON6714: "Mixed-Language Development: Leveraging Native Code from Java". San Francisco Hilton - Continental Ballroom 6 Oracle OpenWorld: 10:45am Monday 1 October. CON6382: "Maximizing Your SPARC T4 Oracle Solaris Application Performance". Marriott Marquis - Golden Gate C3 Hope to see you there!

Read the article

Providing feedback on the Solaris Studio 12.4 Beta

- by Darryl Gove

Obviously, the point of the Solaris Studio 12.4 Beta programme was for everyone to try out the new version of the compiler and tools, and for us to gather feedback on what was working, what was broken, and what was missing. We've had lots of useful feedback - you can see some of it on the forums. But we're after more. Hence we have a Solaris Studio 12.4 Beta survey where you can tell us more about your experiences. Your comments are really helpful to us. Thanks.

Read the article

Write and fprintf for file I/O

- by Darryl Gove

fprintf() does buffered I/O, where as write() does unbuffered I/O. So once the write() completes, the data is in the file, whereas, for fprintf() it may take a while for the file to get updated to reflect the output. This results in a significant performance difference - the write works at disk speed. The following is a program to test this: #include <fcntl.h #include <unistd.h #include <stdio.h #include <stdlib.h #include <errno.h #include <stdio.h #include <sys/time.h #include <sys/types.h #include <sys/stat.h static double s_time; void starttime() { s_time=1.0*gethrtime(); } void endtime(long its) { double e_time=1.0*gethrtime(); printf("Time per iteration %5.2f MB/s\n", (1.0*its)/(e_time-s_time*1.0)*1000); s_time=1.0*gethrtime(); } #define SIZE 10*1024*1024 void test_write() { starttime(); int file = open("./test.dat",O_WRONLY|O_CREAT,S_IWGRP|S_IWOTH|S_IWUSR); for (int i=0; i<SIZE; i++) { write(file,"a",1); } close(file); endtime(SIZE); } void test_fprintf() { starttime(); FILE* file = fopen("./test.dat","w"); for (int i=0; i<SIZE; i++) { fprintf(file,"a"); } fclose(file); endtime(SIZE); } void test_flush() { starttime(); FILE* file = fopen("./test.dat","w"); for (int i=0; i<SIZE; i++) { fprintf(file,"a"); fflush(file); } fclose(file); endtime(SIZE); } int main() { test_write(); test_fprintf(); test_flush(); } Compiling and running I get 0.2MB/s for write() and 6MB/s for fprintf(). A large difference. There's three tests in this example, the third test uses fprintf() and fflush(). This is equivalent to write() both in performance and in functionality. Which leads to the suggestion that fprintf() (and other buffering I/O functions) are the fastest way of writing to files, and that fflush() should be used to enforce synchronisation of the file contents.

Read the article

SPARC Architecture 2011

- by Darryl Gove

With what appears to be minimal fanfare, an update of the SPARC Architecture has been released. If you ever look at SPARC disassembly code, then this is the document that you need to bookmark. If you are not familiar with it, then it basically describes how a SPARC processor should behave - it doesn't describe a particular implementation, just the "generic" processor. As with all revisions, it supercedes the SPARC v9 book published back in the 90s, having both corrections, and definitions of new instructions. Anyway, should be an interesting read

Search Results

Search found 157 results on 7 pages for 'darryl lawrence'.

Page 1/7 | 1 2 3 4 5 6 7 | Next Page >

- by Anand Akela

- by lorrenuy

- by Darryl

- by Darryl Hein

- by Fr. Lawrence winslow

- by Lawrence Kok

- by John Lawrence Aspden

- by John Lawrence Aspden

- by C. Lawrence Wenham

- by Darryl Gove

- by Darryl Gove

- by C. Lawrence Wenham

- by Darryl Gove

- by Lawrence

- by Darryl Gove

- by Darryl Gove

- by Darryl Gove

- by Darryl Gove

- by Darryl Gove

- by Darryl Gove

- by Darryl Gove

- by Darryl Gove

- by Darryl Gove

1 2 3 4 5 6 7 | Next Page >