processors - Page 9 - Developer IT

What is the laptop that FULLY compatible with Ubuntu?

- by user63187

i have met alot of problems with all ubuntu releases on my current laptop (DELL inspiron N4030), such as: out loud fan low battery life screen brightness returns to maximum after every startup weak support for my VGA (Mobility Radeon HD 5430 Series ATI card) ubuntu is probably not scaling my processors correctly sometimes ubuntu hangs after return from suspend sometimes ubuntu doesn't shutdown correctly Now, it's time to change my laptop for good, Can you HELP me to choose a laptop that FULLY compatible with ubuntu releases, please notice that Not every laptop are available in libyan markets.

Read the article

Lenovo Server Stable Growing

Server Snapshot: In the past year, Lenovo launched its second generation of ThinkServers, equipped with more memory and storage capacity, the latest Intel processors, and virtualization capabilities. Yet the company remains in the "other" category in the various server measurement surveys.

Read the article

Lenovo Server Stable Growing

Server Snapshot: In the past year, Lenovo launched its second generation of ThinkServers, equipped with more memory and storage capacity, the latest Intel processors, and virtualization capabilities. Yet the company remains in the "other" category in the various server measurement surveys.

Read the article

IBM Adds Nvidia GPUs to High-Performance Servers

GPU processors will be used as high-performance calculation engines in number-crunching tasks. Have GPUs finally arrived in HPC?

Read the article

IBM System x Server Buyer's Guide

IBM has taken to the road with the message that Intel's Nehalem EX processors coupled with Big Blue's system engineering talents has resulted in a platform well-suited for virtualization, consolidation and mission-critical applications. Does the server hardware live up to the praise?

Read the article

IBM System x Server Buyer's Guide

IBM has taken to the road with the message that Intel's Nehalem EX processors coupled with Big Blue's system engineering talents has resulted in a platform well-suited for virtualization, consolidation and mission-critical applications. Does the server hardware live up to the praise?

Read the article

AMD Unleashes Six-Core Desktop CPU

Hardware Central: "AMD today announced the availability of a new six-core desktop processor and platform to accompany it, which includes a new chipset and support for hobbyists who like to tweak their processors to the limits of their heat sink and warranty."

Read the article

AMD Unveils Pricing, Performance for Opteron 6000 Series

Hardware Central: "AMD has formally released the "Magny-Cours" line of eight- and 12-core Opteron server processors, disclosing the speeds and feeds and promising performance-per-watt that beats Intel's newest Xeon."

Read the article

Is Oracle Ready to Dump AMD From Sun's x86 Server Lineup?

Sun traditionally offered both AMD and Intel processors in its x86 server lineup. Now, it's looking like AMD's Opteron won't make the cut under Oracle's ownership.

Read the article

Intel to Unleash Atom-ic Power at Computex

Hardware Central: "Intel plans to introduce a series of new Atom processors at the opening of the giant Computex show in Taipei this week, as well as offer a preview a number of other offerings. But Atom will be the star of the show."

Read the article

Kernel Log: Linux 2.6.34 goes into testing

The H Open: "Improvements include graphics drivers for recent Radeon GPUs and for the graphics cores of some Intel processors that are only expected to be released early next year. Another new addition is the LogFS SSD file system."

Read the article

Review: AbiWord: The Underappreciated Word Processor

OpenOffice.org isn't the only game in town for open source word processing. One of the best, and underexposed, open word processors is AbiWord.

Read the article

Intel Exceeds Q1 Projections on Mobile Strength

That raft of new mobile processors in January sure paid off as the chip giant blows past all projections for a tremendous first quarter, signaling renewed growth for the IT sector at large.

Read the article

OpenCL 1.1 backward compatible, enhanced performance

Linux Magazine: "The Khronos Group today announced OpenCL 1.1, a backwards compatible update that boosts performance in the parallel programming standard. OpenCL is a free programming standard designed from the ground up to optimize coding in muliticore processors."

Read the article

AbiWord: The Underappreciated Word Processor

Linux Planet: "OpenOffice.org isn't the only game in town for open source word processing. One of the best, and underexposed, open word processors is AbiWord."

Read the article

Software Programming Research Promises Faster Applications

Information Week: "A new approach to memory management allows computer code to operate more efficiently on multicore processors and can reduce the overhead of security checks."

Read the article

Hyper-V Server 2012 with Zambezi AMD FX-Series - Hardware assisted virtualization not present

- by Vazgen

I'm trying to set up VDI across Windows Server 2012 VMs running on Hyper-V 2012. The wizard's compatibility check for the Virtualization Host server failed with "Hardware-assisted virtualization is not present on the server". I'm running an FX-8120 CPU and have the ASUS M5A97 motherboard. I know I'm supposed to enable No-Execute (Hyper-V Hardware Considerations) but I cannot find that or any other synonyms of it in my motherboards UEFI BIOS (NX, XD, EVP, XN... nothing). I found this: PAE/NX/SSE2 Support Requirement Guide for Windows 8 which in short says "Windows 8 and Windows Server 2012 requires that systems must have processors that support NX, and NX must be turned on for important security safeguards to function effectively and avoid potential security vulnerabilities." this leads me to believe NX is on by default if I was able to get this far and install Hyper-V 2012 and Windows Server 2012.. Also I tried to disable AVX in cmd with "bcdedit /set xsavedisable 1". Did not resolve My processor is Zambezi FX-8120 and also supports RVI/SLAT/other synonym: processor: Newegg Processor FX-8120 support proof: AMD Processors with Rapid Virtualization Indexing Required to Run Hyper-V in Windows 8 What's going on here? I bought this CPU specifically after I had the same problems with an older AMD Athelon II and made sure to buy one with AMD-V and RVI. Thank you

Read the article

Network connection to Firebird 2.1 became slow after upgrading to Ubuntu 10.04

- by lyle

We've got a setup that we're using for different clients : a program connecting to a Firebird server on a local network. So far we mostly used 32bit processors running Ubuntu LTS (recently upgraded to 10.04). Now we introduced servers running on 64bit processors, running Ubuntu 10.04 64bit. Suddenly some queries run slower than they used to. In short: running the query locally works fine on both 64bit and 32bit servers, but when running the same queries over the network the 64bit server is suddenly much slower. We did a few checks with both local and remote connections to both 64bit and 32bit servers, using identical databases and identical queries, running in Flamerobin. Running the query locally takes a negligible amount of time: 0.008s on the 64bit server, 0.014s on the 32bit servers. So the servers themselves are running fine. Running the queries over the network, the 64bit server suddenly needs up to 0.160s to respond, while the 32bit server responds in 0.055s. So the older servers are twice as fast over the network, in spite of the newer servers being twice as fast if run locally. Apart from that the setup is identical. All servers are running the same installation of Ubuntu 10.04, same version of Firebird and so on, the only difference is that some are 64 and some 32bit. Any idea?? I tried to google it, but I couldn't find any complains that Firebird 64bit is slower than Firebird 32bit, except that the Firebird 2.1 change log mentions that there's a new network API which is twice as fast, as soon as the drivers are updated to use it. So I could imagine that the 64bit driver is still using the old API, but that's a bit of a stretch, I guess. Thanx in advance for any replies! :)

Read the article

What is optimal hardware configuration for heavy load LAMP application

- by Piotr K.

I need to run Linux-Apache-PHP-MySQL application (Moodle e-learning platform) for a large number of concurrent users - I am aiming 5000 users. By concurrent I mean that 5000 people should be able to work with the application at the same time. "Work" means not only do database reads but writes as well. The application is not very typical, since it is doing a lot of inserts/updates on the database, so caching techniques are not helping to much. We are using InnoDB storage engine. In addition application is not written with performance in mind. For instance one Apache thread usually occupies about 30-50 MB of RAM. I would be greatful for information what hardware is needed to build scalable configuration that is able to handle this kind of load. We are using right now two HP DLG 380 with two 4 core processors which are able to handle much lower load (typically 300-500 concurrent users). Is it reasonable to invest in this kind of boxes and build cluster using them or is it better to go with some more high-end hardware? I am particularly curious how many and how powerful servers are needed (number of processors/cores, size of RAM) what network equipment should be used (what kind of switches, network cards) any other hardware, like particular disc storage solutions, etc, that are needed Another thing is how to put together everything, that is what is the most optimal architecture. Clustering with MySQL is rather hard (people are complaining about MySQL Cluster, even here on Stackoverflow).

Read the article

Two large, linked Excel files take 30 minutes to save, except in VMWare environment

- by Gerald L

I support some tax consultants who love to use Excel when they should probably be using Access. Anyway, they have created two Excel files, A and B. File B has cells linked to file A. File A is 27 MB and file B is 16 MB. One worksheet has roughly 1 million rows and there is another worksheet doing a whole bunch of SUMIF on the 1 million rows. Not the best idea, but whatever. Both Excel files open and recalculate within a reasonable amount of time (1-2 minutes). For a files that large, this is acceptable. Here is the problem: Once you change a cell, and save the file B, it takes a solid 30 minutes to save the file, and the processors are going full speed. I've tried this on 6 different machines, all running Windows XP SP3 with Office 2007 SP2 and all patches. The specs vary from one machine with 512 MB or RAM to a machine with 4 GB of RAM and quad processors. Same result every time. Here is the clincher: If I do this same save operation on a VMWare virtual machine, the file gets saved in 1 minute. I've tried this with my ESX servers at the office, my Mac Fusion at home, and VMWare workstation at the office. It does not matter how much RAM the virtual machine has... it saves in about 1 minute every time. Does anybody have any idea why this is happening and how to fix?

Read the article

Very uneven CPU utilization with SQL Server 2012 on 2 processor computer with 16 cores / processor

- by cooplarsh

After installing SQL Server Enterprise 2012 with the Server + Cal license model, on a computer with 2 processors each with 16 cores (and no hyperthreading involved) and putting the server under extremely heavy load the 16 cores on the first processor were very underutilized, the first 4 cores on the 2nd CPU were heavily utilized, and the last 12 cores were not used at all (because of the 20 core limit for this sql server version). Total CPU utilization was displaying as around 25%. Unfortunately, the server suffered from extremely poor performance even though if the tasks were evenly distributed across the 20 cores it wouldn't have been anywhere near as bad. The Windows Server was running on a VMWare virtual image under ESX Server, but all of the CPU was allocated to the windows server. We tried changing affinity settings (e.g., allocating most cores to CPU and the others to I/O), but that didn't help solve the performance problems. Upgrading the product edition to SQL Server Enterprise Core 2012 not only allowed the SQL Server to utilize the 12 previously unused cores on the 2nd processor, but it also resulted in a much more even distribution of tasks across all of the processors. To get through the backlog of requests cpU utilization jumped to around 90%, and then came down to around 33% once it was caught up, but performance improved dramatically since we failed over to the newly updated version And the performance issues went away. I was wondering if anyone knows what might cause SQL Server to unevenly distribute the load, relying almost exclusively on the first 4 cores of the 2nd processor that had 12 cores idle, and allocate only a few tasks to each of the 16 cores on the first processor. Also, is there any way we could have more evenly distributed the load across the 20 cores that were being used without the product edition upgrade? The flip side of that question is what did the product upgrade do that caused SQL Server to start evenly distributing the load across all of the cores that it recognized? Thanks to any insight to answer these questions and/or links that might help me better understand how to make sense of what was happenings.

Read the article

How to manage processes-to-CPU cores affinities ?

- by Philippe

I use a distributed user-space filesystem (GlusterFS) and I would like to be sure GlusterFS processes will always have the computing power they need. Each execution node of my grid have 2 CPU, with 4 cores per CPU and 2 threads per core (16 "processors" are seen by Linux). My goal is to guarantee that GlusterFS processes have enough processing power to be reliable, responsive and fast. (There is no marketing here, just the dreams of a sysadmin ;-) I consider two main points : GlusterFS processes I/O for data access (on local disks, or remote disks) I thought about binding the Linux Kernel and GlusterFS instances on a specific "processor". I would like to be sure that : No grid job will impact the kernel and the GlusterFS instances Researchers jobs won't be affected by system processes (I'd like to reserve a pool of cores to job execution and be sure that no system process will use these CPUs) But what about I/O ? As we handle a huge amount of data (several terabytes), we'll have a lot of interuptions. How can I distribute these operations on my processors ? What are the "best practices" ? Thanks for your comments!

Read the article

Why do manufacturers not show all hardware power usage?

- by Drew

I find it slightly more difficult to build a computer when I do not know how much power is needed for a component. When selecting a power supply for a computer, it is difficult to know how large of one to get. You don't want to go too large for cost reasons and circuit reasons, but you don't want to go too low and not be able to properly use every component. For instance, a graphics card might say "Minimum of a 500 Watt power supply. (Minimum recommended power supply with +12 Volt current rating of 30 Amps.)" But it really needs 360W (12V * 30A). So why don't they just say "Uses 360W max and xxxW peak"? Processors, I have noticed are good at reporting their power usage, but aside from processors and sometimes graphics cards, power usage is easily found. What is the power consumed by the Blu-ray / DVD drives? By the HDDs/SSDs? By the Mobo? etc. Why are these questions not easily answered when building a machine?

Read the article

Performance Optimization – It Is Faster When You Can Measure It

- by Alois Kraus

Performance optimization in bigger systems is hard because the measured numbers can vary greatly depending on the measurement method of your choice. To measure execution timing of specific methods in your application you usually use Time Measurement Method Potential Pitfalls Stopwatch Most accurate method on recent processors. Internally it uses the RDTSC instruction. Since the counter is processor specific you can get greatly different values when your thread is scheduled to another core or the core goes into a power saving mode. But things do change luckily: Intel's Designer's vol3b, section 16.11.1 "16.11.1 Invariant TSC The time stamp counter in newer processors may support an enhancement, referred to as invariant TSC. Processor's support for invariant TSC is indicated by CPUID.80000007H:EDX[8]. The invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states. This is the architectural behavior moving forward. On processors with invariant TSC support, the OS may use the TSC for wall clock timer services (instead of ACPI or HPET timers). TSC reads are much more efficient and do not incur the overhead associated with a ring transition or access to a platform resource." DateTime.Now Good but it has only a resolution of 16ms which can be not enough if you want more accuracy. Reporting Method Potential Pitfalls Console.WriteLine Ok if not called too often. Debug.Print Are you really measuring performance with Debug Builds? Shame on you. Trace.WriteLine Better but you need to plug in some good output listener like a trace file. But be aware that the first time you call this method it will read your app.config and deserialize your system.diagnostics section which does also take time. In general it is a good idea to use some tracing library which does measure the timing for you and you only need to decorate some methods with tracing so you can later verify if something has changed for the better or worse. In my previous article I did compare measuring performance with quantum mechanics. This analogy does work surprising well. When you measure a quantum system there is a lower limit how accurately you can measure something. The Heisenberg uncertainty relation does tell us that you cannot measure of a quantum system the impulse and location of a particle at the same time with infinite accuracy. For programmers the two variables are execution time and memory allocations. If you try to measure the timings of all methods in your application you will need to store them somewhere. The fastest storage space besides the CPU cache is the memory. But if your timing values do consume all available memory there is no memory left for the actual application to run. On the other hand if you try to record all memory allocations of your application you will also need to store the data somewhere. This will cost you memory and execution time. These constraints are always there and regardless how good the marketing of tool vendors for performance and memory profilers are: Any measurement will disturb the system in a non predictable way. Commercial tool vendors will tell you they do calculate this overhead and subtract it from the measured values to give you the most accurate values but in reality it is not entirely true. After falling into the trap to trust the profiler timings several times I have got into the habit to Measure with a profiler to get an idea where potential bottlenecks are. Measure again with tracing only the specific methods to check if this method is really worth optimizing. Optimize it Measure again. Be surprised that your optimization has made things worse. Think harder Implement something that really works. Measure again Finished! - Or look for the next bottleneck. Recently I have looked into issues with serialization performance. For serialization DataContractSerializer was used and I was not sure if XML is really the most optimal wire format. After looking around I have found protobuf-net which uses Googles Protocol Buffer format which is a compact binary serialization format. What is good for Google should be good for us. A small sample app to check out performance was a matter of minutes: using ProtoBuf; using System; using System.Diagnostics; using System.IO; using System.Reflection; using System.Runtime.Serialization; [DataContract, Serializable] class Data { [DataMember(Order=1)] public int IntValue { get; set; } [DataMember(Order = 2)] public string StringValue { get; set; } [DataMember(Order = 3)] public bool IsActivated { get; set; } [DataMember(Order = 4)] public BindingFlags Flags { get; set; } } class Program { static MemoryStream _Stream = new MemoryStream(); static MemoryStream Stream { get { _Stream.Position = 0; _Stream.SetLength(0); return _Stream; } } static void Main(string[] args) { DataContractSerializer ser = new DataContractSerializer(typeof(Data)); Data data = new Data { IntValue = 100, IsActivated = true, StringValue = "Hi this is a small string value to check if serialization does work as expected" }; var sw = Stopwatch.StartNew(); int Runs = 1000 * 1000; for (int i = 0; i < Runs; i++) { //ser.WriteObject(Stream, data); Serializer.Serialize<Data>(Stream, data); } sw.Stop(); Console.WriteLine("Did take {0:N0}ms for {1:N0} objects", sw.Elapsed.TotalMilliseconds, Runs); Console.ReadLine(); } } The results are indeed promising: Serializer Time in ms N objects protobuf-net 807 1000000 DataContract 4402 1000000 Nearly a factor 5 faster and a much more compact wire format. Lets use it! After switching over to protbuf-net the transfered wire data has dropped by a factor two (good) and the performance has worsened by nearly a factor two. How is that possible? We have measured it? Protobuf-net is much faster! As it turns out protobuf-net is faster but it has a cost: For the first time a type is de/serialized it does use some very smart code-gen which does not come for free. Lets try to measure this one by setting of our performance test app the Runs value not to one million but to 1. Serializer Time in ms N objects protobuf-net 85 1 DataContract 24 1 The code-gen overhead is significant and can take up to 200ms for more complex types. The break even point where the code-gen cost is amortized by its faster serialization performance is (assuming small objects) somewhere between 20.000-40.000 serialized objects. As it turned out my specific scenario involved about 100 types and 1000 serializations in total. That explains why the good old DataContractSerializer is not so easy to take out of business. The final approach I ended up was to reduce the number of types and to serialize primitive types via BinaryWriter directly which turned out to be a pretty good alternative. It sounded good until I measured again and found that my optimizations so far do not help much. After looking more deeper at the profiling data I did found that one of the 1000 calls did take 50% of the time. So how do I find out which call it was? Normal profilers do fail short at this discipline. A (totally undeserved) relatively unknown profiler is SpeedTrace which does unlike normal profilers create traces of your applications by instrumenting your IL code at runtime. This way you can look at the full call stack of the one slow serializer call to find out if this stack was something special. Unfortunately the call stack showed nothing special. But luckily I have my own tracing as well and I could see that the slow serializer call did happen during the serialization of a bool value. When you encounter after much analysis something unreasonable you cannot explain it then the chances are good that your thread was suspended by the garbage collector. If there is a problem with excessive GCs remains to be investigated but so far the serialization performance seems to be mostly ok. When you do profile a complex system with many interconnected processes you can never be sure that the timings you just did measure are accurate at all. Some process might be hitting the disc slowing things down for all other processes for some seconds as well. There is a big difference between warm and cold startup. If you restart all processes you can basically forget the first run because of the OS disc cache, JIT and GCs make the measured timings very flexible. When you are in need of a random number generator you should measure cold startup times of a sufficiently complex system. After the first run you can try again getting different and much lower numbers. Now try again at least two times to get some feeling how stable the numbers are. Oh and try to do the same thing the next day. It might be that the bottleneck you found yesterday is gone today. Thanks to GC and other random stuff it can become pretty hard to find stuff worth optimizing if no big bottlenecks except bloatloads of code are left anymore. When I have found a spot worth optimizing I do make the code changes and do measure again to check if something has changed. If it has got slower and I am certain that my change should have made it faster I can blame the GC again. The thing is that if you optimize stuff and you allocate less objects the GC times will shift to some other location. If you are unlucky it will make your faster working code slower because you see now GCs at times where none were before. This is where the stuff does get really tricky. A safe escape hatch is to create a repro of the slow code in an isolated application so you can change things fast in a reliable manner. Then the normal profilers do also start working again. As Vance Morrison does point out it is much more complex to profile a system against the wall clock compared to optimize for CPU time. The reason is that for wall clock time analysis you need to understand how your system does work and which threads (if you have not one but perhaps 20) are causing a visible delay to the end user and which threads can wait a long time without affecting the user experience at all. Next time: Commercial profiler shootout.

Read the article

Solaris X86 64-bit Assembly Programming

- by danx

Solaris X86 64-bit Assembly Programming This is a simple example on writing, compiling, and debugging Solaris 64-bit x86 assembly language with a C program. This is also referred to as "AMD64" assembly. The term "AMD64" is used in an inclusive sense to refer to all X86 64-bit processors, whether AMD Opteron family or Intel 64 processor family. Both run Solaris x86. I'm keeping this example simple mainly to illustrate how everything comes together—compiler, assembler, linker, and debugger when using assembly language. The example I'm using here is a C program that calls an assembly language program passing a C string. The assembly language program takes the C string and calls printf() with it to print the string. AMD64 Register Usage But first let's review the use of AMD64 registers. AMD64 has several 64-bit registers, some special purpose (such as the stack pointer) and others general purpose. By convention, Solaris follows the AMD64 ABI in register usage, which is the same used by Linux, but different from Microsoft Windows in usage (such as which registers are used to pass parameters). This blog will only discuss conventions for Linux and Solaris. The following chart shows how AMD64 registers are used. The first six parameters to a function are passed through registers. If there's more than six parameters, parameter 7 and above are pushed on the stack before calling the function. The stack is also used to save temporary "stack" variables for use by a function. 64-bit Register Usage %rip Instruction Pointer points to the current instruction %rsp Stack Pointer %rbp Frame Pointer (saved stack pointer pointing to parameters on stack) %rdi Function Parameter 1 %rsi Function Parameter 2 %rdx Function Parameter 3 %rcx Function Parameter 4 %r8 Function Parameter 5 %r9 Function Parameter 6 %rax Function return value %r10, %r11 Temporary registers (need not be saved before used) %rbx, %r12, %r13, %r14, %r15 Temporary registers, but must be saved before use and restored before returning from the current function (usually with the push and pop instructions). 32-, 16-, and 8-bit registers To access the lower 32-, 16-, or 8-bits of a 64-bit register use the following: 64-bit register Least significant 32-bits Least significant 16-bits Least significant 8-bits %rax%eax%ax%al %rbx%ebx%bx%bl %rcx%ecx%cx%cl %rdx%edx%dx%dl %rsi%esi%si%sil %rdi%edi%di%axl %rbp%ebp%bp%bp %rsp%esp%sp%spl %r9%r9d%r9w%r9b %r10%r10d%r10w%r10b %r11%r11d%r11w%r11b %r12%r12d%r12w%r12b %r13%r13d%r13w%r13b %r14%r14d%r14w%r14b %r15%r15d%r15w%r15b %r16%r16d%r16w%r16b There's other registers present, such as the 64-bit %mm registers, 128-bit %xmm registers, 256-bit %ymm registers, and 512-bit %zmm registers. Except for %mm registers, these registers may not present on older AMD64 processors. Assembly Source The following is the source for a C program, helloas1.c, that calls an assembly function, hello_asm(). $ cat helloas1.c extern void hello_asm(char *s); int main(void) { hello_asm("Hello, World!"); } The assembly function called above, hello_asm(), is defined below. $ cat helloas2.s /* * helloas2.s * To build: * cc -m64 -o helloas2-cpp.s -D_ASM -E helloas2.s * cc -m64 -c -o helloas2.o helloas2-cpp.s */ #if defined(lint) || defined(__lint) /* ARGSUSED */ void hello_asm(char *s) { } #else /* lint */ #include <sys/asm_linkage.h> .extern printf ENTRY_NP(hello_asm) // Setup printf parameters on stack mov %rdi, %rsi // P2 (%rsi) is string variable lea .printf_string, %rdi // P1 (%rdi) is printf format string call printf ret SET_SIZE(hello_asm) // Read-only data .text .align 16 .type .printf_string, @object .printf_string: .ascii "The string is: %s.\n\0" #endif /* lint || __lint */ In the assembly source above, the C skeleton code under "#if defined(lint)" is optionally used for lint to check the interfaces with your C program--very useful to catch nasty interface bugs. The "asm_linkage.h" file includes some handy macros useful for assembly, such as ENTRY_NP(), used to define a program entry point, and SET_SIZE(), used to set the function size in the symbol table. The function hello_asm calls C function printf() by passing two parameters, Parameter 1 (P1) is a printf format string, and P2 is a string variable. The function begins by moving %rdi, which contains Parameter 1 (P1) passed hello_asm, to printf()'s P2, %rsi. Then it sets printf's P1, the format string, by loading the address the address of the format string in %rdi, P1. Finally it calls printf. After returning from printf, the hello_asm function returns itself. Larger, more complex assembly functions usually do more setup than the example above. If a function is returning a value, it would set %rax to the return value. Also, it's typical for a function to save the %rbp and %rsp registers of the calling function and to restore these registers before returning. %rsp contains the stack pointer and %rbp contains the frame pointer. Here is the typical function setup and return sequence for a function: ENTRY_NP(sample_assembly_function) push %rbp // save frame pointer on stack mov %rsp, %rbp // save stack pointer in frame pointer xor %rax, %r4ax // set function return value to 0. mov %rbp, %rsp // restore stack pointer pop %rbp // restore frame pointer ret // return to calling function SET_SIZE(sample_assembly_function) Compiling and Running Assembly Use the Solaris cc command to compile both C and assembly source, and to pre-process assembly source. You can also use GNU gcc instead of cc to compile, if you prefer. The "-m64" option tells the compiler to compile in 64-bit address mode (instead of 32-bit). $ cc -m64 -o helloas2-cpp.s -D_ASM -E helloas2.s $ cc -m64 -c -o helloas2.o helloas2-cpp.s $ cc -m64 -c helloas1.c $ cc -m64 -o hello-asm helloas1.o helloas2.o $ file hello-asm helloas1.o helloas2.o hello-asm: ELF 64-bit LSB executable AMD64 Version 1 [SSE FXSR FPU], dynamically linked, not stripped helloas1.o: ELF 64-bit LSB relocatable AMD64 Version 1 helloas2.o: ELF 64-bit LSB relocatable AMD64 Version 1 $ hello-asm The string is: Hello, World!. Debugging Assembly with MDB MDB is the Solaris system debugger. It can also be used to debug user programs, including assembly and C. The following example runs the above program, hello-asm, under control of the debugger. In the example below I load the program, set a breakpoint at the assembly function hello_asm, display the registers and the first parameter, step through the assembly function, and continue execution. $ mdb hello-asm # Start the debugger > hello_asm:b # Set a breakpoint > ::run # Run the program under the debugger mdb: stop at hello_asm mdb: target stopped at: hello_asm: movq %rdi,%rsi > $C # display function stack ffff80ffbffff6e0 hello_asm() ffff80ffbffff6f0 0x400adc() > $r # display registers %rax = 0x0000000000000000 %r8 = 0x0000000000000000 %rbx = 0xffff80ffbf7f8e70 %r9 = 0x0000000000000000 %rcx = 0x0000000000000000 %r10 = 0x0000000000000000 %rdx = 0xffff80ffbffff718 %r11 = 0xffff80ffbf537db8 %rsi = 0xffff80ffbffff708 %r12 = 0x0000000000000000 %rdi = 0x0000000000400cf8 %r13 = 0x0000000000000000 %r14 = 0x0000000000000000 %r15 = 0x0000000000000000 %cs = 0x0053 %fs = 0x0000 %gs = 0x0000 %ds = 0x0000 %es = 0x0000 %ss = 0x004b %rip = 0x0000000000400c70 hello_asm %rbp = 0xffff80ffbffff6e0 %rsp = 0xffff80ffbffff6c8 %rflags = 0x00000282 id=0 vip=0 vif=0 ac=0 vm=0 rf=0 nt=0 iopl=0x0 status=<of,df,IF,tf,SF,zf,af,pf,cf> %gsbase = 0x0000000000000000 %fsbase = 0xffff80ffbf782a40 %trapno = 0x3 %err = 0x0 > ::dis # disassemble the current instructions hello_asm: movq %rdi,%rsi hello_asm+3: leaq 0x400c90,%rdi hello_asm+0xb: call -0x220 <PLT:printf> hello_asm+0x10: ret 0x400c81: nop 0x400c85: nop 0x400c88: nop 0x400c8c: nop 0x400c90: pushq %rsp 0x400c91: pushq $0x74732065 0x400c96: jb +0x69 <0x400d01> > 0x0000000000400cf8/S # %rdi contains Parameter 1 0x400cf8: Hello, World! > [ # Step and execute 1 instruction mdb: target stopped at: hello_asm+3: leaq 0x400c90,%rdi > [ mdb: target stopped at: hello_asm+0xb: call -0x220 <PLT:printf> > [ The string is: Hello, World!. mdb: target stopped at: hello_asm+0x10: ret > [ mdb: target stopped at: main+0x19: movl $0x0,-0x4(%rbp) > :c # continue program execution mdb: target has terminated > $q # quit the MDB debugger $ In the example above, at the start of function hello_asm(), I display the stack contents with "$C", display the registers contents with "$r", then disassemble the current function with "::dis". The first function parameter, which is a C string, is passed by reference with the string address in %rdi (see the register usage chart above). The address is 0x400cf8, so I print the value of the string with the "/S" MDB command: "0x0000000000400cf8/S". I can also print the contents at an address in several other formats. Here's a few popular formats. For more, see the mdb(1) man page for details. address/S C string address/C ASCII character (1 byte) address/E unsigned decimal (8 bytes) address/U unsigned decimal (4 bytes) address/D signed decimal (4 bytes) address/J hexadecimal (8 bytes) address/X hexadecimal (4 bytes) address/B hexadecimal (1 bytes) address/K pointer in hexadecimal (4 or 8 bytes) address/I disassembled instruction Finally, I step through each machine instruction with the "[" command, which steps over functions. If I wanted to enter a function, I would use the "]" command. Then I continue program execution with ":c", which continues until the program terminates. MDB Basic Cheat Sheet Here's a brief cheat sheet of some of the more common MDB commands useful for assembly debugging. There's an entire set of macros and more powerful commands, especially some for debugging the Solaris kernel, but that's beyond the scope of this example. $C Display function stack with pointers $c Display function stack $e Display external function names $v Display non-zero variables and registers $r Display registers ::fpregs Display floating point (or "media" registers). Includes %st, %xmm, and %ymm registers. ::status Display program status ::run Run the program (followed by optional command line parameters) $q Quit the debugger address:b Set a breakpoint address:d Delete a breakpoint $b Display breakpoints :c Continue program execution after a breakpoint [ Step 1 instruction, but step over function calls ] Step 1 instruction address::dis Disassemble instructions at an address ::events Display events Further Information "Assembly Language Techniques for Oracle Solaris on x86 Platforms" by Paul Lowik (2004). Good tutorial on Solaris x86 optimization with assembly. The Solaris Operating System on x86 Platforms An excellent, detailed tutorial on X86 architecture, with Solaris specifics. By an ex-Sun employee, Frank Hofmann (2005). "AMD64 ABI Features", Solaris 64-bit Developer's Guide contains rules on data types and register usage for Intel 64/AMD64-class processors. (available at docs.oracle.com) Solaris X86 Assembly Language Reference Manual (available at docs.oracle.com) SPARC Assembly Language Reference Manual (available at docs.oracle.com) System V Application Binary Interface (2003) defines the AMD64 ABI for UNIX-class operating systems, including Solaris, Linux, and BSD. Google for it—the original website is gone. cc(1), gcc(1), and mdb(1) man pages.

Search Results

Search found 740 results on 30 pages for 'processors'.

Page 9/30 | < Previous Page | 5 6 7 8 9 10 11 12 13 14 15 16 | Next Page >

- by user63187

- by Vazgen

- by lyle

- by Piotr K.

- by Gerald L

- by cooplarsh

- by Philippe

- by Drew

- by Alois Kraus

- by danx

< Previous Page | 5 6 7 8 9 10 11 12 13 14 15 16 | Next Page >