Search Results

Search found 571 results on 23 pages for 'registers'.

Page 4/23 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • Optimizing Solaris 11 SHA-1 on Intel Processors

    - by danx
    SHA-1 is a "hash" or "digest" operation that produces a 160 bit (20 byte) checksum value on arbitrary data, such as a file. It is intended to uniquely identify text and to verify it hasn't been modified. Max Locktyukhin and others at Intel have improved the performance of the SHA-1 digest algorithm using multiple techniques. This code has been incorporated into Solaris 11 and is available in the Solaris Crypto Framework via the libmd(3LIB), the industry-standard libpkcs11(3LIB) library, and Solaris kernel module sha1. The optimized code is used automatically on systems with a x86 CPU supporting SSSE3 (Intel Supplemental SSSE3). Intel microprocessor architectures that support SSSE3 include Nehalem, Westmere, Sandy Bridge microprocessor families. Further optimizations are available for microprocessors that support AVX (such as Sandy Bridge). Although SHA-1 is considered obsolete because of weaknesses found in the SHA-1 algorithm—NIST recommends using at least SHA-256, SHA-1 is still widely used and will be with us for awhile more. Collisions (the same SHA-1 result for two different inputs) can be found with moderate effort. SHA-1 is used heavily though in SSL/TLS, for example. And SHA-1 is stronger than the older MD5 digest algorithm, another digest option defined in SSL/TLS. Optimizations Review SHA-1 operates by reading an arbitrary amount of data. The data is read in 512 bit (64 byte) blocks (the last block is padded in a specific way to ensure it's a full 64 bytes). Each 64 byte block has 80 "rounds" of calculations (consisting of a mixture of "ROTATE-LEFT", "AND", and "XOR") applied to the block. Each round produces a 32-bit intermediate result, called W[i]. Here's what each round operates: The first 16 rounds, rounds 0 to 15, read the 512 bit block 32 bits at-a-time. These 32 bits is used as input to the round. The remaining rounds, rounds 16 to 79, use the results from the previous rounds as input. Specifically for round i it XORs the results of rounds i-3, i-8, i-14, and i-16 and rotates the result left 1 bit. The remaining calculations for the round is a series of AND, XOR, and ROTATE-LEFT operators on the 32-bit input and some constants. The 32-bit result is saved as W[i] for round i. The 32-bit result of the final round, W[79], is the SHA-1 checksum. Optimization: Vectorization The first 16 rounds can be vectorized (computed in parallel) because they don't depend on the output of a previous round. As for the remaining rounds, because of step 2 above, computing round i depends on the results of round i-3, W[i-3], one can vectorize 3 rounds at-a-time. Max Locktyukhin found through simple factoring, explained in detail in his article referenced below, that the dependencies of round i on the results of rounds i-3, i-8, i-14, and i-16 can be replaced instead with dependencies on the results of rounds i-6, i-16, i-28, and i-32. That is, instead of initializing intermediate result W[i] with: W[i] = (W[i-3] XOR W[i-8] XOR W[i-14] XOR W[i-16]) ROTATE-LEFT 1 Initialize W[i] as follows: W[i] = (W[i-6] XOR W[i-16] XOR W[i-28] XOR W[i-32]) ROTATE-LEFT 2 That means that 6 rounds could be vectorized at once, with no additional calculations, instead of just 3! This optimization is independent of Intel or any other microprocessor architecture, although the microprocessor has to support vectorization to use it, and exploits one of the weaknesses of SHA-1. Optimization: SSSE3 Intel SSSE3 makes use of 16 %xmm registers, each 128 bits wide. The 4 32-bit inputs to a round, W[i-6], W[i-16], W[i-28], W[i-32], all fit in one %xmm register. The following code snippet, from Max Locktyukhin's article, converted to ATT assembly syntax, computes 4 rounds in parallel with just a dozen or so SSSE3 instructions: movdqa W_minus_04, W_TMP pxor W_minus_28, W // W equals W[i-32:i-29] before XOR // W = W[i-32:i-29] ^ W[i-28:i-25] palignr $8, W_minus_08, W_TMP // W_TMP = W[i-6:i-3], combined from // W[i-4:i-1] and W[i-8:i-5] vectors pxor W_minus_16, W // W = (W[i-32:i-29] ^ W[i-28:i-25]) ^ W[i-16:i-13] pxor W_TMP, W // W = (W[i-32:i-29] ^ W[i-28:i-25] ^ W[i-16:i-13]) ^ W[i-6:i-3]) movdqa W, W_TMP // 4 dwords in W are rotated left by 2 psrld $30, W // rotate left by 2 W = (W >> 30) | (W << 2) pslld $2, W_TMP por W, W_TMP movdqa W_TMP, W // four new W values W[i:i+3] are now calculated paddd (K_XMM), W_TMP // adding 4 current round's values of K movdqa W_TMP, (WK(i)) // storing for downstream GPR instructions to read A window of the 32 previous results, W[i-1] to W[i-32] is saved in memory on the stack. This is best illustrated with a chart. Without vectorization, computing the rounds is like this (each "R" represents 1 round of SHA-1 computation): RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR With vectorization, 4 rounds can be computed in parallel: RRRRRRRRRRRRRRRRRRRR RRRRRRRRRRRRRRRRRRRR RRRRRRRRRRRRRRRRRRRR RRRRRRRRRRRRRRRRRRRR Optimization: AVX The new "Sandy Bridge" microprocessor architecture, which supports AVX, allows another interesting optimization. SSSE3 instructions have two operands, a input and an output. AVX allows three operands, two inputs and an output. In many cases two SSSE3 instructions can be combined into one AVX instruction. The difference is best illustrated with an example. Consider these two instructions from the snippet above: pxor W_minus_16, W // W = (W[i-32:i-29] ^ W[i-28:i-25]) ^ W[i-16:i-13] pxor W_TMP, W // W = (W[i-32:i-29] ^ W[i-28:i-25] ^ W[i-16:i-13]) ^ W[i-6:i-3]) With AVX they can be combined in one instruction: vpxor W_minus_16, W, W_TMP // W = (W[i-32:i-29] ^ W[i-28:i-25] ^ W[i-16:i-13]) ^ W[i-6:i-3]) This optimization is also in Solaris, although Sandy Bridge-based systems aren't widely available yet. As an exercise for the reader, AVX also has 256-bit media registers, %ymm0 - %ymm15 (a superset of 128-bit %xmm0 - %xmm15). Can %ymm registers be used to parallelize the code even more? Optimization: Solaris-specific In addition to using the Intel code described above, I performed other minor optimizations to the Solaris SHA-1 code: Increased the digest(1) and mac(1) command's buffer size from 4K to 64K, as previously done for decrypt(1) and encrypt(1). This size is well suited for ZFS file systems, but helps for other file systems as well. Optimized encode functions, which byte swap the input and output data, to copy/byte-swap 4 or 8 bytes at-a-time instead of 1 byte-at-a-time. Enhanced the Solaris mdb(1) and kmdb(1) debuggers to display all 16 %xmm and %ymm registers (mdb "$x" command). Previously they only displayed the first 8 that are available in 32-bit mode. Can't optimize if you can't debug :-). Changed the SHA-1 code to allow processing in "chunks" greater than 2 Gigabytes (64-bits) Performance I measured performance on a Sun Ultra 27 (which has a Nehalem-class Xeon 5500 Intel W3570 microprocessor @3.2GHz). Turbo mode is disabled for consistent performance measurement. Graphs are better than words and numbers, so here they are: The first graph shows the Solaris digest(1) command before and after the optimizations discussed here, contained in libmd(3LIB). I ran the digest command on a half GByte file in swapfs (/tmp) and execution time decreased from 1.35 seconds to 0.98 seconds. The second graph shows the the results of an internal microbenchmark that uses the Solaris libpkcs11(3LIB) library. The operations are on a 128 byte buffer with 10,000 iterations. The results show operations increased from 320,000 to 416,000 operations per second. Finally the third graph shows the results of an internal kernel microbenchmark that uses the Solaris /kernel/crypto/amd64/sha1 module. The operations are on a 64Kbyte buffer with 100 iterations. third graph shows the results of an internal kernel microbenchmark that uses the Solaris /kernel/crypto/amd64/sha1 module. The operations are on a 64Kbyte buffer with 100 iterations. The results show for 1 kernel thread, operations increased from 410 to 600 MBytes/second. For 8 kernel threads, operations increase from 1540 to 1940 MBytes/second. Availability This code is in Solaris 11 FCS. It is available in the 64-bit libmd(3LIB) library for 64-bit programs and is in the Solaris kernel. You must be running hardware that supports Intel's SSSE3 instructions (for example, Intel Nehalem, Westmere, or Sandy Bridge microprocessor architectures). The easiest way to determine if SSSE3 is available is with the isainfo(1) command. For example, nehalem $ isainfo -v $ isainfo -v 64-bit amd64 applications sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov amd_sysc cx8 tsc fpu 32-bit i386 applications sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov sep cx8 tsc fpu If the output also shows "avx", the Solaris executes the even-more optimized 3-operand AVX instructions for SHA-1 mentioned above: sandybridge $ isainfo -v 64-bit amd64 applications avx xsave pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov amd_sysc cx8 tsc fpu 32-bit i386 applications avx xsave pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov sep cx8 tsc fpu No special configuration or setup is needed to take advantage of this code. Solaris libraries and kernel automatically determine if it's running on SSSE3 or AVX-capable machines and execute the correctly-tuned code for that microprocessor. Summary The Solaris 11 Crypto Framework, via the sha1 kernel module and libmd(3LIB) and libpkcs11(3LIB) libraries, incorporated a useful SHA-1 optimization from Intel for SSSE3-capable microprocessors. As with other Solaris optimizations, they come automatically "under the hood" with the current Solaris release. References "Improving the Performance of the Secure Hash Algorithm (SHA-1)" by Max Locktyukhin (Intel, March 2010). The source for these SHA-1 optimizations used in Solaris "SHA-1", Wikipedia Good overview of SHA-1 FIPS 180-1 SHA-1 standard (FIPS, 1995) NIST Comments on Cryptanalytic Attacks on SHA-1 (2005, revised 2006)

    Read the article

  • LAN not picking up gigabit connection through patch panel

    - by user332555
    I have just purchased 2 FVS318G switches to install at my store. How this is set up is the server is in the back room. We have Cat 5E ran up through the ceiling and is patched into a panel in the back room. The 2 switches I just purchased are right next to the server in the back where all the cables patch in. I do a direct connection from the server to switch, avoiding the patch panel completely, and receive 1.0 gbps connection no problem. When i patch in the register computers from the front into the panel and then to the switch I am only getting 100 mbps on the registers up front. The patch panel does say Cat 5E on it but I am not sure if there is any interference in the line somewhere and I cannot get the full 1.0 gbps to the front registers like I want. Any ideas??

    Read the article

  • Pasting from vim in terminal to Google Docs (Firefox + Vimperator) - need to understand

    - by LIttle Ancient Forest Kami
    I had some trouble with copy-pasting text from vim in terminal to Google Docs (aka Drive) document (hereafter GDd) in FF browser (with Vimperator). Note: I have a file opened in Vim 7.2 in terminal :version displays both +clipboard and +xterm-clipboard I'm on Ubuntu 10.04 LTS, so I don't think that's Unity-related I want to use Vim, not GVim, nor gedit... I'm avid fan of mouseless navigation, so solution with mouse was not what I wanted. I have the solution, but I need understanding. What I tried and where it gets me: Yanking whole file text via: ggvGy allows me to: paste it via mouse middle button, NOT with Ctrl+v or Shift+Insert here, in text area for entering question text in gedit but NOT in GDd where I want it pasted, even if I switch Vimperator to pass-through mode with Insert does NOT show in XClip after xclip -o From gedit, I can copy-paste the text into GDd (Vimperator's pass-through mode not required). :%! !xclip -i (or :first, last) reports whole file (all lines, to be precise) as filtered, though shell returns 1 `xclip -o' returns nothing (is empty) or returns previously copied value with 2. no surprise, but I can't paste at all not only to GDd but also to gedit or here setting clipboard (:set clipboard=unnamed) to unnamed doesn't help using "+y or "*y on whole file text actually does the trick So, the question (it's actually three, say "split" and I will): why middle mouse button pastes different things than Ctrl+v and how to know what will be pasted with each? why just yanking (without registers) works with mouse but not with keyboard / XClip? why didn't unnamed register help? After setting, it should make unnamed and * registers same?

    Read the article

  • Multiple Graphics Cards on Fedora 15 (Gnome 3)

    - by Michael
    I have the (delightful) misfortune of having 3 graphics cards. They are XFX Radeon 5750s. Each drives 2 monitors via dvi. I am having a really hard time getting these running on fedora 15 (gnome 3). So my setup is 3 columns of 2 monitors (the upper monitor is mounted upside down in each column to reduce the bezel between monitors). When the (graphical) login screen comes up all 6 have the blue stripey background that must be the default, but then when I login, things get interesting. In the xorg.conf below, you will see only 2 of the screens in the serverlayout while the other 4 are commented out. Logging in with only 2 of the screens active works well (and it even remembers that the top one is upside down, and should be considered above the lower, i am not sure where it stores this info, but i set it using the graphical "Displays" settings) However, as soon as I uncomment a third screen, or more, it gives me an error message when I login. It's one of those friendly, less helpful messages (Oh no! Something has gone wrong. A problem has occurred and the system can't recover. Please log out and try again). If i do not use an xorg.conf, then the "Displays" prefs pane shows only the two monitors on one of my graphics cards Thanks to anyone who can help me get going! (xorg.conf and then lspci below, and xorg log) xorg.conf Section "ServerLayout" Identifier "X.org Configured" Screen "Screen0" 0 0 Screen "Screen1" Below "Screen0" # Screen "Screen2" RightOf "Screen0" # Screen "Screen3" RightOf "Screen1" # Screen "Screen4" RightOf "Screen3" # Screen "Screen5" RightOf "Screen4" InputDevice "Mouse0" "CorePointer" InputDevice "Keyboard0" "CoreKeyboard" EndSection Section "Files" ModulePath "/usr/lib/xorg/modules" FontPath "catalogue:/etc/X11/fontpath.d" FontPath "built-ins" EndSection Section "Module" Load "record" Load "dri" Load "dbe" Load "extmod" Load "dri2" Load "glx" EndSection Section "InputDevice" Identifier "Keyboard0" Driver "kbd" EndSection Section "InputDevice" Identifier "Mouse0" Driver "mouse" Option "Protocol" "auto" Option "Device" "/dev/input/mice" Option "ZAxisMapping" "4 5 6 7" EndSection Section "Monitor" Identifier "Monitor0" VendorName "Monitor Vendor" ModelName "Monitor Model" EndSection Section "Monitor" Identifier "Monitor1" VendorName "Monitor Vendor" ModelName "Monitor Model" EndSection Section "Monitor" Identifier "Monitor2" VendorName "Monitor Vendor" ModelName "Monitor Model" EndSection Section "Monitor" Identifier "Monitor3" VendorName "Monitor Vendor" ModelName "Monitor Model" EndSection Section "Monitor" Identifier "Monitor4" VendorName "Monitor Vendor" ModelName "Monitor Model" EndSection Section "Monitor" Identifier "Monitor5" VendorName "Monitor Vendor" ModelName "Monitor Model" EndSection Section "Device" Identifier "Card0" Driver "radeon" BusID "PCI:4:0:0" EndSection Section "Device" Identifier "Card1" Driver "radeon" BusID "PCI:5:0:0" EndSection Section "Device" Identifier "Card2" Driver "radeon" BusID "PCI:6:0:0" EndSection Section "Screen" Identifier "Screen0" Device "Card0" Monitor "Monitor0" SubSection "Display" Viewport 0 0 Depth 24 EndSubSection EndSection Section "Screen" Identifier "Screen1" Device "Card0" Monitor "Monitor1" SubSection "Display" Viewport 0 0 Depth 24 EndSubSection EndSection Section "Screen" Identifier "Screen2" Device "Card1" Monitor "Monitor2" SubSection "Display" Viewport 0 0 Depth 24 EndSubSection EndSection Section "Screen" Identifier "Screen3" Device "Card1" Monitor "Monitor3" SubSection "Display" Viewport 0 0 Depth 24 EndSubSection EndSection Section "Screen" Identifier "Screen4" Device "Card2" Monitor "Monitor4" SubSection "Display" Viewport 0 0 Depth 24 EndSubSection EndSection Section "Screen" Identifier "Screen5" Device "Card2" Monitor "Monitor5" SubSection "Display" Viewport 0 0 Depth 24 EndSubSection EndSection lspci output follows [tgm@tgm ~]$ lspci 00:00.0 Host bridge: Intel Corporation 5520/5500/X58 I/O Hub to ESI Port (rev 13) 00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev 13) 00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 3 (rev 13) 00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 7 (rev 13) 00:14.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub System Management Registers (rev 13) 00:14.1 PIC: Intel Corporation 5520/5500/X58 I/O Hub GPIO and Scratch Pad Registers (rev 13) 00:14.2 PIC: Intel Corporation 5520/5500/X58 I/O Hub Control Status and RAS Registers (rev 13) 00:14.3 PIC: Intel Corporation 5520/5500/X58 I/O Hub Throttle Registers (rev 13) 00:1a.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #4 00:1a.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #5 00:1a.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #6 00:1a.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #2 00:1b.0 Audio device: Intel Corporation 82801JI (ICH10 Family) HD Audio Controller 00:1c.0 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 1 00:1c.1 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Port 2 00:1c.2 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 3 00:1d.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #1 00:1d.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #2 00:1d.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #3 00:1d.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #1 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90) 00:1f.0 ISA bridge: Intel Corporation 82801JIR (ICH10R) LPC Interface Controller 00:1f.2 IDE interface: Intel Corporation 82801JI (ICH10 Family) 4 port SATA IDE Controller #1 00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller 00:1f.5 IDE interface: Intel Corporation 82801JI (ICH10 Family) 2 port SATA IDE Controller #2 02:00.0 PCI bridge: nVidia Corporation NF200 PCIe 2.0 switch for mainboards (rev a3) 03:00.0 PCI bridge: nVidia Corporation NF200 PCIe 2.0 switch for mainboards (rev a3) 03:02.0 PCI bridge: nVidia Corporation NF200 PCIe 2.0 switch for mainboards (rev a3) 04:00.0 VGA compatible controller: ATI Technologies Inc Juniper [Radeon HD 5750 Series] 04:00.1 Audio device: ATI Technologies Inc Juniper HDMI Audio [Radeon HD 5700 Series] 05:00.0 VGA compatible controller: ATI Technologies Inc Juniper [Radeon HD 5750 Series] 05:00.1 Audio device: ATI Technologies Inc Juniper HDMI Audio [Radeon HD 5700 Series] 06:00.0 VGA compatible controller: ATI Technologies Inc Juniper [Radeon HD 5750 Series] 06:00.1 Audio device: ATI Technologies Inc Juniper HDMI Audio [Radeon HD 5700 Series] 07:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02) 08:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02) xorg log (posted to fpaste due to how long it is. thanks to marcusw for the request) http://www.fpaste.org/r5ww/ xorg log with all 6 monitors enabled in xorg.conf (they all turn on and have blue, but then one gets the aforementioned user-friendly error message). http://www.fpaste.org/X63H/

    Read the article

  • How Do Computers Work? [closed]

    - by Rob P.
    This is almost embarrassing ask...I have a degree in Computer Science (and a second one in progress). I've worked as a full-time .NET Developer for nearly five years. I generally seem competent at what I do. But I Don't Know How Computers Work! Please, bare with me for a second. A quick Google of 'How a Computer Works' will yield lots and lots of results, but I struggled to find one that really answered what I'm looking for. I realize this is a huge, huge question, so really, if you can just give me some keywords or some direction. I know there are components....the power supply, the motherboard, ram, CPU, etc...and I get the 'general idea' of what they do. But I really don't understand how you go from a line of code like Console.Readline() in .NET (or Java or C++) and have it actually do stuff. Sure, I'm vaguely aware of MSIL (in the case of .NET), and that some magic happens with the JIT compiler and it turns into native code (I think). I'm told Java is similar, and C++ cuts out the middle step. I've done some mainframe assembly, it was a few years back now. I remember there were some instructions and some CPU registers, and I wrote code....and then some magic happened....and my program would work (or crash). From what I understand, an 'Emulator' would simulate what happens when you call an instruction and it would update the CPU registers; but what makes those instructions work the way they do? Does this turn into an Electronics question and not a 'Computer' question? I'm guessing there isn't any practical reason for me to understand this, but I feel like I should be able to. (Yes, this is what happens when you spend a day with a small child. It takes them about 10 minutes and five iterations of asking 'Why?' for you to realize how much you don't know)

    Read the article

  • Web Content Filtering for Windows Clients

    - by djoyce
    I'm working with a small business to solve a bunch of problems. One is their Windows 7 POS registers need to have web access restricted to only three remote support sites, but the back office machine needs an unfiltered connection. I'd like something I can install and configure on the few registers to block all but those few sites. In a perfect world this would restrict the normal register user, but the admin user would not be filtered. Free is best, if it works, but a small fee would be alright too. Microsoft's Family Safety filter is close, but requires a Windows Live account, which isn't ideal, but may be alright. Anyone use this in a small business environment? I'd prefer something easily managed at the local machines. K9 Web Protection is interesting and I'm going to look into it more. Are there other options? Seems like someone would have made something simple like this as an open source project, but maybe not.

    Read the article

  • Getting to grips with the stack in nasm

    - by MarkPearl
    Today I spent a good part of my day getting to grips with the stack and nasm. After looking at my notes on nasm I think this is one area for the course I am doing they could focus more on… So here are some snippets I have put together that have helped me understand a little bit about the stack… Simplest example of the stack You will probably see examples like the following in circulation… these demonstrate the simplest use of the stack… org 0x100 bits 16 jmp main main: push 42h push 43h push 44h mov ah,2h ;set to display characters pop dx    ;get the first value int 21h   ;and display it pop dx    ;get 2nd value int 21h   ;and display it pop dx    ;get 3rd value int 21h   ;and display it int 20h The output from above code would be… DCB Decoupling code using “call” and “ret” This is great, but it oversimplifies what I want to use the stack for… I do not know if this goes against the grain of assembly programmers or not, but I want to write loosely coupled assembly code – and I want to use the stack as a mechanism for passing values into my decoupled code. In nasm we have the call and return instructions, which provides a mechanism for decoupling code, for example the following could be done… org 0x100 bits 16 jmp main ;---------------------------------------- displayChar: mov ah,2h mov dx,41h int 21h ret ;---------------------------------------- main: call displayChar int 20h   This would output the following to the console A So, it would seem that call and ret allow us to jump to segments of our code and then return back to the calling position – a form of segmenting the code into what we would called in higher order languages “functions” or “methods”. The only issue is, in higher order languages there is a way to pass parameters into the functions and return results. Because of the primitive nature of the call and ret instructions, this does not seem to be obvious. We could of course use the registers to pass values into the subroutine and set values coming out, but the problem with this is we… Have a limited number of registers Are threading our code with tight coupling (it would be hard to migrate methods outside of their intended use in a particular program to another one) With that in mind, I turn to the stack to provide a loosely coupled way of calling subroutines… First attempt with the Stack Initially I thought this would be simple… we could use code that looks as follows to achieve what I want… org 0x100 bits 16 jmp main ;---------------------------------------- displayChar: mov ah,2h pop dx int 21h ret ;---------------------------------------- main: push 41h call displayChar int 20h   However running this application does not give the desired result, I want an ‘A’ to be returned, and I am getting something totally different (you will to). Reading up on the call and ret instructions a discovery is made… they are pushing and popping things onto and off the stack as well… When the call instruction is executed, the current value of IP (the address of the instruction to follow) is pushed onto the stack, when ret is called, the last value on the stack is popped off into the IP register. In effect what the above code is doing is as follows with the stack… push 41h push current value of ip pop current value of ip to dx pop 41h to ip This is not what I want, I need to access the 41h that I pushed onto the stack, but the call value (which is necessary) is putting something in my way. So, what to do? Remember we have other registers we can use as well as a thing called indirect addressing… So, after some reading around, I came up with the following approach using indirect addressing… org 0x100 bits 16 jmp main ;---------------------------------------- displayChar: mov bp,sp mov ah,2h mov dx,[bp+2] int 21h ret ;---------------------------------------- main: push 41h call displayChar int 20h In essence, what I have done here is used a trick with the stack pointer… it goes as follows… Push 41 onto the stack Make the call to the function, which will push the IP register onto the stack and then jump to the displayChar label Move the value in the stack point to the bp register (sp currently points at IP register) Move the at the location of bp minus 2 bytes to dx (this is now the value 41h) display it, execute the ret instruction, which pops the ip value off the stack and goes back to the calling point This approach is still very raw, some further reading around shows that I should be pushing the value of bp onto the stack before replacing it with sp, but it is the starting thread to getting loosely coupled subroutines. Let’s see if you get what the following output would be? org 0x100 bits 16 jmp main ;---------------------------------------- displayChar: mov bp,sp mov ah,2h mov dx,[bp+4] int 21h mov dx,[bp+2] int 21h ret ;---------------------------------------- main: push 41h push 42h call displayChar int 20h The output is… AB Where to from here? If by any luck some assembly programmer comes along and see this code and notices that I have made some fundamental flaw in my logic… I would like to know, so please leave a comment… appreciate any feedback!

    Read the article

  • 32bit to 64bit inline assembly porting

    - by Simone Margaritelli
    I have a piece of C++ code (compiled with g++ under a GNU/Linux environment) that load a function pointer (how it does that doesn't matter), pushes some arguments onto the stack with some inline assembly and then calls that function, the code is like : unsigned long stack[] = { 1, 23, 33, 43 }; /* save all the registers and the stack pointer */ unsigned long esp; asm __volatile__ ( "pusha" ); asm __volatile__ ( "mov %%esp, %0" :"=m" (esp)); for( i = 0; i < sizeof(stack); i++ ){ unsigned long val = stack[i]; asm __volatile__ ( "push %0" :: "m"(val) ); } unsigned long ret = function_pointer(); /* restore registers and stack pointer */ asm __volatile__ ( "mov %0, %%esp" :: "m" (esp) ); asm __volatile__ ( "popa" ); I'd like to add some sort of #ifdef _LP64 // 64bit inline assembly #else // 32bit version as above example #endif But i don't know inline assembly for 64bit machines, anyone could help me? Thanks

    Read the article

  • Account verification Yelp style, how is it more "secure" than traditional verification?

    - by Chad
    For business owners to "take control" of their business page on Yelp, they register for it. The Yelp system performs a telephone call-back. From watching to the video here, it sounds like a telephone version of what we all typically do - e-mail check. For e-mail check, it basically goes like this: User registers verify e-mail sent they click link inside verify e-mail site verifies Here's Yelp's: User registers verify screen shown with code Yelp calls user user enters code site verifies It's essentially the same thing, via phone. Is there any reason you can see why this method is better than the e-mail method?

    Read the article

  • Design - Where should objects be registered when using Windsor

    - by Fredrik Jansson
    I will have the following components in my application DataAccess DataAccess.Test Business Business.Test Application I was hoping to use Castle Windsor as IoC to glue the layers together but I am bit uncertain about the design of the gluing. My question is who should be responsible for registering the objects into Windsor? I have a couple of ideas; Each layer can register its own objects. To test the BL, the test bench could register mock classes for the DAL. Each layer can register the object of its dependencies, e.g. the business layer registers the components of the data access layer. To test the BL, the test bench would have to unload the "real" DAL object and register the mock objects. The application (or test app) registers all objects of the dependencies. Can someone help me with some ideas and pros/cons with the different paths? Links to example projects utilizing Castle Windsor in this way would be very helpful.

    Read the article

  • 80x86 16-bit asm: lea cx, [cx*8+cx] causes error on NASM (compiling .com file)

    - by larz
    Title says it all. The error NASM gives (dispite my working OS) is "invalid effective address". Now i've seen many examples of how to use LEA and i think i gots it right but yet my NASM dislikes it. I tried "lea cx, [cx+9]" and it worked; "lea cx, [bx+cx]" didn't. Now if i extended my registers to 32-bits (i.e. "lea ecx, [ecx*8+ecx]") everything would be well but i am restricted to use 16- and 8-bit registers only. Is here anyone so knoweledgeable who could explain me WHY my assembler doesn't let me use lea the way i supposed it should be used? Thanks.

    Read the article

  • Software to find dependencys in a database that not have them as restrictions

    - by Tomas Friden
    I have a database in SQL server 2005 that originaly comes fom an old mainframe. All relations was set in the surrounding software and there are non i the database. I need to find the relations, not by field name but by actual contence in the registers. (as suggestions, I realize I'l have to check them up) It would be nice with some extras, like finding motherless posts etc. but not a primary demand. There are some 200 registers with max 2 000 000 posts in any register. Does any one know of a software that can help me with this? The software I can find presuposes that relations are set in the database :(

    Read the article

  • Binding functions of derived class with luabind

    - by Anamon
    I am currently developing a plugin-based system in C++ which provides a Lua scripting interface, for which I chose to use luabind. I'm using Lua 5 and luabind 0.9, both statically linked and compiled with MSVC++ 8. I am now having trouble binding functions with luabind when they are defined in a derived class, but not its parent class. More specifically, I have an abstract base class called 'IPlugin' from which all plugin classes inherit. When the plugin manager initialises, it registers that class and its functions like this: luabind::open(L); luabind::module(L) [ luabind::class_("IPlugin") .def("start", (void(IPlugin::*)())&IPlugin::start) ]; As it is only known at runtime what effective plugin classes are available, I had to solve loading plugins in a kind of roundabout way. The plugin manager exposes a factory function to Lua, which takes the name of a plugin class and a desired object name. The factory then creates the object, registers the plugin's class as inheriting from the 'IPlugin' base class, and immediately calls a function on the created object that registers itself as a global with the Lua state, like this: void PluginExample::registerLuaObject(lua_State *L, string a_name) { luabind::globals(L)[a_name] = (PluginExample*)this; } I initially did this because I had problems with Lua determining the most derived class of the object, as if I register it from the StreamManager it is only known as a subtype of 'IPlugin' and not the specific subtype. I'm not sure anymore if this is even necessary though, but it works and the created object is subsequently accessible from Lua under 'a_name'. The problem I have, though, is that functions defined in the derived class, which were not declared at all in the parent class, cannot be used. Virtual functions defined in the base class, such as 'start' above, work fine, and calling them from Lua on the new object runs the respective redefined code from the 'PluginExample' class. But if I add a new function to 'PluginExample', here for example a function taking no arguments and returning void, and register it like this: luabind::module(L) [ luabind::class_("PluginExample") .def(luabind::constructor()) .def("func", &PluginExample::func) ]; calling 'func' on the new object yields the following Lua runtime error: No matching overload found, candidates: void func(PluginExample&) I am correctly using the ':' syntax so the 'self' argument is not needed and it seems suddenly Lua cannot determine the derived type of the object anymore. I am sure I am doing something wrong, probably having to do with the two-step binding required by my system architecture, but I can't figure out where. I'd much appreciate some help =)

    Read the article

  • Does the compiler provides extra stack space for byte-spilling?

    - by xuwicha
    From the sample code below which I got here, I don't understand why the value of registers are move to specific part in stack when byte-spilling is performed. pushq %rbp movq %rsp, %rbp subq $96, %rsp leaq L__unnamed_cfstring_23(%rip), %rax leaq L__unnamed_cfstring_26(%rip), %rcx movl $42, %edx leaq l_objc_msgSend_fixup_alloc(%rip), %r8 movl $0, -4(%rbp) movl %edi, -8(%rbp) movq %rsi, -16(%rbp) movq %rax, -48(%rbp) ## 8-byte Spill movq %rcx, -56(%rbp) ## 8-byte Spill movq %r8, -64(%rbp) ## 8-byte Spill movl %edx, -68(%rbp) ## 4-byte Spill callq _objc_autoreleasePoolPush movq L_OBJC_CLASSLIST_REFERENCES_$_(%rip), %rcx movq %rcx, %rdi movq -64(%rbp), %rsi ## 8-byte Reload movq %rax, -80(%rbp) ## 8-byte Spill callq *l_objc_msgSend_fixup_alloc(%rip) movq L_OBJC_SELECTOR_REFERENCES_27(%rip), %rsi movq %rax, %rdi movq -56(%rbp), %rdx ## 8-byte Reload movl -68(%rbp), %ecx ## 4-byte Reload And also, I don't know what is the purpose of byte-spilling since the program logic can still be achieved if the function is the one saving the value of the registers it will be used inside it. I really have no idea why is this happening. Please help me understand this.

    Read the article

  • What happens when a computer program runs?

    - by gaijinco
    I know the general theory but I can't fit in the details. I know that a program resides in the secondary memory of a computer. Once the program begins execution it is entirely copied to the RAM. Then the processor retrive a few instructions (it depends on the size of the bus) at a time, puts them in registers and executes them. I also know that a computer program uses two kinds of memory: stack and heap, which are also part of the primary memory of the computer. The stack is used for non-dynamic memory, and the heap for dynamic memory (for example, everything related to the new operator in C++) What I can't understand is how those two things connect. At what point is the stack used for the execution of the instructions? Instructions go from the RAM, to the stack, to the registers?

    Read the article

  • How to determine the port numbers for peripheral devices?

    - by smwikipedia
    I know that peripheral devices such as a hard driver, a floppy driver, etc are controlled by reading/writing certain control registers on their device controllers. I am wondering about the following questions: Is it true that when these peripheral devices are plugged onto the computer, the addresses(port numbers) of their control registers are thus determined by how they are attached to the address bus (i.e. the hard-wiring rules, not any soft things)? Who makes the scheme of the port number assignment? If I was given a naked computer(with no operating system and with many peripheral devices), how could I figure out the port number assignment so I can use them to control peripheral deveices. At last and as usual, thanks for your patience and reply. 8^)

    Read the article

  • How to determine the port numbers for peripherals devices?

    - by smwikipedia
    I know that peripheral devices such as a hard driver, a floppy driver, etc are controlled by reading/writing certain control registers on their device controllers. I am wondering about the following questions: Is it true that when these peripheral devices are plugged onto the computer, the addresses(port numbers) of their control registers are thus determined by how they are attached to the address bus (i.e. the hard-wiring)? Who makes the scheme of the port number assignment? If I was given a naked computer(with no operating system and with many peripheral devices), how could I figure out the port number assignment so I can use them to control peripheral deveices.

    Read the article

  • Address of instruction causing SIGSEGV in external program

    - by karramba
    I want to get address of instruction that causes external program to SIGSEGV. I tried using ptrace for this, but I'm getting EIP from kernel space (probably default signal handler?). How GDB is able to get the correct EIP? Is there a way to make GDB provide this information using some API? edit: I don't have sources of the program, only binary executable. I need automation, so I can't simply use "run", "info registers" in GDB. I want to implement "info registers" in my own mini-debugger :)

    Read the article

  • Scan for first zero bit (Assembly)?

    - by cthulhu
    I have some numbers in AH, AL, BL, BH registers. I need to check whether there is 0 bit in each of the registers in left half of the number. If yes, then put into check variable value 10 else -10. How can I do this? I tried something like that: org 100h check dw 0 mov ah, 11111111b mov al, 11111111b mov bl, 11111111b mov bh, 11111111b mov check, -10 shr ah, 4 shr al, 4 shr bl, 4 shr bh, 4 cmp ah, 0Fh jz first first: cmp al, 0Fh jz second second: cmp bl, 0Fh jz third third: cmp bh, 0Fh jz final final: mov check, 10 ret

    Read the article

  • Operations on 64bit words in 32bit system

    - by Vilo
    I'm new here same as I'm new with assembly. I hope that you can help me to start. I'm using 32bit (i686) Ubuntu to make programs in assembly, using gcc compiler. I know that general-purpose-registers are 32bit (4 bytes) max, but what when I have to operate on 64 bit numbers? Intel's instruction says that higher bits are stored in %edx and lower in %eax Great... So how can I do something with this 2-registers number? I have to convert 64bit dec to bin, then save it to memory and show on the screen. How to make the 64bit quadword at start of the program in .data section?

    Read the article

  • Installing Lubuntu 14.04.1 forcepae fails

    - by Rantanplan
    I tried to install Lubuntu 14.04.1 from a CD. First, I chose Try Lubuntu without installing which gave: ERROR: PAE is disabled on this Pentium M (PAE can potentially be enabled with kernel parameter "forcepae" ... Following the description on https://help.ubuntu.com/community/PAE, I used forcepae and tried Try Lubuntu without installing again. That worked fine. dmesg | grep -i pae showed: [ 0.000000] Kernel command line: file=/cdrom/preseed/lubuntu.seed boot=casper initrd=/casper/initrd.lz quiet splash -- forcepae [ 0.008118] PAE forced! On the live-CD session, I tried installing Lubuntu double clicking on the install button on the desktop. Here, the CD starts running but then stops running and nothing happens. Next, I rebooted and tried installing Lubuntu directly from the boot menu screen using forcepae again. After a while, I receive the following error message: The installer encountered an unrecoverable error. A desktop session will now be run so that you may investigate the problem or try installing again. Hitting Enter brings me to the desktop. For what errors should I search? And how? Finally, I rebooted once more and tried Check disc for defects with forcepae option; no errors have been found. Now, I am wondering how to find the error or whether it would be better to follow advice c in https://help.ubuntu.com/community/PAE: "Move the hard disk to a computer on which the processor has PAE capability and PAE flag (that is, almost everything else than a Banias). Install the system as usual but don't add restricted drivers. After the install move the disk back." Thanks for some hints! Perhaps some of the following can help: On Lubuntu 12.04: cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 13 model name : Intel(R) Pentium(R) M processor 1.50GHz stepping : 6 microcode : 0x17 cpu MHz : 600.000 cache size : 2048 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr mce cx8 mtrr pge mca cmov clflush dts acpi mmx fxsr sse sse2 ss tm pbe up bts est tm2 bogomips : 1284.76 clflush size : 64 cache_alignment : 64 address sizes : 32 bits physical, 32 bits virtual power management: uname -a Linux humboldt 3.2.0-67-generic #101-Ubuntu SMP Tue Jul 15 17:45:51 UTC 2014 i686 i686 i386 GNU/Linux lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 12.04.5 LTS Release: 12.04 Codename: precise cpuid eax in eax ebx ecx edx 00000000 00000002 756e6547 6c65746e 49656e69 00000001 000006d6 00000816 00000180 afe9f9bf 00000002 02b3b001 000000f0 00000000 2c04307d 80000000 80000004 00000000 00000000 00000000 80000001 00000000 00000000 00000000 00000000 80000002 20202020 20202020 65746e49 2952286c 80000003 6e655020 6d756974 20295228 7270204d 80000004 7365636f 20726f73 30352e31 007a4847 Vendor ID: "GenuineIntel"; CPUID level 2 Intel-specific functions: Version 000006d6: Type 0 - Original OEM Family 6 - Pentium Pro Model 13 - Stepping 6 Reserved 0 Brand index: 22 [not in table] Extended brand string: " Intel(R) Pentium(R) M processor 1.50GHz" CLFLUSH instruction cache line size: 8 Feature flags afe9f9bf: FPU Floating Point Unit VME Virtual 8086 Mode Enhancements DE Debugging Extensions PSE Page Size Extensions TSC Time Stamp Counter MSR Model Specific Registers MCE Machine Check Exception CX8 COMPXCHG8B Instruction SEP Fast System Call MTRR Memory Type Range Registers PGE PTE Global Flag MCA Machine Check Architecture CMOV Conditional Move and Compare Instructions FGPAT Page Attribute Table CLFSH CFLUSH instruction DS Debug store ACPI Thermal Monitor and Clock Ctrl MMX MMX instruction set FXSR Fast FP/MMX Streaming SIMD Extensions save/restore SSE Streaming SIMD Extensions instruction set SSE2 SSE2 extensions SS Self Snoop TM Thermal monitor 31 reserved TLB and cache info: b0: unknown TLB/cache descriptor b3: unknown TLB/cache descriptor 02: Instruction TLB: 4MB pages, 4-way set assoc, 2 entries f0: unknown TLB/cache descriptor 7d: unknown TLB/cache descriptor 30: unknown TLB/cache descriptor 04: Data TLB: 4MB pages, 4-way set assoc, 8 entries 2c: unknown TLB/cache descriptor On Lubuntu 14.04.1 live-CD with forcepae: cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 13 model name : Intel(R) Pentium(R) M processor 1.50GHz stepping : 6 microcode : 0x17 cpu MHz : 600.000 cache size : 2048 KB physical id : 0 siblings : 1 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fdiv_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov clflush dts acpi mmx fxsr sse sse2 ss tm pbe bts est tm2 bogomips : 1284.68 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 32 bits virtual power management: uname -a Linux lubuntu 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 03:51:12 UTC 2014 i686 i686 i686 GNU/Linux lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 14.04.1 LTS Release: 14.04 Codename: trusty cpuid CPU 0: vendor_id = "GenuineIntel" version information (1/eax): processor type = primary processor (0) family = Intel Pentium Pro/II/III/Celeron/Core/Core 2/Atom, AMD Athlon/Duron, Cyrix M2, VIA C3 (6) model = 0xd (13) stepping id = 0x6 (6) extended family = 0x0 (0) extended model = 0x0 (0) (simple synth) = Intel Pentium M (Dothan B1) / Celeron M (Dothan B1), 90nm miscellaneous (1/ebx): process local APIC physical ID = 0x0 (0) cpu count = 0x0 (0) CLFLUSH line size = 0x8 (8) brand index = 0x16 (22) brand id = 0x16 (22): Intel Pentium M, .13um feature information (1/edx): x87 FPU on chip = true virtual-8086 mode enhancement = true debugging extensions = true page size extensions = true time stamp counter = true RDMSR and WRMSR support = true physical address extensions = false machine check exception = true CMPXCHG8B inst. = true APIC on chip = false SYSENTER and SYSEXIT = true memory type range registers = true PTE global bit = true machine check architecture = true conditional move/compare instruction = true page attribute table = true page size extension = false processor serial number = false CLFLUSH instruction = true debug store = true thermal monitor and clock ctrl = true MMX Technology = true FXSAVE/FXRSTOR = true SSE extensions = true SSE2 extensions = true self snoop = true hyper-threading / multi-core supported = false therm. monitor = true IA64 = false pending break event = true feature information (1/ecx): PNI/SSE3: Prescott New Instructions = false PCLMULDQ instruction = false 64-bit debug store = false MONITOR/MWAIT = false CPL-qualified debug store = false VMX: virtual machine extensions = false SMX: safer mode extensions = false Enhanced Intel SpeedStep Technology = true thermal monitor 2 = true SSSE3 extensions = false context ID: adaptive or shared L1 data = false FMA instruction = false CMPXCHG16B instruction = false xTPR disable = false perfmon and debug = false process context identifiers = false direct cache access = false SSE4.1 extensions = false SSE4.2 extensions = false extended xAPIC support = false MOVBE instruction = false POPCNT instruction = false time stamp counter deadline = false AES instruction = false XSAVE/XSTOR states = false OS-enabled XSAVE/XSTOR = false AVX: advanced vector extensions = false F16C half-precision convert instruction = false RDRAND instruction = false hypervisor guest status = false cache and TLB information (2): 0xb0: instruction TLB: 4K, 4-way, 128 entries 0xb3: data TLB: 4K, 4-way, 128 entries 0x02: instruction TLB: 4M pages, 4-way, 2 entries 0xf0: 64 byte prefetching 0x7d: L2 cache: 2M, 8-way, sectored, 64 byte lines 0x30: L1 cache: 32K, 8-way, 64 byte lines 0x04: data TLB: 4M pages, 4-way, 8 entries 0x2c: L1 data cache: 32K, 8-way, 64 byte lines extended feature flags (0x80000001/edx): SYSCALL and SYSRET instructions = false execution disable = false 1-GB large page support = false RDTSCP = false 64-bit extensions technology available = false Intel feature flags (0x80000001/ecx): LAHF/SAHF supported in 64-bit mode = false LZCNT advanced bit manipulation = false 3DNow! PREFETCH/PREFETCHW instructions = false brand = " Intel(R) Pentium(R) M processor 1.50GHz" (multi-processing synth): none (multi-processing method): Intel leaf 1 (synth) = Intel Pentium M (Dothan B1), 90nm

    Read the article

  • Redirect/rewrite dynamic URL to sub-domain and create DNS for subdomain

    - by Abdul Majeed
    I have created an application in PHP, I would like to re-direct the following URL to corresponding sub-domain. Dynamic URL pattern: http://mydomain.com/mypage.php?user_name=testuser I wish to re-direct this to the corresponding sub domain: http://testuser.mydomain.com/ How do I create a rewrite rule for this purpose? How do I register DNS for sub-domain without using CPANEL? (I want to activate sub-domain when the user registers to the system.)

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >