Search Results

Search found 6716 results on 269 pages for 'distributed algorithm'.

Page 126/269 | < Previous Page | 122 123 124 125 126 127 128 129 130 131 132 133 | Next Page >

Developing Schema Compare for Oracle (Part 5): Query Snapshots

- by Simon Cooper

If you've emailed us about a bug you've encountered with the EAP or beta versions of Schema Compare for Oracle, we probably asked you to send us a query snapshot of your databases. Here, I explain what a query snapshot is, and how it helps us fix your bug. Problem 1: Debugging users' bug reports When we started the Schema Compare project, we knew we were going to get problems with users' databases - configurations we hadn't considered, features that weren't installed, unicode issues, wierd dependencies... With SQL Compare, users are generally happy to send us a database backup that we can restore using a single RESTORE DATABASE command on our test servers and immediately reproduce the problem. Oracle, on the other hand, would be a lot more tricky. As Oracle generally has a 1-to-1 mapping between instances and databases, any databases users sent would have to be restored to their own instance. Furthermore, the number of steps required to get a properly working database, and the size of most oracle databases, made it infeasible to ask every customer who came across a bug during our beta program to send us their databases. We also knew that there would be lots of issues with data security that would make it hard to get backups. So we needed an easier way to be able to debug customers issues and sort out what strange schema data Oracle was returning. Problem 2: Test execution time Another issue we knew we would have to solve was the execution time of the tests we would produce for the Schema Compare engine. Our initial prototype showed that querying the data dictionary for schema information was going to be slow (at least 15 seconds per database), and this is generally proportional to the size of the database. If you're running thousands of tests on the same databases, each one registering separate schemas, not only would the tests would take hours and hours to run, but the test servers would be hammered senseless. The solution To solve these, we needed to be able to populate the schema of a database without actually connecting to it. Well, the IDataReader interface is the primary way we read data from an Oracle server. The data dictionary queries we use return their data in terms of simple strings and numbers, which we then process and reconstruct into an object model, and the results of these queries are identical for identical schemas. So, we can record the raw results of the queries once, and then replay these results to construct the same object model as many times as required without needing to actually connect to the original database. This is what query snapshots do. They are binary files containing the raw unprocessed data we get back from the oracle server for all the queries we run on the data dictionary to get schema information. The core of the query snapshot generation takes the results of the IDataReader we get from running queries on Oracle, and passes the row data to a BinaryWriter that writes it straight to a file. The query snapshot can then be replayed to create the same object model; when the results of a specific query is needed by the population code, we can simply read the binary data stored in the file on disk and present it through an IDataReader wrapper. This is far faster than querying the server over the network, and allows us to run tests in a reasonable time. They also allow us to easily debug a customers problem; using a simple snapshot generation program, users can generate a query snapshot that could be sent along with a bug report that we can immediately replay on our machines to let us debug the issue, rather than having to obtain database backups and restore databases to test systems. There are also far fewer problems with data security; query snapshots only contain schema information, which is generally less sensitive than table data. Query snapshots implementation However, actually implementing such a feature did have a couple of 'gotchas' to it. My second blog post detailed the development of the dependencies algorithm we use to ensure we get all the dependencies in the database, and that algorithm uses data from both databases to find all the needed objects - what database you're comparing to affects what objects get populated from both databases. We get information on these additional objects using an appropriate WHERE clause on all the population queries. So, in order to accurately replay the results of querying the live database, the query snapshot needs to be a snapshot of a comparison of two databases, not just populating a single database. Furthermore, although the code population queries (eg querying all_tab_cols to get column information) can simply be passed straight from the IDataReader to the BinaryWriter, we need to hook into and run the live dependencies algorithm while we're creating the snapshot to ensure we get the same WHERE clauses, and the same query results, as if we were populating straight from a live system. We also need to store the results of the dependencies queries themselves, as the resulting dependency graph is stored within the OracleDatabase object that is produced, and is later used to help order actions in synchronization scripts. This is significantly helped by the dependencies algorithm being a deterministic algorithm - given the same input, it will always return the same output. Therefore, when we're replaying a query snapshot, and processing dependency information, we simply have to return the results of the queries in the order we got them from the live database, rather than trying to calculate the contents of all_dependencies on the fly. Query snapshots are a significant feature in Schema Compare that really helps us to debug problems with the tool, as well as making our testers happier. Although not really user-visible, they are very useful to the development team to help us fix bugs in the product much faster than we otherwise would be able to.

Read the article
Is your team is a high-performing team?

As a child I can remember looking out of the car window as my father drove along the Interstate in Florida while seeing prisoners wearing bright orange jump suits and prison guards keeping a watchful eye on them. The prisoners were taking part in a prison road gang. These road gangs were formed to help the state maintain the state highway infrastructure. The prisoner’s primary responsibilities are to pick up trash and debris from the roadway. This is a prime example of a work group or working group used by most prison systems in the United States. Work groups or working groups can be defined as a collection of individuals or entities working together to achieve a specific goal or accomplish a specific set of tasks. Typically these groups are only established for a short period of time and are dissolved once the desired outcome has been achieved. More often than not group members usually feel as though they are expendable to the group and some even dread that they are even in the group. "A team is a small number of people with complementary skills who are committed to a common purpose, performance goals, and approach for which they are mutually accountable." (Katzenbach and Smith, 1993) So how do you determine that a team is a high-performing team? This can be determined by three base line criteria that include: consistently high quality output, the promotion of personal growth and well being of all team members, and most importantly the ability to learn and grow as a unit. Initially, a team can successfully create high-performing output without meeting all three criteria, however this will erode over time because team members will feel detached from the group or that they are not growing then the quality of the output will decline. High performing teams are similar to work groups because they both utilize a collection of individuals or entities to accomplish tasks. What distinguish a high-performing team from a work group are its characteristics. High-performing teams contain five core characteristics. These characteristics are what separate a group from a team. The five characteristics of a high-performing team include: Purpose, Performance Measures, People with Tasks and Relationship Skills, Process, and Preparation and Practice. A high-performing team is much more than a work group, and typically has a life cycle that can vary from team to team. The standard team lifecycle consists of five states and is comparable to a human life cycle. The five states of a high-performing team lifecycle include: Formulating, Storming, Normalizing, Performing, and Adjourning. The Formulating State of a team is first realized when the team members are first defined and roles are assigned to all members. This initial stage is very important because it can set the tone for the team and can ultimately determine its success or failure. In addition, this stage requires the team to have a strong leader because team members are normally unclear about specific roles, specific obstacles and goals that my lay ahead of them. Finally, this stage is where most team members initially meet one another prior to working as a team unless the team members already know each other. The Storming State normally arrives directly after the formulation of a new team because there are still a lot of unknowns amongst the newly formed assembly. As a general rule most of the parties involved in the team are still getting used to the workload, pace of work, deadlines and the validity of various tasks that need to be performed by the group. In this state everything is questioned because there are so many unknowns. Items commonly questioned include the credentials of others on the team, the actual validity of a project, and the leadership abilities of the team leader. This can be exemplified by looking at the interactions between animals when they first meet. If we look at a scenario where two people are walking directly toward each other with their dogs. The dogs will automatically enter the Storming State because they do not know the other dog. Typically in this situation, they attempt to define which is more dominating via play or fighting depending on how the dogs interact with each other. Once dominance has been defined and accepted by both dogs then they will either want to play or leave depending on how the dogs interacted and other environmental variables. Once the Storming State has been realized then the Normalizing State takes over. This state is entered by a team once all the questions of the Storming State have been answered and the team has been tested by a few tasks or projects. Typically, participants in the team are filled with energy, and comradery, and a strong alliance with team goals and objectives. A high school football team is a perfect example of the Normalizing State when they start their season. The player positions have been assigned, the depth chart has been filled and everyone is focused on winning each game. All of the players encourage and expect each other to perform at the best of their abilities and are united by competition from other teams. The Performing State is achieved by a team when its history, working habits, and culture solidify the team as one working unit. In this state team members can anticipate specific behaviors, attitudes, reactions, and challenges are seen as opportunities and not problems. Additionally, each team member knows their role in the team’s success, and the roles of others. This is the most productive state of a group and is where all the time invested working together really pays off. If you look at an Olympic figure skating team skate you can easily see how the time spent working together benefits their performance. They skate as one unit even though it is comprised of two skaters. Each skater has their routine completely memorized as well as their partners. This allows them to anticipate each other’s moves on the ice makes their skating look effortless. The final state of a team is the Adjourning State. This state is where accomplishments by the team and each individual team member are recognized. Additionally, this state also allows for reflection of the interactions between team members, work accomplished and challenges that were faced. Finally, the team celebrates the challenges they have faced and overcome as a unit. Currently in the workplace teams are divided into two different types: Co-located and Distributed Teams. Co-located teams defined as the traditional group of people working together in an office, according to Andy Singleton of Assembla. This traditional type of a team has dominated business in the past due to inadequate technology, which forced workers to primarily interact with one another via face to face meetings. Team meetings are primarily lead by the person with the highest status in the company. Having personally, participated in meetings of this type, usually a select few of the team members dominate the flow of communication which reduces the input of others in group discussions. Since discussions are dominated by a select few individuals the discussions and group discussion are skewed in favor of the individuals who communicate the most in meetings. In addition, Team members might not give their full opinions on a topic of discussion in part not to offend or create controversy amongst the team and can alter decision made in meetings towards those of the opinions of the dominating team members. Distributed teams are by definition spread across an area or subdivided into separate sections. That is exactly what distributed teams when compared to a more traditional team. It is common place for distributed teams to have team members across town, in the next state, across the country and even with the advances in technology over the last 20 year across the world. These teams allow for more diversity compared to the other type of teams because they allow for more flexibility regarding location. A team could consist of a 30 year old male Italian project manager from New York, a 50 year old female Hispanic from California and a collection of programmers from India because technology allows them to communicate as if they were standing next to one another. In addition, distributed team members consult with more team members prior to making decisions compared to traditional teams, and take longer to come to decisions due to the changes in time zones and cultural events. However, team members feel more empowered to speak out when they do not agree with the team and to notify others of potential issues regarding the work that the team is doing. Virtual teams which are a subset of the distributed team type is changing organizational strategies due to the fact that a team can now in essence be working 24 hrs a day because of utilizing employees in various time zones and locations. A primary example of this is with customer services departments, a company can have multiple call centers spread across multiple time zones allowing them to appear to be open 24 hours a day while all a employees work from 9AM to 5 PM every day. Virtual teams also allow human resources departments to go after the best talent for the company regardless of where the potential employee works because they will be a part of a virtual team all that is need is the proper technology to be setup to allow everyone to communicate. In addition to allowing employees to work from home, the company can save space and resources by not having to provide a desk for every team member. In fact, those team members that randomly come into the office can actually share one desk amongst multiple people. This is definitely a cost cutting plus given the current state of the economy. One thing that can turn a team into a high-performing team is leadership. High-performing team leaders need to focus on investing in ongoing personal development, provide team members with direction, structure, and resources needed to accomplish their work, make the right interventions at the right time, and help the team manage boundaries between the team and various external parties involved in the teams work. A team leader needs to invest in ongoing personal development in order to effectively manage their team. People have said that attitude is everything; this is very true about leaders and leadership. A team takes on the attitudes and behaviors of its leaders. This can potentially harm the team and the team’s output. Leaders must concentrate on self-awareness, and understanding their team’s group dynamics to fully understand how to lead them. In addition, always learning new leadership techniques from other effective leaders is also very beneficial. Providing team members with direction, structure, and resources that they need to accomplish their work collectively sounds easy, but it is not. Leaders need to be able to effectively communicate with their team on how their work helps the company reach for its organizational vision. Conversely, the leader needs to allow his team to work autonomously within specific guidelines to turn the company’s vision into a reality. This being said the team must be appropriately staffed according to the size of the team’s tasks and their complexity. These tasks should be clear, and be meaningful to the company’s objectives and allow for feedback to be exchanged with the leader and the team member and the leader and upper management. Now if the team is properly staffed, and has a clear and full understanding of what is to be done; the company also must supply the workers with the proper tools to achieve the tasks that they are asked to do. No one should be asked to dig a hole without being given a shovel. Finally, leaders must reward their team members for accomplishments that they achieve. Awards could range from just a simple congratulatory email, a party to close the completion of a large project, or other monetary rewards. Managing boundaries is very important for team leaders because it can alter attitudes of team members and can add undue stress to the team which will force them to loose focus on the tasks at hand for the group. Team leaders should promote communication between team members so that burdens are shared amongst the team and solutions can be derived from hearing the opinions of multiple sources. This also reinforces team camaraderie and working as a unit. Team leaders must manage the type and timing of interventions as to not create an even bigger mess within the team. Poorly timed interventions can really deflate team members and make them question themselves. This could really increase further and undue interventions by the team leader. Typically, the best time for interventions is when the team is just starting to form so that all unproductive behaviors are removed from the team and that it can retain focus on its agenda. If an intervention is effectively executed the team will feel energized about the work that they are doing, promote communication and interaction amongst the group and improve moral overall. High-performing teams are very import to organizations because they consistently produce high quality output and develop a collective purpose for their work. This drive to succeed allows team members to utilize specific talents allowing for growth in these areas. In addition, these team members usually take on a sense of ownership with their projects and feel that the other team members are irreplaceable. References: http://blog.assembla.com/assemblablog/tabid/12618/bid/3127/Three-ways-to-organize-your-team-co-located-outsourced-or-global.aspx Katzenbach, J.R. & Smith, D.K. (1993). The Wisdom of Teams: Creating the High-performance Organization. Boston: Harvard Business School.

Read the article
What's up with OCFS2?

- by wcoekaer

On Linux there are many filesystem choices and even from Oracle we provide a number of filesystems, all with their own advantages and use cases. Customers often confuse ACFS with OCFS or OCFS2 which then causes assumptions to be made such as one replacing the other etc... I thought it would be good to write up a summary of how OCFS2 got to where it is, what we're up to still, how it is different from other options and how this really is a cool native Linux cluster filesystem that we worked on for many years and is still widely used. Work on a cluster filesystem at Oracle started many years ago, in the early 2000's when the Oracle Database Cluster development team wrote a cluster filesystem for Windows that was primarily focused on providing an alternative to raw disk devices and help customers with the deployment of Oracle Real Application Cluster (RAC). Oracle RAC is a cluster technology that lets us make a cluster of Oracle Database servers look like one big database. The RDBMS runs on many nodes and they all work on the same data. It's a Shared Disk database design. There are many advantages doing this but I will not go into detail as that is not the purpose of my write up. Suffice it to say that Oracle RAC expects all the database data to be visible in a consistent, coherent way, across all the nodes in the cluster. To do that, there were/are a few options : 1) use raw disk devices that are shared, through SCSI, FC, or iSCSI 2) use a network filesystem (NFS) 3) use a cluster filesystem(CFS) which basically gives you a filesystem that's coherent across all nodes using shared disks. It is sort of (but not quite) combining option 1 and 2 except that you don't do network access to the files, the files are effectively locally visible as if it was a local filesystem. So OCFS (Oracle Cluster FileSystem) on Windows was born. Since Linux was becoming a very important and popular platform, we decided that we would also make this available on Linux and thus the porting of OCFS/Windows started. The first version of OCFS was really primarily focused on replacing the use of Raw devices with a simple filesystem that lets you create files and provide direct IO to these files to get basically native raw disk performance. The filesystem was not designed to be fully POSIX compliant and it did not have any where near good/decent performance for regular file create/delete/access operations. Cache coherency was easy since it was basically always direct IO down to the disk device and this ensured that any time one issues a write() command it would go directly down to the disk, and not return until the write() was completed. Same for read() any sort of read from a datafile would be a read() operation that went all the way to disk and return. We did not cache any data when it came down to Oracle data files. So while OCFS worked well for that, since it did not have much of a normal filesystem feel, it was not something that could be submitted to the kernel mail list for inclusion into Linux as another native linux filesystem (setting aside the Windows porting code ...) it did its job well, it was very easy to configure, node membership was simple, locking was disk based (so very slow but it existed), you could create regular files and do regular filesystem operations to a certain extend but anything that was not database data file related was just not very useful in general. Logfiles ok, standard filesystem use, not so much. Up to this point, all the work was done, at Oracle, by Oracle developers. Once OCFS (1) was out for a while and there was a lot of use in the database RAC world, many customers wanted to do more and were asking for features that you'd expect in a normal native filesystem, a real "general purposes cluster filesystem". So the team sat down and basically started from scratch to implement what's now known as OCFS2 (Oracle Cluster FileSystem release 2). Some basic criteria were : Design it with a real Distributed Lock Manager and use the network for lock negotiation instead of the disk Make it a Linux native filesystem instead of a native shim layer and a portable core Support standard Posix compliancy and be fully cache coherent with all operations Support all the filesystem features Linux offers (ACL, extended Attributes, quotas, sparse files,...) Be modern, support large files, 32/64bit, journaling, data ordered journaling, endian neutral, we can mount on both endian /cross architecture,.. Needless to say, this was a huge development effort that took many years to complete. A few big milestones happened along the way... OCFS2 was development in the open, we did not have a private tree that we worked on without external code review from the Linux Filesystem maintainers, great folks like Christopher Hellwig reviewed the code regularly to make sure we were not doing anything out of line, we submitted the code for review on lkml a number of times to see if we were getting close for it to be included into the mainline kernel. Using this development model is standard practice for anyone that wants to write code that goes into the kernel and having any chance of doing so without a complete rewrite or.. shall I say flamefest when submitted. It saved us a tremendous amount of time by not having to re-fit code for it to be in a Linus acceptable state. Some other filesystems that were trying to get into the kernel that didn't follow an open development model had a lot harder time and a lot harsher criticism. March 2006, when Linus released 2.6.16, OCFS2 officially became part of the mainline kernel, it was accepted a little earlier in the release candidates but in 2.6.16. OCFS2 became officially part of the mainline Linux kernel tree as one of the many filesystems. It was the first cluster filesystem to make it into the kernel tree. Our hope was that it would then end up getting picked up by the distribution vendors to make it easy for everyone to have access to a CFS. Today the source code for OCFS2 is approximately 85000 lines of code. We made OCFS2 production with full support for customers that ran Oracle database on Linux, no extra or separate support contract needed. OCFS2 1.0.0 started being built for RHEL4 for x86, x86-64, ppc, s390x and ia64. For RHEL5 starting with OCFS2 1.2. SuSE was very interested in high availability and clustering and decided to build and include OCFS2 with SLES9 for their customers and was, next to Oracle, the main contributor to the filesystem for both new features and bug fixes. Source code was always available even prior to inclusion into mainline and as of 2.6.16, source code was just part of a Linux kernel download from kernel.org, which it still is, today. So the latest OCFS2 code is always the upstream mainline Linux kernel. OCFS2 is the cluster filesystem used in Oracle VM 2 and Oracle VM 3 as the virtual disk repository filesystem. Since the filesystem is in the Linux kernel it's released under the GPL v2 The release model has always been that new feature development happened in the mainline kernel and we then built consistent, well tested, snapshots that had versions, 1.2, 1.4, 1.6, 1.8. But these releases were effectively just snapshots in time that were tested for stability and release quality. OCFS2 is very easy to use, there's a simple text file that contains the node information (hostname, node number, cluster name) and a file that contains the cluster heartbeat timeouts. It is very small, and very efficient. As Sunil Mushran wrote in the manual : OCFS2 is an efficient, easily configured, quickly installed, fully integrated and compatible, feature-rich, architecture and endian neutral, cache coherent, ordered data journaling, POSIX-compliant, shared disk cluster file system. Here is a list of some of the important features that are included : Variable Block and Cluster sizes Supports block sizes ranging from 512 bytes to 4 KB and cluster sizes ranging from 4 KB to 1 MB (increments in power of 2). Extent-based Allocations Tracks the allocated space in ranges of clusters making it especially efficient for storing very large files. Optimized Allocations Supports sparse files, inline-data, unwritten extents, hole punching and allocation reservation for higher performance and efficient storage. File Cloning/snapshots REFLINK is a feature which introduces copy-on-write clones of files in a cluster coherent way. Indexed Directories Allows efficient access to millions of objects in a directory. Metadata Checksums Detects silent corruption in inodes and directories. Extended Attributes Supports attaching an unlimited number of name:value pairs to the file system objects like regular files, directories, symbolic links, etc. Advanced Security Supports POSIX ACLs and SELinux in addition to the traditional file access permission model. Quotas Supports user and group quotas. Journaling Supports both ordered and writeback data journaling modes to provide file system consistency in the event of power failure or system crash. Endian and Architecture neutral Supports a cluster of nodes with mixed architectures. Allows concurrent mounts on nodes running 32-bit and 64-bit, little-endian (x86, x86_64, ia64) and big-endian (ppc64) architectures. In-built Cluster-stack with DLM Includes an easy to configure, in-kernel cluster-stack with a distributed lock manager. Buffered, Direct, Asynchronous, Splice and Memory Mapped I/Os Supports all modes of I/Os for maximum flexibility and performance. Comprehensive Tools Support Provides a familiar EXT3-style tool-set that uses similar parameters for ease-of-use. The filesystem was distributed for Linux distributions in separate RPM form and this had to be built for every single kernel errata release or every updated kernel provided by the vendor. We provided builds from Oracle for Oracle Linux and all kernels released by Oracle and for Red Hat Enterprise Linux. SuSE provided the modules directly for every kernel they shipped. With the introduction of the Unbreakable Enterprise Kernel for Oracle Linux and our interest in reducing the overhead of building filesystem modules for every minor release, we decide to make OCFS2 available as part of UEK. There was no more need for separate kernel modules, everything was built-in and a kernel upgrade automatically updated the filesystem, as it should. UEK allowed us to not having to backport new upstream filesystem code into an older kernel version, backporting features into older versions introduces risk and requires extra testing because the code is basically partially rewritten. The UEK model works really well for continuing to provide OCFS2 without that extra overhead. Because the RHEL kernel did not contain OCFS2 as a kernel module (it is in the source tree but it is not built by the vendor in kernel module form) we stopped adding the extra packages to Oracle Linux and its RHEL compatible kernel and for RHEL. Oracle Linux customers/users obviously get OCFS2 included as part of the Unbreakable Enterprise Kernel, SuSE customers get it by SuSE distributed with SLES and Red Hat can decide to distribute OCFS2 to their customers if they chose to as it's just a matter of compiling the module and making it available. OCFS2 today, in the mainline kernel is pretty much feature complete in terms of integration with every filesystem feature Linux offers and it is still actively maintained with Joel Becker being the primary maintainer. Since we use OCFS2 as part of Oracle VM, we continue to look at interesting new functionality to add, REFLINK was a good example, and as such we continue to enhance the filesystem where it makes sense. Bugfixes and any sort of code that goes into the mainline Linux kernel that affects filesystems, automatically also modifies OCFS2 so it's in kernel, actively maintained but not a lot of new development happening at this time. We continue to fully support OCFS2 as part of Oracle Linux and the Unbreakable Enterprise Kernel and other vendors make their own decisions on support as it's really a Linux cluster filesystem now more than something that we provide to customers. It really just is part of Linux like EXT3 or BTRFS etc, the OS distribution vendors decide. Do not confuse OCFS2 with ACFS (ASM cluster Filesystem) also known as Oracle Cloud Filesystem. ACFS is a filesystem that's provided by Oracle on various OS platforms and really integrates into Oracle ASM (Automatic Storage Management). It's a very powerful Cluster Filesystem but it's not distributed as part of the Operating System, it's distributed with the Oracle Database product and installs with and lives inside Oracle ASM. ACFS obviously is fully supported on Linux (Oracle Linux, Red Hat Enterprise Linux) but OCFS2 independently as a native Linux filesystem is also, and continues to also be supported. ACFS is very much tied into the Oracle RDBMS, OCFS2 is just a standard native Linux filesystem with no ties into Oracle products. Customers running the Oracle database and ASM really should consider using ACFS as it also provides storage/clustered volume management. Customers wanting to use a simple, easy to use generic Linux cluster filesystem should consider using OCFS2. To learn more about OCFS2 in detail, you can find good documentation on http://oss.oracle.com/projects/ocfs2 in the Documentation area, or get the latest mainline kernel from http://kernel.org and read the source. One final, unrelated note - since I am not always able to publicly answer or respond to comments, I do not want to selectively publish comments from readers. Sometimes I forget to publish comments, sometime I publish them and sometimes I would publish them but if for some reason I cannot publicly comment on them, it becomes a very one-sided stream. So for now I am going to not publish comments from anyone, to be fair to all sides. You are always welcome to email me and I will do my best to respond to technical questions, questions about strategy or direction are sometimes not possible to answer for obvious reasons.

Read the article
Why should I use Amazon Route 53 over my registrar's DNS servers?

- by Abtin Forouzandeh

I am building a site that I anticipate will have high usage. Currently, my registrar (GoDaddy) is handling DNS. However, Amazon's Route 53 looks interesting. They promise high speed and offer globally distributed DNS servers and a programmable interface. While GoDaddy doesn't offer a programmable interface, I assume their servers are geographically distributed as well. What are the main reasons I should opt to use Amazon Route 53 over free registrar-based DNS?

Read the article
mod_deflate Supported Encodings for Compression

- by sparc

It seems to me, that mod_deflate in Apache 2.2 will always return: Content-Encoding: gzip and never: Content-Encoding: deflate It was explained to me, that although there may be a deflate algorithm, mod_deflate is named after a file-format, in which the algorithm could be any of: gzip, bzip. pkzip Of those three, mod_deflate provides gzip. It seems as though gzip is the most popular and widely-supported algorithm in web browsers, but I know some web servers and proxies do return Content-Encoding: deflate. Aside from the confusion of the module's name, it true that mod_deflate will only return Content-Encoding: gzip? Thank you.

Read the article
Objective-C Plugin Architecture Security (Mac, not iphone)

- by Tom Dalling

I'm possibly writing a plugin system for a Cocoa application (Mac, not iphone). A common approach is the make each plugin a bundle, then inject the bundle into the main application. I'm concerned with the security implications of doing this, as the bundle will have complete access to the Objective-C runtime. I am especially concerned with a plugin having access to the code that handles registration and serial keys. Another plugin system we are considering is based on distributed notifications. Basically, each plugin will be a separate process, and they will communicate via distributed notifications only. Is there a way to load bundles securely (e.g. sandboxing)? If not, do you see any problems with using distributed notifications? Are there any other plugin architectures that would be better?

Read the article
Is it illegal to rewrite every line of an open source project in a slightly different way, and use it in a closed source project?

- by Chris Barry

There is some code which is GPL or LGPL that I am considering using for an iPhone project. If I took that code (JavaScript) and rewrote it in a different language for use on the iPhone would that be a legal issue? In theory the process that has happened is that I have gone through each line of the project, learnt what it is doing, and then reimplemented the ideas in a new language. To me it seems this is like learning how to implement something, but then reimplementing it separately from the original licence. Therefore you have only copied the algorithm, which arguably you could have learnt from somewhere else other than the original project. Does the licence cover the specific implementation or the algorithm as well? EDIT------ Really glad to see this topic create a good conversation. To give a bit more backing to the project, the code involved does some kind of audio analysis. I believe it is non-trivial to learn or implement, although I was prepared to embark on this task (I'm at the level where I can implement an FFT algorithm, and this was going to go beyond that.) It is a fairly low LOC script, so I didn't think it would be too hard to do a straight port. I really like the idea of rereleasing my port as well as using it in the application. I don't see any problem with that, and it would be a great way to give something back to the community. I was going to add a line about not wanting to discuss the moral issues, but I'm quite glad I didn't as it seems to have fired the debate a bit. I still feel a bit odd about using open source code to learn from. Does this mean that anything one learns from an open source project is not allowed to be used in a closed source project? And how long after or different does an implementation have to be to not be considered violation of the licence? Murky! EDIT 2 -------- Follow up question

Read the article
How does braking assist of car racing games work?

- by Ayush Khemka

There are a lot of PC car racing games around which have this unique driving assist which helps brake your car so that you can safely turn it. While in some games it just an 'assist', it will just help your car brake but won't ensure a safe turn. While in others, the braking assist will help you get a safe turn. I was wondering on what could be the algorithm that is followed to achieve it. A very basic algorithm I could think of was, Pre-determine the braking distance of an ideal car for every turn of the track, depending on the radius of the turn, and then start braking the car accordingly. For example, for a turn of less than 90o, the car would start braking automatically at 50m distance from the start of the turn. A more advanced algorithm, which would ensure a safe turn, could be Pre-determine the speed of the car at the start of each turn, individually for each track, turn and car. Also, pre-determine the deceleration rate of each car individually, which varies because of the car's performance. The braking assist would keep recording the speed of the car at a certain instant of time. Start braking the car appropriately so that the car gets to the exact speed needed at the start of the turn. For example, let the speed of a particular car at the start of a turn 43m in radius, be 120km/h. Let the deceleration rate of the car be 200km/h2. If, at some instant of time, the speed of the car is 200km/h, then the car would automatically start braking at 400m from the start of the turn.

Read the article
How to find the average color of an image.

- by Edward Boyle

Years ago I was the lead developer of a large Scrapbook Web Site. One of the things I implemented was to allow shoppers to find Scrapbook papers and embellishments of like colors (“more like this color”). Below is the base algorithm I wrote to extract the color from an image. It worked out pretty well. I took the returned values and stored them in an associated table for the products. Yet another algorithm was used to SELECT near matches. This algorithm has turned out to be very handy for me. I have used it for borders and subtle outlined text overlays. I am sure you will find more creative uses for it. Enjoy… private Color GetColor(Bitmap bmp) { int r = 0; int g = 0; int b = 0; Color mColor = System.Drawing.Color.White; for (int i = 1; i < bmp.Width; i++) { for (int x = 1; x < bmp.Height; x++) { mColor = bmp.GetPixel(i, x); r += mColor.R; g += mColor.G; b += mColor.B; } } r = (r / (bmp.Height * bmp.Width)); g = (g / (bmp.Height * bmp.Width)); b = (b / (bmp.Height * bmp.Width)); return System.Drawing.Color.FromArgb(r, g, b); } You could also get the RGB values by passing in the RGB by ref private Color GetColor(ref int r, ref int g, ref int b, Bitmap bmp) but that is a bit much as you can simply get it from the return value: mReturnedColor.R; mReturnedColor.G; mReturnedColor.B;

Read the article
Continuous Physics Engine's Collision Detection Techniques

- by Griffin

I'm working on a purely continuous physics engine, and I need to choose algorithms for broad and narrow phase collision detection. "Purely continuous" means I never do intersection tests, but instead want to find ways to catch every collision before it happens, and put each into "planned collisions" stack that is ordered by TOI. Broad Phase The only continuous broad-phase method I can think of is encasing each body in a circle and testing if each circle will ever overlap another. This seems horribly inefficient however, and lacks any culling. I have no idea what continuous analogs might exist for today's discrete collision culling methods such as quad-trees either. How might I go about preventing inappropriate and pointless broad test's such as a discrete engine does? Narrow Phase I've managed to adapt the narrow SAT to a continuous check rather than discrete, but I'm sure there's other better algorithms out there in papers or sites you guys might have come across. What various fast or accurate algorithm's do you suggest I use and what are the advantages / disatvantages of each? Final Note: I say techniques and not algorithms because I have not yet decided on how I will store different polygons which might be concave, convex, round, or even have holes. I plan to make a decision on this based on what the algorithm requires (for instance if I choose an algorithm that breaks down a polygon into triangles or convex shapes I will simply store the polygon data in this form).

Read the article
How to practice object oriented programming?

- by user1620696

I've always programmed in procedural languages and currently I'm moving towards object orientation. The main problem I've faced is that I can't see a way to practice object orientation in an effective way. I'll explain my point. When I've learned PHP and C it was pretty easy to practice: it was just matter of choosing something and thinking about an algorithm for that thing. In PHP for example, it was matter os sitting down and thinking: "well, just to practice, let me build one application with an administration area where people can add products". This was pretty easy, it was matter of thinking of an algorithm to register some user, to login the user, and to add the products. Combining these with PHP features, it was a good way to practice. Now, in object orientation we have lots of additional things. It's not just a matter of thinking about an algorithm, but analysing requirements deeper, writing use cases, figuring out class diagrams, properties and methods, setting up dependency injection and lots of things. The main point is that in the way I've been learning object orientation it seems that a good design is crucial, while in procedural languages one vague idea was enough. I'm not saying that in procedural languages we can write good software without design, just that for sake of practicing it is feasible, while in object orientation it seems not feasible to go without a good design, even for practicing. This seems to be a problem, because if each time I'm going to practice I need to figure out tons of requirements, use cases and so on, it seems to become not a good way to become better at object orientation, because this requires me to have one whole idea for an app everytime I'm going to practice. Because of that, what's a good way to practice object orientation?

Read the article
In the days of modern computing, in 'typical business apps' - why does performance matter?

- by Prog

This may seem like an odd question to some of you. I'm a hobbyist Java programmer. I have developed several games, an AI program that creates music, another program for painting, and similar stuff. This is to tell you that I have an experience in programming, but not in professional development of business applications. I see a lot of talk on this site about performance. People often debate what would be the most efficient algorithm in C# to perform a task, or why Python is slow and Java is faster, etc. What I'm trying to understand is: why does this matter? There are specific areas of computing where I see why performance matters: games, where tens of thousands of computations are happening every second in a constant-update loop, or low level systems which other programs rely on, such as OSs and VMs, etc. But for the normal, typical high-level business app, why does performance matter? I can understand why it used to matter, decades ago. Computers were much slower and had much less memory, so you had to think carefully about these things. But today, we have so much memory to spare and computers are so fast: does it actually matter if a particular Java algorithm is O(n^2)? Will it actually make a difference for the end users of this typical business app? When you press a GUI button in a typical business app, and behind the scenes it invokes an O(n^2) algorithm, in these days of modern computing - do you actually feel the inefficiency? My question is split in two: In practice, today does performance matter in a typical normal business program? If it does, please give me real-world examples of places in such an application, where performance and optimizations are important.

Read the article
String patterns that can be used to filter and group files

- by Louis Rhys

One of our application filters files in certain directory, extract some data from it and export a document from the extracted data. The algorithm for extracting the data depends on the file, and so far we use regex to select the algorithm to be used, for example .*\.txt will be processed by algorithm A, foo[0-5]\.xml will be processed by algo B, etc. However now we need some files to be processed together. For example, in one case we need two files, foo.*\.xml and bar.*\.xml. Part of the information to be extracted exist in the foo file, and the other part in the bar file. Moreover, we need to make sure the wild card is compatible. For example, if there are 6 files foo1.xml foo23.xml bar1.xml bar9.xml bar23.xml foo4.xml I would expect foo1 and bar1 to be identified as a group, and foo23 and bar23 as another group. bar9 and foo4 has no pair, so they will not be treated. Now, since the filter is configured by user, we need to have a pattern that can express the above requirement. I don't think you can express meaning like above in standard regex. (foo|bar).*\.xml will match all 6 file above and we can't identify which file is paired for a particular file. Is there any standard pattern that can express it? Or any idea how to modify regex to support this, that can be implemented easily?

Read the article
Obstacle Avoidance steering behavior: how can an entity avoid an obstacle while other forces are acting on the entity?

- by Prog

I'm trying to implement the Obstacle Avoidance steering behavior in my 2D game. Currently my approach is to apply a force on the entity, in the direction of the normal of the heading, scaled by a number that gets bigger the closer we are to the obstacle. This is supposed to push the entity to the side and avoid the obstacle that blocks it's way. However, in the same time that my entity tries to avoid an obstacle, it Seeks to a point more or less behind the obstacle (which is the reason it needs to avoid the obstacle in the first place). The Seek algorithm constantly applies a force on the entity that pushes it (more or less) in the direction of the obstacle, while the Obstacle Avoidance algorithm constantly applies a force that pushes the entity away (more accurately, to the side) of the obstacle. The result is that sometimes the entity succesfully avoids the obstacle, and sometimes it collides with it, depending on the strength of the avoidance force I'm applying. How can I make sure that a force will succeed in steering the entity in some direction, while other forces are currently acting on the entity? (And while still looking natural). I can't allow entities to collide with obstacles when realistically they should be able to easily avoid them, doesn't matter what they're currently doing. Also, the Obstacle Avoidance algorithm is made exactly for the case where another force is acting on the entity. Otherwise it wouldn't be moving and there would be no need to avoid anything. So maybe I'm missing something. Thanks

Read the article
Automatic Appointment Conflict Resolution

- by Thomas

I'm trying to figure out an algorithm for resolving appointment times. I currently have a naive algorithm that pushes down conflicting appointments repeatedly, until there are no more appointments. # The appointment list is always sorted on start time appointment_list = [ <Appointment: 10:00 -> 12:00>, <Appointment: 11:00 -> 12:30>, <Appointment: 13:00 -> 14:00>, <Appointment: 13:30 -> 14:30>, ] Constraints are that appointments: cannot be after 15:00 cannot be before 9:00 This is the naive algorithm for i, app in enumerate(appointment_list): for possible_conflict in appointment_list[i+1:]: if possible_conflict.start < app.end: difference = app.end - possible_conflict.start possible_conflict.end += difference possible_conflict.start += difference else: break This results in the following resolution, which obviously breaks those constraints, and the last appointment will have to be pushed to the following day. appointment_list = [ <Appointment: 10:00 -> 12:00>, <Appointment: 12:00 -> 13:30>, <Appointment: 13:30 -> 14:30>, <Appointment: 14:30 -> 15:30>, ] Obviously this is sub-optimal, It performs 3 appointment moves when the confict could have been resolved with one: if we were able to push the first appointment backwards, we could avoid moving all the subsequent appointments down. I'm thinking that there should be a sort of edit-distance approach that would calculate the least number of appointments that should be moved in order to resolve the scheduling conflict, but I can't get the a handle on the methodology. Should it be breadth-first or depth first solution search. When do I know if the solution is "good enough"?

Read the article
Finding header files

- by rwallace

A C or C++ compiler looks for header files using a strict set of rules: relative to the directory of the including file (if "" was used), then along the specified and default include paths, fail if still not found. An ancillary tool such as a code analyzer (which I'm currently working on) has different requirements: it may for a number of reasons not have the benefit of the setup performed by a complex build process, and have to make the best of what it is given. In other words, it may find a header file not present in the include paths it knows, and have to take its best shot at finding the file itself. I'm currently thinking of using the following algorithm: Start in the directory of the including file. Is the header file found in the current directory or any subdirectory thereof? If so, done. If we are at the root directory, the file doesn't seem to be present on this machine, so skip it. Otherwise move to the parent of the current directory and go to step 2. Is this the best algorithm to use? In particular, does anyone know of any case where a different algorithm would work better?

Read the article
Mechanics of reasoning during programming interviews

- by user129506

This is not the usual "I don't want to write code during an interview", in this question the assumption is that I need to write code during an interview (think about the level of rewriting the quicksort or mergesort from scratch) I know how the algorithm work or I have a basic idea of how I should start working from there, i.e. I don't remember the algorithm by heart I noticed that even on a whiteboard, I always end up writing bugged code or code that doesn't compile. If there's a typo, whatever I usually live with that.. but when there's a crash due to some uncaught particular case I end up losing confidence in my skills. I realize that perhaps interviewers might want to look at how I write code and/or how I solve problems rather than proof-compiling my whiteboard code, but I'd like to ask how should I approach the above problem in mental terms, i.e. what mental steps should I follow when writing code for an interview with the two bullet points above. There must be a unique and agreed series of steps I should follow to avoid getting stuck/caught into particular exception cases (limit cases) that might end up wasting my time and my energies rather than focusing on the overall algorithm for the general case. I hope I made my point clear

Read the article
Approach to Authenticate Clients to TCP Server

- by dab

I'm writing a Server/Client application where clients will connect to the server. What I want to do, is make sure that the client connecting to the server is actually using my protocol and I can "trust" the data being sent from the client to the server. What I thought about doing is creating a sort of hash on the client's machine that follows a particular algorithm. What I did in a previous version was took their IP address, the client version, and a few other attributes of the client and sent it as a calculated hash to the server, who then took their IP, and the version of the protocol the client claimed to be using, and calculated that number to see if they matched. This works ok until you get clients that connect from within a router environment where their internal IP is different from their external IP. My fix for this was to pass the client's internal IP used to calculate this hash with the authentication protocol. My fear is this approach is not secure enough. Since I'm passing the data used to create the "auth hash". Here's an example of what I'm talking about: Client IP: 192.168.1.10, Version: 2.4.5.2 hash = 2*4*5*1 * (1+9+2) * (1+6+8) * (1) * (1+0) Client Connects to Server client sends: auth hash ip version Server calculates that info, and accepts or denies the hash. Before I go and come up with another algorithm to prove a client can provide data a server (or use this existing algorithm), I was wondering if there are any existing, proven, and secure systems out there for generating a hash that both sides can generate with general knowledge. The server won't know about the client until the very first connection is established. The protocol's intent is to manage a network of clients who will be contributing data to the server periodically. New clients will be added simply by connecting the client to the server and "registering" with the server. So a client connects to the server for the first time, and registers their info (mac address or some other kind of unique computer identifier), then when they connect again, the server will recognize that client as a previous person and associate them with their data in the database.

Read the article
How do I do high quality scaling of a image?

- by pbhogan

I'm writing some code to scale a 32 bit RGBA image in C/C++. I have written a few attempts that have been somewhat successful, but they're slow and most importantly the quality of the sized image is not acceptable. I compared the same image scaled by OpenGL (i.e. my video card) and my routine and it's miles apart in quality. I've Google Code Searched, scoured source trees of anything I thought would shed some light (SDL, Allegro, wxWidgets, CxImage, GD, ImageMagick, etc.) but usually their code is either convoluted and scattered all over the place or riddled with assembler and little or no comments. I've also read multiple articles on Wikipedia and elsewhere, and I'm just not finding a clear explanation of what I need. I understand the basic concepts of interpolation and sampling, but I'm struggling to get the algorithm right. I do NOT want to rely on an external library for one routine and have to convert to their image format and back. Besides, I'd like to know how to do it myself anyway. :) I have seen a similar question asked on stack overflow before, but it wasn't really answered in this way, but I'm hoping there's someone out there who can help nudge me in the right direction. Maybe point me to some articles or pseudo code... anything to help me learn and do. Here's what I'm looking for: 1. No assembler (I'm writing very portable code for multiple processor types). 2. No dependencies on external libraries. 3. I am primarily concerned with scaling DOWN, but will also need to write a scale up routine later. 4. Quality of the result and clarity of the algorithm is most important (I can optimize it later). My routine essentially takes the following form: DrawScaled( uint32 *src, uint32 *dst, src_x, src_y, src_w, src_h, dst_x, dst_y, dst_w, dst_h ); Thanks! UPDATE: To clarify, I need something more advanced than a box resample for downscaling which blurs the image too much. I suspect what I want is some kind of bicubic (or other) filter that is somewhat the reverse to a bicubic upscaling algorithm (i.e. each destination pixel is computed from all contributing source pixels combined with a weighting algorithm that keeps things sharp. EXAMPLE: Here's an example of what I'm getting from the wxWidgets BoxResample algorithm vs. what I want on a 256x256 bitmap scaled to 55x55. And finally: the original 256x256 image

Read the article
New Book From Luís Abreu: ASP.NET 4.0 – The Complete Course (Portuguese)

- by Paulo Morgado

Thsi book, with several practical examples, presents how to build web applications using ASP.NET 4.0. Starts by introducing the framework to build pages and controls and gradually introduces all the new features available. More compact that its previous versions (part of the content was moved to FCA’s site in the form of apendices), this new book gives emphasis to to the new features in ASP.NET 4.0 and targets both developers new to ASP.NET and developers moving from previous versions of ASP.NET. This time there’s good new for Brazilian readers. The book will be distributed in Brazil by: Zamboni Comércio de Livros Ltda. Av.Parada Pinto, 1476 São Paulo – SP Telf. / Fax: +55 11 2233-2333 E-mail: [email protected] Our book (LINQ Com C# (Portuguese)) isn’t still distributed in Brazil, but, if you want it, you can always try that distributer.

Read the article
To sample or not to sample...

- by [email protected]

Ideally, we would know the exact answer to every question. How many people support presidential candidate A vs. B? How many people suffer from H1N1 in a given state? Does this batch of manufactured widgets have any defective parts? Knowing exact answers is expensive in terms of time and money and, in most cases, is impractical if not impossible. Consider asking every person in a region for their candidate preference, testing every person with flu symptoms for H1N1 (assuming every person reported when they had flu symptoms), or destructively testing widgets to determine if they are "good" (leaving no product to sell). Knowing exact answers, fortunately, isn't necessary or even useful in many situations. Understanding the direction of a trend or statistically significant results may be sufficient to answer the underlying question: who is likely to win the election, have we likely reached a critical threshold for flu, or is this batch of widgets good enough to ship? Statistics help us to answer these questions with a certain degree of confidence. This focuses on how we collect data. In data mining, we focus on the use of data, that is data that has already been collected. In some cases, we may have all the data (all purchases made by all customers), in others the data may have been collected using sampling (voters, their demographics and candidate choice). Building data mining models on all of your data can be expensive in terms of time and hardware resources. Consider a company with 40 million customers. Do we need to mine all 40 million customers to get useful data mining models? The quality of models built on all data may be no better than models built on a relatively small sample. Determining how much is a reasonable amount of data involves experimentation. When starting the model building process on large datasets, it is often more efficient to begin with a small sample, perhaps 1000 - 10,000 cases (records) depending on the algorithm, source data, and hardware. This allows you to see quickly what issues might arise with choice of algorithm, algorithm settings, data quality, and need for further data preparation. Instead of waiting for a model on a large dataset to build only to find that the results don't meet expectations, once you are satisfied with the results on the initial sample, you can take a larger sample to see if model quality improves, and to get a sense of how the algorithm scales to the particular dataset. If model accuracy or quality continues to improve, consider increasing the sample size. Sampling in data mining is also used to produce a held-aside or test dataset for assessing classification and regression model accuracy. Here, we reserve some of the build data (data that includes known target values) to be used for an honest estimate of model error using data the model has not seen before. This sampling transformation is often called a split because the build data is split into two randomly selected sets, often with 60% of the records being used for model building and 40% for testing. Sampling must be performed with care, as it can adversely affect model quality and usability. Even a truly random sample doesn't guarantee that all values are represented in a given attribute. This is particularly troublesome when the attribute with omitted values is the target. A predictive model that has not seen any examples for a particular target value can never predict that target value! For other attributes, values may consist of a single value (a constant attribute) or all unique values (an identifier attribute), each of which may be excluded during mining. Values from categorical predictor attributes that didn't appear in the training data are not used when testing or scoring datasets. In subsequent posts, we'll talk about three sampling techniques using Oracle Database: simple random sampling without replacement, stratified sampling, and simple random sampling with replacement.

Read the article
What steps can you take to ensure sane build environments when compiling software?

- by Chris Adams

Hi guys, I've been stuck with a compilation problem when building a standardised virtual machine on CentOS 5.4, and I'm in the dark here as to a) why this error is occurring, and b) how to fix it, and in the hope that someone else stumbles across this problem too, I'm hoping someone can help me find the solution here. I'm getting a configure: error: newly created file is older than distributed files! error when trying to compile Ruby Enterprise like below when I try to run the installer, and the solutions offered to on the forums (of checking the tine, and touching the files to update the time associated with them) don't seem to be helping here. What steps can I take to work out what the cause of this problem? [vagrant@vagrant-centos-5 ruby-enterprise-1.8.7-2009.10]$ sudo ./installer Welcome to the Ruby Enterprise Edition installer This installer will help you install Ruby Enterprise Edition 1.8.7-2009.10. Don't worry, none of your system files will be touched if you don't want them to, so there is no risk that things will screw up. You can expect this from the installation process: 1. Ruby Enterprise Edition will be compiled and optimized for speed for this system. 2. Ruby on Rails will be installed for Ruby Enterprise Edition. 3. You will learn how to tell Phusion Passenger to use Ruby Enterprise Edition instead of regular Ruby. Press Enter to continue, or Ctrl-C to abort. Checking for required software... * C compiler... found at /usr/bin/gcc * C++ compiler... found at /usr/bin/g++ * The 'make' tool... found at /usr/bin/make * Zlib development headers... found * OpenSSL development headers... found * GNU Readline development headers... found -------------------------------------------- Target directory Where would you like to install Ruby Enterprise Edition to? (All Ruby Enterprise Edition files will be put inside that directory.) [/opt/ruby-enterprise] : -------------------------------------------- Compiling and optimizing the memory allocator for Ruby Enterprise Edition In the mean time, feel free to grab a cup of coffee. ./configure --prefix=/opt/ruby-enterprise --disable-dependency-tracking checking build system type... i686-pc-linux-gnu checking host system type... i686-pc-linux-gnu checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... configure: error: newly created file is older than distributed files! Check your system clock This is a virtual machine running on virtualbox, and the time of the host and the virtual machine are identical, and up to date. I've also tried running this after updating time with an ntp-client, so no avail. I tried this after reading this post here of someone having a similar problem [vagrant@vagrant-centos-5 ruby-enterprise-1.8.7-2009.10]$ date Tue Apr 27 08:09:05 BST 2010 The other approach I've tried is to touch the top level the files in the build folder like suggested here, but this hasn't worked either (an to be honest, I'm not sure why it would have worked either) [vagrant@vagrant-centos-5 ruby-enterprise-1.8.7-2009.10]$ sudo touch ruby-enterprise-1.8.7-2009.10/* I'm not sure what I can do next here - the problem seems to be the bash configure script that returns this error error: newly created file is older than distributed files!, at line :2214 { echo "$as_me:$LINENO: checking whether build environment is sane" >&5 echo $ECHO_N "checking whether build environment is sane... $ECHO_C" >&6; } # Just in case sleep 1 echo timestamp > conftest.file # Do `set' in a subshell so we don't clobber the current shell's # arguments. Must try -L first in case configure is actually a # symlink; some systems play weird games with the mod time of symlinks # (eg FreeBSD returns the mod time of the symlink's containing # directory). if ( set X `ls -Lt $srcdir/configure conftest.file 2> /dev/null` if test "$*" = "X"; then # -L didn't work. set X `ls -t $srcdir/configure conftest.file` fi rm -f conftest.file if test "$*" != "X $srcdir/configure conftest.file" \ && test "$*" != "X conftest.file $srcdir/configure"; then # If neither matched, then we have a broken ls. This can happen # if, for instance, CONFIG_SHELL is bash and it inherits a # broken ls alias from the environment. This has actually # happened. Such a system could not be considered "sane". { { echo "$as_me:$LINENO: error: ls -t appears to fail. Make sure there is not a broken alias in your environment" >&5 echo "$as_me: error: ls -t appears to fail. Make sure there is not a broken alias in your environment" >&2;} { (exit 1); exit 1; }; } fi ### PROBLEM LINE #### # this line is the problem line - this is returned true, sometimes it isn't and I can't # see a pattern that that determines when this will test will pass or not. test "$2" = conftest.file ) then # Ok. : else { { echo "$as_me:$LINENO: error: newly created file is older than distributed files! Check your system clock" >&5 echo "$as_me: error: newly created file is older than distributed files! Check your system clock" >&2;} { (exit 1); exit 1; }; } fi the thing that makes this really frustrating is that this script works sometimes, when the VM has been running for an hour or so it works, but not at boot. There's nothing I see in the crontab that suggests any hourly tasks are run that might change the state of the system enough make a difference to this script working. I'm totally at a loss when it comes to debugging beyond here. What's the best approach to take here? Thanks

Read the article
Oracle Releases New Mainframe Re-Hosting in Oracle Tuxedo 11g

- by Jason Williamson

I'm excited to say that we've released our next generation of Re-hosting in 11g. In fact I'm doing some hands-on labs now for our Systems Integrators in Italy in a couple of weeks and targeting Latin America next month. If you are an SI, or Rehosting firm and are looking to become an Oracle Partner or get a better understanding of Tuxedo and how to use the workbench for rehosting...drop me a line. Oracle Tuxedo Application Runtime for CICS and Batch 11g provides a CICS API emulation and Batch environment that exploits the full range of Oracle Tuxedo's capabilities. Re-hosted applications run in a multi-node, grid environment with centralized production control. Also, enterprise integration of CICS application services benefits from an open and SOA-enabled framework. Key features include: CICS Application Runtime: Can run IBM CICS applications unchanged in an application grid, which enables the distribution of large workloads across multiple processors and nodes. This simplifies CICS administration and can scale to over 100,000 users and over 50,000 transactions per second. 3270 Terminal Server: Protects business users from change through support for tn3270 terminal emulation. Distributed CICS Resource Management: Simplifies deployment and administration by allowing customers to run CICS regions in a distributed configuration. Batch Application Runtime: Provides robust IBM JES-like job management that enables local or remote job submissions. In addition, distributed batch initiators can enable parallelization of jobs and support fail-over, shortening the batch window and helping to meet stringent SLAs. Batch Execution Environment: Helps to run IBM batch unchanged and also supports JCL functionality and all common batch utilities. Oracle Tuxedo Application Rehosting Workbench 11g provides a set of automated migration tools integrated around a central repository. The tools provide high precision which results in very low error rates and the ability to handle large applications. This enables less expensive, low-risk migration projects. Key capabilities include: Workbench Repository and Cataloguer: Ensures integrity of the migrated application assets through full dependency checking. The Cataloguer generates and maintains all relevant meta-data on source and target components. File Migrator: Supports reliable migration of datasets and flat files to an ISAM or Oracle Database 11g. This is done through the automated migration utilities for data unloading, reloading and validation. It also generates logical access functions to shield developers from data repository changes. DB2 Migrator: Similarly, this tool automates the migration of DB2 schema and data to Oracle Database 11g. COBOL Migrator: Supports migration of IBM mainframe COBOL assets (OLTP and Batch) to open systems. Adapts programs for compiler dialects and data access variations. JCL Migrator: Supports migration of IBM JCL jobs to a Tuxedo ART environment, maintaining the flow and characteristics of batch jobs.

Read the article
Big Data – Various Learning Resources – How to Start with Big Data? – Day 20 of 21

- by Pinal Dave

In yesterday’s blog post we learned how to become a Data Scientist for Big Data. In this article we will go over various learning resources related to Big Data. In this series we have covered many of the most essential details about Big Data. At the beginning of this series, I have encouraged readers to send me questions. One of the most popular questions is - “I want to learn more about Big Data. Where can I learn it?” This is indeed a great question as there are plenty of resources out to learn about Big Data and it is indeed difficult to select on one resource to learn Big Data. Hence I decided to write here a few of the very important resources which are related to Big Data. Learn from Pluralsight Pluralsight is a global leader in high-quality online training for hardcore developers. It has fantastic Big Data Courses and I started to learn about Big Data with the help of Pluralsight. Here are few of the courses which are directly related to Big Data. Big Data: The Big Picture Big Data Analytics with Tableau NoSQL: The Big Picture Understanding NoSQL Data Analysis Fundamentals with Tableau I encourage all of you start with this video course as they are fantastic fundamentals to learn Big Data. Learn from Apache Resources at Apache are single point the most authentic learning resources. If you want to learn fundamentals and go deep about every aspect of the Big Data, I believe you must understand various concepts in Apache’s library. I am pretty impressed with the documentation and I am personally referencing it every single day when I work with Big Data. I strongly encourage all of you to bookmark following all the links for authentic big data learning. Haddop - The Apache Hadoop® project develops open-source software for reliable, scalable, distributed computing. Ambari: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which include support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heat maps and ability to view MapReduce, Pig and Hive applications visually along with features to diagnose their performance characteristics in a user-friendly manner. Avro: A data serialization system. Cassandra: A scalable multi-master database with no single points of failure. Chukwa: A data collection system for managing large distributed systems. HBase: A scalable, distributed database that supports structured data storage for large tables. Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying. Mahout: A Scalable machine learning and data mining library. Pig: A high-level data-flow language and execution framework for parallel computation. ZooKeeper: A high-performance coordination service for distributed applications. Learn from Vendors One of the biggest issues with about learning Big Data is setting up the environment. Every Big Data vendor has different environment request and there are lots of things require to set up Big Data framework. Many of the users do not start with Big Data as they are afraid about the resources required to set up framework as well as a time commitment. Here Hortonworks have created fantastic learning environment. They have created Sandbox with everything one person needs to learn Big Data and also have provided excellent tutoring along with it. Sandbox comes with a dozen hands-on tutorial that will guide you through the basics of Hadoop as well it contains the Hortonworks Data Platform. I think Hortonworks did a fantastic job building this Sandbox and Tutorial. Though there are plenty of different Big Data Vendors I have decided to list only Hortonworks due to their unique setup. Please leave a comment if there are any other such platform to learn Big Data. I will include them over here as well. Learn from Books There are indeed few good books out there which one can refer to learn Big Data. Here are few good books which I have read. I will update the list as I will learn more. Ethics of Big Data Balancing Risk and Innovation Big Data for Dummies Head First Data Analysis: A Learner’s Guide to Big Numbers, Statistics, and Good Decisions If you search on Amazon there are millions of the books but I think above three books are a great set of books and it will give you great ideas about Big Data. Once you go through above books, you will have a clear idea about what is the next step you should follow in this series. You will be capable enough to make the right decision for yourself. Tomorrow In tomorrow’s blog post we will wrap up this series of Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Big Data – Operational Databases Supporting Big Data – RDBMS and NoSQL – Day 12 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the Cloud in the Big Data Story. In this article we will understand the role of Operational Databases Supporting Big Data Story. Even though we keep on talking about Big Data architecture, it is extremely crucial to understand that Big Data system can’t just exist in the isolation of itself. There are many needs of the business can only be fully filled with the help of the operational databases. Just having a system which can analysis big data may not solve every single data problem. Real World Example Think about this way, you are using Facebook and you have just updated your information about the current relationship status. In the next few seconds the same information is also reflected in the timeline of your partner as well as a few of the immediate friends. After a while you will notice that the same information is now also available to your remote friends. Later on when someone searches for all the relationship changes with their friends your change of the relationship will also show up in the same list. Now here is the question – do you think Big Data architecture is doing every single of these changes? Do you think that the immediate reflection of your relationship changes with your family member is also because of the technology used in Big Data. Actually the answer is Facebook uses MySQL to do various updates in the timeline as well as various events we do on their homepage. It is really difficult to part from the operational databases in any real world business. Now we will see a few of the examples of the operational databases. Relational Databases (This blog post) NoSQL Databases (This blog post) Key-Value Pair Databases (Tomorrow’s post) Document Databases (Tomorrow’s post) Columnar Databases (The Day After’s post) Graph Databases (The Day After’s post) Spatial Databases (The Day After’s post) Relational Databases We have earlier discussed about the RDBMS role in the Big Data’s story in detail so we will not cover it extensively over here. Relational Database is pretty much everywhere in most of the businesses which are here for many years. The importance and existence of the relational database are always going to be there as long as there are meaningful structured data around. There are many different kinds of relational databases for example Oracle, SQL Server, MySQL and many others. If you are looking for Open Source and widely accepted database, I suggest to try MySQL as that has been very popular in the last few years. I also suggest you to try out PostgreSQL as well. Besides many other essential qualities PostgreeSQL have very interesting licensing policies. PostgreSQL licenses allow modifications and distribution of the application in open or closed (source) form. One can make any modifications and can keep it private as well as well contribute to the community. I believe this one quality makes it much more interesting to use as well it will play very important role in future. Nonrelational Databases (NOSQL) We have also covered Nonrelational Dabases in earlier blog posts. NoSQL actually stands for Not Only SQL Databases. There are plenty of NoSQL databases out in the market and selecting the right one is always very challenging. Here are few of the properties which are very essential to consider when selecting the right NoSQL database for operational purpose. Data and Query Model Persistence of Data and Design Eventual Consistency Scalability Though above all of the properties are interesting to have in any NoSQL database but the one which most attracts to me is Eventual Consistency. Eventual Consistency RDBMS uses ACID (Atomicity, Consistency, Isolation, Durability) as a key mechanism for ensuring the data consistency, whereas NonRelational DBMS uses BASE for the same purpose. Base stands for Basically Available, Soft state and Eventual consistency. Eventual consistency is widely deployed in distributed systems. It is a consistency model used in distributed computing which expects unexpected often. In large distributed system, there are always various nodes joining and various nodes being removed as they are often using commodity servers. This happens either intentionally or accidentally. Even though one or more nodes are down, it is expected that entire system still functions normally. Applications should be able to do various updates as well as retrieval of the data successfully without any issue. Additionally, this also means that system is expected to return the same updated data anytime from all the functioning nodes. Irrespective of when any node is joining the system, if it is marked to hold some data it should contain the same updated data eventually. As per Wikipedia - Eventual consistency is a consistency model used in distributed computing that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value. In other words - Informally, if no additional updates are made to a given data item, all reads to that item will eventually return the same value. Tomorrow In tomorrow’s blog post we will discuss about various other Operational Databases supporting Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article

Search Results

Search found 6716 results on 269 pages for 'distributed algorithm'.

Page 126/269 | < Previous Page | 122 123 124 125 126 127 128 129 130 131 132 133 | Next Page >

- by Simon Cooper

- by wcoekaer

- by Abtin Forouzandeh

- by sparc

- by Tom Dalling

- by Chris Barry

- by Ayush Khemka

- by Edward Boyle

- by Griffin

- by user1620696

- by Prog

- by Louis Rhys

- by Prog

- by Thomas

- by rwallace

- by user129506

- by dab

- by pbhogan

- by Paulo Morgado

- by [email protected]

- by Chris Adams

- by Jason Williamson

- by Pinal Dave

- by Pinal Dave

< Previous Page | 122 123 124 125 126 127 128 129 130 131 132 133 | Next Page >