distributed transactions - Page 85

What is a good Java web crawler library?

- by DrDee

Hi, I am about to develop a crawler in Java but don't feel like reinventing the wheel. A quick Google search gives a whole bunch of Java libraries to build a web crawler. Besides that Nutch is of course a very robust package but seems a bit too advanced for my needs. I only need to crawl a handful websites a week containing a couple of 1000 pages each. Which open source Java library would you recommend considering: speed multithreading (or even distributed) extending it with new functionality active maintained and documentation?

Read the article

Can EC2 instances be set up to come from different IP ranges?

- by Joshua Frank

I need to run a web crawler and I want to do it from EC2 because I want the HTTP requests to come from different IP ranges so I don't get blocked. So I thought distributing this on EC2 instances might help, but I can't find any information about what the outbound IP range will be. I don't want to go to the trouble of figuring out the extra complexity of EC2 and distributed data, only to find that all the instances use the same address block and I get blocked by the server anyway. NOTE: This isn't for a DoS attack or anything. I'm trying to harvest data for a legitimate business purpose, I'm respecting robots.txt, and I'm only making one request per second, but the host is still shutting me down. Edit: Commenter Paul Dixon suggests that the act of blocking even my modest crawl indicates that the host doesn't want me to crawl them and therefore that I shouldn't do it (even assuming I can work around the blocking). Do people agree with this?

Read the article

Standard way of distributing source code?

- by penyuan

I am relatively new to programming, and have built a few working C++ commandline programs with Xcode in Mac OS X (no dependencies on Mac-only libraries or APIs). My question is: What is the standard way of packaging and distributing the source code (and possibly compiled binaries)? i.e. Almost all Linux programs seemed to be distributed that a user simply needs to run ./configure && make && make install from the source directory. Thank you.

Read the article

SQL Server becomes slow after restart

- by Tobi DM

I already posted this one on stackoverflow but someone gave me the hint to that I might have more luck on serverfault. We use SQL Server 2005 on an Windwos Server 2008. Ther Server has 48 GB RAM. SQL Server is configured to use 40 GB RAM. There is only one database hosted (About 70 GB). The only app beside SQL Server is our App-Server which connects the clients to the database. Now we encounter the following problem: After a restart of the server our the performance is great. The server grabs the 40 GB RAM wich it is allowed to and then runs fast as hell. But after about 4 weeks the system becomes slower and slower. The execution of statements (seen in the profiler) is raising slowly. But I cannot see that there is something going wrong on the server. CPU usage is at about 20% I/O also seems to be no Problem The process monitor does also not show that there are strange apps or something like that. Eventlog does also have no interessting messages No open transactions or blockings to see We do not use cursors in our app We tried already the following things without effect: Droped the cache by using the statements DBCC FreeProcCache DBCC FREESYSTEMCACHE('ALL') DBCC DropCleanbuffers Restarted the Appserver we are using. Restart the sql server service But nothing did help exept restarting the whole server. Any ideas?

Read the article

Is this way of using Excel 2007 Pivot table for BI scalable ?

- by Sim

Hi all, Background: We need to consolidate sales data across the country to do analysis Our Internet connection/IT expertise/IT investment is not quite strong, therefore full BI solution is out of question I tried several SaaS BI solution (GoodData, ZohoReports) and while they're good, they seem not to fully support what we need We're looking at 'bout 2 millions record for every 2 months My current approach Our (10) sites currently gathers data from all their branches and consolidate them into 1 Excel file with Pivot table and embed source data In HQ, I will request 10 sites to send back those Excel files periodically We will import those Excel to our MSSQL server There will be a master Excel file, that will also have the same pivot table (as those came from site Excel file), and datasource is the MSSQL server More details For testing, I currently use MSSQL 2008 Express on my laptop So far, I imported our transactions for the past 2 months and there are 2 millions+ row in 1 table in MSSQL (we just use 1 table, corresponding to our common pivot table structure). DB size is ~ 600 MB In the master Excel file, if not including the source data, it's just < 10MB. Including the source data will increase the size to 60 MB (so I supposed Office 2007 automatically zip the data ?) I try using the Pivot (drag-and-drop fields) and the performance so far is OK (my laptop specs: C2D T7200, 3GB RAM, Windows XP) So my question is : If we're looking at full year transaction (roughly 15 millions rows in MSSQL 2008 Express, 3.6 GB in size), is there any issue with that 15 million rows in 1 table in SQL Express ? Is there any performance issue with the pivot table at that time ? Can it still embed the source data ? (I google-ed but didn't find the maximum size of source data Excel 2007 can embed) Any other suggestions on how we can better do this ? Given that we can't afford the full BI solution, any light-weight/budget/SaaS BI that you can recommend ? Thanks

Read the article

XML over HTTP with JMS and Spring

- by Will Sumekar

I have a legacy HTTP server where I need to send an XML file over HTTP request (POST) using Java (not browser) and the server will respond with another XML in its HTTP response. It is similar to Web Service but there's no WSDL and I have to follow the existing XML structure to construct my XML to be sent. I have done a research and found an example that matches my requirement here. The example uses HttpClient from Apache Commons. (There are also other examples I found but they use java.net networking package (like URLConnection) which is tedious so I don't want to use them). But it's also my requirement to use Spring and JMS. I know from Spring's reference that it's possible to combine HttpClient, JMS and Spring. My question is, how? Note that it's NOT in my requirement to use HttpClient. If you have a better suggestion, I'm welcome. Appreciate it. For your reference, here's the XML-over-HTTP example I've been talking about: /* * $Header: * $Revision$ * $Date$ * ==================================================================== * * Copyright 2002-2004 The Apache Software Foundation * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. * ==================================================================== * * This software consists of voluntary contributions made by many * individuals on behalf of the Apache Software Foundation. For more * information on the Apache Software Foundation, please see * <http://www.apache.org/>. * * [Additional notices, if required by prior licensing conditions] * */ import java.io.File; import java.io.FileInputStream; import org.apache.commons.httpclient.HttpClient; import org.apache.commons.httpclient.methods.InputStreamRequestEntity; import org.apache.commons.httpclient.methods.PostMethod; /** * * This is a sample application that demonstrates * how to use the Jakarta HttpClient API. * * This application sends an XML document * to a remote web server using HTTP POST * * @author Sean C. Sullivan * @author Ortwin Glück * @author Oleg Kalnichevski */ public class PostXML { /** * * Usage: * java PostXML http://mywebserver:80/ c:\foo.xml * * @param args command line arguments * Argument 0 is a URL to a web server * Argument 1 is a local filename * */ public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println( "Usage: java -classpath <classpath> [-Dorg.apache.commons."+ "logging.simplelog.defaultlog=<loglevel>]" + " PostXML <url> <filename>]"); System.out.println("<classpath> - must contain the "+ "commons-httpclient.jar and commons-logging.jar"); System.out.println("<loglevel> - one of error, "+ "warn, info, debug, trace"); System.out.println("<url> - the URL to post the file to"); System.out.println("<filename> - file to post to the URL"); System.out.println(); System.exit(1); } // Get target URL String strURL = args[0]; // Get file to be posted String strXMLFilename = args[1]; File input = new File(strXMLFilename); // Prepare HTTP post PostMethod post = new PostMethod(strURL); // Request content will be retrieved directly // from the input stream // Per default, the request content needs to be buffered // in order to determine its length. // Request body buffering can be avoided when // content length is explicitly specified post.setRequestEntity(new InputStreamRequestEntity( new FileInputStream(input), input.length())); // Specify content type and encoding // If content encoding is not explicitly specified // ISO-8859-1 is assumed post.setRequestHeader( "Content-type", "text/xml; charset=ISO-8859-1"); // Get HTTP client HttpClient httpclient = new HttpClient(); // Execute request try { int result = httpclient.executeMethod(post); // Display status code System.out.println("Response status code: " + result); // Display response System.out.println("Response body: "); System.out.println(post.getResponseBodyAsString()); } finally { // Release current connection to the connection pool // once you are done post.releaseConnection(); } } }

Read the article

Is it possible to partition more than one way at a time in SQL Server?

- by meeting_overload

I'm considering various ways to partition my data in SQL Server. One approach I'm looking at is to partition a particular huge table into 8 partitions, then within each of these partitions to partition on a different partition column. Is this even possible in SQL Server, or am I limited to definining one parition column+function+scheme per table? I'm interested in the more general answer, but this strategy is one I'm considering for Distributed Partitioned View, where I'd partition the data under the first scheme using DPV to distribute the huge amount of data over 8 machines, and then on each machine partition that portion of the full table on another parition key in order to be able to drop (for example) sub-paritions as required.

Read the article

Load balancer - how to write one for a custom application?

- by Poni

Hi! I've written a simple server application which will run distributed on several machines. My question is how does a network load balancer works, in general? I've heard of round-robin and other algorithms, but what I haven't got answer to is how does the process really goes? In socket terms. The client connects to one of the load balancer machines, asks for a "free-to-connect-to" server and simply connects to it? That's the simpliest way I can think of. .. or, does it use the load balancer as a proxy (that implies that all the NBs must be always connected to the application servers, and data is transferred through them)? It's more of a general question. How would you do this? Thank you all!

Read the article

DLL dependant on curllib.dll - How can I fix this?

- by haraldo

Hi there, I'm new to developing in C++. I've developed a dll where I'm using curllib to make HTTP requests. When running the dll via depend.exe it notifies me that my dll now depends on the curllib.dll. This simply doesn't work for me. My dll is set as a static library not shared and will be distributed on its own. I cannot rely on a user having libcurl.dll installed. I thought by including libcurl into my project this is all that would be needed and my dll could be independent. If this is impossible to resolve is there an alternative method I can use to create HTTP requests? Obviously I would prefer to use libcurl. Thanks in advance.

Read the article

git: having 2 push/pull repos in sync (or 1 push/pull and 1 pull in sync)

- by xavjuan

Hello, We work on multiple geographically seperate sites. Today I have our git clones all live on one site A. Then users from site B have to ssh over to do a git clone or to push in changes. These are bare repos where the update is through pushes. Ideally, for git clone/push performance, I'd like to limit having to go over ssh. I'd like to have a copy of git repo X live on site A and site B... and have some syncing mechanism between them. OR to have X live on both sites, but only allow pushing to A (and have that setup correctly at clone time on B) I'm worried about the case where someone on site A pushes changes to the repo at site A at the same time that someone on site B pushes a truely conflicting change to the repo at site B. Is there some 'sync'ing solution built into git for distributed open repos like this? Or a way to have a clone from X set the origin/parent to the X from the other site? thanks, -John

Read the article

Would Mercurial help me work from 2 PCs?

- by rikh

I currently use Perforce for source control, but want to start working on the code from 2 different PCs at the same time (desktop and laptop). The laptop would not be able to access the perforce server very often, which makes Perforce a poor choice in this setup. Distributed source control tools like Mercurial seem better suited to the task, but I am still not clear if this would work or not. Does anyone have any experience of using Mercurial to work on 2 machines at once (eg desktop in the week, laptop in evening and weekends). Does it help, or is it still a pain the butt keeping everything in sync and knowing what is going on.

Read the article

python mysqldb - mysql server gone away - can't reconnect

- by david.barkhuizen

When attempting to import a bunch of data into mysql tables using python and mysqldb, I run into the following error '2006 - mySQL Server has gone away', and then I am unable to reconnect again within the script. I am iniitially re-using a connection object across transactions ( delineated by conn.commit() ), then when I first encounter this exception, if I create a new connection by calling MySQLdb.connect(), this new connection also fails with the same exception. This error does not occur immediately, I can pump a fair amount of data into the db, but then faithfully occurs after I have inserted a couple thousand records, so roughly once the db has committed a certain transaction volume, it always falls over like this. If I rerun the script, WITHOUT restarting the db server. then it resumes where it left off, pumps in some data, then falls over again. Before recommendations to change time-out timings, does anyone know why I am not able to establish a new connection after the initial failure ? - Even if I try a couple of times waiting a couple of seconds between each. (btw, I'm running Windows 7, mysql server 5.1.48, mysqldb 1.2.3.gamma.1, python 2.6)

Read the article

What is the best way to password protect folder/page using php without a db or username

- by Salt Packets

What is the best way to password protect folder using php without a database or user name but using. Basically I have a page that will list contacts for organization and need to password protect that folder without having account for every user . Just one password that gets changes every so often and distributed to the group. I understand that it is not very secure but never the less I would like to know how to do this. In the best way. It would be nice if the password is remembered for a while once user entered it correctly.

Read the article

php cache zend framework

- by msaif

server side is PHP + zend framework. problem: i have huge of data appox 5000 records and no of columns are 5 in input.txt file. i like to read all data into memory only once and send some data to the every browser request. but if i update that input.txt file then updated data must be auto synchronized to that memory location. so i need to solve that problem by using memory caching technique.but caching technique has expire time.but if input.txt is updated before cache expire then i need to auto synchronize to that memory location. now i am using zend framework 1.10.is it possible in zend framework. can anybody give me some line of code of zendfrmawork i have no option to use memchached server(distributed). Only zend framwork.

Read the article

Plugin architecture in .net: unloading

- by henchman

Hello everybody, I need to implement a plugin architecture within c#/.net in order to load custom user defined actions data type handling code for a custom data grid / conversion / ... from non-static linked assembly files. Because the application has to handle many custom user defined actions, Iam in need for unloading them once executed in order to reduce memory usage. I found several good articles about plugin architectures, eg: ExtensionManager PluginArchitecture ... but none of them gave me enough sausage for properly unloading an assembly. As the program is to be distributed and the user defined actions are (as the name states) user defined: how to i prevent the assembly from executing malicious code (eg. closing my progra, deleting files)? Are there any other pitfalls one of you has encountered?

Read the article

FileInputStream for a generic file System

- by Akhil

I have a file that contains java serialized objects like "Vector". I have stored this file over Hadoop Distributed File System(HDFS). Now I intend to read this file (using method readObject) in one of the map task. I suppose FileInputStream in = new FileInputStream("hdfs/path/to/file"); wont' work as the file is stored over HDFS. So I thought of using org.apache.hadoop.fs.FileSystem class. But Unfortunately it does not have any method that returns FileInputStream. All it has is a method that returns FSDataInputStream but I want a inputstream that can read serialized java objects like vector from a file rather than just primitive data types that FSDataInputStream would do. Please help!

Read the article

List of private iPhone APIs?

- by diego nunes

. . Hi there, everybody. . . I need to do an app to be distributed ad hoc (it doesn't need to go to the store) but I need to get the information about the "data usage" (gprs/3g traffic). It is available on the system, but there is no official API call to get that info. One app made it through Apple testing (it's called "Download Meter"), though, and I emailed the guys to see if they would share the call, but they were not in that mood. . . Is there any list of private APIs or anything like that? Does anyone have any ideas of how could I get that info? Again: the app doesn't need to go to the store, but I need to install it on stock iPhone (ad hoc will do). . Thanks.

Read the article

Building a minimal plugin architecture in Python.

- by dF

I have an application, written in Python, which is used by a fairly technical audience (scientists). I'm looking for a good way to make the application extensible by the users, i.e. a scripting/plugin architecture. I am looking for something extremely lightweight. Most scripts, or plugins, are not going to be developed and distributed by a third-party and installed, but are going to be something whipped up by a user in a few minutes to automate a repeating task, add support for a file format, etc. So plugins should have the absolute minimum boilerplate code, and require no 'installation' other than copying to a folder (so something like setuptools entry points, or the Zope plugin architecture seems like too much.) Are there any systems like this already out there, or any projects that implement a similar scheme that I should look at for ideas / inspiration?

Read the article

Articles about replication schemes/algorithms?

- by jkff

I'm designing an hierarchical distributed system (every node has zero or more "master" nodes to which it propagates its current data). The data gets continuously updated and I'd like to guarantee that at least N nodes have almost-current data at any given time. I do not need complete consistency, only eventual consistency (t.i. for any time instant, the current snapshot of data should eventually appear on at least N nodes. It is tricky to define the term "current" here, but still). Nodes may fail and go back up at any moment, and there is no single "central" node. O overflowers! Point me to some good papers describing replication schemes. I've so far found one: Consistency Management in Optimistic Replication Algorithms

Read the article

Question about mysql indexes on low to medium cardinality columns

- by Kevin J

I have a general question about the way that database indexing works, particularly in mysql. Let's say I have a table with a million rows with a column "ClientID" that is distributed relatively equally among 30 values. Thus, this column is very low cardinality (30) relative to the primary key (1 million). Now, I understand that you shouldn't create indexes on low cardinality fields. However, in this case, queries are only ever done with one of the 30 clientIDs. Thus, wouldn't creating an index on ClientID be helpful, as the search space is automatically reduced to 1/30th what it normally would be? Or is my understanding of how the index works flawed? Thanks

Read the article

Building Cocoa UIs for OS X with C# and Mono

- by Antony Perkov

Has anyone spent any time comparing the various Objective C bridges and associated Cocoa wrappers for Mono? I want to port an existing C# application to run on OS X. Ideally I'd run the application on Mono, and build a native Cocoa UI for it. I'm wondering which bridge would be the best choice. In case it's useful to anyone, here are some links to bridges I've found so far: CocoSharp - distributed with Mono on OS X - www.cocoa-sharp.com Monobjc - better documentation than the others (in my opinion) - www.mono-project.com/CocoaSharp and www.monobjc.net NObjective - (apparently) faster than the others - code.google.com/p/nobjective MObjc / MCocoa - code.google.com/p/mobjc and code.google.com/p/mcocoa ObjC# - www.mono-project.com/ObjCSharp

Read the article

What should I learn to improve my Java skills ?

- by hory.incpp

Hello, I currently know Java SE and I want to learn something more 'enterprise'. I would like something more distributed (app server, server programming, web, content management system ...) but any suggestion is ok. There are many frameworks which I've heard: spring, hibernate, persistence, ejb, jsp, servlet, jsf, jboss, glassfish, ant etc etc etc etc. I'm very confused where to start. So the question is: Can somebody explain to me what actually there frameworks are; and which one should I start with ? Thank you.

Read the article

assistance with classifying tests

- by amateur

I have a .net c# library that I have created that I am currently creating some unit tests for. I am at present writing unit tests for a cache provider class that I have created. Being new to writing unit tests I have 2 questions These being: My cache provider class is the abstraction layer to my distributed cache - AppFabric. So to test aspects of my cache provider class such as adding to appfabric cache, removing from cache etc involves communicating with appfabric. Therefore the tests to test for such, are they still categorised as unit tests or integration tests? The above methods I am testing due to interacting with appfabric, I would like to time such methods. If they take longer than a specified benchmark, the tests have failed. Again I ask the question, can this performance benchmark test be classifed as a unit test? The way I have my tests set up I want to include all unit tests together, integration tests together etc, therefore I ask these questions that I would appreciate input on.

Read the article

continuous integration web service

- by Josh Moore

I am in a position where I could become a team leader of a team distributed over two countries. This team would be the tech. team for a start up company that we plan to bootstrap on limited funds. So I am trying to find out ways to minimize upfront expenses. Right now we are planning to use Java and will have a lot of junit tests. I am planing on using github for VCS and lighthouse for a bug tracker. In addition I want to add a continuous integration server but I do not know of any continuous integration servers that are offered as a web service. Does anybody know if there are continuous integration servers available in a software as a service model? P.S. if anybody knows were I can get these three services at one location that would be great to know to.

Read the article

amazon design doubt

- by praveen

I was looking at the amazon website and was wondering how one of the feature would have been implemented. The feature : what customers buy after viewing a particular item. If i were to develop such a feature i would probably generate a session id for each user session and store the session id-page id combination in a log file. and if a book is bought set a separate flag for the session id-page id. A separate program can then be run on the log file periodically, to identify the groups that were bought together/viewed together and that information can be stored in a persistent file. This is ofcourse a simple solution without taking into consideration the distributed nature of the servers - but would this suffice or can you help me identify a better design.

Search Results

Search found 2993 results on 120 pages for 'distributed transactions'.

Page 85/120 | < Previous Page | 81 82 83 84 85 86 87 88 89 90 91 92 | Next Page >

- by DrDee

- by Joshua Frank

- by penyuan

- by Tobi DM

- by Sim

- by Will Sumekar

- by meeting_overload

- by Poni

- by haraldo

- by xavjuan

- by rikh

- by david.barkhuizen

- by Salt Packets

- by msaif

- by henchman

- by Akhil

- by diego nunes

- by dF

- by jkff

- by Kevin J

- by Antony Perkov

- by hory.incpp

- by amateur

- by Josh Moore

- by praveen

< Previous Page | 81 82 83 84 85 86 87 88 89 90 91 92 | Next Page >