Search Results

Search found 777 results on 32 pages for 'volumes'.

Page 3/32 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

Triangular bounding volumes

- by Cheery

I've come up with an alternative for beziers that might be easier to ray-trace, perhaps even though a plain vertex shader. Though there's missing a piece. I need to find the parametric surface equation from the surface normals I have for edge vertices. I also have to know it's peak and valley so I can constraint the depth of my bounding triangle. Image explains the overall idea: I build a bounding-volume from a control triangle. Then apply a function to each parametric coordinate of the triangle (s+t+u=1 where s,t,u = 0) to get the height coordinate for that certain point. Simply put, it produces a procedurally generated height-map for the triangle's surface. I just need to find a function that generates the height-map so I can make it work.

Read the article
Database for managing large volumes of (system) metrics

- by symcbean

Hi, I'm looking at building a system for managing and reporting stats on web page performance. I'll be collecting a lot more stats than are available in the standard log formats (approx 20 metrics) but compared to most types of database applications, the base data structure will be very simple. My problem is that I'll be accumulating a lot of data - in the region of 100,000 records (i.e. sets of metrics) per hour. Of course, resources are very limited! So that its possible to sensibly interact with the data, I'd need to consolidate each metric into one minute bins, broken down by URL, then for anything more than 1 day old, consolidated into 10 minute bins, then at 1 week, hourly bins. At the front end, I want to provide a view (prefereably as plots) of the last hour of data, with the facility for users to drill up/down through defined hierarchies of URLs (which do not always map directly to the hierarchy expressed in the path of the URL) and to view different time frames. Rather than coding all this myself and using a relational database, I was wondering if there were tools available which would facilitate both the management of the data and the reporting. I had a look at Mondrian however I can't see from the documentation I've looked at whether it's possible to drop the more granular information while maintaining the consolidated views of the data. RRDTool looks promising in terms of managing the data consolidation, but seems to be rather limited in terms of querying the dataset as a multi-dimensional/relational database. What else whould I be looking at?

Read the article
Cannot disable spotlight indexing on volume

- by jayhendren

I have a FAT32 partition on my HDD. When using OSX, it is mounted to /Volumes/MEDIA. After a recent upgrade to Mavericks, spotlight is having trouble indexing it, eating up almost all of my system resources, and I cannot get the indexing to stop: [jay-mba-osx ~]% sudo mdutil -v -a -i off /: Indexing disabled. [jay-mba-osx ~]% sudo mdutil -v -V /Volumes/MEDIA -i off [jay-mba-osx ~]% mdutil -v -a -s /: Indexing disabled. /Volumes/BOOTCAMP: Indexing disabled. /Volumes/MEDIA: Indexing enabled. [jay-mba-osx ~]% [jay-mba-osx ~]% sudo mdutil -v -V /Volumes/MEDIA -E /Volumes/MEDIA: Indexing enabled. [jay-mba-osx ~]% sudo mdutil -v -V /Volumes/MEDIA -i off [jay-mba-osx ~]% mdutil -v -a -s /: Indexing disabled. /Volumes/BOOTCAMP: Indexing disabled. /Volumes/MEDIA: Indexing enabled. [jay-mba-osx ~]% How to I tell spotlight to "cease and desist" on my MEDIA volume? I only want or need spotlight indexing on my OSX partition.

Read the article
Copy Network Volume configuration among Linux systems

- by David Yu

I have several standalone Debian Linux (Lenny) systems. As it stands now, all of the systems are configured with a generic login account. This login account has a network volume that connects to a Windows share on a Windows server. I need to create a batch of user accounts on all of the systems (this part I figured out). After I create all of the user accounts, I need all of them to have the same network volume mapping as the current generic account. Is the network volume configuration saved somewhere, where I could copy that configuration across all of the user accounts?

Read the article
Shrink a Volume Group in LVM / Linux in order to install Windows on the freed space

- by Stephan Kristyn

I have a Volume Group with Unused space. This 40Gig should become an entidy in order to install Microsoft windows 7 on it. I do not have extra space on the drive - that is why I want to shrink the VG! LVG berta resides on sda2 and consists of lv_root lv_swap unused_space I want it to become lv_root lv_swap and have a seperate entidy made out of unused_space. Microsoft Windows 7 has to get installed on this entidy. I do not understand why Linux made simple things complicated. I utterly hate LVM and think its absolute bollocks. Useful Sources: http://www.centos.org/docs/5/html/Deployment_Guide-en-US/s1-system-config-lvm.html Edit: I found the answer. The necessary steps depict how complicated LVM really is. In my opinion it is best to avoiding LVM until pvresize matures as promised in its man pages. Answer: http://fedorasolved.org/Members/zcat/shrink-lvm-for-new-partition If you run into problems when you want to remove lvswap even if in resuce mode, then try swapoff /dev/vg_1/lv_swap lvchange -an /dev/vg_1/lv_swap

Read the article
Is it possible to allow the user to access the 'Volumes' without asking the Administrator's password?

- by Tom

Am the administrator of my Ubuntu system. Recently I added a new user account. But when ever the user tries to access or open the 'Volumes'(Drives where movies, songs and other files are stored) it asks for the Administrator's password. I created the user account to my other family members and I don't want to tell them my password. So is it possible to allow them to access the Volumes without asking Administrator's password ? UPDATE : Ubuntu was installed alongside Windows in my system. I will provide a screenshot of the Volume details -

Read the article
Is it possible to allow the User Account to access the 'Volumes' without asking the Administrator's password?

- by Tom

Am the administrator of my Ubuntu system. Recently I added a new user account. But when ever the user tries to access or open the 'Volumes'(Drives where movies, songs and other files are stored) it asks for the Administrator's password. I created the user account to my other family members and I don't want to tell them my password. So is it possible to allow them to access the Volumes without asking Administrator's password ? UPDATE : Ubuntu was installed alongside Windows in my system. I will provide a screenshot of the Volume details -

Read the article
How should I configure TRIM Support for LVM logical volumes?

- by Zack Perry

I am setting up a notebook for software demo purpose. The machine has a Intel Core i7 CPU, 8GB RAM, a 128GB SSD, and runs Ubuntu 12.04 LTS 64bit desktop. As it is, the SSD is configured to have a single volume group, with /boot, /swap, and / all in their respective logical volumes. They collectively consume 30GB space. I plan to use the remaining for logical volumes for KVM guests, all run Ubuntu 12.04 Server I would like to ensure that the SSD is utilized optimally. Although on this site, there are some great info about setting up TRIM support for file system setups that do not involve LVM, I have not found explicit guide regarding my planned setup. I did found this page which talks about adding issue_discards in /etc/lvm/lvm.conf. But in said file on my machine, I didn't find the cited content. I double-checked man lvm.conf(5), didn't see any mentioning of this option either. Thus, I'm not sure what to do. Furthermore, even say adding the option is the right thing to do, should I in my machine's /etc/fstab still add mount options such as noatime etc? Any tips, pointers, and/or further guidance are greatly appreciated.

Read the article
Fin de LINQ to HPC : Microsoft abandonne sa plateforme de traitement de gros volumes de données pour son concurrent Hadoop

Fin de LINQ to HPC : Microsoft abandonne sa plateforme de traitement de gros volumes de données pour se concentrer sur le support de son concurrent Hadoop Microsoft abandonne LINQ to HPC (High performance computing), nom de code Dryad, sa propre plateforme haute performance pour des calculs distribués et la gestion intensive des données, pour se concentrer sur le support de son concurrent Hadoop dans ses produits. L'éditeur avait récemment manifesté son intérêt pour la plateforme Java de stockage et de traitement par lot de très grandes quantités de données (Big Data) Hadoop, en publiant notamment deux connecteurs

Read the article
How to best transfer large payloads of data using wsHttp with WCF with message security

- by jpierson

I have a case where I need to transfer large amounts of serialized object graphs (via NetDataContractSerializer) using WCF using wsHttp. I'm using message security and would like to continue to do so. Using this setup I would like to transfer serialized object graph which can sometimes approach around 300MB or so but when I try to do so I've started seeing a exception of type System.InsufficientMemoryException appear. After a little research it appears that by default in WCF that a result to a service call is contained within a single message by default which contains the serialized data and this data is buffered by default on the server until the whole message is completely written. Thus the memory exception is being caused by the fact that the server is running out of memory resources that it is allowed to allocate because that buffer is full. The two main recommendations that I've come across are to use streaming or chunking to solve this problem however it is not clear to me what that involves and whether either solution is possible with my current setup (wsHttp/NetDataContractSerializer/Message Security). So far I understand that to use streaming message security would not work because message encryption and decryption need to work on the whole set of data and not a partial message. Chunking however sounds like it might be possible however it is not clear to me how it would be done with the other constraints that I've listed. If anybody could offer some guidance on what solutions are available and how to go about implementing it I would greatly appreciate it. Related resources: Chunking Channel How to: Enable Streaming Large attachments over WCF Custom Message Encoder Another spotting of InsufficientMemoryException I'm also interested in any type of compression that could be done on this data but it looks like I would probably be best off doing this at the transport level once I can transition into .NET 4.0 so that the client will automatically support the gzip headers if I understand this properly.

Read the article
Using Hibernate's ScrollableResults to slowly read 90 million records

- by at

I simply need to read each row in a table in my MySQL database using Hibernate and write a file based on it. But there are 90 million rows and they are pretty big. So it seemed like the following would be appropriate: ScrollableResults results = session.createQuery("SELECT person FROM Person person") .setReadOnly(true).setCacheable(false).scroll(ScrollMode.FORWARD_ONLY); while (results.next()) storeInFile(results.get()[0]); The problem is the above will try and load all 90 million rows into RAM before moving on to the while loop... and that will kill my memory with OutOfMemoryError: Java heap space exceptions :(. So I guess ScrollableResults isn't what I was looking for? What is the proper way to handle this? I don't mind if this while loop takes days (well I'd love it to not). I guess the only other way to handle this is to use setFirstResult and setMaxResults to iterate through the results and just use regular Hibernate results instead of ScrollableResults. That feels like it will be inefficient though and will start taking a ridiculously long time when I'm calling setFirstResult on the 89 millionth row... UPDATE: setFirstResult/setMaxResults doesn't work, it turns out to take an unusably long time to get to the offsets like I feared. There must be a solution here! Isn't this a pretty standard procedure?? I'm willing to forgo Hibernate and use JDBC or whatever it takes. UPDATE 2: the solution I've come up with which works ok, not great, is basically of the form: select * from person where id > <offset> and <other_conditions> limit 1 Since I have other conditions, even all in an index, it's still not as fast as I'd like it to be... so still open for other suggestions..

Read the article
Practical size limitations for RDBMS

- by grenade

I am working on a project that must store very large datasets and associated reference data. I have never come across a project that required tables quite this large. I have proved that at least one development environment cannot cope at the database tier with the processing required by the complex queries against views that the application layer generates (views with multiple inner and outer joins, grouping, summing and averaging against tables with 90 million rows). The RDBMS that I have tested against is DB2 on AIX. The dev environment that failed was loaded with 1/20th of the volume that will be processed in production. I am assured that the production hardware is superior to the dev and staging hardware but I just don't believe that it will cope with the sheer volume of data and complexity of queries. Before the dev environment failed, it was taking in excess of 5 minutes to return a small dataset (several hundred rows) that was produced by a complex query (many joins, lots of grouping, summing and averaging) against the large tables. My gut feeling is that the db architecture must change so that the aggregations currently provided by the views are performed as part of an off-peak batch process. Now for my question. I am assured by people who claim to have experience of this sort of thing (which I do not) that my fears are unfounded. Are they? Can a modern RDBMS (SQL Server 2008, Oracle, DB2) cope with the volume and complexity I have described (given an appropriate amount of hardware) or are we in the realm of technologies like Google's BigTable? I'm hoping for answers from folks who have actually had to work with this sort of volume at a non-theoretical level.

Read the article
How would you handle making an array or list that would have more entries than the standard implemen

- by faceless1_14

I am trying to create an array or list that could handle in theory, given adequate hardware and such, as many as 100^100 BigInteger entries. The problem with using an array or standard list is that they can only hold Integer.MAX_VALUE number of entries. How would you work around this limitations? A whole new class/interface? A wrapper for list? another data type entirely?

Read the article
Doing large updates against indexed view

- by user217136

We have an indexed view that runs across three large tables. Two of these tables (A & B) are constantly getting updated with user transactions and the other table (C) contains data product info that is needs to be updated once a week. This product table contains over 6 million records. We need this view across these three tables for our core business process and unfortunately we cannot change this aspect. We even had a sql server MVP come in to help test under load to make sure we have the most efficient configuration. There is one column in the product table that gets utilized in the view and has to be updated each week. The problem we are now encountering is that as volume is increasing on our transactions against tables A & B, the update to Table C is causing deadlocks. I have tried several different methods to no avail: 1) I was hoping that we could change the view so that table C could be a dirty read "WITH (NOLOCK)" but apparently that functionality is not available with indexes views. 2) I thought about updating a new column in Table C and then just renaming it when the process is done but you cannot do that due to the dependency in the view. 3) I also entertained the idea of writing this value to a temporary product table, and then running an ALTER statement against the view to have it point to my new table. however when i did that the indexes on my view were dropped and it took quite a bit of time to recreate them. 4) we tried to do the weekly update in small chunks (as small as 100 records at a time) but we still run into dead locks. questions: a) we are using sql server 2005. Does sql server 2008 have a new functionality with their indexed views that would help us? Is there now a way to do dirty reads w/ an indexed view? b) a better approach to altering an existing view to point to a new table? thanks!

Read the article
large amount of data in many text files - how to process?

- by Stephen

Hi, I have large amounts of data (a few terabytes) and accumulating... They are contained in many tab-delimited flat text files (each about 30MB). Most of the task involves reading the data and aggregating (summing/averaging + additional transformations) over observations/rows based on a series of predicate statements, and then saving the output as text, HDF5, or SQLite files, etc. I normally use R for such tasks but I fear this may be a bit large. Some candidate solutions are to 1) write the whole thing in C (or Fortran) 2) import the files (tables) into a relational database directly and then pull off chunks in R or Python (some of the transformations are not amenable for pure SQL solutions) 3) write the whole thing in Python Would (3) be a bad idea? I know you can wrap C routines in Python but in this case since there isn't anything computationally prohibitive (e.g., optimization routines that require many iterative calculations), I think I/O may be as much of a bottleneck as the computation itself. Do you have any recommendations on further considerations or suggestions? Thanks

Read the article
Getting started with massive data

- by Max

I'm a math guy and occasionally do some statistics/machine learning analysis consulting projects on the side. The data I have access to are usually on the smaller side, at most a couple hundred of megabytes (and almost always far less), but I want to learn more about handling and analyzing data on the gigabyte/terabyte scale. What do I need to know and what are some good resources to learn from? Hadoop/MapReduce is one obvious start. Is there a particular programming language I should pick up? (I primarily work now in Python, Ruby, R, and occasionally Java, but it seems like C and Clojure are often used for large-scale data analysis?) I'm not really familiar with the whole NoSQL movement, except that it's associated with big data. What's a good place to learn about it, and is there a particular implementation (Cassandra, CouchDB, etc.) I should get familiar with? Where can I learn about applying machine learning algorithms to huge amounts of data? My math background is mostly on the theory side, definitely not on the numerical or approximation side, and I'm guessing most of the standard ML algorithms don't really scale. Any other suggestions on things to learn would be great!

Read the article
Performing Aggregate Functions on Multi-Million Row Tables

- by Daniel Short

I'm having some serious performance issues with a multi-million row table that I feel I should be able to get results from fairly quick. Here's a run down of what I have, how I'm querying it, and how long it's taking: I'm running SQL Server 2008 Standard, so Partitioning isn't currently an option I'm attempting to aggregate all views for all inventory for a specific account over the last 30 days. All views are stored in the following table: CREATE TABLE [dbo].[LogInvSearches_Daily]( [ID] [bigint] IDENTITY(1,1) NOT NULL, [Inv_ID] [int] NOT NULL, [Site_ID] [int] NOT NULL, [LogCount] [int] NOT NULL, [LogDay] [smalldatetime] NOT NULL, CONSTRAINT [PK_LogInvSearches_Daily] PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY] ) ON [PRIMARY] This table has 132,000,000 records, and is over 4 gigs. A sample of 10 rows from the table: ID Inv_ID Site_ID LogCount LogDay -------------------- ----------- ----------- ----------- ----------------------- 1 486752 48 14 2009-07-21 00:00:00 2 119314 51 16 2009-07-21 00:00:00 3 313678 48 25 2009-07-21 00:00:00 4 298863 0 1 2009-07-21 00:00:00 5 119996 0 2 2009-07-21 00:00:00 6 463777 534 7 2009-07-21 00:00:00 7 339976 503 2 2009-07-21 00:00:00 8 333501 570 4 2009-07-21 00:00:00 9 453955 0 12 2009-07-21 00:00:00 10 443291 0 4 2009-07-21 00:00:00 (10 row(s) affected) I have the following index on LogInvSearches_Daily: /****** Object: Index [IX_LogInvSearches_Daily_LogDay] Script Date: 05/12/2010 11:08:22 ******/ CREATE NONCLUSTERED INDEX [IX_LogInvSearches_Daily_LogDay] ON [dbo].[LogInvSearches_Daily] ( [LogDay] ASC ) INCLUDE ( [Inv_ID], [LogCount]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] I need to pull inventory only from the Inventory for a specific account id. I have an index on the Inventory as well. I'm using the following query to aggregate the data and give me the top 5 records. This query is currently taking 24 seconds to return the 5 rows: StmtText ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- SELECT TOP 5 Sum(LogCount) AS Views , DENSE_RANK() OVER(ORDER BY Sum(LogCount) DESC, Inv_ID DESC) AS Rank , Inv_ID FROM LogInvSearches_Daily D (NOLOCK) WHERE LogDay DateAdd(d, -30, getdate()) AND EXISTS( SELECT NULL FROM propertyControlCenter.dbo.Inventory (NOLOCK) WHERE Acct_ID = 18731 AND Inv_ID = D.Inv_ID ) GROUP BY Inv_ID (1 row(s) affected) StmtText ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |--Top(TOP EXPRESSION:((5))) |--Sequence Project(DEFINE:([Expr1007]=dense_rank)) |--Segment |--Segment |--Sort(ORDER BY:([Expr1006] DESC, [D].[Inv_ID] DESC)) |--Stream Aggregate(GROUP BY:([D].[Inv_ID]) DEFINE:([Expr1006]=SUM([LOALogs].[dbo].[LogInvSearches_Daily].[LogCount] as [D].[LogCount]))) |--Sort(ORDER BY:([D].[Inv_ID] ASC)) |--Nested Loops(Inner Join, OUTER REFERENCES:([D].[Inv_ID])) |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1011], [Expr1012], [Expr1010])) | |--Compute Scalar(DEFINE:(([Expr1011],[Expr1012],[Expr1010])=GetRangeWithMismatchedTypes(dateadd(day,(-30),getdate()),NULL,(6)))) | | |--Constant Scan | |--Index Seek(OBJECT:([LOALogs].[dbo].[LogInvSearches_Daily].[IX_LogInvSearches_Daily_LogDay] AS [D]), SEEK:([D].[LogDay] > [Expr1011] AND [D].[LogDay] < [Expr1012]) ORDERED FORWARD) |--Index Seek(OBJECT:([propertyControlCenter].[dbo].[Inventory].[IX_Inventory_Acct_ID]), SEEK:([propertyControlCenter].[dbo].[Inventory].[Acct_ID]=(18731) AND [propertyControlCenter].[dbo].[Inventory].[Inv_ID]=[LOA (13 row(s) affected) I tried using a CTE to pick up the rows first and aggregate them, but that didn't run any faster, and gives me essentially the same execution plan. (1 row(s) affected) StmtText ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --SET SHOWPLAN_TEXT ON; WITH getSearches AS ( SELECT LogCount -- , DENSE_RANK() OVER(ORDER BY Sum(LogCount) DESC, Inv_ID DESC) AS Rank , D.Inv_ID FROM LogInvSearches_Daily D (NOLOCK) INNER JOIN propertyControlCenter.dbo.Inventory I (NOLOCK) ON Acct_ID = 18731 AND I.Inv_ID = D.Inv_ID WHERE LogDay DateAdd(d, -30, getdate()) -- GROUP BY Inv_ID ) SELECT Sum(LogCount) AS Views, Inv_ID FROM getSearches GROUP BY Inv_ID (1 row(s) affected) StmtText ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |--Stream Aggregate(GROUP BY:([D].[Inv_ID]) DEFINE:([Expr1004]=SUM([LOALogs].[dbo].[LogInvSearches_Daily].[LogCount] as [D].[LogCount]))) |--Sort(ORDER BY:([D].[Inv_ID] ASC)) |--Nested Loops(Inner Join, OUTER REFERENCES:([D].[Inv_ID])) |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1008], [Expr1009], [Expr1007])) | |--Compute Scalar(DEFINE:(([Expr1008],[Expr1009],[Expr1007])=GetRangeWithMismatchedTypes(dateadd(day,(-30),getdate()),NULL,(6)))) | | |--Constant Scan | |--Index Seek(OBJECT:([LOALogs].[dbo].[LogInvSearches_Daily].[IX_LogInvSearches_Daily_LogDay] AS [D]), SEEK:([D].[LogDay] > [Expr1008] AND [D].[LogDay] < [Expr1009]) ORDERED FORWARD) |--Index Seek(OBJECT:([propertyControlCenter].[dbo].[Inventory].[IX_Inventory_Acct_ID] AS [I]), SEEK:([I].[Acct_ID]=(18731) AND [I].[Inv_ID]=[LOALogs].[dbo].[LogInvSearches_Daily].[Inv_ID] as [D].[Inv_ID]) ORDERED FORWARD) (8 row(s) affected) (1 row(s) affected) So given that I'm getting good Index Seeks in my execution plan, what can I do to get this running faster? Thanks, Dan

Read the article
Read from one large file and write to many (tens, hundreds, or thousands) files in Java?

- by Rudiger

I have a large-ish file (4-5 GB compressed) of small messages that I wish to parse into approximately 6,000 files by message type. Messages are small; anywhere from 5 to 50 bytes depending on the type. Each message starts with a fixed-size type field (a 6-byte key). If I read a message of type '000001', I want to write append its payload to 000001.dat, etc. The input file contains a mixture of messages; I want N homogeneous output files, where each output file contains only the messages of a given type. What's an efficient a fast way of writing these messages to so many individual files? I'd like to use as much memory and processing power to get it done as fast as possible. I can write compressed or uncompressed files to the disk. I'm thinking of using a hashmap with a message type key and an outputstream value, but I'm sure there's a better way to do it. Thanks!

Read the article
How to pick a chunksize for python multiprocessing with large datasets

- by Sandro

I am attempting to to use python to gain some performance on a task that can be highly parallelized using http://docs.python.org/library/multiprocessing. When looking at their library they say to use chunk size for very long iterables. Now, my iterable is not long, one of the dicts that it contains is huge: ~100000 entries, with tuples as keys and numpy arrays for values. How would I set the chunksize to handle this and how can I transfer this data quickly? Thank you.

Read the article
What's the best way to transfer a large dataset over a .NET web service?

- by Malvineous

I've inherited a C# .NET application which talks to a web service, and the web service talks to an Oracle database. I need to add an export function to the UI, to produce an Excel spreadsheet of some of the data. I have created a web service function to run a database query, load the data into a DataTable and then return it, which works fine for a small number of rows. However there is enough data in the full run that the client application locks up for a few minutes and then returns a timeout error. Obviously this isn't the best way to retrieve such a large dataset. Before I go ahead and come up with some dodgy way of splitting the call, I'm wondering if there is already something in place that can handle this. At the moment I'm thinking of a startExport function then repeatedly calling a next50Rows function until there is no data left, but because web services are stateless this means I'm going to have to keep some sort of ID number around and deal with the associated permissions. It would mean that I don't have to load the entire data set into the web server's memory though, which is one good thing. So if anyone knows a better way to retrieve a large amount of data (in a table format) over a .NET web service, please let me know!

Read the article
What should i do for accomodating large scale data storage and retrieval?

- by kailashbuki

There's two columns in the table inside mysql database. First column contains the fingerprint while the second one contains the list of documents which have that fingerprint. It's much like an inverted index built by search engines. An instance of a record inside the table is shown below; 34 "doc1, doc2, doc45" The number of fingerprints is very large(can range up to trillions). There are basically following operations in the database: inserting/updating the record & retrieving the record accoring to the match in fingerprint. The table definition python snippet is: self.cursor.execute("CREATE TABLE IF NOT EXISTS `fingerprint` (fp BIGINT, documents TEXT)") And the snippet for insert/update operation is: if self.cursor.execute("UPDATE `fingerprint` SET documents=CONCAT(documents,%s) WHERE fp=%s",(","+newDocId, thisFP))== 0L: self.cursor.execute("INSERT INTO `fingerprint` VALUES (%s, %s)", (thisFP,newDocId)) The only bottleneck i have observed so far is the query time in mysql. My whole application is web based. So time is a critical factor. I have also thought of using cassandra but have less knowledge of it. Please suggest me a better way to tackle this problem.

Read the article
Avoid an "out of memory error" in Java(eclipse), when using large data structure?

- by gnomed

OK, so I am writing a program that unfortunately needs to use a huge data structure to complete its work, but it is failing with a "out of memory error" during its initialization. While I understand entirely what that means and why it is a problem, I am having trouble overcoming it, since my program needs to use this large structure and I don't know any other way to store it. The program first indexes a large corpus of text files that I provide. This works fine. Then it uses this index to initialize a large 2D array. This array will have nXn entries, where "n" is the number of unique words in the corpus of text. For the relatively small chunk I am testing it on(about 60 files) it needs to make approximately 30,000x30,000 entries. this will probably be bigger once I run it on my full intended corpus too. It consistently fails every time, after it indexes, while it is initializing the data structure(to be worked on later). Things I have done include: revamp my code to use a primitive "int[]" instead of a "TreeMap" eliminate redundant structures, etc... Also, I have run eclipse with "eclipse -vmargs -Xmx2g" to max out my allocated memory I am fairly confident this is not going to be a simple line of code solution, but is most likely going to require a very new approach. I am looking for what that approach is, any ideas? Thanks, B.

Read the article
Export large amount of data from Oracle 10G to SQL Server 2005

- by uniball

Dear all, I need to export 100 million data rows (avg row length ~ 100 bytes) from Oracle 10G database table into SQL server (over WAN/VLAN with 6MBits/sec capacity) on a regular basis. So far, these are the options that I have tried and a quick summary. Has anyone tried this before? Are there other better options? Which option would be the best in terms of performance and reliability? The time taken has been calculated using tests on smaller amounts of data and then extrapolating it to estimate the time required. Using data import wizard on the SQL server or SSIS packages to import the data. It will take around 150 hours to complete the task. Using Oracle batch job to spool data into a comma-delimited flat-file. Then using SSIS package to FTP this file to the SQL server and then load directly from the flat-file. The issue here is the size of the flat-file which is expected to run in GBs. Although this option is drastically different, I am even considering the option of using Linked Server to query the Oracle data directly at run-time to avoid bringing in data. Performance is a big problem and I have limited control over the Oracle database in terms of creating table indexes. Regards, Uniball

Read the article
How to load 1 milion records from database fast?

- by Leonard P.

Now we have a firebird database with 1.000.000 that must be processed after ALL are loaded in RAM memory. To get all of those we must extract data using (select * first 1000 ...) for 8 hours. What is the solution for this?

Read the article
what changes when your input is giga/terabyte sized?

- by Wang

I just took my first baby step today into real scientific computing today when I was shown a data set where the smallest file is 48000 fields by 1600 rows (haplotypes for several people, for chromosome 22). And this is considered tiny. I write Python, so I've spent the last few hours reading about HDF5, and Numpy, and PyTable, but I still feel like I'm not really grokking what a terabyte-sized data set actually means for me as a programmer. For example, someone pointed out that with larger data sets, it becomes impossible to read the whole thing into memory, not because the machine has insufficient RAM, but because the architecture has insufficient address space! It blew my mind. What other assumptions have I been relying in the classroom that just don't work with input this big? What kinds of things do I need to start doing or thinking about differently? (This doesn't have to be Python specific.)

Read the article

Search Results

Search found 777 results on 32 pages for 'volumes'.

Page 3/32 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by Cheery

- by symcbean

- by jayhendren

- by David Yu

- by Stephan Kristyn

- by Tom

- by Tom

- by Zack Perry

- by jpierson

- by at

- by grenade

- by faceless1_14

- by user217136

- by Stephen

- by Max

- by Daniel Short

- by Rudiger

- by Sandro

- by Malvineous

- by kailashbuki

- by gnomed

- by uniball

- by Leonard P.

- by Wang

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >