pdf scraping - Page 132

A download manager for Linux which saves downloaded files in directories by date like 2012_06_29

- by Gart

I've been using Download Master on Windows for years and what I liked most about it is that this program can automatically put downloaded files into directories by download date: /Downloads | |--/2012_06_28 | | | |--a.zip | |--b.pdf | ... | |--/2012_06_29 | | | |--c.txt | ... ... I'm looking for something similar for Linux. Is there any free download manager that can do this? I have tried KGet and uGet but they both seem to lack this feature. If there is a way to configure them to do that, I'll be happy to know about it. Thank you.

Read the article

Text comparison utility

- by Aaron

I know this has been asked before...but I have a spin as I have been trying out varying free software offerings. I want to rid out department of DiffDoc the problem is that I am having trouble locating something that will do what we need. WinMerge has been the latest attempt... The problem is simple. One Word doc...one PDF with a portion of it containing the text to be compared against. Compare them and be done. Raw text, ignore whitespace, ignore carriage returns, etc... Just compare the text and give me the results in some sort of report. NOTE: Have tried ExamDiff, kdiff3, Tortoise, and a few others...

Read the article

Apache/PHP serving file multiple times

- by easement

I have a system with a download.php page. The page takes and id and loads a file based on from the DB Record and then serves it up. I've noticed a couple instances where files are requested multiple times in short time spans (20ms). Times that are too quick for human input. There are plenty of instances where the downloader functions fine. However, in taking a closer look at the downloader’s usage, I did see some interesting behavior. For instance, the IP address xxx.xxx.xxx.xxx (which is one in a range owned by xxxxxx.de in Germany) came to the site through Google. They browsed around and then came to the page http://site.com/xxxx/press+125.php There they issued a request for /download.php?id=/ZZ/n+aH55Y= (a PDF) at 9:04:23AM. That alone is not a big deal. However, what is interesting is that the server seems to have been quite preoccupied with serving that request. In the logs the request first completes between 9:09:48 and 9:10:00. It looks like the user must have gotten tired of waiting during that time and requested the document two more times. Between 09:14:47 and 09:15:00 the same request appears again, except it is from 9:04:43AM, 20ms later than the first request. Then it pops up a third time, with a request that started at 09:05:06 completing between 09:19:55 and 09:19:58! I’m suspicious of that document. In looking through the logs I see other instances where it takes the server a little while to handle that specific file. Check out this list of requests from zzz.zzz.zzz.zzz[different than above] for the file /download.php?id=/ZZ/n+aH55Y= (the same docuemnt as before): Request time Complete Time 04:32:43 04:33:36 04:32:50 04:33:36 04:32:51 04:33:38 04:33:05 04:33:38 04:33:34 04:33:42 04:33:05 04:33:42 So something is definitely going on. Whether it has to do with this specific document tripping up the server, the download.php page’s code, or if we’re just seeing the evidence of some server level overload as it plays out in real time I’m not yet sure. In fairness, there are other instances of people downloading /download.php?id=/ZZ/n+aH55Y= (the same PDF) without error. However, it is interesting that the multiple processes only seem to happen with this one file, and then only when it is accessed through the page http://site.com/press+125.php . It bears further investigation if there’s something amiss inside the code that causes the system to fire off multiple download requests that occupy the server. I don't know if this press+125.php is a rabbit hole, but there is weird consicence. Any ideas? I'm totally out of ideas. Apache maxed out? Things like that. ///DOWNLOAD.php $file = new files(); $file->comparison_filter("id", "=", $id); //sql to load if ($file->load()) { $file->serve(); } //FILES function serve() { if ($this->is_loaded) { if (file_exists($this->get_value("filename"))) { if ($this->get_value("content_type") != "") { header("Content-Type: " . $this->get_value("content_type")); } header("Content-Length: " . filesize($this->get_value("filename"))); if ($this->get_value("flag_image") == 0 || $this->get_value("flag_image") == false) { header("Cache-Control: private"); header("Content-Disposition: attachment; filename=" . urlencode($this->get_value("original_filename"))); } set_time_limit(0); @readfile($this->get_value("filename")); exit; } } }

Read the article

How much HDD space would I need to cache the web while respecting robot.txts?

- by Koning Baard XIV

I want to experiment with creating a web crawler. I'll start with indexing a few medium sized website like Stack Overflow or Smashing Magazine. If it works, I'd like to start crawling the entire web. I'll respect robot.txts. I save all html, pdf, word, excel, powerpoint, keynote, etc... documents (not exes, dmgs etc, just documents) in a MySQL DB. Next to that, I'll have a second table containing all restults and descriptions, and a table with words and on what page to find those words (aka an index). How much HDD space do you think I need to save all the pages? Is it as low as 1 TB or is it about 10 TB, 20? Maybe 30? 1000? Thanks

Read the article

Good documentation tool that is not Latex?

- by flpgdt

I am far from being a expert in Latex but I'm ok to document my projects with it. Though I would seldom find people in the corporate word eager to learn latex and going along with the documentation. 99% of the cases they would just ask me the Word version of it. For technical documentation I find less resistance, but still, whenever I start a project with someone not familiar with latex, the starting up is troublesome. That said, latex is a bit of an oversize tool for my needs really. My documents hardly go further from tables, lists, few images and type styles (although I'd love to still be able to produce hyperlinked PDFs). What are other tools there, simpler and with a easier learning curve than Latex, but still PDF worthy and with minimally decent capabilities? It also has to run on windows :( Oh. yeah, MSWord is not an option ;)

Read the article

cant make outbound calls - asterisk

- by deanvz

I have a basic Atcom IP01 with the following config Registered Voip (SIP) Trunk Registered Voip Phone - ext Dial Plan Outbound Call rule I made use of this manual that the manufacturer supplies: http://www.atcom.cn/cn/download/pbx/ip01/ATCOM%20IP01-User%20Manual-V1.0-EN.pdf Whenever I try and make a call, it seems that the outbound call rule that i defined does not get regarded as the default rule even though the dial plan lists this as the only outbound call rule. When dialling I see in the log file the following [Jan 1 09:10:07] NOTICE[176]: chan_sip.c:14377 handle_request_invite: Call from '6001' to extension '00765243679' rejected because extension not found. The 00765243679 is a cellular number. Am I missing a configuration in order to make outbound calls? Land line, other Voip numbers and cellular calls have been tried

Read the article

How to make a quiet laptop?

- by psihodelia

Most modern laptops have very noisy fans. I am looking for a quiet laptop or a small stationary computer which has all its hardware built in a display. Most tasks will be PDF/docs processing, real-time audio processing, web-surfing and Skype video chats. Certainly, there is no any fan-less model today; but maybe some of the existed laptops do not switch on their fans so often or implement different solutions? For example, an iPad has no fan at all and it is fast enough for my needs, but it has no normal operating system, so I can't use it for anything but audio chats and web-surfing. Or maybe I can buy a laptop and tweak it to make it absolutely noiseless? Can you recommend any solution please?

Read the article

Enable file download via redirect in IE7

- by Christian W

Our application enables our customers to download files to their computer. The way I have implemented it is using asp.net with a dropdown. When the user clicks the dropdown they get the choice of "PDF","Powerpoint", and a couple of other choices depending on circumstances. Then, in postback depending on the choice the user made, it will return a file (changing the content-header and such and then bitbanging a file to the user). This works perfectly in all browsers, but IE7 complains that this is a security risk and blocks the download. Is there any way for the users to authorize downloads from our webapplication?

Read the article

Error headers: ap_headers_output_filter() after putting cache header in htaccess file

- by Brad

Receiving error: [debug] mod_headers.c(663): headers: ap_headers_output_filter() after I included this within the htaccess file: # 6 DAYS <FilesMatch "\.(ico|pdf|flv|jpg|jpeg|png|gif|js|css|swf)$"> Header set Cache-Control "max-age=518400, public" </FilesMatch> # 2 DAYS <FilesMatch "\.(xml|txt)$"> Header set Cache-Control "max-age=172800, public, must-revalidate" </FilesMatch> # 2 HOURS <FilesMatch "\.(html|htm)$"> Header set Cache-Control "max-age=7200, must-revalidate" </FilesMatch> Any help is appreciated as to what I could do to fix this?

Read the article

Dir and Findstr commands taking a long time to complete in Batch File

- by user2405934

dir %DRIVE_NAME%: /S /C /A-D /Q /T:C | findstr ".zip$ .doc$ .xls$ .xpt$ .cpt$ .cpo$ .xlsx$ .pdf$ .dat$ .txt$ .docx$ .csv$" >> file.info I am using above command to list all information in file, as below: 03/27/2013 01:02 PM 86,280 uusr\fr02 h123_frf67_rk_20140327.txt 03/27/2013 01:02 PM 5,513 usr\fr02 h123_frf67_rk_20140328.txt %DRIVE_NAME%: is mapped drive. Folders will be the same; not more than 100 folders and their sub-folders, and there will only be 2 or 3 files at time in any one of the folders. Now the issues is that for one folder it works perfect, but for 80 to 90 folders it is taking too much time. I think it's because of findstr and the different extensions used. Is there any way to make it faster?

Read the article

Forcing Acrobat Reader font

- by Jack

Hello, I have a netbook with Linpus Linux and I'm trying to open automatically generated documents with Acrobat Reader that use Verdana but without having it embedded inside the PDF file. Linpus doesn't come natively with any Verdana font so I had to install them inside /usr/share/fonts/by doing mkfontdirand fc-cacheto force a recache of the fonts. Then I've been able to select it inside other programs (eg. OpenOffice) but I'm still unable to open these PDFs. It seems that Acrobat is unable to find the font anyway. Since I have no control on how these PDFs are generated, is there a way to force Acrobat to use a specific font is the one it needs is unfound? Or maybe Acrobat needs a different kind of font configuration on Linux? Thanks in advance

Read the article

Linux: disbale USB without disabling power

- by Ergot

TLDR I want toggle between the following usages of a usb-port via the terminal: use like a normal usb-port only supply energy to charge Story I recently got me something like a magna doodle that can save your drawings to pdf, which can be moved to your computer via usb afterwards. Now the thing is that you can't save anything while it's plugged in. Because it's the only way to charge it, it bugs me that I can't find a software solution and laziness I want to keep it plugged in and toggle the connection to the computer only when needed. I noticed that it's charging and usable when it is plugged in and the computer is shut down or suspened. So I guess that there's a way to do it. Tech info computer: ThinkPad X201 Linux Kernel: 3.14.5-1-ARCH "Magna doodle": Boogie Board Sync

Read the article

How SSD hard drive affected speed of your website (asp.net/linq/ms sql database)

- by Sergey Osypchuk

I have a small database (<1G) But we have a lot of complex logi? in website and client complains on render time, which is 3-5 seconds. We are not google, and thousands of users a day is our dream, so size is not a problem, but speed is important. Can anybody share with experience with SSD drives for ASP.NET (MVC)/LINQ/MS SQL based application ? How you performance increased? UPDATE: this whitepaper states that it will be 20 times faster. http://www.texmemsys.com/files/f000174.pdf

Read the article

How to diagnose RAM?

- by x-man

I have a java process that is aborted after a while with SIGSEGV. It started to happen after I upgraded the server with more RAM. Having tested on different JVMs I suspect it might be a hardware problem. But no problem was detected by memtest86. So, what else can I do to detect the source of the problem is? Should I take the RAM modules one by one to detect the faulty module? The server is running on 64bit OpenSuse11.3. The memory is not an ECC one it seems. I have a kit of this (3*4GB * 2 = 24GB): http://www.kingston.com/datasheets/KHX1600C9S3K2_8GX.pdf

Read the article

What are current options to scan or convert a hand written note to a file on my laptop?

- by goldenmean

I wonder how come there are not many options when it comes to scan or convert a device which could be connected to a laptop/desktop, which could - 1] Allow me to write with a digital pen on some special surface, which is connected to my laptop and thus converts my hand written notes to a pdf/jpg/word. (Microsoft's failed attempt at windows based tablet PC in past comes to mind, but not anymore) Any such solution I can use with my laptop? 2] A document scanning device, apart from a flat bed scanner, integrated these days into multi function printers; anything that is portable enough to connect to my laptop?

Read the article

REST-based file server

- by Chris Wenham

I need to be able to PUT files and GET them later using nothing but HTTP, so I went searching for something that might match the terms "REST file server" or "HTTP file server" or "REST drop-box", etc. Unfortunately, these terms bring up the wrong kind of results on Google. What I want is the equivalent of an SMB fileshare over HTTP. Some ideal features: Can PUT a file of any type at http://servername/service/any/path/I/want/document.pdf Anyone with access can GET that file at the URL I PUT it at Supports AV scanning on any new file that has been PUT Supports DELETE of existing resources (files) Our shop runs Windows, but I'd be interested to know about Unix software that can do this kind of thing, too. It's to be used in an IT department for private users only. It won't be on a public-facing IP address. Does anything like this exist?

Read the article

OS X Automator empty, blank or null value.

- by Brian

I have some data files mostly excel, word and pdf files most of the files have no extension on them. So they are missing the .doc .xls. This data needs to be used in a Windows environment now. I have created automator apps for each of the file types I want to add the ext onto. The problem is it also adds the extension to files that already have an extension. So data.xls becomes data.xls.xls I would like to figure a way to only add the extenion to the files without extension. How do I tell the finder filter that i only want it to return files without extensions. I see how to add a line to filter by extension but I don't know how to let it know I want only blank or null or files without any extensions. Thanks

Read the article

LPR command won't recognize CUPS printer

- by Datapimp23

I have a cups server with one shared printer configured on it. It prints test pages without problems. printername (Idle, Accepting Jobs, Shared) Description: desc Location: Driver: Zebra ZPL Label Printer (grayscale, 2-sided printing) Connection: socket://172.20.50.26 Defaults: job-sheets=none, none media=oe_w288h432_4x6in sides=one-sided This is the output from lpstat -t. it shows that the printer is idle and accepting requests admin@SERVER:~$ lpstat -t scheduler is running no system default destination device for printername: socket://172.20.50.26 printername accepting requests since Thu 26 Jan 2012 01:29:35 PM CET printer printername is idle. enabled since Thu 26 Jan 2012 01:29:35 PM CET Now when I want to send a printjob to it via an LPR command it won't recognize the printer /usr/bin/lpr -P printername test.pdf Result lpr: ttn_seg_zebra1: unknown printer What am I missing here ?

Read the article

Forcing Acrobat Reader font

- by Jack

I have a netbook with Linpus Linux and I'm trying to open automatically generated documents with Acrobat Reader that use Verdana but without having it embedded inside the PDF file. Linpus doesn't come natively with any Verdana font so I had to install them inside /usr/share/fonts/by doing mkfontdirand fc-cacheto force a recache of the fonts. Then I've been able to select it inside other programs (eg. OpenOffice) but I'm still unable to open these PDFs. It seems that Acrobat is unable to find the font anyway. Since I have no control on how these PDFs are generated, is there a way to force Acrobat to use a specific font is the one it needs is unfound? Or maybe Acrobat needs a different kind of font configuration on Linux? Thanks in advance

Read the article

Copy files with filter (XP)

- by fire

I have a huge folder (over 6GB) with multiple sub-folders that I want to copy onto an external hard drive, however I do not want it to copy any PDF, EXE or ZIP files across to save space. Is there any software that will help me achieve this? I have looked at TeraCopy but this doesn't seem to have any filter mechanism on it. I am using Windows XP (* sigh *). *edit: found the xcopy command, will this do it? Can anyone help me with the syntax?

Read the article

Virtual Network Printer

- by user113720

I'm pretty new to Microsoft Servers so don't blame me if the question isn't that smart [I'm a Unix guy]. I need to install a Virtual Printer of a Microsoft Server 2008 r2. The requirements are: The printer must print on a file {whatever file... txt or pdf } The printer must run on a server The printer must accept plaintext from a specific IP:port The connection between the device that prints and the server is a local network I've tried to install a virtual printer, but I cannot specify the constraint about the socket from which receive data to print. Thank you so much

Read the article

Cannot copy anything onto WD Elements 1TB External USB HDD

- by Aashish Vaghela

I have a Western Digital 1023 Elements 1TB External USB HDD. Recently, it has started an unusual problem. I cannot copy any file of any size on to that 1TB hard-drive, eventhough it has more than 400 GB free (out of 931GB actual size). I tried copying movies from one friends laptop, which did not work. I also tried another desktop to copy some study material e-books (in PDF), which also did not work. I get same CRC error when I try to copy anything from a computer's hard-drive onto this WD 1TB hard-drive. Vice-versa it's working. I mean, I can copy any file from the USB HDD onto local machine's HDD on any computer. It's like one-way traffic. This HDD is only 1 year old. What are my options ? Any suggestions ? Regards, Aashish.V

Read the article

Get an yerror plot without a line in Octave

- by queueoverflow

I'd like to print a plot with y-error-bars and just plain points. My current Octave script looks like this: errorbar(x_list, y_list, Delta_y_list, "~.x"); title("physikalisches Pendel"); xlabel("a^2 [m^2]"); ylabel("aT^2 [ms^2]"); print -dpdf plot.pdf The plot I get has a line, although I specified the .x style option: How can I get rid of that line? And the ylabel is in the scale as well, is there some way to fix that?

Read the article

What is the best free or low-cost Java reporting library (e.g. BIRT, JasperReports, etc.) for making

- by Max3000

I want to print, email and write to PDF very simple reports. The reports are basically a list of items, divided in various sections/columns. The sections are not necessarily identical. Think newspaper. I just wasted a solid 2 days of work trying to make this kind of reports using JasperReports. I find that Jasper is great for outputing "normalized" data. The kind that would come out of a database for instance, each row neatly describing an item and each item printed on a line. I'm simplifying a bit but that's the idea. However, given what I want to do I always ended up completely lost. Data not being displayed for no apparent reason, columns of texts never the correct size, column positioning always ending up incorrect, pagination not sanely possible (I was never able to figure it out; the FAQ gives an obscure workaround), etc. I came to the conclusion that Jasper is really not built to make the kind of reports I want. Am I missing something? I'm ready to pay for a tool, as long as the price is reasonable. By reasonable I mean a few $100s. Thanks. EDIT: To answer cetus, here is more information about the report I made in Jasper. What I want is something like this: text text text text ------------------- text | text text |---------- text | text text | text --------| text text |---------- text | text What I made in jasper is this: (detail band) subreport | subreport ------------------------------------ subreport | subreport ------------------------------------ subreport | subreport The subreports are all the same actual report. This report has one field (called "field") and basically just prints this field in a detail band. Hence, running a single subreport simply lists all items from the datasource. The datasource itself is a simple custom JRDatasource containing a collection of strings in the field "field". The datasource iterates over the collection until there are no more strings. Each subreport has its own datasource. I tried many different variations of the above, with all sorts of different properties for the report, subreports, etc. IMO, this is fairly simple stuff. However, the problems I encounter are as follows: Subreports starting from the 3rd don't show up when their position type is 'float'. They do show up when they have 'fix relative to top'. However, I don't want to do this because the first two subreports can be of any length. I can't make each subreport to stretch according to its own length. Instead, they either don't stretch at all (which is not desirable because they have different lenghts) or they stretch according to the longest subreport. This makes a weird layout for sure. Pagination doesn't happen. If some subreports fall outside the page, they simple don't show. One alternative is to increase the 'page height' considerably and the 'detail band height' accordingly. However, in this case it is not really possibly to know the total height in advance. So I'm stuck with calculating/guessing it myself, before the report is even generated. More importantly, long reports end up on one page and this is not acceptable (the printout text is too small, it's ugly/non-professional to have different reports with different PDF page lengths, etc.). BTW, I used iReport so it's possibly limitations of iReport I'm listing here and not of Jasper itself. That's one of the things I'm trying to find out asking this question here. One alternative would be to generate the jrxml myself with just static text but I'm afraid I'll encounter the very same limitations. Anyway, I just generally wasted so much time getting anything done with Jasper that I can't help thinking its not the right tool for the job. (Not to say that Jasper doesn't excel in what it's good at).

Read the article

Corosync - stopping the service crashes the server

- by Antipop

I am trying to set up a test cluster on a Xen Server with 2 paravirtualized CentOS 5.4 machines. I am using Pacemaker+Corosync, and following the instructions found at http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf and other sites. Anyway, when I try to manually stop the corosync service, about 80% of the times the whole VM locks up with the message "Waiting for corosync services to unload" and I am forced to shut the machine down manually. For the remaining 20%, the VM keeps responding and adds dots to the above message, but it won't actually stop the service. There aren't many resources on the internet about this particular error. Any ideas about this? Thanks in advance.

Search Results

Search found 4479 results on 180 pages for 'pdf scraping'.

Page 132/180 | < Previous Page | 128 129 130 131 132 133 134 135 136 137 138 139 | Next Page >

- by Gart

- by Aaron

- by easement

- by Koning Baard XIV

- by flpgdt

- by deanvz

- by psihodelia

- by Christian W

- by Brad

- by user2405934

- by Jack

- by Ergot

- by Sergey Osypchuk

- by x-man

- by goldenmean

- by Chris Wenham

- by Brian

- by Datapimp23

- by Jack

- by fire

- by user113720

- by Aashish Vaghela

- by queueoverflow

- by Max3000

- by Antipop

< Previous Page | 128 129 130 131 132 133 134 135 136 137 138 139 | Next Page >