Search Results

Search found 282 results on 12 pages for 'extraction'.

Page 3/12 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

[cli/linux] get plain text from raw emails (not attachments extraction)

- by etuardu

Hi, having a raw email as input (i.e. the text between "DATA" and "." sent by a smtp client) I need to extract the mail content (which I know is always text only) as plain text. This means decoding transfer encoding (if any: could be base64 or quoted-printable), merging mutiparts (if any), and stripping headers. I tried various tools that would do that: mewdecode, uudecode, uudeview... I only managed to get this last one to work, but it won't output anything if the mail is not MIME encoded and it stores its output in an unpredictable (nor forceable) filename, so it's hard to use it in a not-interactive shell script. Since this is a pretty common job (every mail client have to do that), it's weird it's so complicated. Do you have some hints? (Actually, forcing uudeview to output in a certain file would be good enough). Thank you!

Read the article
Media Information Extractor for Java

- by eyazici

I need a media information extraction library (pure Java or JNI wrapper) that can handle common media formats. I primarily use it for video files and I need at least these information: Video length (Runtime) Video bitrate Video framerate Video format and codec Video size (width X height) Audio channels Audio format Audio bitrate and sampling rate There are several libraries and tools around but I couldn't find for Java.

Read the article
Transferring Music from iPod to iTunes?

- by Tio

Is there a way for me to transfer music from my iPod to my iTunes? Say, if I accidentally deleted some music on my computer but I have the files on my iPod?

Read the article
opening offline sync files from a .CAB file

- by Rob

OK, I have downloaded from Windows Live Spaces (don't know if this is useful, but might be) a .CAB file containing an Index.XML file and package.cab, package01.cab through to package12.cab. The index.XML simply has names of all the subsequent package.cab files and their offsets. The first package.cab has a single 26MB XML file which appears to be an OfflineSyncFile definition which I am guessing is the meta data for all the other packageXX.cab files. Now the question I have is how should i be going about extracting these things and piecing it all back together again. I have tried WinRAR, which extracts all 800MB for me into unnamed files and randomly named directories. I have also tried the standard extract in Windows Explorer with much the same resusts.

Read the article
Extracting information from active directory

- by Nop at NaDa

I work in the IT support department of a branch of a huge company. I have to take care of a database with all the users, computers, etc. I'm trying to find a way to automatically update the database as much as possible, but the IT infrastructure guys doesn't give me enough privileges to use Active Directory in order to dump the users, nor they have the time to give me the information that I need. Some days ago I found Active Directory explorer from Sysinternals that allows me to browse through Active Directory, and I found all the information that I need there (username, real name, date when it was created, privileges, company, etc.). Unfortunately I'm unable to export the data to a human readable format. I'm just able to take a snapshot of the whole database in a machine-readable format. Doing the snapshot takes hours and I'm afraid that the infrastructure guys won't like me doing entire snapshots on a regular basis. Do you know of any tool (command-line is preferable) that would allow me to retrieve the values of the keys or export it to XML, CSV, etc?

Read the article
Using chilkat to extract RAR files with progress bar?

- by Dodi300

Hello. Does anyone know how to show the progress of archives extracting, when using chilkat? I already have a progress bar called "progressBar1" on my form. At the moment the whole program freezes when extraction is started. Maybe have another thread? I'm using this code: Chilkat.Rar rar = new Chilkat.Rar(); bool success; success = rar.Open("abc123.rar"); if (success != true) { MessageBox.Show(rar.LastErrorText); return; } success = rar.Unrar("c:/temp/unrarDest/"); if (success != true) { MessageBox.Show(rar.LastErrorText); } else { MessageBox.Show("Success."); } If anyone has any alternative ways to extract .rar files, it would be great to know. Thanks.

Read the article
ubuntu 12.04 installation problem on windows 7 64bit

- by zakariya

06-26 20:57 ERROR TaskList: Extraction failed with code: 2 Traceback (most recent call last): File "\lib\wubi\backends\common\tasklist.py", line 197, in __call__ File "\lib\wubi\backends\win32\backend.py", line 450, in extract_diskimage Exception: Extraction failed with code: 2 06-26 20:57 DEBUG TaskList: # Cancelling tasklist 06-26 20:57 DEBUG TaskList: # Finished tasklist 06-26 20:57 ERROR root: Extraction failed with code: 2 Traceback (most recent call last): File "\lib\wubi\application.py", line 58, in run File "\lib\wubi\application.py", line 132, in select_task File "\lib\wubi\application.py", line 158, in run_installer File "\lib\wubi\backends\common\tasklist.py", line 197, in __call__ File "\lib\wubi\backends\win32\backend.py", line 450, in extract_diskimage Exception: Extraction failed with code: 2

Read the article
Extract news links from news website

- by Ali

Is there any reliable method to find out the collection of links which is directed us to detail news page. in other word after visiting the first page of website I just want those links that refer to a news item. any solution ?

Read the article
Get the rendered text from HTML (Delphi)

- by Daisetsu

I have some HTML and I need to extract the actual written text from the page. So far I have tried using a web browser and rendering the page, then going to the document property and grabbing the text. This works, but only where the browser is supported (IE com object). The problem is I want this to be able to run under wine also, so I need a solution that doesn't use IE COM. There must be a programatic way to do this that is reasonable.

Read the article
What is the best way to parse html in C#?

- by Luke

I'm looking for a library/method to parse an html file with more html specific features than generic xml parsing libraries.

Read the article
Getting BeautifulSoup to find a specific <p>

- by Ryan

I'm trying to put together a basic HTML scraper for a variety of scientific journal websites, specifically trying to get the abstract or introductory paragraph. The current journal I'm working on is Nature, and the article I've been using as my sample can be seen at http://www.nature.com/nature/journal/v463/n7284/abs/nature08715.html. I can't get the abstract out of that page, however. I'm searching for everything between the <p class="lead">...</p> tags, but I can't seem to figure out how to isolate them. I thought it would be something simple like from BeautifulSoup import BeautifulSoup import re import urllib2 address="http://www.nature.com/nature/journal/v463/n7284/full/nature08715.html" html = urllib2.urlopen(address).read() soup = BeautifulSoup(html) abstract = soup.find('p', attrs={'class' : 'lead'}) print abstract Using Python 2.5, BeautifulSoup 3.0.8, running this returns 'None'. I have no option of using anything else that needs to be compiled/installed (like lxml). Is BeautifulSoup confused, or am I?

Read the article
How do you parse an HTML in vb.net

- by tooleb

I would like to know if there is a simple way to parse HTML in vb.net. I know that HTML is not sctrict subset of XML, but it would be nice if it could be treated that way. Is there anything out there that would let me parse HTML in an XML-like way in VB.net?

Read the article
need help working with the Jericho Html Parser

- by rookie

Hi all I've simply used the following program on the url below http://jericho.htmlparser.net/samples/console/src/ExtractText.java My goal is to be able to extract the main body text, to be able to summarize it and present the summarized text as output to the user. My problem is that, I'm not sure how I'd modify the above program to only get the required text from the webpage, without the links or any other information. Again, I'd really appreciate any help I could get. Thanks in advance

Read the article
looking for alternative to Webzinc .NET , screen scraping, web automation library for .net

- by gpow

i came across this .net library http://www.webzinc.com/online/faq.aspx however, i was wondering if there was a free alternative out there ?

Read the article
Extracting information from PDFs of research papers

- by Christopher Gutteridge

I need a mechanism for extracting bibliographic metadata from PDF documents, to save people entering it by hand or cut-and-pasting it. At the very least, the title and abstract. The list of authors and their affiliations would be good. Extracting out the references would be amazing. Ideally this would be an open source solution. The problem is that not all PDF's encode the text, and many which do fail to preserve the logical order of the text, so just doing pdf2text gives you line 1 of column 1, line 1 of column 2, line 2 of column 1 etc. I know there's a lot of libraries. It's identifying the abstract, title authors etc. on the document that I need to solve. This is never going to be possible every time, but 80% would save a lot of human effort.

Read the article
parsing HTML on the iPhone

- by Ben Alpert

Can anyone recommend a C or Objective-C library for HTML parsing? It needs to handle messy HTML code that won't quite validate. Does such a library exist, or am I better off just trying to use regular expressions?

Read the article
How to extract data from a PDF?

- by Fermin

Hi, My company receives data from an external company via Excel. We export this into SQL Server to run reports on the data. They are now changing to PDF format, is there a way to reliably port the data from the PDF and insert it into our SQL Server 2008 database? Would this require writing an app or is there an automated way of doing this?

Read the article
How do I extract HTML content using Regex in PHP

- by gAMBOOKa

I know, i know... regex is not the best way to extract HTML text. But I need to extract article text from a lot of pages, I can store regexes in the database for each website. I'm not sure how XML parsers would work with multiple websites. You'd need a separate function for each website. In any case, I don't know much about regexes, so bear with me. I've got an HTML page in a format similar to this <html> <head>...</head> <body> <div class=nav>...</div><p id="someshit" /> <div class=body>....</div> <div class=footer>...</div> </body> I need to extract the contents of the body class container. I tried this. $pattern = "/<div class=\"body\">\(.*?\)<\/div>/sui" $text = $htmlPageAsIs; if (preg_match($pattern, $text, $matches)) echo "MATCHED!"; else echo "Sorry gambooka, but your text is in another castle."; What am I doing wrong? My text ends up in another castle.

Read the article
Extracting Window Contents

- by user293392

I need to extract window content if this is based on text, or at least the file path associated to that window. To-date, I have considered: 1. win32api 2. 3rd party libraries 3. wrapper classes However, I am not satisfied with the solutions. So any ideas how this can be done in a clean way?

Read the article
How can I read from an std::istream (using operator>>)?

- by dehmann

How can I read from an std::istream using operator>>? I tried the following: void foo(const std::istream& in) { std::string tmp; while(in >> tmp) { std::cout << tmp; } } But it gives an error: error: no match for 'operator>>' in 'in >> tmp'

Read the article
Extract part of a git repository?

- by Riobard

Assume my git repository has the following structure: /.git /Project /Project/SubProject-0 /Project/SubProject-1 /Project/SubProject-2 and the repository has quite some commits. Now one of the subprojects (SubProject-0) grows pretty big, and I want to take SubProject-0 out and set it up as a standalone project. Is it possible to extract all the commit history involving SubProject-0 from the parent git repository and move it to a new one?

Read the article
Access Adobe InDesign files

- by PeterMmm

I need some directions for the following problem: I have a lot of InDesign files and i have to setup a process that will track if a certain paragraph or text block has changed between diferent versions of the file. If the text block has changed i want to extract that text block in a "portable" format (html, pdf, txt). Is there an Adobe product that would do that ? Is there any public API to access an InDesign file ? Is there the posibility to export InDesign to, say, html ?

Read the article
How to extract common / significant phrases from a series of text entries

- by arronsky

I have a series of text items- raw HTML from a MYSQL database. I want to find the most common phrases in these entries (not the single most common phrase, and ideally, not enforcing word-for-word matching). My example is any review on Yelp.com, that shows 3 snippets from hundreds of reviews of a given restaurant, in the format: "Try the hamburger" (in 44 reviews) e.g., the "Review Highlights" section of this page: http://www.yelp.com/biz/sushi-gen-los-angeles/ I have NLTK installed and I've played around with it a bit, but am honestly overwhelmed by the options. This seems like a rather common problem and I haven't been able to find a straightforward solution by searching here. Thanks in advance for any help.

Read the article
Is there anything for Python that is like readability.js?

- by Emre Sevinç

Hi, I'm looking for a package / module / function etc. that is approximately the Python equivalent of Arc90's readability.js http://lab.arc90.com/experiments/readability http://lab.arc90.com/experiments/readability/js/readability.js so that I can give it some input.html and the result is cleaned up version of that html page's "main text". I want this so that I can use it on the server-side (unlike the JS version that runs only on browser side). Any ideas? PS: I have tried Rhino + env.js and that combination works but the performance is unacceptable it takes minutes to clean up most of the html content :( (still couldn't find why there is such a big performance difference).

Read the article
Extract strings in python

- by shadyabhi

Basically, I want to extract the strings "AAA", "BBB", "CCC", "DDD" from a text file.. ...... (other text goes here)..... <TD align="left" class=texttd><font class='textfont'>AAA</font></TD> ..... (useless text here)..... <TD align="left" class=texttd><font class='textfont'>BBB</font></TD> ....(more text)..... <TD align="left" class=texttd><font class='textfont'>CCC</font></TD> <TD align="left" class=texttd><font class='textfont'>DDD</font></TD> ......(more text)..... I want something like if I do:- data = foo("file.txt") i get:- data = ['AAA','BBB','CCC','DDD'] What is the best possible way? My file is not big..

Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >