Search Results

Search found 71496 results on 2860 pages for 'http content length'.

Page 78/2860 | < Previous Page | 74 75 76 77 78 79 80 81 82 83 84 85 | Next Page >

Designing a Content-Based ETL Process with .NET and SFDC

- by Patrick

As my firm makes the transition to using SFDC as our main operational system, we've spun together a couple of SFDC portals where we can post customer-specific documents to be viewed at will. As such, we've had the need for pseudo-ETL applications to be implemented that are able to extract metadata from the documents our analysts generate internally (most are industry-standard PDFs, XML, or MS Office formats) and place in networked "queue" folders. From there, our applications scoop of the queued documents and upload them to the appropriate SFDC CRM Content Library along with some select pieces of metadata. I've mostly used DbAmp to broker communication with SFDC (DbAmp is a Linked Server provider that allows you to use SQL conventions to interact with your SFDC Org data). I've been able to create [console] applications in C# that work pretty well, and they're usually structured something like this: static void Main() { // Load parameters from app.config. // Get documents from queue. var files = someInterface.GetFiles(someFilterOrRegexPattern); foreach (var file in files) { // Extract metadata from the file. // Validate some attributes of the file; add any validation errors to an in-memory // structure (e.g. List<ValidationErrors>). if (isValid) { var fileData = File.ReadAllBytes(file); // Upload using some wrapper for an ORM or DAL someInterface.Upload(fileData, meta.Param1, meta.Param2, ...); } else { // Bounce the file } } // Report any validation errors (via message bus or SMTP or some such). } And that's pretty much it. Most of the time I wrap all these operations in a "Worker" class that takes the needed interfaces as constructor parameters. This approach has worked reasonably well, but I just get this feeling in my gut that there's something awful about it and would love some feedback. Is writing an ETL process as a C# Console app a bad idea? I'm also wondering if there are some design patterns that would be useful in this scenario that I'm clearly overlooking. Thanks in advance!

Read the article
Should I implement slugs with my already fairly long URLs?

- by Earlz

I'm considering implementing slugs in my blog. My blog uses MongoDB. One of the side-effects of using MongoDB is that it uses relatively long hex string IDs. Example before: http://lastyearswishes.com/blog/view/5070f025d1f1a5760fdfafac after: http://lastyearswishes.com/blog/view/5070f025d1f1a5760fdfafac/improvements-on-barelymvc Of course, that's a relatively short title.. I have some longer ones, but intend to limit the maximum character limit for slugs to something reasonable. At what point does a URL become so long that it hurts SEO instead of improves it? In this case, should I leave my URLs alone, or add slugs?

Read the article
Noob-Friendly Guides to WSGI?

- by Johnny McKenzie

world! I have recently been delving into server-side code web development with python, and I have hit a brick wall; you see, I know little about server side code and HTTP (other than the v. basics with php shudder), and all of the docs for wsgi that I have found seem to be for people already well established in the field. Are there any n00b happy guides for server-side scripting (the theory of), or on wsgi out there. Http would be helpful, video tuts are also greatly appreciated. Thanks in advance.

Read the article
Wrong content for URL cache on Google

- by user32592

I have this website natural-track.com and when I do a cache check I get a completely different website,This is Google's cache of http://www.backpackers-planet.com/modules.php?name=Web_Links&l_op=visit&lid=3379 , unrelated to my site. I have checked with the host, they say all is well on their side. How can we fix it? The site also went off from Google Search. We are about to rebuild this site to a better professional platform but first we would like to have an idea of what happened and how to fix it.

Read the article
Canonicals with differing content

- by Jimbo Jonny

Interesting conundrum here with canonicals. Lets say I have a site with a "verified" system where other websites can become so and so "verified". Their url to send people to to confirm verification is something like "blah.com/verify/company1" and "blah.com/verify/company2". But logically "blah.com/verify" itself is not verifying anyone in particular, so it redirects to the signup form to get verified, at "blah.com/verify/register" As far as the actual companies registered, I figure it doesn't make sense to index every individual url with only the tiny difference of which company name it's saying yay or nay to being verified, so canonicals could come in handy on those pages to condense the indexing. Yet making "blah.com/verify" the canonical "hub" doesn't work well because it's a signup form, not a verification page, so technically has quite different content from the various verification pages themselves. But at the same time it's a bit unfair to choose 1 company to point all the canonical benefits too to use that as the "hub", yet a bit wasteful to have google index every individual verification page and spread out all that linkjuice. Basically, I'm just looking for advice, what's best for this from a search engine standpoint?

Read the article
Designing Content-Based ETL Process with .NET and SFDC

- by Patrick

As my firm makes the transition to using SFDC as our main operational system, we've spun together a couple of SFDC portals where we can post customer-specific documents to be viewed at will. As such, we've had the need for pseudo-ETL applications to be implemented that are able to extract metadata from the documents our analysts generate internally (most are industry-standard PDFs, XML, or MS Office formats) and place in networked "queue" folders. From there, our applications scoop of the queued documents and upload them to the appropriate SFDC CRM Content Library along with some select pieces of metadata. I've mostly used DbAmp to broker communication with SFDC (DbAmp is a Linked Server provider that allows you to use SQL conventions to interact with your SFDC Org data). I've been able to create [console] applications in C# that work pretty well, and they're usually structured something like this: static void Main() { // Load parameters from app.config. // Get documents from queue. var files = someInterface.GetFiles(someFilterOrRegexPattern); foreach (var file in files) { // Extract metadata from the file. // Validate some attributes of the file; add any validation errors to an in-memory // structure (e.g. List<ValidationErrors>). if (isValid) { // Upload using some wrapper for an ORM an someInterface.Upload(meta.Param1, meta.Param2, ...); } else { // Bounce the file } } // Report any validation errors (via message bus or SMTP or some such). } And that's pretty much it. Most of the time I wrap all these operations in a "Worker" class that takes the needed interfaces as constructor parameters. This approach has worked reasonably well, but I just get this feeling in my gut that there's something awful about it and would love some feedback. Is writing an ETL process as a C# Console app a bad idea? I'm also wondering if there are some design patterns that would be useful in this scenario that I'm clearly overlooking. Thanks in advance!

Read the article
What is recommended minimum object size for gzip benefits?

- by utt73

I'm working on improving page speed display times, and one of the methods is to gzip content from the webserver. Google recommends: Note that gzipping is only beneficial for larger resources. Due to the overhead and latency of compression and decompression, you should only gzip files above a certain size threshold; we recommend a minimum range between 150 and 1000 bytes. Gzipping files below 150 bytes can actually make them larger. We serve our content through Akamai, using their network for a proxy and CDN. What they've told me: Following up on your question regarding what is the minimum size Akamai will compress the requested object when sending it to the end user: The minimum size is 860 bytes. My reply: What is the reason(s) for why Akamai's minimum size is 860 bytes? And why, for example, is this not the case for files Akamai serves for facebook? (see below) Google recommends to gzip more agressively. And that seems appropriate on our site where the most frequent hits, by far, are AJAX calls that are <860 bytes. Akamai's response: The reasons 860 bytes is the minimum size for compression is twofold: (1) The overhead of compressing an object under 860 bytes outweighs performance gain. (2) Objects under 860 bytes can be transmitted via a single packet anyway, so there isn't a compelling reason to compress them. So I'm here for some fact checking. Is the 860 byte limit due to packet size the end of this reasoning? Why would high traffic sites push this lower/closer to the 150 byte limit... just to save on bandwidth costs, or is there a performance gain in doing so?

Read the article
Similar domains using my business' content, and stealing SEO results

- by Murciano

I've been hired to create a website for a restaurant in my city, let's call it "Flying Dragon" Chinese restaurant. The restaurant has never had a website, though the business itself is about ten years old. However, if you Google the restaurant's name, the first site that comes up seems to be affiliated with the restaurant itself, even though it is not. This site - let's say, flyingdragonchinese.com - is also the one that Google has apparently selected, in its results, to be the official website of the restaurant - in essence, the first Google result is flyingdragonchinese.com, and directly beneath it, within the same entry, are the Google reviews and contact information. Upon visiting flyingdragonchinese.com (again, not the actual name), I see that the website has taken the menu content from the restaurant, in the same manner that Yelp does, but it also seems (to the untrained eye) to be the restaurant's official site. Basically, someone has created a fake website for the business (I am not sure why) using its actual menu and contact information, and is hogging the search results. The concept is similar to a "scraping site" except that the information seems to have been stolen manually. The main problem is that visitors to this site will have an inaccurate impression of the restaurant. I feel like the obvious solution is to register a new domain for my site, and simply beat out this competitor (or whatever it is) with smarter SEO and business verification with Google. However, the Conan-the-Barbarian-web-designer part of me wants to somehow bash this other site (deservedly?) into oblivion. But I don't know what I can really do, besides maybe issuing a cease-and-desist letter, or trying to contact the web host for the site, although there is no contact information available on this "fake" site for the site owner. Has anyone ever experienced something like this? Is there any solution?

Read the article
VPS Server OS differences

- by silvercover

I have two VPS servers. one of them is running Linux and the other is Windows one. I've uploaded same file to their public_html folders and could see them in my browser via static IP address of each one like http://178.63.165.178/getorder/file.xml and http://178.63.165.178/getorder/file.xml. On the other side there is a device called SMSPrinter that configured to read those XML files using GPRS and need static IP address to reach destination server. unfortunately this device can only read file from windows server and could not reach the file on Linux server. There is no note in this device manual suggesting Windows server or specific OS! I've also set file permission on Linux server to 777 to have no limitation. what could be the cause of our problem? Thanks.

Read the article
RequestContextHolder.currentRequestAttributes() and accessing HTTP Session

- by Umesh Awasthi

Need to access HTTP session for fetching as well storing some information.I am using Spring-MVC for my application and i have 2 options here. User Request/ Session in my Controller method and do my work Use RequestContextHolde to access Session information. I am separating some calculation logic from Controller and want to access Session information in this new layer and for that i have 2 options Pass session or Request object to other method in other layer and perform my work. use RequestContextHolder.currentRequestAttributes() to access request/ session and perform my work. I am not sure which is right way to go? with second approach, i can see that method calling will be more clean and i need not to pass request/ session each time.

Read the article
Google crawler not found an error inside of the <head> tag

- by inckka

I've found a crawler error in my site and it is listed as a page not found(404) link. Heres the broken link http://mydomain.com/blog/comments/feed/ I'm using Google web master tools and found that broken link coming from my web site pages' head tag. here's actual code where that link situated. <head> <link rel="alternate" type="application/rss+xml" title="My Domain Blog » Feed" href="http://www.my-domain.com/blog/feed/" /> </head> So Google report this link as a not found. Actually this link target is not an exact page or a location. But essential for the blog feeds. Anyway I have to fix this and remove from the Google crawler error's list. But haven't got any idea, because cannot redirect or do a 404 header with this link target. Have anyone got an idea of fixing this?

Read the article
"X-Robots-Tag: noindex" on an HTTP 301 response

- by Peter O.

I understand that a resource with X-Robots-Tag: noindex forces some search engines, including Google, not to index the resource further. I also understand that an HTTP 301 response causes search engines to use the redirected URL instead of the original URL to refer to the resource. But what happens if both "X-Robots-Tag: noindex" and status code 301 occur on the same response? It's likely that the original URL will no longer be indexed, but will that cause the redirected URL to no longer be indexed too? This possibility is not mentioned in the X-Robots-Tag specification.

Read the article
Alternative to nofollow: custom 302 url shortener?

- by Dogweather

Here's the scenario: lots of blogging platforms make it tedious to insert nofollow into links within the post content. I.e., you need to edit the html, format it correctly, etc. I have a client who posts lots of content with links that should be nofollow'ed, and I thought of a novel way to handle this, since the blogging platform they're using makes it hard: I install a URL shortener web app on the client's domain. The shortener works as normal, except it redirects via 302 instead of 301. The pagerank will therefore stay at the shortener's domain, and not flow on to the target site. Part 2: In order to get the pagerank to collect meaningfully, say on the site's home page, the shortened URLs would be generated like this: /link?12345 instead of /link/12345. And then, the path /link would 301 to the home page. This way, the id is a param, not a path element. And thus, all the incoming shortened links are going to one path, which transfers pagerank to the home page. So that's my idea. I wanted to see if anybody could find problems with it. Thanks!

Read the article
HTTP 303 redirection and robots.txt

- by Ian Dickinson

On a site I'm working on, we're using the HTTP 303 redirect pattern (see this article for background) to distinguish between information and non-information resources. So: some URL's under /id get redirected to dynamically-created pages under /doc. These dynamic pages are built from a database, and contain links to other /doc/ resources, so in general we don't want them to be crawled. Our robots.txt contains: Disallow: /doc However, we do want the non-redirected pages under /id to get indexed by Google et al: Allow: /id So the question I have, which I can't find an answer to so far, is: if an allowed /id page 303-redirects to a /doc page, will it still be blocked by robots.txt? If yes, we're OK, but otherwise I'm going to disallow all /id resources in the robots file, as having the crawler hammer the db would be worse than losing search indexing for the /id pages.

Read the article
URL Rewrite http to https EXCEPT files in a specific subfolder

- by BrettRobi

I am trying to force all traffic on my web site to use HTTPS, using the URL Rewrite 2.0 module added to IIS 7.5. I got that working and now have a need to exclude a couple of pages from using SSL. So I need a rule to rewrite all URL except those referencing this folder to HTTPS. I've been banging my head against the wall on this and am hoping someone can help. I tried creating a rule to match all URL except those in a nossl subfolder as in this example: <rule name="HTTP to HTTPS redirect" enabled="true" stopProcessing="true"> <match url="(/nossl/.*)" negate="true" /> <conditions logicalGrouping="MatchAll" trackAllCaptures="false"> <add input="{HTTPS}" pattern="off" /> </conditions> <action type="Redirect" url="https://{HTTP_HOST}/{R:1}" redirectType="Found" /> </rule> But this doesn't work. Can anyone help?

Read the article
URL Rewrite http to https EXCEPT files in a specific subfolder

- by BrettRobi

I am trying to force all traffic on my web site to use HTTPS, using the URL Rewrite 2.0 module added to IIS 7.5. I got that working and now have a need to exclude a couple of pages from using SSL. So I need a rule to rewrite all URL except those referencing this folder to HTTPS. I've been banging my head against the wall on this and am hoping someone can help. I tried creating a rule to match all URL except those in a nossl subfolder as in this example: <rule name="HTTP to HTTPS redirect" enabled="true" stopProcessing="true"> <match url="(/nossl/.*)" negate="true" /> <conditions logicalGrouping="MatchAll" trackAllCaptures="false"> <add input="{HTTPS}" pattern="off" /> </conditions> <action type="Redirect" url="https://{HTTP_HOST}/{R:1}" redirectType="Found" /> </rule> But this doesn't work. Can anyone help?

Read the article
Authorization pop-up requested by http://localhost:51675 every time I run Firefox

- by user10711

Using Ubuntu 10.04. Whenever I run Firefox I get a pop up requesting authorisation. It says 'a user name and password are being requested by http://localhost:51675. The site says "server" I have tried all passwords I know and nothing is accepted. If I click 'cancel' it disappears but re-appears after about 5 minutes. This whole 'experience' is accompanied by a great deal of hard disc activity. Can anyone help with this?

Read the article
I used a 301 Permanent Redirect to a 3rd party site by mistake! Can I stop the redirection?

- by Dees

Oh Noes! I've been parking a domain name for a friend/client of mine on my hosting provider (Dreamhost, FWIW) for a while, and they eventually asked me to redirect their domain to a 3rd party website which is currently featuring some relevant promotional content. Once this period ends, we will probably go ahead and set up a proper website for the domain on my hosting account. I used Dreamhost's "redirect" hosting option in their domain configuration panel, not realizing that it would implement a 301 Permanent redirect, or what the implications were. Now it seems that for any client that has visited the site anytime recently, the 301 redirect is still cached/in effect, although I have changed the domain settings back to regular Dreamhost full site hosting. It seems that the only thing that can be done is to wait out the TTL/cache expiration for the redirect. I have no idea how long that might be, so I'm wondering if there is any good way to cache-bust the redirect or otherwise undo its long-term effects. I put a simple html meta refresh in the domain folder to replace the 301 to keep the intended functionality in place, but I'm still not able to access the domain's other content normally, even via FTP, etc. Isn't there anything I can do? Otherwise, how long does it take for a cached redirect to expire? It's gonna be a bummer if it's really permanent.

Read the article
How do I deal with content scrapers? [closed]

- by aem

Possible Duplicate: How to protect SHTML pages from crawlers/spiders/scrapers? My Heroku (Bamboo) app has been getting a bunch of hits from a scraper identifying itself as GSLFBot. Googling for that name produces various results of people who've concluded that it doesn't respect robots.txt (eg, http://www.0sw.com/archives/96). I'm considering updating my app to have a list of banned user-agents, and serving all requests from those user-agents a 400 or similar and adding GSLFBot to that list. Is that an effective technique, and if not what should I do instead? (As a side note, it seems weird to have an abusive scraper with a distinctive user-agent.)

Read the article
HTTP(S) based file server

- by Michael

I've got a server running Ubuntu 10.04. I've already gotten openssh for ssh and sftp on it. I've been looking for a web-based (http, or preferably https) file server, perhaps a web-front-end to an (S)FTP server, that allows access to a specific folder, and also allows uploads. It requires user authentication, preferably using PAM. This web-based solution is for users that are not allowed to use FTP software / browser extension and don't have flash / java browser plugins within their corporate environments. So far I have looked into: Webmin: Includes a file manager, however it uses Java, and I'm looking for a plugin-free implementation. Apache2: I was able to set up https and PAM authentication, but the barebone implementation doesn't include file upload (as far as I'm aware of). HFS: Haven't tried it out because it is for Windows/wine only, and I don't want to run it under wine.

Read the article
HTTP Session Invalidation in Servlet/GlassFish

- by reza_rahman

HTTP session invalidation is something most of us take for granted and don't think much about. However for security and performance sensitive applications it is helpful to have at least a basic understanding of how it works in Servlets. In a brief code centric blog post Servlet specification lead Shing Wai Chan introduces the APIs for session invalidation and explains how you can fine tune the underlying reaper thread for session invalidation when it is needed in GlassFish 4. Don't hesitate to post a question here if the blog is not clear, this is a relatively esoteric topic...

Read the article
SSL issue and redirects from https to http

- by Asghar

I have a site www.example.com for which i purchased SSL cert and installed. And it was working fine, I also have a subdomain with app.example.com which was not on SSL. Both www.example.com and app.example.com are on same IP address. At later we decided to put SSL only on app.frostbox.com and then i configured SSL with app.frostbox.com and it worked fine, Now the issue is that Google is indexing my site as https://www.example.com/ and when users hits the web , Invalid security warning is issued and when user allow security issue they are shown my app.example.com contents. Note: I have my SSL configuration files in /etc/httpd/conf.d/ssl.conf The contents of the ssl.conf are below. http://pastebin.com/GCWhpQJq NOTE: I tried solutions in .httaccess but none of those worked. Like redirecting 301 redirects etc

Read the article
PHP - Internal APIs/Libraries - What makes sense?

- by Mark Locker

I've been having a discussion lately with some colleagues about the best way to approach a new project, and thought it'd be interesting to get some external thoughts thrown into the mix. Basically, we're redeveloping a fairly large site (written in PHP) and have differing opinions on how the platform should be setup. Requirements: The platform will need to support multiple internal websites, as well as external (non-PHP) projects which at the moment consist of a mobile app and a toolbar. We have no plans/need in the foreseeable future to open up an API externally (for use in products other than our own). My opinion: We should have a library of well documented native model classes which can be shared between projects. These models will represent everything in our database and can take advantage of object orientated features such as inheritance, traits, magic methods, etc. etc. As well as employing ORM. We can then add an API layer on top of these models which can basically accept requests and route them to the appropriate methods, translating the response so that it can be used platform independently. This routing for each method can be setup as and when it's required. Their opinion: We should have a single HTTP API which is used by all projects (internal PHP ones or otherwise). My thoughts: To me, there are a number of issues with using the sole HTTP API approach: It will be very expensive performance wise. One page request will result in several additional http requests (which although local, are still ones that Apache will need to handle). You'll lose all of the best features PHP has for OO development. From simple inheritance, to employing the likes of ORM which can save you writing a lot of code. For internal projects, the actual process makes me cringe. To get a users name, for example, a request would go out of our box, over the LAN, back in, then run through a script which calls a method, JSON encodes the output and feeds that back. That would then need to be JSON decoded, and be presented as an array ready to use. Working with arrays, as appose to objects, makes me sad in a modern PHP framework. Their thoughts (and my responses): Having one method of doing thing keeps things simple. - You'd only do things differently if you were using a different language anyway. It will become robust. - Seeing as the API will run off the library of models, I think my option would be just as robust. What do you think? I'd be really interested to hear the thoughts of others on this, especially as opinions on both sides are not founded on any past experience.

Read the article
Significant number of non-HTTP requests hitting my site

- by Mark Westling

I'm seeing a significant number of non-HTTP requests hitting a site I just launched. They show up in the server (nginx) logs as non-ASCII and get rejected (correctly) with a 400 status. Here are some lines from the log: 95.132.198.189 - - [09/Jan/2011:13:53:30 -0500] "œ$A\x10õœ²É9J" 400 173 "-" "-" 79.100.145.126 - - [09/Jan/2011:13:57:42 -0500] "#§i²¸oYi á¹„\x13VJ—x·—œ\x04N \x1DÔvbÛè½\x10§¬\x1E0œ_^¼+\x09ÜÅ\x08DÌÃiJeT€¿æ]œr\x1EëîyIÐ/ßýúê5Ç¸" 400 173 "-" "-" 79.100.145.126 - - [09/Jan/2011:13:58:33 -0500] "¯Ú%ø=Œ›D@\x12î½‰¼\x1C†ÄÀe\x015mˆàd˜Û%pÛÿ" 400 173 "-" "-" What should I make of this? Is this some sort of scripted attack? Or could these be correct requests that have somehow been garbled? They're not affecting the performance of the site and I'm not seeing any other signs of attacks (e.g., no strange POSTs) so at this point I'm more curious than afraid.

Read the article
deny-uncovered-http-methods in Servlet 3.1

- by reza_rahman

Servlet 3.1 is a relatively minor release included in Java EE 7. However, the Java EE foundational API still contains some very important changes. One such set of features are the security enhancements done in Servlet 3.1 such as the new deny-uncovered-http-methods option. Servlet 3.1 co-spec lead Shing Wai Chan outlines the use case for the feature and shows you how to use it in a recent code example driven post. You can also check out the official specification yourself or try things out with the newly released Java EE 7 SDK.

Read the article

< Previous Page | 74 75 76 77 78 79 80 81 82 83 84 85 | Next Page >