You are here

نصيحة تقنية: problems in your feeds

Primary tabs

بعض المدونات بها مشاكل تمنعها من الظهور في مجمع المدونات، شرح طبيعة هذه المشاكل موضوع طويل يتطلب شرح يعني ايه web و html و xml و atom و rss و شرح لأهمية المعايير القياسية على الوب

يمكن أكتب حاجة مبسطة في الكلام ده كله بعدين المهم دلوقتي أنه كل اللي بيكتب في برنامج Microsoft Word و برنامج مشابه و بعدين يعمل copy and paste في مدونته بيخرب مدونته لأن مبرمجي ماكروسوفت ولاد وسخة مش فارق معاهم المعايير و مصممين يبوظوا الوب و يخرتئوها

اللي عاوز يعرف اذا كانت تلقيمته (feed) مطابقة للمعايير ولا لأ ممكن يستخدم feed validator

و دي مجرد عينة، هتلاقوا كل المشاكل من tags زي o:p و st:city و حاجات زي كده اختراعات من عند مايكروزفت

على قد ما المشكلة بسبب برامج مايكروسوفت برضه كان ممكن blogspot تحوشها عنكم كل أدوات التدوين المحترمة بتساعد الواحد يطلع كود مقبول و تفلتر البلاوي بتاعت مايكروسوفت

sample of error messages I get when trying to aggregate posts.

links point directly to feedvalidator reports

Comments

يعني انا كويس يا دكتوور؟

أنت كويس دلوقتي لكن لازم تلحق نفسك و تحول على drupal على طول و بالمرة تتعلم HTML و CSS

يعنى إزاى ننقل على durpal

نغير المدونه ولا هننقل المدونه كلها

يا ريت توضح شويه اكتر

النقل لنظام تاني لادارة المدونة يعني أنك تسيب blogspot خالص و تروح لخدمة تانية

النصيحة مش النقل لدروبال النصيحة النقل لwordpress أو Drupal

و Wordpress يمكن يكون أقرب و أسهل لناس كتير و blogsome بتقدم خدمة مجانية على wordpress

المشكلة الحقيقة طبعا هي نقل التدوينات القديمة و التعليقات، أنا شبه مجهز سكريبت ينقل بيانات blogspot لdrupal عشان خدمة drupal المجانية اللي بقدمها

لكن ده كان قبل الاستفتاء بأربع أيام طبعا من بعد الفتاء ما بقتش فاضي غير للمظاهرات و قلة الأدب

على العموم أنا هرجع تاني أشتغل على الموضوع ده قريب و المفروض تزنوا على ألف يعملكم سكريبت مشابه لwordpress

اشمعنى دروبال؟؟؟؟؟؟؟

ليه ما يكونش اي حاجة تانية ... مش ده نوع من انواع خلق معايير وده ضد المصادر المفتوحة؟؟؟

انا حتى شايف ان بالاختلاف تبقى معارضة حقيقية ... بس بالتوحيد يبقى كلام موجه منك يا ريس .... هههههه

فين دروبال في المقال ده أنا ماجبتش سيرته خالص و الكلام كان عن المعايير القياسية اللي ممكن أي حد يطبقها

أنا عملت رابط لwordpress بس

Just a small comment, if you don't know how to use word and why these tags are insterted, please shout your mouse and stop talking in things you dont know.

These tags are inserted in the html file to give you the chance to return the html file again into word file and use the same features you originally used to create the page, but when you are sure you need only the html code, you can use office html filter tool comes with office to give you the clean html code.

So being lazy to know about this must make you stop saying: "hey they dont know bla bla bla" Mr. Genius!!!!

you don't get it

it is a simple fact, this is not valid XML data

period.

by using these programs they got non valid html in their blogs and non valid ATOM XML data, it does not validate and since XML parsing software assumes strict adherence to standards (thats the whole point of XML really), it breaks everything for many parsers and hence many applications that try to process their Atom feeds.

as I clearly said, the problem is so common that a decent CMS now has features to strip these tags instead of assuming people will come up with clean code.

I don't care why microsoft did it, the fact remains it breaks the XML according to the standard.

and proper namespace encapsulation would allow them to do the same without breaking the XML code but thats beside the point.

I didn't say they don't know bla bla, I said they delibratly break standards which is even worse than claiming they are stupid, I'm actually claiming they are evil.

Beside, No one talked about saving except you, All what alaa talked about was copy & paste.

The fact that there's a feature to export clean HTML doesn't change the fact that you have junk in your clipboard when you copy from MS word.

Another point, If you want to preserve the features to reimport them you can use XML/XSLT or whatever, Not to break a standard like ths.

By the way, Iam sure 99% that you won't publish it, but it was just a msg for you to know that you dont know many things and just talking too much ;)

man I'm 99% sure it sucks to be you

you asshole whatever made you think I won't publish your fucking comment? where and when did I demonstrate a tendency to block certain speech?

in fact if you registered and logged in before posting the comment you wouldn't even need to wait for approval, the only reason we have comment moderation for non logged in users is to block spam (uses less resources than a spam filter).

I didn't make any claims about knowledge, I pointed to a feed validator where you can clearly see that some blog posts written in word break the XML and invalidate it due to non standard tags, there is no single mistake in this simple set of facts.

now go fuck yourself or something, it will be better use of your time.

Also in new versions of office like XP and 2003, it's not a tool any more, you can simply click File->Save As-> Filtered HTML and you'll get W3C CSS and HTML code, next time try to read first not to make your-self look like that ;)

it is nice to know that there is a way to get microsoft products to produce clean html, but that should not only be the default it should be the only option, anything else is plain stupidity.

and btw many in Egypt are still stuck at windows 1998

I don't need to read anything, I don't use M$ products and I don't plan to, I simply informed bloggers that by using M$ products to write for their blogs they break their HTML and XML, it is their job to find out a solution I proposed none, it wouldn't break anything if their blog was based on drupal for instance because it would strip away the rubbish.

and it is you and your fellow brain dead microsofties who look ridicolous when you tell people to follow an obscure step to get correct HTML.

It did that because Word is not intended mainly for doing web pages, it's word processing program if you know about the meaning of this word, saving into html page is extra feature in any word processing program, there's another program named by front page, this is made for making web pages and this generate clean html code by default.

This is a process known in software engineering by design (in case you know it well), you have a list of mandatory features and other of optional ones, you can't ever through an mandatory feature because someone on the earth wanna the optional feature to be number one because it's the feature he's using!!!!

wake up mon, HTML is a standard, when you say you'll output HTML you should output HTML, you don't want to output HTML don't, you want to output something similar call it something else.

but when something says "save as html" it should do what it says and save as fucking HTML which is a strict standard.

besides who said anything about saving, we are talking about copy and paste here, you prepare in word, you copy, you paste in IE and you get these funky NON STANDARD tags that simply invalidate your website and XML feed.

babbling about Software engineering doesn't change anything, HTML is HTML, what word delivers in these cases is not HTML.

now you want to do something useful go fix the bloody product or give people advice on how to avoid it's shortcomings.

and in my world a feature that exists should work properly, if it doesn't then it is natural that people will criticize it, but you are being misleading anyway, we are not talking about a bug here, we are talking about a deliberate design decission, someone up there made a clear decission to ignore the standards and the end user, the internet community at large and non M$ developers suffer because of it.

MESSAGE TO THE WORLD, THIS IS A M$ EMPLOYEE WHO IS ACTUALLY TRYING TO CONVINCE US THAT IT MAKES SENSE FOR THEM TO DELIBERATELY BREAK AND IGNORE INTERNET STANDARDS

Micro$oft is well-known for being ignorant to standards. They have no problem to spread a product that claims to use standards, yet it actually fucks up the standards and causing damage to the internet if this will ease their own life, this no bloody software engineering. I call it a failure in software engineering as they were not able to probably engineer their product.

When they say HTML, it's not really proper HTML.

When they say XML, it's not really proper XML.

If you must use a word processor, then I advice you to choose one that supports OpenDocument, it's the standard format for office applications, it's free, open to the public, and it's XML, which means your web applications (wordpress, drupal ..etc) will like it.

You can get a list of office applications that support open document here.

And so you can have all the facts and not to be fooled by Micro$oft, it's said that the new office 12 will use XML-based formats for its documents. But please, before taking any decision, go over google or that comparison or here to get some facts about that so called XML.

By the way, I believe that someone speeking in this way with this dirty words, can't do nothing in his life except for blaming everyone except his life, and really has no role except talking too much

good now go check your beliefs against reality and see if they hold. then go tell it to someone who cares.

Wowww, you've created a great fear inside me now, Ohhh my God, please save my soul!!

what fear? were did I try to intimidate or threaten you? where did I claim I could save your soul or that your soul needs saving?

do you even know what you are talking about?

what do you think you are achieving by commenting here exactly?

Nothing, just wanna you to know couple of things:

1- I don't care ever for you and haven't thought ever to convence you with some thing because you mean nothing for me.

2- Before saying any thing, please take a look on any software engineering book and dont try to talk in systems you dont know, in other words, being the guru of html and css and etc, doesn't mean that you're the guru in other fields, and it seems you know nothing about software engineering.

That's all and that's my final comment, enjoy your great life!

so if you don't care about me, if you don't plan to convince me why are you posting here?

I never claimed to be a guru in anything, and I'll repeat adherence to standards when it comes to shared spaces like the web are not an option, I can't see how a book in Software Engineering will convince me that it is ok to break web standards and cause grief to end users, the internet community and third party developers.

you are not making any coherent arguments, there is simply no excuse for delibratly breaking standards and still insisting on calling the result HTML.

now we are repeating our selves here, I told you we are talking about copy and paste, I told you when something is called HTML it should be HTML, according to your own words the nonstandard tags where a deliberate design decision, I told you that breaking a standard is not execusable and I demonstrated how it makes things difficult for eveyrone involved, instead of responding to any of this you just talk about software engineering, fear, gurus and other rubbish which has no bearing on the topic.

which is why I asked you what exactly are you trying to achieve? if you are trying to help people because you think what I said was incorrect or misleading then you gotta respond to my counter arguments, if you are trying to convince me then you'd better start acting rationaly.

if it is something else, just tell me what it is so we can stop wasting time.

The main problem for you all that you don't know what you're talking about, let's say why:

1- Microsoft Word IS NOT DESIGNED TO OUTPUT HTML AT ALL, IT'S WORD PROCESSING APPLICATION, and I'll tell you what I mean, If I do a new design for new programming language and it's C-like syntax, and I put new objects inside it and so no C/C++ compiler can run it, then you can't NEVER EVER tell me that Iam not following the standerds, simply because IT'S A NEW LANGUAGE, as example JAVA, it's C++-like syntax, can you claim that JAVA IS NOT FOLLOWING THE STANDERDS?!!!!! Word is the same, it's using HTML like syntax, in which LOT OF NEW TAGS are used out there, THEY HAVE USED HTML LIKE SYNTAX for the same reason the JAVA LANGUAGE DESIGNER HAS USED C-LIKE SYNTAX.

2- Due to reason 1, and due to that IT'S WORD PROCESSING APPLICATION, these new tags are needed because normal HTML is not enough to handle word processing application needs, by the way THERE'S AN APPLICATION NAMED BY OPEN WORD, IT'S DEVELOPED BY SUN UNDER OLD NAME OF OPEN OFFICE, IT'S FOLLOWING THE SAME STANDERDS BECAUSE THIS IS A SIMPLE A,B,C SOFTWARE ENGINEERING.

3- There's another application in the office named by FRONT PAGE, designed TO OUTPUT HTML.

I HOPE AGAIN TO READ MORE ABOUT SOFTWARE ENGINEERING AND SOFTWARE DESIGN, FOR ME I CAN'T SAY THAT BOEING IS BAD BEFORE I KNOW WHAT'S THE PLANE AND HOW IT'S EVEN FLYING!!!

will you stop repeating yourself.

your argument makes no sense. if a wordprocessor is not supposed to output html then it should not attempt to do so, if it does then it should do it properly, you want to invent a new format give it another name and don't use it to paste inside a webbrowser, these people are just doing copy and paste.

and this is the crux of the problem, if you check any javascript rich editor widget like TinyMCE or HTMLArea you'll find they have a special button to clean up after M$ word.

the reality of the webpublishing world is that way too many users will type in word then copy and paste into the browser no matter what you tell them, and this will either break websites or make the job of the webdeveloper much more difficult.

your answer as an fucking M$ employee is that the user is at fault, the user is stupid and using a wordprocessing application for something else, the user should stop using that and start using this other application (which will still output bad html).

complete rubbish. it doesn't take much to realize that office suites have to be web-ready these days, we are not in 1992 when no one had an internet connection.

but the thing you don't want to see is that word is broken delibrately, M$ breaks standards and builds products that integrate based on their broken data to give the impression that users can never seek a solution from any other vendor.

copy and paste from open office does not introduce non standard html.

and frontpage does not output standard html either.

and you are a moron, I'm glad people like you work in M$, might lead to is't demise slghtly earlier.

No, Iam just saying that you're a fucken jerk who still don't know what I mean till now, just mind-less jerk who has nothing to do in life more than complaining, enjoy your wine, your funcking life ;)

For the second time, still the same opinion, just a jerk has nothing to do except drinking wine and making sex and blaming the whole world ;)

By the way, Open Office will generate plain text if you copy from it and paste in the blogger, if this your idea about standerds, then again, nothing but a dummy jerk ;)

did you actually install OpenOffice and test?

I just did a test with open office version 1.1.5 and version 2.0 using the blogspot rich text widget, htmlarea and tinymce with Mozilla-firefox 1.5 as my browser. on a GNU/Linux machine of course.

and it worked flawlessy producing reasonably good html.

so either something in your setup changes OpenOffice behavior, or you are just spreading FUD (more likely).

besides yes if you paste plain text and hit submit the result will be fairly standard html that will not break other software.

now I'm not sure how the clipboard in M$ land works, but here on linux when you paste the two apps communicate and agree on the most suiteable mime format to use for pasting content.

if your argument is correct (it isn't cause the tags M$ adds are completley useless) all word needs to do is switch to proper html when it detects that the end application is a webbrowser or something that expect html.

now go fuck yourself.