Robots.txt

Robots.txt

robots.txt is the filename used for implementing the Robots Exclusion Protocol, a standard used by websites to indicate to visiting web crawlers and other web robots which portions of the website they are allowed to visit. The standard, developed in 1994, relies on voluntary compliance. Malicious bots can use the file as a directory of which pages to visit, though standards bodies discourage countering this with security through obscurity. Some archival sites ignore robots.txt. The standard was used in the 1990s to mitigate server overload. In the 2020s, websites began denying bots that collect information for generative artificial intelligence. The "robots.txt" file can be used in conjunction with sitemaps, another robot inclusion standard for websites.

Search engine optimization

Search engine optimization (SEO) is the process of improving the quality and quantity of website traffic to a website or a web page from search engines. SEO targets unpaid search traffic (usually referred to as "organic" results) rather than direct traffic, referral traffic, social media traffic, or paid traffic. Unpaid search engine traffic may originate from a variety of kinds of searches, including image search, video search, academic search, news search, and industry-specific vertical search engines. As an Internet marketing strategy, SEO considers how search engines work, the computer-programmed algorithms that dictate search engine results, what people search for, the actual search queries or keywords typed into search engines, and which search engines are preferred by a target audience. SEO is performed because a website will receive more visitors from a search engine when websites rank higher within a search engine results page (SERP), with the aim of either converting the visitors or building brand awareness.

XML pipeline

In software, an XML pipeline is formed when XML (Extensible Markup Language) processes, especially XML transformations and XML validations, are connected. For instance, given two transformations T1 and T2, the two can be connected so that an input XML document is transformed by T1 and then the output of T1 is fed as input document to T2. Simple pipelines like the one described above are called linear; a single input document always goes through the same sequence of transformations to produce a single output document.

OpenCms

OpenCms

OpenCms is an open-source content management system written in Java. It is distributed by Alkacon Software under the LGPL license. OpenCms requires a JSP Servlet container such as Apache Tomcat. It is a CMS application with a browser-based work environment, asset management, user management, workflow management, a WYSIWYG editor, internationalization support, content versioning, and many more features including proxying of requests to another endpoint. OpenCms was launched in 1999, based on its closed-source predecessor MhtCms. The first open source version was released in 2000. OpenCms is used or has been used by large organizations such as WIPO, the LGT Group, the University of Stuttgart, the Archdiocese of Cologne, or the Chicago Mercantile Exchange.

Web development

Web development is the work involved in developing a website for the Internet (World Wide Web) or an intranet (a private network). Web development can range from developing a simple single static page of plain text to complex web applications, electronic businesses, and social network services. A more comprehensive list of tasks to which Web development commonly refers, may include Web engineering, Web design, Web content development, client liaison, client-side/server-side scripting, Web server and network security configuration, and e-commerce development. Among Web professionals, "Web development" usually refers to the main non-design aspects of building Web sites: writing markup and coding. Web development may use content management systems (CMS) to make content changes easier and available with basic technical skills. For larger organizations and businesses, Web development teams can consist of hundreds of people (Web developers) and follow standard methods like Agile methodologies while developing Web sites. Smaller organizations may only require a single permanent or contracting developer, or secondary assignment to related job positions such as a graphic designer or information systems technician. Web development may be a collaborative effort between departments rather than the domain of a designated department. There are three kinds of Web developer specialization: front-end developer, back-end developer, and full-stack developer. Front-end developers are responsible for behavior and visuals that run in the user browser, while back-end developers deal with the servers. Since the commercialization of the Web, the industry has boomed and has become one of the most used technologies ever.

Apache Cocoon

Apache Cocoon, usually abbreviated as Cocoon, is a web application framework built around the concepts of Pipeline, separation of concerns, and component-based web development. The framework focuses on XML and XSLT publishing and is built using the Java programming language. Cocoon's use of XML is intended to improve compatibility of publishing formats, such as HTML and PDF. The content management systems Apache Lenya and Daisy have been created on top of the framework. Cocoon is also commonly used as a data warehousing ETL tool or as middleware for transporting data between systems.

Open Archives Initiative Protocol for Metadata Harvesting

The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a protocol developed for harvesting metadata descriptions of records in an archive so that services can be built using metadata from many archives. An implementation of OAI-PMH must support representing metadata in Dublin Core, but may also support additional representations. The protocol is usually just referred to as the OAI Protocol. OAI-PMH uses XML over HTTP. Version 2.0 of the protocol was released in 2002; the document was last updated in 2015. It has a Creative Commons license BY-SA.

Keyhole Markup Language

Keyhole Markup Language (KML) is an XML notation for expressing geographic annotation and visualization within two-dimensional maps and three-dimensional Earth browsers. KML was developed for use with Google Earth, which was originally named Keyhole Earth Viewer. It was created by Keyhole, Inc, which was acquired by Google in 2004. KML became an international standard of the Open Geospatial Consortium in 2008. Google Earth was the first program able to view and graphically edit KML files, but KML support is now available in many GIS software applications, such as Marble, QGIS, and ArcGIS.

Site map

A sitemap is a list of pages of a web site within a domain. There are three primary kinds of sitemap: Sitemaps used during the planning of a website by its designers Human-visible listings, typically hierarchical, of the pages on a site Structured listings intended for web crawlers such as search engines

Sitemaps

Sitemaps is a protocol in XML format meant for a webmaster to inform search engines about URLs on a website that are available for web crawling. It allows webmasters to include additional information about each URL: when it was last updated, how often it changes, and how important it is in relation to other URLs of the site. This allows search engines to crawl the site more efficiently and to find URLs that may be isolated from the rest of the site's content. The Sitemaps protocol is a URL inclusion protocol and complements robots.txt, a URL exclusion protocol.

Google Base

Google Base

Google Base was a database provided by Google which allowed users to add content such as text, images, and structured information in formats such as XML, PDF, Excel, RTF, or WordPerfect. Google Base was launched in 2005 and downgraded to Google Merchant Center in September 2010. If Google found user-added content relevant, submitted content appeared on its shopping search engine, Google Maps or even the web search. The piece of content could then be labeled with attributes like the ingredients for a recipe or the camera model for stock photography. Because information about the service was leaked before public release, it generated much interest in the information technology community prior to release. Google subsequently responded on their blog with an official statement: "You may have seen stories today reporting on a new product that we're testing, and speculating about our plans. Here's what's really going on. We are testing a new way for content owners to submit their content to Google, which we hope will complement existing methods such as our web crawl and Google Sitemaps. We think it's an exciting product, and we'll let you know when there's more news." Files could be uploaded to the Google Base servers by browsing your computer or the web, by various FTP methods, or by API coding. Online tools were provided to view the number of downloads of the user's files, and other performance measures. On December 17, 2010, it was announced that Google Base's API is deprecated in favor of a set of new APIs known as Google Shopping APIs.

Resources of a Resource

Resources of a Resource (ROR) is an XML format for describing the content of an internet resource or website in a generic fashion so this content can be better understood by search engines, spiders, web applications, etc. The ROR format provides several pre-defined terms for describing objects like sitemaps, products, events, reviews, jobs, classifieds, etc. The format can be extended with custom terms. RORweb.com is the official website of ROR; the ROR format was created by AddMe.com as a way to help search engines better understand content and meaning. Similar concepts, like Google Sitemaps and Google Base, have also been developed since the introduction of the ROR format. ROR objects are placed in an ROR feed called ror.xml. This file is typically located in the root directory of the resource or website it describes. When a search engine like Google or Yahoo searches the web to determine how to categorize content, the ROR feed allows the search engines "spider" to quickly identify all the content and attributes of the website. This has three main benefits: It allows the spider to correctly categorize the content of the website into its engine. It allows the spider to extract very detailed information about the objects on a website (sitemaps, products, events, reviews, jobs, classifieds, etc.) It allows the website owner to optimize his site for inclusion of its content into the search engines.

Google Data Protocol

GData (Google Data Protocol) provides a simple protocol for reading and writing data on the Internet, designed by Google. GData combines common XML-based syndication formats (Atom and RSS) with a feed-publishing system based on the Atom Publishing Protocol, plus some extensions for handling queries. It relies on XML or JSON as a data format. According to the Google Developers portal, "The Google Data Protocol is a REST-inspired technology for reading, writing, and modifying information on the web. It is used in some older Google APIs." However, "Most Google APIs are not Google Data APIs." Google provides GData client libraries for Java, JavaScript, .NET, PHP, Python, and Objective-C.

Bing Webmaster Tools

Bing Webmaster Tools

Bing Webmaster Tools (previously the Bing Webmaster Center) is a free service as part of Microsoft's Bing search engine which allows webmasters to add their websites to the Bing index crawler, see their site's performance in Bing (clicks, impressions) and a lot more. The service also offers tools for webmasters to troubleshoot the crawling and indexing of their website, submission of new URLs, Sitemap creation, submission and ping tools, website statistics, consolidation of content submission, and new content and community resources.

PowerMapper

PowerMapper is a web crawler that automatically creates a site map of a website using thumbnails from each web page.

Canonical link element

A canonical link element is an HTML element that helps webmasters prevent duplicate content issues in search engine optimization by specifying the "canonical" or "preferred" version of a web page. It is described in RFC 6596, which went live in April 2012.

XenForo

XenForo is an Internet forum software package written in the PHP programming language. The software is developed by former vBulletin lead developers Kier Darby and Mike Sullivan. The first public beta release of XenForo was released in October 2010, the stable version on March 8, 2011. The program includes several search engine optimization (SEO) features. On November 12, 2014, Chris Deeming joined the development team. One of his products, Xen Media Gallery, now XenForo Media Gallery, joined the XenForo family of products.

Yesod (web framework)

Yesod (web framework)

Yesod (Hebrew pronunciation: [jeˈsod]; Hebrew: יְסוֺד, "Foundation") is a web framework based on the programming language Haskell for productive development of type-safe, representational state transfer (REST) model based (where uniform resource locators (URLs) identify resources, and Hypertext Transfer Protocol (HTTP) methods identify transitions), high performance web applications, developed by Michael Snoyman, et al. It is free and open-source software released under an MIT License. Yesod is based on templates, to generate instances for listed entities, and dynamic content process functions, through Template Haskell constructs to host domain-specific language (eDSL) content templates called QuasiQuotes, where the content is translated into code expressions by metaprogramming instructions. There are also web-like language snippet templates that admit code expression interpolations, making them fully type-checked at compile time. Yesod divides its functions in separate libraries (database, html rendering, forms, etc.) so functions may used as needed.

Hreflang

The rel="alternate" hreflang="x" link attribute is a HTML meta element described in RFC 8288. Hreflang specifies the language and optional geographic restrictions for a document. Hreflang is interpreted by search engines and can be used by webmasters to clarify the lingual and geographical targeting of a website.

ASP.NET Web Forms

ASP.NET Web Forms is a web application framework and one of several programming models supported by the Microsoft ASP.NET technology. Web Forms applications can be written in any programming language which supports the Common Language Runtime, such as C# or Visual Basic. The main building blocks of Web Forms pages are server controls, which are reusable components responsible for rendering HTML markup and responding to events. A technique called view state is used to persist the state of server controls between normally stateless HTTP requests. Web Forms was included in the original .NET Framework 1.0 release in 2002 (see .NET Framework version history and ASP.NET version history), as the first programming model available in ASP.NET. Unlike newer ASP.NET components, Web Forms is not supported by ASP.NET Core.